4,810 Matching Annotations
  1. Aug 2023
    1. We were not many days in the merchant’s custody before we were sold after their usual manner, which is this:—On a signal given, (as the beat of a drum), the buyers rush at once into the yard where the slaves are confined, and make choice of that parcel they like best. The noise and clamour with which this is attended, and the eagerness visible in the countenances of the buyers, serve not a little to increase the apprehension of the terrified Africans, who may well be supposed to consider them as the ministers of that destruction to which they think themselves devoted. In this manner, without scruple, are relations and friends separated, most of them never to see each other again. I remember in the vessel in which I was brought over, in the men’s apartment, there were several brothers who, in the sale, were sold in different lots; and it was very moving on this occasion to see and hear their cries at parting.

      This shows how separation from families is common because many don't see each other again which is horrible. He was talking about his personal experience about being sold into different places and suffering emotionally. It's sad and I wish to read and find out about the well-being of his sister and makes me question that did he ever got to see her again like he did last time.

    2. The first object which saluted my eyes when I arrived on the coast was the sea, and a slave-ship, which was then riding at anchor, and waiting for its cargo. These filled me with astonishment, which was soon converted into terror, which I am yet at a loss to describe, nor the then feelings of my mind. When I was carried on board I was immediately handled, and tossed up, to see if I were sound, by some of the crew; and I was now persuaded that I had got into a world of bad spirits, and that they were going to kill me. Their complexions too differing so much from ours, their long hair, and the language they spoke, which was very different from any I had ever heard, united to confirm me in this belief. Indeed, such were the horrors of my views and fears at the moment, that, if ten thousand worlds had been my own, I would have freely parted with them all to have exchanged my condition with that of the meanest slave in my own country. When I looked round the ship too, and saw a large furnace or copper boiling, and a multitude of black people of every description changed together, every one of their countenances expressing dejection and sorrow, I no longer doubted my fate, and, quite overpowered with horror and anguish, I fell motionless on the deck and fainted. When I recovered a little, I found some black people about me, who I believed were some of those who brought me on board, and had been receiving their pay; they talked to me in order to cheer me, but all in vain. I asked them if we were not to be eaten by those white men with horrible looks, red faces, and long hair? They told me I was not; and one of the crew brought me a small portion of spiritous liqour in a wine glass; but, being afraid of him, I would not take it out of his hand. One of the blacks therefore took it from him and gave it to me, and I took a little down my palate, which, instead of reviving me, as they thought it would, threw me into the greatest consternation at the strange feeling it produced having never tasted any such liquor before. Soon after this, the blacks who brought me on board went off, and left me abandoned to despair. I now saw myself deprived of all chance of returning to my native country, or even the least glimpse of hope of aining the shore, which I now considered as friendly: and even wished for my former slavery, in preference to my present situation, which was filled with horrors of every kind, still heightened by my ignorance of what I was to undergo. I was not long suffered to indulge my grief; I was soon put down under the decks, and there I received such a salutation in my nostrils as I had never experienced in my life; so that with the loathsomeness of the stench, and crying together, I became so sick and low that I was not able to eat, nor had I the least desire to taste any thing. I now wished for the last friend, Death, to relieve me; but soon, to my grief, two of the white men offered me eatables; and, on my refusing to eat, one of them held me fast by the hands, and laid me across, I think, the windlass, and tied my feet, while the other flogged me severely. I had never experienced any thing of this kind before; and although not being used to the water, I naturally feared that element the first time I saw it; yet, nevertheless, could I have got over the nettings, I would have jumped over the side; but I could not; and, besides, the crew used to watch us very closely who were not chained down to the decks, lest we should leap into the water; and I have seen some of these poor African prisoners, most severely cut for attempting to do so, and hourly whipped for not eating. This indeed was often the case with myself. In a little time after, amongst the poor chained men, I found some of my own nation, which in a small degree gave ease to my mind. I inquired of them what was to be done with us? they give me to understand we were to be carried to these white people’s country to work for them. I then was a little revived, and thought, if it were no worse than working, my situation was not so desperate: but still I feared I should be put to death, the white people looked and acted, as I thought, in so savage a manner; for I had never seen among any people such instances of brutal cruelty; and this not only shewn towards us blacks, but also to some of the whites themselves. One white man in particular I saw, when we were permitted to be on deck, flogged so unmercifully, with a large rope near the foremast, that he died in consequence of it; and they tossed him over the side as they would have done a brute. This made me fear these people the more; and I expected nothing less than to be treated in the same manner. I could not help expressing my fears and apprehensions to some of my countrymen: I asked them if these people had no country, but lived in this hollow place the ship? they told me they did not, but came from a distant one. ‘Then,’ said I, ‘how comes it in all our country we never heard of them?’ They told me, because they lived so very far off. I then asked, where were their women? had they any like themselves! I was told they had: ‘Ande why,’ said I, ‘do we not see them?’ they answered, because they were left behind. I asked how the vessel could go? they told me they could not tell; but that there were cloth put upon the mastsby the help of the ropes I saw, and then the vessel went on; and the white men had some spell or magic they put in the water when they liked in order to stop the vessel. I was exceedingly amazed at this account, and really thought they were spirits. I therefore wished much to be from amongst them, for I expected they would sacrifice me: but my wishes were vain; for we were so quartered that it was impossible for any of us to make our escape. While we staid on the coast I was mostly on deck; and one day, to my great astonishment, I saw one of these vessels coming in with the sails up. As soon as the whites saw it, they gave a great shout, at which we were amazed; and the more so as the vessel appeared larger by approaching nearer. At last she came to anchor in my sight, and when the anchor was let go, I and my countrymen who saw it were lost in astonishment to observe the vessel stop; and were now convinced it was done by magic. Soon after this the other ship got her boats out, and they came on board of us, and the people of both ships seemed very glad to see each other. Several of the strangers also shook hands with us black people, and made motions with their hands, signifying, I suppose, we were to go to their country; but we did not understand them. At last, when the ship we were in had got in all her cargo they made ready with many fearful noises, and we were all put under deck, so that we could not see how they managed the vessel. But this disappointment was the least of my sorrow. The stench of the hold while we were on the coast was so intolerably loathsome, that it was dangerous to remain there for any time, and some of us had been permitted to stay on the deck for the fresh air; but now that the whole ship’s cargo were confined together, it became absolutely pestilential. The closeness of the place, and the heat of the climate, added to the number in the ship, which was so crouded that each had scarcely room to turn himself, almost suffocated us. This produced copious perspiration, from a variety of loathsome smells, and brought on a sickness amongst the slaves, of which many died, thus falling victims to the improvident avarice, as I may call it, of their purchasers. This wretched situation was again aggravated by the galling of the chains, now become insupportable; and the filth of the necessary tubs, into which the children often fell, and were almost suffocated. The shrieks of the women, and the groans of the dying, rendered the whole a scene of horror almost inconceiveable. Happily perhaps for myself I was soon reduced so low here that it was thought necessary to keep me almost always on deck; and from my extreme youth I was not put in fetters. In this situation I expected every hour to share the fate of my companions, some of whom were almost daily brought upon deck at the point of death, which I began to hope would soon put an end to my miseries. Often did I think many of the inhabitants of the deep much more happy than myself; I envied them the freedom they enjoyed, and as often wished I could change my condition for theirs. Every circumstance I met with served only to render my state more painful, and heighten my apprehensions and my opinion of the cruelty of the whites. One day they had taken a number of fishes; and when they had killed and satisfied themselves with as many as they thought fit, to our astonishment who were on deck, rather than give any of them to us to eat, as we expected, they tossed the remaining fish into the sea again, although we begged and prayed for some as well as we could, but in vain; and some of my countrymen, being pressed by hunger, took an opportunity, when they thought no one saw them, of trying to get a little privately; but they were discovered, and the attempt procured them some very severe floggings.

      To me, this is part is kind of dark. He was on the slave ship. He saw the men who are being tortured and chained up on the ship. The men were so cruel to them. Thats sad that he had to witness about that. It's heartbreaking that at first, he got kidnapped, his sister got kidnapped, he got departed from his sister, sold into slavery, sold into many different locations, then have to witness the men who are being tortured on the ship. This is very dark and traumatizing. And also, he still feeling hopeless of the desire of returning back home and also reunite with his sister again. It makes me question that would he ever reunite with his sister again.

    1. Students can simultaneouslybe-come both "unstuck"(distanced from the ways they have always thought, nolonger so complicit with oppression) and "stuck" (intellectually paralyzed sothat they need to work through feelings and thoughts before moving on withthe more "academic"part of a lesson). Though paradoxicaland in some waystraumatic,this condition should be expected: by teaching studentsthat the veryways in which we think and do things can be oppressive, teachers should expecttheir students to get upset

      From my readings in EDSCI so far, this is the first time I have seen someone address the heaviness that may come with being a transformative learner. Many of our biases and student's biases as well as oppressive ideologies might be the only way they have learned. The word trauma exemplifies the impact of unlearning things that perhaps have been the building blocks of your identity. I think about my 7th graders, primarily Latinx, primarily Christian, primarily male, and primarily from low income households surrounding our school;they might experience this trauma when presented with ideas that deviate from what they hold to be truth. However, their truth has been their reality, and rather than negate them, they have to be part of the conversation.

    1. Here, we aim to unpack these new behaviours as well as to dismantle some broad narratives of ‘young people’. Instead, we consider how social natives (18–24s) – who largely grew up in the world of the social, participatory web – differ meaningfully from digital natives (25–34s) – who largely grew up in the information age but before the rise of social networks – when it comes to news access, formats, and attitudes.2

      Social natives may be more interactive in social media and more inclined to socialize on their devices instead of in person. This is my hypothesis because I see a lot of younger people Facetiming and utilizing social media more than I do, and even though I could be as active as them, I am just not as inclined to participate on a more personal account. I would be more inclined to participate promoting a company I work for on social media than interacting with it as much as the younger generation for myself. I think that digital natives may be more skilled at media literacy because they have some background knowledge on what existed before the floods of mass misinformation from social media on platforms like Twitter and Facebook- while seeing how misinformation was perpetuated in the media for the generation older than them.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments. Here we provide a point-by-point response to their reviews. All additional experiments that are present in the revised manuscript, or that we plan to include in the final manuscript, are numbered.

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The concept introduced by this paper is exciting and novel. However, the current paucity of presented data can lead to incorrect interpretations of the findings and speculations that might not hold true after a more rigorous assessment of the observed phenomenon.

      The premise of this study builds upon an interaction between the PAXT complex and nuclear YTH domain containing proteins. However, figures 1B and C should be improved. The interacting band for the ZFC3H1 presented in panel B does not seem to match the size of the construct used in panel C. Is the Flag version of ZFC3H1 expressing a smaller isoform for this protein? __

      The reviewer is correct in that endogenous ZFC3H1 (which migrates at 250kD with a minor band at 150kD, see Figure 1B in the initial manuscript) appears to differ from the FLAG-tagged construct as expressed from a plasmid transfected in HEK293 cells (which migrates as two bands at 180kD and 200kD, see Figure 1C in the initial manuscript). For the endogenous protein, the predicted molecular weight is 226kD and the 250kD band disappears when cells are transduced with lentivirus containing shRNAs against ZFC3H1 (see Figure 4A in the initial manuscript), indicating that it is the correct protein. Both the 250kD endogenous protein (*) and the 200kD overexpressed protein (**) in transfected HEK293 and U2OS cells are detected in immunoblots using anti-ZFC3H1 antibodies (see Figure 1 in this document) indicating that the over-expressed protein is indeed ZFC3H1.

      [ Figure 1]

      _Figure 1. Molecular Weight Size Comparison of Endogenous ZFC3H1 and FLAG-ZFC3H1 (1-1233). _Lysates from HEK293 and U2OS that were either untransfected or transfected with FLAG-ZFC3H1 (1-1233) plasmid. We labelled the bands corresponding to the endogenous ZFC3H1 “*” and FLAG-ZFC3H1 (1-1233) “**”.

      We have sequenced the plasmid, and discovered that it contains an additional sequence inserted within the middle of the ZFC3H1 cDNA with a premature stop codon. As such, the version of the protein that is expressed from the plasmid only contains amino acids 1-1233 of the endogenous protein and is missing amino acids 1234-1989. The deleted region only contains TPR repeats, and is not known to interact with any of the well characterized interactors of ZFC3H1 (Wang, Nuc Acid Res 2021, Figure 3). We have renamed this construct FLAG-ZFC3H1 (1-1233). Given these new considerations, our results are consistent with the idea that the N-terminal portion of ZFC3H1 interacts with U1-70K, YTHDC1 and YTHDC2. We will change the text to reflect this.

      We are currently in the process of deleting the small insertion to obtain a plasmid that encodes a full length version of human ZFC3H1. For the final manuscript:

      Experiment #1) We will repeat the co-immunoprecipitation experiment with the full length FLAG-ZFC3H1 to determine whether it interacts with YTHDC1 and YTHDC2. This will take a few weeks.

      __Also, the YTHDC1-2 interaction in panel C is not as convincing considering the negative controls lane show some degree of binding. __

      Although the reviewer is correct that there is substantial background binding in the YTHDC1 immunoblot, we disagree with their characterization of the results with the YTHDC2 immunoblot (see Figure 1B-C in the initial manuscript). In the new manuscript we have included:

      Experiment #2) A new co-immunoprecipitate of the FLAG-tagged ZFC3H1 (1-1233) from HEK293 cells under more stringent conditions where the background level of YTHDC1 binding to beads is negligible. We have already completed this experiment (see Figure 1D in the revised manuscript).

      __Additionally, can the authors test if their RNaseA treatment worked? __

      In the new manuscript we have included:

      Experiment #3) A new co-immunoprecipitate of FLAG-YTHDC1 immunoblotted for eIF4AIII from HEK293 cell lysates. We find that without RNAse, there is some eIF4AIII in the precipitates but that the levels diminish substantially after RNAse A/T1 treatment. We have already completed this experiment (see Figure 1B in the preliminary revised manuscript).

      __Why do you need 18 hours to observe the nuclear export of your modifiable construct when inhibiting METTL3 in figure 3? Is it possible that your observation is secondary to phenotypes these cells develop as a result of blocking METTL3? __

      We treated cells for this period of time so that during the expression of the reporter, all of the newly synthesized mRNA is expressed in the absence of m6A methyl transferase activity. For shorter treatment times, it is unclear whether the bulk of the reporter mRNA, which would be synthesized before the treatment, would lose any pre-existing m6A marks, making a negative result hard to interpret. Previously we found that although 50% of intronic polyadenylated (IPA) transcripts from our reporters are rapidly degraded, about 50% are stable and are nuclear retained over extended periods of time (see Lee at al., PLOS ONE 2015; https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0122743 Figure 3B-G). We believe that the bulk of the reporter mRNA that we are visualizing is stable and accumulates in the nucleus. Given that METTL3-depletion inhibits nuclear retention and that versions of the IPA reporter that lack m6A modification motifs are exported, we think that the most likely interpretation of the 18 hour STM2457 treatment experiments is that the lack of methyltransferase activity had a direct effect, rather than an indirect effect, on nuclear retention. We would be open to performing more experiments if the editors insist, however we ordered STM2457 four weeks ago and it has yet to arrive from Sigma-Millipore. Performing this experiment may substantially delay our ability to resubmit the manuscript in a timely manner.

      __Is ALKBH5 nuclear and/or cytoplasmic in the cell system used? __

      According to The Human Protein Atlas, ALKBH5 is predominantly nuclear in U2OS cells, with some present in the cytoplasm (https://www.proteinatlas.org/ENSG00000091542-ALKBH5/subcellular#human).

      In the revised manuscript we have included:

      Experiment #4) Data from subcellular fractionation demonstrating that ALKBH5 is present in both the nucleus and cytoplasm that we have already performed (see Figure 4J in the preliminary revised manuscript).

      __Reviewer #1 (Significance (Required)):

      The study is highly significant __

      ------

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In the manuscript by Lee et al. entitled "N-6-methyladenosine (m6A) Promotes the Nuclear Retention of mRNAs with Intact 5'Splice Site Motifs", the authors provide evidence that m6A modifications within specific regions of transcripts can confer nuclear retention. These results are important because they add to our understanding of how m6A modifications can contribute to post-transcriptional regulation. Although the authors do not quite come out and say this, data seem to be accumulating to suggest that the location of the m6A modifications within a given transcript can dictate the functional consequences of those modifications.__

      We thank the reviewer for pointing this out. We have included a few sentences in the new preliminary revised manuscript pointing out that the location of the m6A modification in IPA transcripts, with respect to intact 5’SS and poly(A) signals, may play a role in promoting nuclear retention.

      __The current work builds on previous findings from these authors identifying factors critical for retention of intronic polyadenylated (IPA) transcripts. The present study identified m6A modification as one of the signals for the retention of such transcripts. The authors use reporters for their analysis and also examine validated endogenous IPA transcripts. The data presented supports the conclusions albeit they show a surprising finding for one of the m6A erasers, ALKBH5. However, there is some controversy over the mechanism by which ALKBH5 functions and whether the m6A mark is truly reversible, so these results may continue to add to this point of view.

      Major Comments: One experiment that might add to the argument would be overexpression of Mettl3 as compared to catalytically inactive Mettl3. The prediction would be that the reporter transcript with intact DRACH sequences would be even more retained in the nucleus in a manner that depends on Mettl3 catalytic activity. For some of the data presented, the reporter is already wholly nuclear so no difference could be detected, but in the U2OS cells shown in Figure 2B, it appears that an increase in nuclear localization might be evident. Such an experiment would add an orthogonal approach to demonstrate that the methylation by Mettl3 is required for retention. If such an experiment would work with the endogenous IPA transcripts shown in Figure 4, but these transcripts may already be too nuclear to detect any increase in nuclear retention.

      __

      We have performed two experiments that try to address this but they gave negative results:

      Experiment #5) We have over-expressed wildtype and a methyl transferase mutant FLAG-METTL3 and assessed the nuclear export/retention of ftz-Δi-5’SS mRNA. There was no effect (see Figure 2 in this document).

      [Figure 2]

      __Figure 2. Over-expression of METTL3 does not increase the nuclear retention of ftz-Δi-5’SS. __U2OS cells were co-transfected with ftz-Δi-5’SS reporter and either FLAG-METTL3 or FLAG-METTL3-D395A, which lacks methyl-transferase activity (Wang, Mol Cell 2016). Cells were fixed, stained for ftz mRNA by fluorescent in situ hybridization and METTL3 using anti-FLAG antibodies. The nuclear and cytoplasmic distribution of ftz mRNA was quantified as described in the manuscript. Note that this is the average of one independent experiment (each bar consisting of the average of at least 50 cells). We plan to repeat this two more times, but we anticipate that these will show the same result.

      We could include this negative data as a supplemental figure. We believe that there are two possible reasons for this experimental result. First, as the reviewer points out, the reporter transcripts are already too nuclear to detect any significant change. Second, METTL3 is part of a larger complex that includes several proteins including METTL14, WTAP and potentially other proteins (for example see Covelo-Molares, Nuc Acid Res 2021). We may need to co-express all of these proteins to see an effect.

      Experiment #6) We have also expressed versions of ftz-Δi and ftz-Δi-5’SS mRNA with optimized m6A modification (i.e. DRACH) motifs (AGACT) to enhance methylation (“e-m6A-ftz”). We only observed a slight increase in nuclear retention but it is not significant (see Figure 2A,C in the revised manuscript).

      Again, this result could be explained by the fact that the reporter is too nuclear to detect any significant increase in retention. We had originally performed this in parallel with the no-m6A-ftz-Δi-5’SS reporters but did not report this negative data in the original manuscript.

      __Some rather minor changes to the presentation of the data could enhance the impact of this study.

      Specific Comments:

      The primary question in this manuscript is comparing reporters with m6A site (intact DRACH sequences) to those without. For this reason, organizing the data to the +/- DRACH sites are adjacent to one another might make the most sense. This point is evident in Figure 1C where perhaps simply changing the order of the bars presented to put the ones directly compared adjacent would be preferable. Then the p-value would compare sets of data directly adjacent to one another. __

      We thank the reviewer for this suggestion and we have made these changes to the figures in the preliminary revised manuscript.

      __While the authors show representative fields/cells for most assays, they do an excellent job of providing quantitation as well. One exception is Figure 3D, which shows a single cell image for the most key panel (the 5'SS-containing reporter upon Mettl3 depletion). If there is not a field with more cells, the authors could create a montage. __

      In the revised manuscript, we have replaced this image with one containing multiple cells expressing the reporter.

      __Minor Comments:

      Figure presentation:

      The text in a number of the figures is VERY small (Figures 1B,1C, and 4A) for example. __

      We have fixed this in the new manuscript.

      __Figure 3A includes the label "shRNA:" at the top, but these cells are treated with Mettl3 inhibitor and there does not appear to be any shRNA employed, so this seems like a labeling error. __

      We have fixed this in the new manuscript.

      __In Figure 3C, the immunoblot of Mettl3, there are three bands that all disappear completely upon knockdown of Mettl3- are these all Mettl3? This should at least be mentioned in the legend and perhaps indicated in the figure. The authors do mention in the text employing shRNAs to target multiple Mettl3 isoforms, so likely this is the case. __

      We have clarified these issues in the new manuscript.

      __Minor points (some really minor to just polish the presentation for clarity):

      The word "since" should only be used if there is a time element- otherwise the word "as" is preferable.

      For example on p. 4, the sentence: "Since inhibition of mRNA export typically enhances the nuclear retention of RNAs with intact 5'SS motifs (Lee et al. 2020),.." would more precisely read "As inhibition of mRNA export typically enhances the nuclear retention of RNAs with intact 5'SS motifs (Lee et al. 2020),..". __

      We thank the reviewer for pointing this out. We have fixed this issue in the revised manuscript.

      __Reviewer #2 (Significance (Required)):

      Summary: In the manuscript by Lee et al. entitled "N-6-methyladenosine (m6A) Promotes the Nuclear Retention of mRNAs with Intact 5'Splice Site Motifs", the authors provide evidence that m6A modifications within specific regions of transcripts can confer nuclear retention. These results are important because they add to our understanding of how m6A modifications can contribute to post-transcriptional regulation. Although the authors do not quite come out and say this, data seem to be accumulating to suggest that the location of the m6A modifications within a given transcript can dictate the functional consequences of those modifications.

      This study would be of significant interest to those that study gene expression in any context as well as cell biologists as the data add to our understanding of export of mRNA from the nucleus. This work also adds to our understanding of the biological consequences of m6A modification, which is an area of significant interest. In my opinion, the authors could make a broader conclusion that we do, which is that the location of the modification significantly dictates function- an extension of previous findings mostly focused on processed mRNA transcripts. __

      -------

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Quality control of mRNA is vital for all types of cells. In eukaryotic cells, nuclear export of misprocessed mRNAs containing the 5' splice site is prevented. In this manuscript, Lee and colleagues demonstrate that the nuclear retention of intronic polyadenylated transcripts is dependent on m6A modification. Based on the results shown in yeast, they perform immunoprecipitation experiments and demonstrate the interaction between ZFC3H1, a component of the PAXT complex, and YTHDC1 and YTHDC 2, nuclear YTH RNA-binding proteins that recognize m6A-modified transcripts. The study also shows the interaction of U1-70K with YTHDC1 and with ZFC3H1. Depletion of YTHDC1/2 prevents the nuclear retention of IPA transcripts. Additionally, CLIP-seq analysis is performed, demonstrating that m6A modification is enriched around the 5' splice site motif and the 3' polyadenylation site in IPAs. From these observations, they conclude that m6A modification contributes to the quality control of mRNA by promoting nuclear retention of misprocessed transcripts.

      Major Points 1. The interaction between ZFC3H1 and YTHDC1 is clearly shown by immunoprecipitation of FLAG-tagged YTHDC1 in Figure 1B. However, the co-purification of YTHDC1 with FLAG-tagged ZFC3H1 in Figure 1C is rather ambiguous. Additionally, the immunoprecipitated samples do not appear to show signals corresponding to FLAG-tagged ZFC3H1, making it unclear if the immunoprecipitation is working. It is essential to provide a better quality result to clarify these observations. __

      Please see our responses to reviewer #1. We have repeated the co-immunoprecipitation of FLAG-ZFC3H1 (1-1233) with YTHDC1 under more stringent conditions and have reduced the background binding (see Figure 1B and D in the new manuscript). We have also determined why the FLAG-ZFC3H1 is smaller than expected as the construct contains a premature stop codon. As explained above, we are in the midst of generating a full-length FLAG-ZFC3H1 and we plan to repeat the co-immunoprecipitation with this new construct.

      2. While the authors demonstrate that the m6A modification is dispensable for the targeting of IPA reporter transcripts to the nuclear speckles, it would be valuable to investigate whether m6A is required for their exit from the nuclear speckles. Do reporter transcripts with m6A motifs remain in the nuclear speckles at later time points?

      We have now analyzed the colocalization of nuclear speckles (SC35) with ftz-Δi-5’SS, which contains both a 5’SS and DRACH motifs, and no-m6A-ftz-Δi-5’SS, which contains a 5’SS but lacks DRACH motifs, at steady state – i.e. after 18-24 hours of transfection (as opposed to at early time points as shown in Figure 2D-E of the initial manuscript). Unexpectedly, we see that both mRNAs continue to colocalize with nuclear speckles, although the no-m6A-ftz-Δi-5’SS mRNA is well exported from the nucleus and its signal in nuclear speckles is faint (see Figure 2F-H in the new manuscript).

      Previously, we observed that ftz-Δi-5’SS required the 5’SS motif to remain in nuclear speckles at these later time points (Lee PLOS ONE 2015 and Lee RNA 2022). Upon closer inspection, ftz-Δi-5’SS mRNA also accumulates in additional nuclear foci that are not SC35-positive. Our new results may indicate that m6A marks promote the transfer of mRNAs from nuclear speckles to other foci, but more data is required to make a firm statement. Given this, we plan to conduct further experiments which may take a month to complete:

      Experiment #7) We are now assessing whether these additional ftz-Δi-5’SS foci correspond to either YTHDC-positive foci which were previously shown to partially overlap nuclear speckles and sequester m6A-rich mRNAs (Cheng Cancer Cell 2022), or “pA+ RNA foci” which accumulate MTR4/ZFC3H1-targetted RNAs when the nuclear exosome is inhibited (Silla Cell Reports 2018). These foci are enriched in ZFC3H1. We plan on co-staining ftz-Δi, ftz-Δi-5’SS, no-m6A-ftz-Δi and no-m6A-ftz-Δi-5’SS with SC35, YTHDC1 and ZFC3H1 to determine whether m6A may help to transfer mRNAs from nuclear speckles to YTHDC1 or ZFC3H1-enriched foci.

      __3. Figures 5B and 5C suggest that ZFC3H1 is required for the degradation of IPA transcripts. However, the range of the vertical axis is inappropriate and it is difficult to assess the extent of the increase in expression levels. Please adjust the vertical axis range for improved clarity. __

      We thank the reviewer for the feedback we have added additional graphs with an expanded vertical axis to demonstrate that ZFC3H1 is required for the degradation IPA transcripts.

      Minor Points 1. page 4, line 2 "RNAse" should be corrected to "RNase".

      We thank the reviewer for catching this error. We have fixed this.

      __ 2. page 7, line 5: Is the statement "prevents the nuclear export and decay of non-functional and misprocessed RNAs" correct? m6A modification promotes the decay of such RNAs. __

      We thank the reviewer for pointing this out. We have altered the text to clarify that m6A promotes decay.

      __3. Figure 2E: ftz-∆i should be ftz-∆i-5'SS. __

      We thank the reviewer for catching this error. We have fixed this.

      __4. Figure 5A: It would be helpful to indicate the number of IPA transcripts analyzed. __

      We have included this information.

      __Reviewer #3 (Significance (Required)):

      Overall, the work is sound and generally well-controlled. This study advances our understanding of the quality control of misprocessed transcripts in higher eukaryotes. This reviewer suggests a few points for clarification or improvement. __

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We would like to thank the editorial staff and the reviewers for their handling of our manuscript. We were very pleased with the timely communications from Review Commons, and we are grateful to have been assigned this insightful and constructive group of reviewers.

      The reviewers were well-suited to evaluate our work based on their stated areas of expertise (cancer biology, image analysis, machine learning, cell-based screening, etc.). As such, we received thoughtful and constructive feedback, which we have already incorporated into our attached revision. We are confident that these reviews have improved our manuscript.

      Our goal with this manuscript is to present a proof-of-concept study where high-content imaging and morphological profiling are used to characterize drug resistance in clonal cell lines. The main criticism from reviewers was that our original manuscript may have overstated our method’s ability to discriminate the signal of bortezomib resistance and that any extension beyond cultured cells (to patient samples for example) would require significant follow-up studies. The reviewers suggested that such work would be beyond the scope of our study, and recommended toning down our language to better reflect the limitations of this proof-of-concept work. We have embraced this suggestion, extensively revising our text, and we now believe our language and tone more accurately reflects our results. The reviewers also suggested follow-up computational analyses to more robustly characterize the bortezomib resistance signature. We have performed these analyses and added their description to our revised manuscript. We feel that these analyses have improved understanding of the signature, and will help a reader to gain a deeper understanding of our results and methodology.

      The reviewers also suggested several minor changes; many of which we embraced fully, but others that we chose not to incorporate. We felt that a lack of clarity in our text contributed to these reviewer suggestions. In these cases, we improved clarity in the text and responded to each comment point-by-point in the “prefer not to carry out” section. Further, we address all reviewer comments in the following document point-by-point, grouped by common themes across reviewers (e.g., tone, clarity, analyses, etc.).

      Lastly, a common theme among reviewer comments was their appreciation for our strong methodology and data transparency (examples pasted below). We are extremely gratified by this observation as we feel this is a particular strength of our manuscript. In addition, we were pleased to see reviewers engaged by our work, acknowledging the interest this manuscript is likely to generate among a broad range of scientific disciplines.

      Examples of reviewer appreciation of our strong methodology and data transparency:

      Reviewer 1: “However, this does not imply that the same approach can not achieve the goal, perhaps by using other cell painting markers for bortezomib-sensitivity, or with the same markers to assess sensitivity of different drugs. The cell painting + analysis approaches are not new and the clinical impact is questionable, but the technical aspects (data, analysis) are exceptional and the concept may hold as I described above.”

      Reviewer 2: “The paper is well written, and the text is clear, as is the presentation of data and transparency of methods being utilized. The methods were applied appropriately and followed established standards in the field. The paper's premise is timely and interesting, addressing a pressing issue in cancer therapy: making informed treatment decisions fast, based on markers found in tumors early in tumor development, and using image-based screening for characterizing drug resistance before treatment could be an option. A fascinating bit of the manuscript is the description of the feature selection from the screen is done systematically, considering the technical and biological variability and technical artifacts and modeling covariates using linear models seems a very appropriate way of doing so and could serve as another proof of concept that this is indeed the most robust way of modeling and removing signal of technical covariates from the data.”

      Reviewer 3: “The strengths of this study are the machine learning best practice and detailed methodology. The experiments could be reproduced and statistical analysis is more than adequate. The analysis takes into account batch effects, well position, differences in cell numbers, and other sources of technical variation that complicate high-content image analysis. It is a good exemplar of how unsupervised morphological profiling can be applied to imaging data. The major limitation is the generalizability of this particular method for patient samples. This could be addressed in the Discussion.”

      1. Description of the planned revisions

      We have incorporated all planned revisions.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      Text revisions already carried out

      1. [Text revision] We have materially toned down our claims in the manuscript in two distinct areas: A) model performance and B) potential clinical application. A) Model performance. We specifically balanced our discussion of the discriminative signal of the Bortezomib Signature. While the signature adequately separated never-before-seen wildtype and resistant clones with metrics well above randomly permuted baselines (accuracy near 80%, average precision about 70%, area under the ROC curve (AUROC) about 84%), there were many limitations that we should have more explicitly highlighted. For example, many individual profiles were incorrectly classified, some clones were predicted entirely incorrectly, and many profiles did not receive Bortezomib Signature scores above the randomly permuted baseline. We have more clearly discussed these limitations and used more balanced language (see key examples of text-based changes below). Additionally, we modified a figure (now Figure 3) to include boxplots of clones that explicitly show the Bortezomib Signature scores of each well profile and permit examination of the strength of the signature for each clone (previously found in Figure 2-Supplement 9). Lastly, we add a new supplementary figure (now Figure 5-Supplement 1) that describes a feature space analysis of misclassified samples. Please note that this figure rearrangement and new analysis helped to balance our claims, but were also performed in response to other tangential reviewer comments. B) Clinical application. In the abstract, introduction, and discussion, we further emphasized that this work is a proof of concept, and that more advances must be made prior to clinical application.

      We made these changes in direct response to the following reviewer comments:

      Reviewer 1 - Major Comment 1 (relevant excerpts)

      While I am convinced that the signature captures morphological phenotypes associated with drug resistance, at the cumulative scale, the discriminative signal of a single cell type seems weak… With Fig. 4, the data fully supports the argument that the bortezomib-signature encodes bortezomib-resistance, but the signal is weak. Thus statements such as "We found the Bortezomib Signature could predict whether a cell line was bortezomib-resistant or bortezomib-sensitive" (line #172) and the specificity statements in the abstract" (line #28) are not supported by the data in my opinion. I would recommend the authors to tune down these and other related statements throughout the manuscript.

      Reviewer cross-commenting - Reviewer 1

      My main critic is regarding "over selling" a weak discriminative signal. Specifically, I am not convinced that the major claims regarding predicting sensitivity and specificity at the single cell types scales are supported by the data. Since reviewer #2 and #3 did not raise this concern I think it is worth discussion here.

      Once these statements are tuned down - I think no significant additional work is needed to make the point that they can measure a discriminative signal. If they want to make these claims, perhaps they'd like to collect more data to gain statistical power (but I am not optimistic this will work at the single cell level).

      Personally, I was happy with the authors' choice of cell lines not included in the training dataset. I am not convinced that additional cell lines + validations are necessary for making the point of a proof of principle.

      Reviewer cross-commenting - Reviewer 2

      I agree that, perhaps, my major criticism of the paper was the manuscript's 'overselling' of claims that were only weakly supported by the data. Yes, if the authors tune down their claims and clearly state that this is an interesting starting point and proof of concept study, it might be ok to publish with only minor revisions. If the claims should be more generalized, then this study needs more data supporting the conclusions and the method's predictive power.

      Reviewer 2 - Major Comment 8

      Lastly, I find some misfits between the question, the model used, and the conclusions drawn. The authors start by exploring the problem of bortezomib resistance in cancer treatment, which they say is a devastating issue for patients with, e.g., multiple myeloma. Yet, the authors use HCT116 as their model cell line, a microsatellite instable, colorectal cell line with several intrinsic mutations that make it a difficult model to address physiologically relevant medical problems after all. The authors then go on to suppose that their method might be suitable to diagnose resistance in patient samples, but I am not convinced this conclusion can be speculated based on data from HCT cells. I suggest the authors test their approach on at least two other cell lines (maybe from different tissues) and benchmark their results against a dataset of digital pathology where such predictions are made from stained and analyzed tissue slices. This way, after a thorough benchmark against related third-party data sets, the method would significantly gain relevance, the paper would appeal to a broader audience, and the advance gains more merit.

      Reviewer 3 - Major Comment 5

      It is not clear from the Discussion whether this type of analysis is more broadly applicable to cell lines derived from patients, rather than selected from a parental cell line, or if this approach would be more efficient than genotyping or next-gen sequencing. How many replicates and ground truth cell lines would be necessary for predictive confidence?

      We edited the last two sentences of the abstract to tone down specificity claims (“provide evidence”) and clarify that we are establishing a “proof-of-concept framework”.

      • This signature predicted bortezomib resistance better than resistance to other drugs targeting the ubiquitin-proteasome system. Our results establish a proof-of-concept framework for the unbiased analysis of drug resistance using high-content microscopy of cancer cells, in the absence of drug treatment.

      We revised the last paragraph of the introduction to contrast bortezomib predictions with ixazomib/CB-5083 predictions, and to remove claims about “using microscopy to guide therapy”.

      • This morphological signature correctly predicted the bortezomib resistance of seven out of ten clones not included in the signature training dataset. Overall, our results establish a proof-of-concept framework for identifying unbiased signatures of drug resistance using high-content microscopy. The ability to identify drug-resistant cells based on morphological features provides a valuable orthogonal method for characterizing resistance in the absence of drug treatment.

      To tone down claims in the figures, we added boxplots to Figure 3 (previous Figure 2) showing specific distribution of signature scores per well profile and updated Figure 4 legend (previous Figure 3).

      • Figure 4. Bortezomib Signature has limited ability to characterize clones resistant to other ubiquitin-proteasome system inhibitors.

      We modify the following text in the discussion to tone down claims of specificity and clinical utility:

      • This Bortezomib Signature correctly predicted the bortezomib resistance of seven out of ten clones not included in the training dataset and was more specific to bortezomib-resistance given its limited ability to identify clones that were resistant to other UPS-targeting drugs.

      Though it is unclear whether this method can be extended to patient samples, where identifying intrinsic drug resistance in cells prior to treatment has the potential to improve targeted cancer therapy, our results are an encouraging proof of concept. We expect that further refinement may develop Cell Painting as a tool for identifying drug-resistant cells, perhaps even guiding strategies to overcome intrinsic resistance.

      1. [Text revision] We defined LD50 in text (originally line #97), changed description of resistant clone selection to remove main text references to LD90 (originally line #87), and stated drug concentrations used for selection in Methods. We also defined LD90 in the Methods and described its role in determining the drug concentrations to use for clone selection. This change was in response to the following comments:

      Reviewer 1 - Minor Comment 2

      What is LD90 (line #87)? LD50 (line #97)?

      Reviewer 2 - Minor Comment 5

      What was the LD 90 per drug on HCT cells? Rather than LD90 foldchanges, absolute concentrations should be used in the results and discussion to allow the reader to vet the conclusions.

      • To determine the appropriate drug concentrations to use in order to isolate drug-resistant clones, we performed proliferation assays on HCT116 parental cells with our drugs of interest: bortezomib (proteasome inhibitor), ixazomib (proteasome inhibitor), or CB-5083 (p97 inhibitor) (Fig. 1-Supplement 1 A-D).
      • We characterized the bortezomib-resistant clones and found that the median lethal doses (LD50s) were ~2.8- to ~9-fold that of HCT116 parental cells (Fig. 1-Supplement 2 B).
      • Briefly, HCT116 cells were plated in 150 mm dishes and grown in the presence of the desired drug at a concentration that resulted in the death of the majority of cells (selection concentrations: bortezomib, 12 nM; ixazomib, 150 nM; CB-5083, 600 and 700 nM).
      • Using the data from our proliferation assays, we calculated the median lethal dose (LD50) for each of our drugs of interest by fitting data of normalized growth vs. log[drug concentration] to a sigmoidal dose-response curve using GraphPad Prism (v.9.2.0) (Fig. 1-Supplement 1 D).

      • [Text revision] We thank the reviewer for allowing us an opportunity to improve clarity on the clones we used. We now describe the total number of clones generated and removed unnecessary references to specific clones for ease of reading (originally lines #96-98) (We maintain all references to specific clones in the figures, legends, supplement, and methods)

      Reviewer 1 - Minor Comment 3

      It was not clear to me in the text which and how many cell lines were evaluated and the reader is forced to go to the SI. For example, "(BZ01-10 and BZ clones A and E)" (line #96-97) and "wild-type clones (WT01-05, 10, and 12-15)" (line #98) appeared when presenting the results without a clear explanation and made it harder for me to follow. Summary of the data (for example, based on Figure 2-Supplement 8) can be briefly mentioned in the text to make it more clear for the reader.

      We added the following to the second paragraph of the results:

      • Together these methods provided a total of twelve bortezomib-resistant, five ixazomib-resistant, five CB-5083-resistant, and twelve bortezomib-sensitive clones as well as HCT116 parental cells for our experiments.

      [Text revision] We removed duplicate text (originally lines #115-125).

      Reviewer 1 - Minor Comment 5

      1. Lines #104-111 were duplicated in lines #114-122.

      Reviewer 3 - Minor Comment 4

      Ten lines of text are duplicated on page 5.

      Reviewer 2 - Minor Comment 4

      on page 5, paragraph 4, there is a sizeable copy-and-paste error of text being identically replicated.

      1. [Text revision] We provided more intuition of the Bortezomib Signature in the results section (originally lines #150-151).

      Reviewer 1 - Minor Comment 6

      The "Bortezomib Signature" is a critical measurement but is only briefly mentioned in lines 150-151 ("..based on the direction-sensitive ranking method for phenotype analysis, singscore (Foroutan et al., 2018)"). Please provide more information/intuition.

      • We used these 45 features to compute a rank-based resistance score or “Bortezomib Signature” for each well profile based on the direction-sensitive method called singscore (Foroutan et al. 2018). Singscore ranks these 45 resistance-related features on a per sample basis and calculates a normalized score between -1 and 1, with higher values expected for bortezomib-resistant clones and lower values expected for bortezomib-sensitive clones.

      • [Text revision] We clarified that DNA sequencing had been performed solely on clones A and E in a previous study (originally lines #88-90). Furthermore, one of the strengths of our approach is that it can identify resistant clones in an unbiased fashion prior to molecular characterization. It is beyond scope to perform these sequencing studies in the present paper.

      Reviewer 2 - Minor Comment 3

      The authors talk about validating the mutation - PSMB5 by RNA-seq. However, the data for the genotyping/sequencing/characterization of these newly generated BZ-resistant lines are missing.<br />

      In the results, we clarify DNA sequencing that was previously performed on clones A and E

      • We also isolated bortezomib-sensitive (wild-type; WT) clones by dilution of the HCT116 parental cell line and acquired two bortezomib-resistant clones (BZ clones A and E) both with mutations in PSMB5 identified by RNA sequencing performed in previous work (Fig. 1-Supplement 1 E) (Wacker et al. 2012).

      In the last paragraph of the discussion, we highlight the strength of our unbiased approach

      • Together, our work has demonstrated the potential for morphological profiling with Cell Painting to be used as an unbiased method to characterize resistance in the absence of drug treatment. Our results indicate that different mechanisms of bortezomib resistance may generate distinct morphological profiles; with larger and broader training datasets, it may be possible to identify signatures for distinct mechanisms of bortezomib resistance as well as signatures of resistance to other drugs. Though it is unclear whether this method can be extended to patient samples, where identifying intrinsic drug resistance in cells prior to treatment has the potential to improve targeted cancer therapy, our results are an encouraging proof of concept. We expect that further refinement may develop Cell Painting as a tool for identifying drug-resistant cells, perhaps even guiding strategies to overcome intrinsic resistance.

      • [Text revision] We thank the reviewers for their suggestions. We agree that the description of the experimental design was somewhat unclear and have provided greater detail and clarity, particularly regarding the generation of clones. We used the HCT116 parental cell line to generate drug-resistant clones by identifying single surviving cells after drug treatment and allowing these cells to expand prior to isolating colonies for experimentation. We did not perform experiments to confirm whether these “clones” were isogenic and can not exclude cell migration during expansion or genetic drift as convoluting factors. However, we have provided greater detail in the descriptions of our method for clone isolation in order to address this concern.

      Reviewer 1 - Minor Comment 1

      More information in Fig. 1's legend would be helpful to follow the experimental design. I found it hard to follow in its current form and had to go back to carefully reading the main text to fully understand.

      Reviewer 2 - Minor Comment 6

      The description of the resistant clonal populations is confusing. As I understand, no single-cell clones were isolated during the selection procedure. Thus, the training lines are not yet isogenic clones but oligoclonal sub-populations of the parental cell line. The authors could provide more details here and discuss the different characteristics of their sub-populations, e.g., their growth kinetics or molecular alterations.

      We bolstered the description in the results.

      • We first isolated and characterized drug-resistant cells (Fig. 1 A). To isolate drug-resistant clones, we used an approach we have described previously (Wacker et al. 2012; Kasap, Elemento, and Kapoor 2014) and the HCT116 cell line. These cancer cells express multidrug resistance pumps at low levels and are mismatch repair deficient, providing a genetically heterogeneous polyclonal population of cells (Umar et al. 1994; Papadopoulos et al. 1994; Teraishi et al. 2005) allowing for isolation of drug-resistant clones in 2-3 weeks. We hypothesize that a rapid selection of resistance could favor the isolation of clones with intrinsic resistance. To determine the appropriate drug concentrations to use in order to isolate drug-resistant clones, we performed proliferation assays on HCT116 parental cells with our drugs of interest: bortezomib, ixazomib, or CB-5083 (Fig. 1-Supplement 1 A-D). We also isolated bortezomib-sensitive (wild-type; WT) clones by dilution of the HCT116 parental cell line and acquired two published bortezomib-resistant clones (BZ clones A and E) both with mutations in PSMB5 identified by RNA sequencing performed in previous work (Fig. 1-Supplement 1 E) (Wacker et al. 2012). We characterized the bortezomib-resistant clones and found that the median lethal doses (LD50s) for bortezomib were ~2.8- to ~9-fold that of HCT116 parental cells (Fig. 1-Supplement 2 B). In contrast, bortezomib-sensitive clones had LD50s for bortezomib that ranged from ~0.7- to ~1.2-fold that of HCT116 parental cells (Fig. 1-Supplement 2 A). Together these methods provided a total of twelve bortezomib-resistant, five ixazomib-resistant, five CB-5083-resistant, and twelve bortezomib-sensitive clones as well as HCT116 parental cells for our experiments.

      We also updated the legend for Figure 1A.

      • Figure 1. Experimental design for using Cell Painting to examine morphological profiles of drug-resistant cells. (A) Graphic of the experimental workflow: we isolated drug-resistant clones by treating parental HCT116 cells with a high dose of the desired drug and then expanded them for experiments. We isolated drug-sensitive clones by diluting HCT116 cells and then expanded them for experiments. We then performed proliferation assays on select clones to screen for multidrug resistance. Next, we performed Cell Painting on both drug-resistant and -sensitive clones, using multiplexed high-throughput fluorescence microscopy of fixed cells followed by feature extraction and morphological profiling to search for features that contribute to a signature of drug resistance.

      • [Text revision] We clarified that the Bortezomib Signature did not correspond to well position (originally lines #155-157).

      Reviewer 1 - Minor Comment 9

      Line #155-156: "We found that the pattern of Bortezomib Signatures corresponded to the cell identity plate layout", the word "not" is missing before "corresponded".

      We found that the pattern of Bortezomib Signatures did not correspond to well position relative to the plate (Fig. 2-Supplement 7 B), indicating that the well position for each clone was not strongly contributing to its Bortezomib Signature.

      1. [Text revision] We explicitly described the result that some misclassified clones (WT10, WT15, and BZ06) did not have unexpected bortezomib sensitivity as determined by proliferation assays. We also moved the supplementary figure to an updated Figure 3 to better highlight this result (described below in “Figure revisions already carried out”). Lastly, we add a new figure (Figure 5-Supplement 1) to more explicitly analyze the misclassified lines (described below in “New analyses already carried out”).

      Reviewer 3 - Minor Comment 3

      The bortezomib sensitivity of the WT lines used in the last experiments was determined and did not seem to be greater than parental. This could be mentioned in the text; the figure raises the question and the answer is provided, but it's in the supplemental material.

      While the Bortezomib Signature correctly characterized the bortezomib sensitivity of most clones, it consistently misclassified others (WT10, WT15, and BZ06) (Fig 5-Supplement 1 A). Proliferation assays conducted in earlier experiments showed that WT10 and WT15 were sensitive to bortezomib while BZ06 was resistant (Fig. 1-Supplement 2 A and B). By comparing these incorrect predictions with high-confidence correct predictions, we observed differences that varied by clone type, suggesting unique morphology may be driving each of these misclassifications (Fig. 5-Supplement 1 B and C). These results are consistent with the Bortezomib Signature being generalizable to clones not included in the training dataset and suggest that morphological profiling has the potential to identify bortezomib-resistant clones based on the morphological features of cells in the absence of drug treatment.

      1. [Text revision] We clarified that the metrics (accuracy and average precision) were based on median Bortezomib Signature scores of all replicate well-level profiles per clone. We can compare samples based on rank, and difference from 95% confidence interval of permuted data. There is no current way for our method to assign a likelihood. Also note that we have updated the discussion to discuss alternative metrics (see Reviewer 1 - Minor Comment 7) These are very important distinctions, and we are grateful to the reviewer for bringing them up.

      Reviewer 3 - Major Comment 3

      The study classifies cells as binary sensitive or resistant, but would results be improved by scoring based on likelihood of being resistant/sensitive?

      Reviewer 3 - Minor Comment 2

      It is not clear whether the accuracy was based on a percentage of replicates per cell line that were classified correctly or whether that was referring to classification of the cell line overall as sensitive/resistant.

      • We next examined whether the Bortezomib Signature was able to predict the bortezomib resistance of a clone based on morphological profiling data (Fig. 3 A-E and Fig. 3-Supplement 2 A and B). We called the clone bortezomib-resistant if the median Bortezomib Signature of all replicate well profiles was greater than zero and bortezomib-sensitive if the median Bortezomib Signature less than zero. In the training dataset, the Bortezomib Signature correctly predicted the bortezomib resistance of all ten clones, with median Bortezomib Signatures for eight out of ten clones beyond the 95% confidence interval for the randomly permuted data (Fig. 3 A). The accuracy of the Bortezomib Signature was 88% while the average precision was 81% for the training dataset (Fig. 3-Supplement 2 A and B) (see Methods). The signature performed similarly well in the validation dataset (Fig. 3 B), with an accuracy of 92% and an average precision of 89% (Fig. 3-Supplement 2 A and B). In the test dataset the Bortezomib Signature correctly predicted the bortezomib resistance of all clones, though only HCT116 parental cells had a median Bortezomib Signature outside the 95% confidence interval for the randomly permuted data (Fig. 3 C). The test dataset had an accuracy of 80% and an average precision of 68% (Fig. 3-Supplement 2 A and B). Similarly, in the holdout dataset the Bortezomib Signature had an accuracy of 78% and an average precision of 69% (Fig.3 -Supplement 2 A and B), and correctly predicted the bortezomib resistance of twelve out of thirteen clones, with WT01 misclassified as bortezomib-resistant (Fig. 3 D). In the holdout dataset, four of the twelve correctly characterized clones had median Bortezomib Signatures outside the 95% confidence interval for the randomly permuted data.

      We also mirrored language when discussing the ixazomib and CB-5083 results.

      • However, only two of the four correctly identified ixazomib-resistant clones and one of the three CB-5083-resistant clones had median Bortezomib Signatures outside the 95% confidence interval of the randomly permuted data. The area under the ROC (AUROC) curve for ixazomib-resistant and CB-5083-resistant clones (0.63 and 0.60, respectively) was lower than those calculated for the training, validation, test, and holdout datasets. In addition, many of the Bortezomib Signatures for well profiles of ixazomib- and CB-5083-resistant clones, particularly those for CB-5083-resistant clones, landed within the 95% confidence interval of the randomly permuted data. These results suggest that the Bortezomib Signature is not a general signature of UPS-targeting drug resistance and instead has some specificity for bortezomib.

      • [Text revision] We added an explicit note that our image analysis pipelines are also publicly available. Our reporting of our data processing pipelines are documented fully and well above standards in our field. Linking the publicly-available resources with these methods maximizes reproducibility.

      Reviewer 1 - Minor Comment 10

      Additional details on the processing steps in the analysis pipeline in the Methods will be highly appreciated.

      We include all image analysis pipelines at https://github.com/broadinstitute/profiling-resistance-mechanisms (G. Way et al. 2023).

      1. [Text revision] We have compared our approach to the on-disease/off-disease scores as introduced in (Heiser et al. 2020). We agree with the reviewer that a discussion of these two methods would help clarify our phenotypic signature concept. The on/off score is about the degree to which a perturbation pushes disease towards a healthy state. In this case we have 3 sets of data: healthy samples (used for training), disease samples (used for training), and the sample we want to score, which should be of the form "disease + perturbation". With our approach, based on singscore, we also have 3 sets of data: sensitive samples (used for training), resistance samples (used for training), and the sample we want to score. Here, our sample we want to score could be anything, not necessarily of the form "resistance + perturbation". Furthermore, singscore does not have the concept of orthogonality to resistance/sensitivity. This would become relevant if we were exploring perturbations or conditions that would induce a resistant cell line to become sensitive, but we are not doing that here. There are other statistical differences (projection vs. rank based etc.) but the key difference is the applicability of the method to the specific problem at hand.

      Reviewer 1 - Minor Comment 7

      How is the Bortezomib Signature related to the "on-disease"/"off-disease" scores described in https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1.full? Are there other alternatives used for similar binary phenotypic signatures? What is the justification for using these measurements? I would love to see this generalized concept explicitly discussed in the Discussion.

      We added the following to the discussion.

      • The Bortezomib Signature is conceptually similar to the on-disease/off-disease score (Heiser et al. 2020). Both require three phenotypic measurements: a target phenotype representing ideal, a disease phenotype, and a new phenotype to classify. However, our approach is technically different (non-parametric compared to linear projection) and our goals are different (phenotypic classification compared to perturbation alignment). Other methods also enable phenotype labeling, but they focus on single-sample annotation without regard to a target phenotype (Wawer et al. 2014; Rohban et al. 2017; Simm et al. 2018; Nyffeler et al. 2020).

      Figure revisions already carried out

      1. [Figure revision] We moved all boxplots from the original Fig. 2-Supplement 9 to the main text (also splitting Fig. 2 into Fig. 2 and 3). From the original Figure 2, we moved the accuracy and average precision bar graphs to the supplement. We also note that this change increases transparency of the discriminative signal of our signature.

      Reviewer 1 - Minor Comment 8

      I would highly recommend showing the Bortezomib Signatures from Figure 2-Supplement 9. in Fig. 2. This was the main measurement used throughout the manuscript and in my opinion, it is very important to consistently visualize the data along the manuscript, for clarity and easier reader interpretation.

      1. [Figure revision] We adjusted the position of the legend in the accuracy and average precision bar graphs (originally Fig. 2 C and D, now Fig. 3-Supplement 2) for clarity. We also note that keeping the bar chart here is standard best practice (compared to a dot plot).

      Reviewer 1 - Minor Comment 4

      I found the visualization in Fig. 2C-D not intuitive (it is properly explained in the legend). I suggest replacing the accuracy colorbar with a color marker to make it more distinct from the random permutation (|--*--|) The location of the text "mean +- SD of 100 random permutation" made me first think that it is linked to the holdout.

      1. [Figure revision] We changed the point distribution in the boxplots (from expanded to standard) to minimize overlap with the boxplot lines. We also updated the legend text to indicate that individual points in boxplots represent the Bortezomib Signature for well profiles. Note, we paste a representative example of this change above (new Figure 3).

      Reviewer 3 - Minor Comment 1

      I found the box plots somewhat difficult to interpret (especially where the WT lines had a lot of overlap with the red shaded area). Do the points in these charts correspond to replicate wells?

      We also update the figure legend.

      • Plots show values for individual well profiles (points), range (error bars), 25th and 75th percentiles (box boundaries), and median.

      • [Figure revision] [Response to Reviewer 2 - Major Comment 7] We thank the reviewer for allowing us an opportunity to clarify the mechanism. We feel that it is beyond scope of this manuscript to disentangle the molecular alterations that cause bortezomib resistance based on our Cell Painting insights. This wet lab experimental process is arduous and cost prohibitive, and we argue that one of the benefits of taking a morphology approach to resistance status is that we can detect resistant cells (and therefore cells that won’t die when presented with a treatment) without knowing the molecular mechanism.

      Nevertheless, the reviewer has encouraged us to enhance the ability for a reader to view and interpret the signature to perhaps more easily facilitate future work. Previously, we presented our signature in text form in Figure 2-Supplement 4 and in heatmap form in Figure 2-Supplement 5. Here, we add a new figure (Figure 2-Supplement 6; pasted below) which will improve interpretability.

      Reviewer 2 - Major Comment 7:

      Next to feature importance, the authors do not discuss (or I missed) what biology the features represent. Such the reader is left wondering what the actual mechanism of bortezomib resistance could be and if cell painting could shed light on the molecular alterations that cause the treatment resistance. While reviewing, I thus wondered which audience the authors targeted with their manuscript. A more focused analysis of their data that highlights aspects of the study either for the machine learning community, the cell biology community, or the precision oncology community would greatly benefit the manuscript's impact. In its current form, the study's findings seem diluted and spread across a wide range of research questions.<br />

      • Figure 2-Supplement 6. Bortezomib Signature visualized by CellProfiler features. Visualization of CellProfiler features contributing to the Bortezomib Signature. Features with high values (mean signature estimates) in resistant cells are purple while features with low values in resistant cells are green. The mean signature estimates were based on Tukey's Honestly Significant Difference test score and the number in each box represents the number of features used to calculate the mean signature estimate.

      Additionally, we add the following to the results section:

      • We then examined the grouping of features across compartments and channels and found radial distribution features were higher in resistant cells (Fig 2-Supplement 6).

      The code change to generate the signature visualization summary is available at: https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/131

      New analyses already carried out

      1. [New analysis] [Response to Reviewer 2 - Major Comment 5] We agree that a systematic analysis of feature selection methods will provide additional insights not already in the manuscript. Therefore, we have performed two new computational experiments to compare our linear modeling feature selection approach against other standard approaches. We demonstrate that our linear modeling approach is effective at isolating the core differences between resistant and sensitive classes.

      Specifically, we performed two analyses: A) UMAP and B) k-means cluster analysis. We analyzed profiles defined by four different feature selection approaches: 1) Using all traditional CellProfiler features; 2) Using the traditional CellProfiler feature selection approach (removing low variance features, high correlating features, etc.); 3) Using 45 random features (same size as Bortezomib Signature); and 4) Using only the bortezomib signature features. We performed Fisher’s exact tests to derive odds ratios of cluster membership by resistance status and calculated Silhouette widths to quantify relative proximity of clusters.

      This analysis generates a new supplementary figure (see below), and demonstrates that the linear-modeling-based feature selection isolated the features driving the differences between the clone types (resistance vs. wildtype) while the standard approaches do not as effectively separate.

      Reviewer 2 - Major Comment 5:

      A fascinating bit of the manuscript is the description of the feature selection from the screen is done systematically, considering the technical and biological variability and technical artifacts and modeling covariates using linear models seems a very appropriate way of doing so and could serve as another proof of concept that this is indeed the most robust way of modeling and removing signal of technical covariates from the data. Yet, I wondered why the authors do not discuss other means of feature selection or dimensionality reduction; further, they need to show how the features cluster the cell lines or why impact (information content) different features deliver. For an audience interested in the technical aspects of cell painting analysis and machine learning based on the data, that would, IMHO, be the most exciting questions.

      • Figure 3-Supplement 3. Benchmarking linear-modeling feature selection to separate clones by bortezomib resistance. Uniform Manifold Approximation and Projection (UMAP) analysis of the qualitative separability of (A) resistance status and (B) Bortezomib Signature scores across four different feature spaces. (C) k-means clustering from k=2 to k=14 of average odds ratio, maximum odds ratio (Fisher’s exact test), and Silhouette width using Bortezomib Signature features.

      Additionally, we add the following to the results section:

      • We then compared our linear-modeling approach to feature selection against other feature spaces and found that the Bortezomib Signature clusters same-type clones (bortezomib-resistant vs. bortezomib-sensitive) with higher enrichment compared to the full feature space, standard feature selection (see Methods), or a random selection of 45 features (Fig 3-Supplement 3).

      And methods section, describing this analysis:

      • We were also interested in comparing the ability of different feature spaces to cluster clones of the same type (resistant vs. sensitive). This analysis would determine if the Bortezomib Signature features, which we derived using linear modeling to isolate biological from technical variables, had a greater ability to cluster. We compared the Bortezomib Signature against three other feature spaces: 1) the full feature space, 2) standard feature selection (see Image data processing methods), and 3) 45 randomly selected features. We performed two analyses using these four feature spaces including Uniform Manifold Approximation and Projection (UMAP) (McInnes et al. 2018) and k-means clustering. For UMAP, we used default umap-learn parameters to identify two UMAP coordinates per feature space. We then visualized the clusters by their resistance status and Bortezomib Signature score. The UMAP analysis represents a qualitative analysis. Next, we applied k-means clustering with 25 initializations across a range of 2-14 clusters (k). Prior to clustering and for each feature space, we applied principal component analysis (PCA) and transformed each feature space into 30 principal components. This step was necessary to compare k-means clustering metrics, which are sensitive to the feature space dimensionality. We applied a Fisher’s exact test to each cluster using a two-by-two contingency matrix that specified cluster membership for each clone classification (resistant vs. sensitive). We visualized the mean odds ratio and max cluster odds ratio for each feature space across k. A high odds ratio tells us that the feature space effectively clusters clones of the same resistance status. Lastly, we calculated Silhouette width (the average proximity between samples in one cluster to the second nearest cluster) for each feature space across k.

      The code change to derive the UMAP coordinates, perform clustering, and generate the figure is available at https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/132

      1. [New analysis] [Response to Reviewer 3 - Major Comment 1] We thank the reviewer for this suggestion, which allowed us to explore the misclassified samples in more depth. We added a new supplementary figure in which we summarized all bortezomib clones (wildtype and resistant) in their accuracy based on the bortezomib signature (panel A). We did not include training set samples in this analysis. Using samples that were consistently incorrectly classified with high confidence (three samples: WT15, BZ06, WT10) we performed two separate two-sample Kolmogorov–Smirnov (KS) tests. Specifically, we compared high incorrect wildtype to high correct wildtype and high incorrect resistant to high correct resistant. Our results indicate that most bortezomib signatures were significantly different between correct and incorrect assignments (panel B), and that the signature features varied between resistant and wildtype misclassification tests (panel C).

      Reviewer 3 - Major Comment 1:

      While the claims are largely substantiated, there are a few points where further consideration would improve the manuscript. Several cell lines were mis-classified with what appears to be a high degree of certainty. Can the authors tell what was driving those predictions? Was there something in the morphological signature that weighed more heavily in those cases?

      • Figure 5-Supplement 1. Examining the accuracy of clone classification and misclassification of clones. (A) Proportion of high-confidence correct, low-confidence correct, low-confidence incorrect, and high-confidence incorrect predictions of well profiles across clones in the test, holdout, and validation sets. High-confidence predictions (high) had a Bortezomib Signatures greater (resistant clones) or less than (sensitive) the 95% confidence interval of randomly permuted data while low-confidence predictions (low) had Bortezomib Signatures within the 95% confidence interval of randomly permuted data. (B) Visualization of Kolmogorov-Smirnov (KS) test statistic means of feature groups across channels and cellular compartments. (C) Plot of the KS test statistic means for feature groups in bortezomib-resistant vs. -sensitive cells. Each feature group is color coded by the imaging channel.

      Additionally, we add the following to the results section:

      • While the Bortezomib Signature correctly characterized the bortezomib sensitivity of most clones, it consistently misclassified others (WT10, WT15, and BZ06) (Fig 5-Supplement 1 A). Proliferation assays conducted in earlier experiments showed that WT10 and WT15 were sensitive to bortezomib while BZ06 was resistant (Fig. 1-Supplement 2 A and B). By comparing these incorrect predictions with high-confidence correct predictions, we observed differences that varied by clone type, suggesting unique morphology may be driving each of these misclassifications (Fig. 5-Supplement 1 B and C). These results are consistent with the Bortezomib Signature being generalizable to clones not included in the training dataset and suggest that morphological profiling has the potential to identify bortezomib-resistant clones based on the morphological features of cells in the absence of drug treatment.

      And methods section, describing this analysis:

      Some profiles were consistently predicted incorrectly with high confidence but in the opposite direction (see Figure 5-Supplement 1). For a well-level profile to be categorized as high-confidence (in either the correct or incorrect directions), it needed to score beyond the 95% confidence interval of the randomly permuted data range. For example, a high-confidence incorrect resistant profile would have a Bortezomib Signature below 95% confidence interval of the randomly permuted data. To evaluate the features driving the differences in these samples, we applied two-sample Kolmogorov–Smirnov (KS) tests per Bortezomib Signature feature. We applied these tests to two separate groups: 1) misclassified bortezomib-sensitive vs. high-confidence accurate bortezomib-sensitive and 2) misclassified bortezomib-resistant vs. high-confidence accurate bortezomib-resistant.

      The code change to generate the UMAP coordinates and figure is available at https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/130

      Description of analyses that authors prefer not to carry out

      1. [Response to Reviewer 2 - Minor Comments 1 and 2]: These are interesting suggestions! Still, we prefer not to speculate on the biological mechanism of the Bortezomib signature. Connecting morphological features identified as contributing to the Bortezomib Signature by Cell Painting to specific biological pathways would demand considerable cell-based assays to validate. In addition, our analyses suggest that the features contributing to the Bortezomib Signature are spread across a range of cellular compartments and channels, making it difficult to pin down specific mechanisms or pathways as likely contributors to bortezomib resistance. However, we are adding a figure to increase interpretability of the signature, which will aid in developing future hypotheses. Note that the signature was not possible to detect by eye (Fig. 2 A).

      Reviewer 2 - Minor Comment 1:

      There could be some speculation on the mechanism of Bortezomib resistance concerning the literature with the existing image data. For example, Bortezomib resistance is connected to serine synthesis and how a particular feature could contribute to the known mechanism.<br />

      Reviewer 2 - Minor Comment 2:

      Along the same lines, the authors could show that larger cells lead to resistance with microscopic images.

      2. [Response to Reviewer 2 - Major Comment 8]: We appreciate the reviewer’s concern that our work using HCT116 clonal cells lines may not directly reflect results from patient samples. Our choice was based on previously published work demonstrating the efficiency with which HCT116 cells generate resistant clones due to diminished DNA mismatch repair and decreased expression of drug efflux pumps. Since our work is a proof of concept rather than a comprehensive demonstration of translating morphological profiling into clinical practice, we believe that experiments using multiple patient cell lines from different tissues as well as digital pathology records to be beyond the scope of this work. We instead chose to tone down the language of our manuscript to more clearly acknowledge the limitations of our work and clarify this as a proof of concept.

      Reviewer 2 - Major Comment 8 (relevant excerpt):

      I suggest the authors test their approach on at least two other cell lines (maybe from different tissues) and benchmark their results against a dataset of digital pathology where such predictions are made from stained and analyzed tissue slices. This way, after a thorough benchmark against related third-party data sets, the method would significantly gain relevance, the paper would appeal to a broader audience, and the advance gains more merit.<br />

      3. [Response to Reviewer 3 - Major Comment 2]: The bortezomib sensitivity of ixazomib- and CB-5083-resistant clones was not determined, and hence can not be ruled out as a possible explanation for their high Bortezomib Signature scores. However, we prefer not to conduct additional proliferation assays for the misclassified clones (IX02, WT06, CB14, CB16) in the presence of bortezomib to determine whether coincidental bortezomib resistance might explain the signature performance. Our rationale is that three other misclassified clones (WT10, WT15, and BZ06) had the expected bortezomib sensitivity in proliferation assays (Fig. 1-Supplement 2), meaning that additional proliferation assays may not reveal any insights regarding the signature performance.

      Reviewer 3 - Major Comment 2:

      Was the bortezomib sensitivity of the IX (or CB) resistant cell lines determined? If there were differences, this could explain some of the variation in the morphological signatures. This could be easily done in one or two growth experiments.

      4. [Response to Reviewer 2 - Major Comment 7]: Thank you for pointing this out. Our goal is to keep the study multi-disciplinary. We are adding a figure to increase interpretability of the signature, and adding text-based clarifications.

      Reviewer 2 - Major Comment 7 (relevant excerpt):

      While reviewing, I thus wondered which audience the authors targeted with their manuscript. A more focused analysis of their data that highlights aspects of the study either for the machine learning community, the cell biology community, or the precision oncology community would greatly benefit the manuscript's impact. In its current form, the study's findings seem diluted and spread across a wide range of research questions.<br />

      5. [Response to Reviewer 2 and 3 - Major Comments 6 and 4]: We prefer not to expand the scope of the model to predict other drug signatures. This would require a substantial amount of work to generate the appropriate drug-resistant clones, collect the imaging data, and analyze it, and we think it important to convey the purpose of our paper is proof of concept. We do not feel that the time invested in performing this analysis would result in adequate returns beyond what we already demonstrate.

      Reviewer 2 - Major Comment 6.

      Interestingly, the Bortezomib signature is specific to the drug and not a broad range of proteasomal inhibitors. However, seeing the common features between all the proteasomal inhibitors would be interesting.

      Reviewer 3 - Major Comment 4

      There was some predictive ability of the Bortezomib Signature for ixazomib resistance. Were there some features that were correlated with IX-resistance, i.e. UPS pathway, versus specific to bortezomib? Do the features suggest anything about resistance mechanisms or is the feature set too abstruse to interpret?

      References

      Foroutan, Momeneh, Dharmesh D. Bhuva, Ruqian Lyu, Kristy Horan, Joseph Cursons, and Melissa J. Davis. 2018. “Single Sample Scoring of Molecular Phenotypes.” BMC Bioinformatics 19 (1): 404.

      Heiser, Katie, Peter F. McLean, Chadwick T. Davis, Ben Fogelson, Hannah B. Gordon, Pamela Jacobson, Brett Hurst, et al. 2020. “Identification of Potential Treatments for COVID-19 through Artificial Intelligence-Enabled Phenomic Analysis of Human Cells Infected with SARS-CoV-2.” bioRxiv. https://doi.org/10.1101/2020.04.21.054387.

      McInnes, Leland, John Healy, Nathaniel Saul, and Lukas Großberger. 2018. “UMAP: Uniform Manifold Approximation and Projection.” Journal of Open Source Software 3 (29): 861.

      Nyffeler, Johanna, Clinton Willis, Ryan Lougee, Ann Richard, Katie Paul-Friedman, and Joshua A. Harrill. 2020. “Bioactivity Screening of Environmental Chemicals Using Imaging-Based High-Throughput Phenotypic Profiling.” Toxicology and Applied Pharmacology 389 (January): 114876.

      Rohban, Mohammad Hossein, Shantanu Singh, Xiaoyun Wu, Julia B. Berthet, Mark-Anthony Bray, Yashaswi Shrestha, Xaralabos Varelas, Jesse S. Boehm, and Anne E. Carpenter. 2017. “Systematic Morphological Profiling of Human Gene and Allele Function via Cell Painting.” eLife 6 (March). https://doi.org/10.7554/eLife.24060.

      Simm, Jaak, Günter Klambauer, Adam Arany, Marvin Steijaert, Jörg Kurt Wegner, Emmanuel Gustin, Vladimir Chupakhin, et al. 2018. “Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery.” Cell Chemical Biology 25 (5): 611–18.e3.

      Wacker, Sarah A., Benjamin R. Houghtaling, Olivier Elemento, and Tarun M. Kapoor. 2012. “Using Transcriptome Sequencing to Identify Mechanisms of Drug Action and Resistance.” Nature Chemical Biology 8 (3): 235–37.

      Wawer, Mathias J., Kejie Li, Sigrun M. Gustafsdottir, Vebjorn Ljosa, Nicole E. Bodycombe, Melissa A. Marton, Katherine L. Sokolnicki, et al. 2014. “Toward Performance-Diverse Small-Molecule Libraries for Cell-Based Phenotypic Screening Using Multiplexed High-Dimensional Profiling.” Proceedings of the National Academy of Sciences of the United States of America 111 (30): 10911–16.

      Way, Gregory, Yu Han, David Stirling, and Shantanu Singh. 2023. Broadinstitute/profiling-Resistance-Mechanisms: Analysis for Preprint. Zenodo. https://doi.org/10.5281/ZENODO.7803787.

      Way, Gregory P., Maria Kost-Alimova, Tsukasa Shibue, William F. Harrington, Stanley Gill, Federica Piccioni, Tim Becker, et al. 2021. “Predicting Cell Health Phenotypes Using Image-Based Morphology Profiling.” Molecular Biology of the Cell 32 (9): 995–1005.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors use Cell Painting, a high-content image-based phenotypic assay, to distinguish between clonal cancer cell lines that are resistant versus sensitive to a proteasome inhibitor anti-myeloma drug called bortezomib. The authors characterized a high-dimensional cell morphology signature for bortezomib-resistance, evaluated it on an independent subset of cell lines, and evaluated specificity in respect to other drugs targeting the ubiquitin-proteasome system. The authors thus propose image-based morphology characterization as an alternative method for characterizing drug resistance.

      Strengths: solid methodology - cell lines validation of drug resistance, extensive data collection, thorough validation of the analysis pipeline, avoiding potential confounders, biases and proper data partitioning to test and hold-out (what the authors refer to as "machine learning best practices").

      Weakness: weak discriminative signal. Some aspects of the writing could be improved to make the manuscript easier to follow (see Minor comments).

      Major comments:

      While I am convinced that the signature captures morphological phenotypes associated with drug resistance, at the cumulative scale, the discriminative signal of a single cell type seems weak. Specifically, it is not clear whether the signature can effectively capture the drug resistance of a single cell line. In Figure 2-Supplement 9, considering the test (C) and the holdout (D), only 1/9 BZ clones' median signatures were beyond the 95% confidence interval, with 4/6 and 2/6 WT cell types with median signatures beyond the positive and negative 95% confidence interval correspondingly. When defining bortezomib-sensitivity according to the median signatures' sign (>0 or <0) of a cell line, Figure 2-Supplement 9 shows that in the test+holdout there are 9/9 correct bortezomib-resistance (BZ) and 6/7 correct bortezomib-sensitive (WT) predictions. However, similar discrimination levels also appeared in the other drugs (ixazomib, CB-5083), making the statements about specificity less grounded. When the authors evaluate the AUROC they report ~0.6 (line #194) for the non-specific (ixazomib, CB-5083) drugs versus ~0.75 for bortezomib-resistance (line #202). With Fig. 4, the data fully supports the argument that the bortezomib-signature encodes bortezomib-resistance, but the signal is weak. Thus statements such as "We found the Bortezomib Signature could predict whether a cell line was bortezomib-resistant or bortezomib-sensitive" (line #172) and the specificity statements in the abstract" (line #28) are not supported by the data in my opinion. I would recommend the authors to tune down these and other related statements throughout the manuscript. An alternative would be to increase the number of wells and see whether this weak signal can indeed be statistically amplified with many replicates to make a robust and specific characterization of a cell line's bortezomib-sensitivity (but I assume this is a lot of work and probably out of scope of this manuscript). I think it is also important to discuss in more detail the interpretation of these results (including Figure 2-Supplement 9), in this context, in the Discussion.

      Minor comments:

      Suggested clarifications (some might be less relevant if the manuscript is designed for experts in the more clinical domain who are familiar with these terms / style):

      1. More information in Fig. 1's legend would be helpful to follow the experimental design. I found it hard to follow in its current form and had to go back to carefully reading the main text to fully understand.
      2. What is LD90 (line #87)? LD50 (line #97)?
      3. It was not clear to me in the text which and how many cell lines were evaluated and the reader is forced to go to the SI. For example, "(BZ01-10 and BZ clones A and E)" (line #96-97) and "wild-type clones (WT01-05, 10, and 12-15)" (line #98) appeared when presenting the results without a clear explanation and made it harder for me to follow. Summary of the data (for example, based on Figure 2-Supplement 8) can be briefly mentioned in the text to make it more clear for the reader.
      4. I found the visualization in Fig. 2C-D not intuitive (it is properly explained in the legend). I suggest replacing the accuracy colorbar with a color marker to make it more distinct from the random permutation (|--*--|) The location of the text "mean +- SD of 100 random permutation" made me first think that it is linked to the holdout.
      5. Lines #104-111 were duplicated in lines #114-122.
      6. The "Bortezomib Signature" is a critical measurement but is only briefly mentioned in lines 150-151 ("..based on the direction-sensitive ranking method for phenotype analysis, singscore (Foroutan et al., 2018)"). Please provide more information/intuition.
      7. How is the Bortezomib Signature related to the "on-disease"/"off-disease" scores described in https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1.full? Are there other alternatives used for similar binary phenotypic signatures? What is the justification for using these measurements? I would love to see this generalized concept explicitly discussed in the Discussion.
      8. I would highly recommend showing the Bortezomib Signatures from Figure 2-Supplement 9. in Fig. 2. This was the main measurement used throughout the manuscript and in my opinion, it is very important to consistently visualize the data along the manuscript, for clarity and easier reader interpretation.
      9. Line #155-156: "We found that the pattern of Bortezomib Signatures corresponded to the cell identity plate layout", the word "not" is missing before "corresponded".
      10. Additional details on the processing steps in the analysis pipeline in the Methods will be highly appreciated.

      Referees cross-commenting

      My main critic is regarding "over selling" a weak discriminative signal. Specifically, I am not convinced that the major claims regarding predicting sensitivity and specificity at the single cell types scales are supported by the data. Since reviewer #2 and #3 did not raise this concern I think it is worth discussion here.

      Once these statements are tuned down - I think no significant additional work is needed to make the point that they can measure a discriminative signal. If they want to make these claims, perhaps they'd like to collect more data to gain statistical power (but I am not optimistic this will work at the single cell level).

      Personally, I was happy with the authors' choice of cell lines not included in the training dataset. I am not convinced that additional cell lines + validations are necessary for making the point of a proof of principle.

      Significance

      Cell Painting was applied to many applications, but as far as I am aware this is the first attempt for an image-based phenotypic characterization of drug resistance. While the authors established that this approach can measure, to some extent, bortezomib-sensitivity, at the current state of the results, I am not convinced that cell painting can be practically used to assess bortezomib-sensitivity of a single cell line. However, this does not imply that the same approach can not achieve the goal, perhaps by using other cell painting markers for bortezomib-sensitivity, or with the same markers to assess sensitivity of different drugs. The cell painting + analysis approaches are not new and the clinical impact is questionable, but the technical aspects (data, analysis) are exceptional and the concept may hold as I described above.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary: Forer and Otsuka provide first-rate evidence for tethers fixed in place between separating anaphase chromosomes using electron tomography. The authors traced the anaphase movement of a number of living cells before fixation for examination using electron tomography. The manuscript is clearly written and provides an excellent introduction and discussion of the known literature. The reader will have an excellent background to see the importance of this work.

      Major comments:<br /> - Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      No further experiments are needed. The data are very supportive, and extremely clear.<br /> - Are the data and the methods presented in such a way that they can be reproduced? Yes.<br /> - Are the experiments adequately replicated and statistical analysis adequate? Yes.

      Minor comments:<br /> - Are prior studies referenced appropriately? Yes.<br /> - Are the text and figures clear and accurate? Absulotely.<br /> - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      The authors are to congratulated on their major contribution to this study on tethers between separated daughter chromosomes. It is a tpur deforce to go from the living cells to fixing and identifying the same separated chromosomes using electron tomography to see the ultrastructure of the fibers seen fir.

      Referees cross-commenting<br /> Thank you reviewer #2. The manuscript should be published. It is an excellent contribution.

      We thank the reviewer for the appreciation of the clarity and quality of our work.

      Reviewer #1 (Significance):

      Provide contextual information to readers (editors and researchers) about the novelty of the study, its value for the field and the communities that might be interested.

      This manuscript is the first to use electron tomography to identify the tethers between separated anaphase chromosomes. Forer and the laetMichael Berns and their co-authors have published a number of papers using phase microscopy and lasers to report on the physical nature and elastic properties of these fibres in the past. Forer and Otsuka have presented first-rate evidence for the reality of these structures using electron tomography. This manuscript should highlighted in the published journal.<br /> The chemical identity of these fibers as the authors state is unclear.

      The following aspects are important:

      • Audience: describe the type of audience ("specialized", "broad", "basic research", "translational/clinical", etc...) that will be interested or influenced by this research; how will this research be used by others; will it be of interest beyond the specific field?

      This exciting contribution will be read by anyone interested in mitosis. It will be of interest to all Cell Biologists because of the careful manner in which the living cells were studied before they were fixed for examination using electron tomography. The readers will be dreaming how they can use this process on their Cell Biology problems._

      • Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      I am a cell Biologist who has made contributions, both in light microscopy and in transmission microscopy on diving cells, both in tissue culture and in situ in aviav and zebrafish embryos.

      We thank the reviewer for appreciating the significance of our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this paper, the authors use light microscopy and electron tomography to study anaphase chromosomes in crane fly spermatocytes. They find that there are two "tether" structures that connect telomeres of sister chromatids. One tether is thicker (denser) and extends between sister chromatids during early but not late anaphase, whereas a second, less-dense tether maintains contact with both sister chromatids in all examined stages of anaphase. The paper makes arguments as to what the tethers could or could not be. Specifically, they are too numerous to be ultrafine DNA bridges seen in various normal or abnormal segregation events and they also do not affect anaphase chromosome motion the same way ultrafine DNA bridges do.

      Major comments:<br /> The major claim that there are tethers that connect sister chromatids in anaphase is supported by the data. Moreover, the data resolves two types of tethers on the basis of their density. While it is unclear what the composition of the tethers are, the paper makes a convincing case that they cannot be the DNA ultrafine bridges seen in other studies. The discussion has sufficient caveats that most readers will see that more work is needed to identify the composition of the two tethers. In my opinion, no further experiments are needed to support the modest claims of this paper. Therefore, I only have minor comments that may hopefully improve the paper's clarity.

      We thank the reviewer for the positive evaluation of our work.

      Minor comments:<br /> It was argued that the tethers reported here were also seen in other species and cellular contexts, where the imaging work was done with projection EM imaging. Presumably, what is new here is the usage of electron tomography. It would help readers if the authors explained why the electron tomography done here was essential to arrive at key conclusions.

      Thank you for the useful comment. We have added the explanation of why electron tomography was critical to visualise small tether structures to the last paragraph of the Discussion on page 7.

      p.3 mitochondria appeared to be fixed properly ... (e.g., Figs. 1C, 2B) - I don't see any mitochondria in any figures. Perhaps this observation should be noted as "not shown"?

      We thank the reviewer for pointing this out. We have added an electron micrograph of mitochondria to the Supplementary Figure 1.

      p.3 The images shown in Figs. 1, 2, 4 - The figures should be called out in the order; in this case, Fig 3 has not been called out yet.

      We have corrected the order of the figures.

      p.4 we did not find any other connecting structures - Because the sample was processed by traditional EM methods, it's safer to add a caveat that other connecting structures could be missed if they were disrupted by sample prep or if they did not pick up stain as well as the two structures presented in this paper.

      We have clarified that our sample was chemically fixed in the first paragraph of the Discussion on page 4. Because the details of how our samples were prepared are described in the Method section, we did not add further details to this paragraph.

      p.7 we expect such structures to be commonly seen in other cell types as well if they are examined carefully - Instead of saying that examinations should be done "carefully", it would be more helpful to specify how other cell types should be examined. This work shows that the bridges can be found if the cells are either sectioned parallel to the spindle axis or if a sufficiently large volume is sampled.

      We have now clarified that 3D electron microscopy techniques such as electron tomography are critical to visualise small tether structures in the last paragraph of the Discussion on page 7.

      Please use consistent spelling/hyphenation of ultrafine/ultra-fine and word choice (strands vs. bridges).

      Referees cross-commenting<br /> I agree with my co-reviewers's comments and have no further suggestions._

      Reviewer #2 (Significance):

      This may be the first use of electron tomography to study the structural details of tethers that connect chromosomes in anaphase cells. The data is of sufficient quality to reveal differences in density. Namely, one class of tether appears to be an extension of the chromosome while the other class is composed of thin filaments. This study is novel in that it characterizes a mitosis-associated complex that is poorly studied compared to the microtubule-based spindle apparatus and the kinetochore. Hopefully, the tethers will draw more attention and further characterization by methods like super-resolution microscopy and cryo-electron microscopy. My expertise is in chromatin, mitotic machines, and cryo-electron tomography.

      We thank the reviewer for appreciating the novelty and the impact of our work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      Tethers between telomeres of chromosomes in anaphase were inferred from earlier studies of laser microbeam cutting experiments. The current paper presents images from electron tomography of crane fly spermatocytes that substantiates the earlier inference. The authors deduce that the darker filaments and the lighter filaments that they visualize may be the structural tethers at telomeres.

      Major comments:

      The experiments are carefully done, and the conclusions are appropriately worded to qualify any caveats. This short communication is well-presented, and I have only a few comments._

      We thank the reviewer for appreciating the clarity and quality of our work.

      The authors should expand their list of references on bridges to include those listed by Warecki et al (Curr Biol 33:1-17, 2023; refs 15-26, etc).

      We do not think it is necessary to expand the list of references for ultra-fine DNA bridges. In the article we submitted, we discussed the Warecki at al article in the penultimate paragraph of the Discussion; we concluded that the bridges that Warecki at al described are different from ours in having so few per cell that they couldn’t be tethers, and further that there was no evidence that those bridges were elastic. For those reasons, we do not find discussion of those proteins relevant to tethers, any more than would listing all the proteins associated with ultra-fine DNA bridges be relevant to the elastic tethers.

      In the Discussion, we discussed data suggesting that a known elastic protein titin was present; that is as far as we wanted to go on speculation of what the elastic component of tethers might be.

      The authors present arguments that the tethers are not the DNA bridges observed by others. However, they should try to address this experimentally by treatment of their preparations with DNase to see if the thick and/or thin filaments disappear.

      While we agree that it would be important to identify the components of the tethers, we are concerned that those experiments are beyond the scope of this manuscript. Nevertheless, we appreciate the constructive suggestion for the future research direction.

      Moreover, they should discuss in more detail the possible functions of (DNA) bridges, including the recent model from Bill Sullivan's lab (Warecki et al, Curr Biol, 2023) that they help to retain fragments of broken chromosomes. In addition, the authors should summarize the various proteins that may be associated with the bridges (as enumerated in the Warecki et al 2023 paper).

      As we describe above, we concluded that the bridges Warecki at al described are different from the tethers that we report in our manuscript. Therefore, we do not think it is necessary to expand the discussion on the proteins and functions associated with ultra-fine DNA.

      The authors could add a sentence to the Results or Discussion of whether the thicker tethers might become stretched as anaphase progresses to become the thinner tethers (Fig. 4G).

      We thank the reviewer for this suggestion. We actually mentioned this possibility in the third paragraph of our Discussion on page 7.

      The authors may want to add a few sentences to the Discussion about the "chromosomal bouquet" stage of leptotene of meiosis prophase I where the telomeres of chromosomes seem pulled together and associate with the nuclear envelope --- they could speculate if this might also be due to the tethers that they describe in spermatocytes.

      This is a very interesting possibility. While we would refrain from adding this speculation to our manuscript as it is beyond the scope of the main points, it is certainly an interesting avenue of future research.

      Minor comments:

      A few additional comments are as follows:

      p. 2 last sentence of first paragraph -modify the wording about "no structural evidence that identifies physical connections between separating telomeres", since there is some information from genetic and cell biology light microscopy experiments. Perhaps simply change "structural" to "ultrastructural".

      We have changed the wording as the reviewer recommended

      p. 6, 5th line of second paragraph - change "ribosome DNA" to "ribosomal DNA"

      We have corrected it.

      Figure 1D - add the chromosome to the right of the schematic model (as suggested by Fig. 1B).

      We are sorry for the confusion. In Figure 1D, the left half of the tethers are 3D modelled and shown. We have clarified this point by modifying the legend of Figure 1D

      p. 17 (Methods), line 10 of first paragraph - state if this is light or heavy Halocarbon oil (give details).

      It is a mixture of heavy and light Halocarbon oil. We have clarified it on page 17.

      p. 17 (Methods), line 12 of first paragraph- state the concentration for fibrinogen and for thrombin.

      As we wrote in the original manuscript, the procedures are described in detail in our previous publication (Forer A. & Pickett-Heaps J. (2005) Fibrin clots keep non-adhering living cells in place on glass for perfusion or fixation. Cell Biology International 29: 721–730). Nonetheless, to clarify this point, we have modified the text on page 17.

      p. 17 (Methods), line 4 of second paragraph - is there any data to show that the filaments (tethers) occur if there is no cold shock?

      Yes, we do see similar filamentous structures in the sample without cold shock. For your information, we show one of the electron micrographs below. In our manuscript, we show the data from the samples prepared with cold shock, because it better visualizes the filamentous structures. We now show these electron micrographs in the Supplementary Figure 2.

      Referees cross-commenting<br /> I concur with Reviewers #1 and #2 that this is a fine paper that should be published. My detailed comments submitted with my review are simply meant as revisions to further strengthen this paper.

      We thank the reviewer for supporting the publication of our manuscript.

      Reviewer #3 (Significance):

      Strengths: This is an important conceptual advance and the carefully done ultrastructural imaging provides the foundation for future studies that could delve into the molecular composition and functional significance of the tethers at telomeres of anaphase chromosomes seen here by 3D electron microscopy.

      Limitations: the molecular composition and functional roles are not yet known for the tethers seen here by 3D electron microscopy, but to do so would involve an entire new program of experimentation.

      Advances: there have only been two earlier ultrastructural papers on tethers at telomeres, and the tethers were peripheral to the main focus of those papers. Thus, the current paper extends our ultrastructural information about tethers.

      Audience: this work is of importance for scientists who study the mechanics of chromosome movement on spindles, including regulation to combat aneuploidy. This work will also be important for a broader audience to inform them about transmission of the hereditary information to daughter cells._

      We thank the reviewer for appreciating the significance and the impact of our work.

    1. While there are rich areas of study in animal communication and interspecies communication, our focus in this book is on human communication. Even though all animals communicate, as human beings we have a special capacity to use symbols to communicate about things outside our immediate temporal and spatial reality (Dance & Larson, 1976). For example, we have the capacity to use abstract symbols, like the word education, to discuss a concept that encapsulates many aspects of teaching and learning. We can also reflect on the past and imagine our future. The ability to think outside our immediate reality is what allows us to create elaborate belief systems, art, philosophy, and academic theories. It’s true that you can teach a gorilla to sign words like food and baby, but its ability to use symbols doesn’t extend to the same level of abstraction as ours. However, humans haven’t always had the sophisticated communication systems that we do today.

      With 126 published definitions of "communication," touching on other forms of communication other than merely speaking in a speech class is vital. With humans having some of the widest range of speech (i.e. various languages) that often times are not seamless, other universal abstract symbolism in conjunction with spoken communication is necessary to bridge the gap. Even our written language and assigned meaning to certain methodic squiggles displayed on paper varies widely, as well as other less obvious ways of communicating like gestures and body languages that could seem inconsequential to one may be monumentally offensive to others, the intricate woven methods to communicate within the complexities we as a human species have created is a fascinating study beyond merely standing in front of a group of peers and talking at them for 3-5 minutes about a chosen topic.

    2. Like other forms of communication, intrapersonal communication is triggered by some internal or external stimulus. We may, for example, communicate with our self about what we want to eat due to the internal stimulus of hunger, or we may react intrapersonally to an event we witness. Unlike other forms of communication, intrapersonal communication takes place only inside our heads.

      Everyone on this planet has intrapersonal communication. I talk to myself every day, and I have conversations with myself on what I'm going to do or what I need to do. Some people talk to themselves to calm down, or they journal to ease their minds. When something surprising happens people usually react somehow in their head, basically when anything happens people react to themselves. Just as the text states, "We also use intrapersonal communication or “self-talk” to let off steam, process emotions, think through something, or rehearse what we plan to say or do in the future." Intrapersonal communication happens almost every second throughout one person's day.

    1. In fact, it might be good if you make your first cards messy and unimportant, just to make sure you don’t feel like everything has to be nicely organized and highly significant.

      Making things messy from the start as advice for getting started.

      I've seen this before in other settings, particularly in starting new notebooks. Some have suggested scrawling on the first page to get over the idea of perfection in a virgin notebook. I also think I've seen Ton Ziijlstra mention that his dad would ding every new car to get over the new feeling and fear of damaging it. Get the damage out of the way so you can just move on.

      The fact that a notebook is damaged, messy, or used for the smallest things may be one of the benefits of a wastebook. It averts the internal need some may find for perfection in their nice notebooks or work materials.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript by Rigger and Brenner details the role of vimentin network, in advancing OA pathogenesis by exacerbating premature senescence. The data is well presented and the study of interest, in that there is little known about vimentin in cartilage biology.<br /> The authors used OA derived cartilage explants and chondrocytes cultures, were graded for severity and compared accordingly. Figure 1 shows that markers of senescence are increased with structural damage, which is well established and consistant with the literature. Using a DOX model the authors induce premature senescence and exhibit a disrupted vimentin network. However, upon KD of CDKN2A, a marker of senescence, but did not observe complete reversal of CSV presentation.<br /> Next the authors show in figure 4 and 5, that the reduction or dismemberment of vimentin structures are linked to senescence and may act as contributing factors.<br /> Figures 6 and 7 then go on to show that upon advanced passage chondrocytes lose their vimentin network, and tend to senesce and mineralize.

      Reviewer #1 (Significance):

      Strength:<br /> This is a very novel study showing a link between vimentin and senescence in chondrocytes. The data are in line with other data. The work is clearly written structured and well displayed.

      Author´s response:<br /> We thank reviewer #1 for their interest in our work and their overall positive report.

      Suggestions for improvement:

      While the study is very thorough ought in describing the markers of senescence and vimentin network, it lacks insight regarding mechanism which isn't completely deciphered. Are there links to key transcription factors?

      Author´s response:<br /> The transcriptional regulation of vimentin in human cells is very complex. The VIM promoter region comprises multiple elements, such as a NF-kB- binding site, a PEA3-binding site and two AP1-binding sites (Zhang et al., 2003). Moreover, it was recently demonstrated that redox signaling is involved in vimentin expression at the wound margin after tissue injury in zebra fish (LeBert et al., 2018). However, it has also been reported that IL-1ß stimulation results in reduced gene expression of vimentin via p38-signalling in cartilage degeneration and OA progression (see manuscript REF. 36,37).

      In our study, we observed that enhanced CSV levels are associated with a decreased vimentin gene expression, indicating a lower stability of the mRNA or decreased transcription of VIM in senescent chondrocytes (maybe due to enhanced p38-signalling as mentioned above). Since the transcriptome in senescent cells is radically changed, this question cannot be answered easily.

      In future studies, we will rather try to clarify the underlying mechanism of vimentin externalization. There are still many questions to be answered: is the CSV anchored in the cell membrane (which anchor protein?) and is there still a connection to the intracellular vimentin network? Which proteins are involved in the externalization process: maybe comparable to phosphatidylserine exposure, mediated by flippases, scramblases, and lipid transfer proteins or rather by vesicles?

      Literature mentioned above (not included in manuscript):

      LeBert et al., 2018: Damage-induced reactive oxygen species regulate vimentin and dynamic collagen-based projections to mediate wound repair. DOI: 10.7554/eLife.30703

      Zhang et al., 2003: ZBP-89 represses vimentin gene transcription by interacting with the transcriptional activator, Sp1. DOI: 10.1093/nar/gkg380

      It is also unclear if disruption of the network is more detrimental than KD in promoting senescence.

      Author´s response:<br /> KD of Vimentin led to a gradually decrease of intracellular Vimentin content and consequent stress. The cells were analyzed 7 days after induction of the KD and exhibited a stable senescent phenotype, comparable to Doxorubicin-treated chondrocytes (treated with very low concentrations over several days to produce only mild but ongoing stress). These models might reflect the pathophysiologic situation: We think that cellular stress due to mechanical impact and subsequent oxidative stress/ low-grade inflammation might lead to a gradual disruption or re-organization of the vimentin network, which is accompanied by decreased vimentin gene expression.

      In case of the disruption of the vimentin network by Simvastatin, the stress response was very intense and rapid (24 h), and was only conducted as a proof-of-principle experiment. Despite the upregulation of some senescence-associated markers, we don`t think that permanent Simvastatin treatment would be suitable to obtain a stable senescent phenotype, but rather expect the cells to die due to excessive stress.

      It would have been good to include models OA murine models to understand these processes better, and make a stronger physiological connection with OA of the joint.

      Author´s response:<br /> The CSV antibody is only suitable for human cells and cannot be used for immunohistochemistry. Therefore, all previous reports of CSV are based on human (isolated) cells. At the current time point, it would not be possible to stain CSV in joints of mice after induction of PTOA due to the methodological limitations. We actually tested the CSV-antibody in isolated lapine chondrocytes and found a high percentage of CSV-positive cells, even at low passages. Although stress increased the amount of CSV-positive lapine cells, we did not consider the results as reliable due to the high percentage in un-stressed cells, which might result from unspecific antibody binding.

      Overall, we think that the usage of clinical OA samples is convincing and reflect the pathophysiologic situation in the human OA joint.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript provides solid evidence for an association between cell surface vimentin (CSV) and chondrocyte senescence. Human cartilage and cultured chondrocytes are used with a wide range of approaches to provoke senescence: natural osteoarthritis, traumatic loading ex vivo, doxorubicin to cells in monolayer, vimentin siRNA, and simvastatin. In contrast, relatively little was done to try and interrupt or reverse the role of CSV in senescence, with CDKN2A siRNA representing one attempted intervention. The manuscript is well written and the data are presented in a logical and clear manner, with a high likelihood of being reproduced in subsequent studies.

      Author´s response:<br /> We thank reviewer #2 for their interest in our work and their mainly positive report.<br /> Regarding their comment on our attempts to reverse CSV on senescent chondrocytes, we would like to add the following: Reversal of cellular senescence is a very ambitious challenge. But in fact, we are currently preparing a manuscript in which we characterize an appropriate senolytic strategy to “rejuvenate” human chondrocytes and plan to use this approach to reduce the amount of senescent and thus CSV-positive cells in future experiments.

      _Major comments:

      In the doxorubicin experiments, the senescent cells show a spread morphology as expected. Given the importance of vimentin in cell spreading (as the authors own data show), the possibility that spread morphology itself (and not senescence) leads to CSV should probably be examined. This could perhaps be achieved by plating with different concentrations of fibronectin or other matrix proteins that produce a spread morphology to a degree that matches the doxo. If the cells remain spread for ~10 days but don't become senescent and don't have CSV, this would provide further support for a direct relationship.

      Author´s response:<br /> We agree that cell spreading is associated with various cellular processes (for example by the YAP signaling pathway). Moreover, we would like to thank the reviewer for the proposed experiment.

      Seeding of cartilage cells on fibronectin coated plates is a commonly used procedure to isolate chondrogenic stem progenitor cells, due to their higher affinity to fibronectin. The cells are usually cultured for several days on the coated plates and do not exhibit a flattened, senescent-like phenotype (as we observe for Doxorubicin-treated cells), but an elongated, fibroblast-/ stem cell-like shape. Our results (Figure 6E) demonstrate that CSPC have no increased CSV levels, despite their elongated (not flat) morphology.

      There are some findings supporting the assumption that CSV leads to enhanced cell adhesion, but not that adhesion or cell spreading promotes CSV: we included experiments with HeLa (low CSV levels) and SaOS-2 (high CSV levels), which demonstrated that high CSV levels are associated with increased plastic adhesion (Figure S5). In line with this, we demonstrated that higher CSV levels on chondrocytes were associated with enhanced fibronectin and vitronectin binding, which might explain increased plastic adhesion. Moreover, Simvastatin stimulation and subsequent cellular stress by Vimentin disruption resulted in enhanced CSV but did not lead to cell spreading (Actin not affected, cells rather elongated, not flattened).

      Minor comments:

      The CSV antibody and staining method appeared to have generated some signal from debris, which makes it challenging to assess the localization of true staining. Presumably the true staining would be present only on the cell surface. While the widefiled view is appreciated, perhaps insets with a higher magnification would clarify.

      Author´s response:<br /> In Figure 2h and Figure 2i, we provide insets of the IF-staining and an exemplary image made by scanning electron microscopy (SEM). CSV is not localized on debris – Figure 2h, actually represents the cell surface. The magnified, Doxo-treated cell is highly senescent and thus flattened. The uneven (rather spotted) staining pattern of CSV and the unusual shape of the cell might suggest that this is debris, not the cell membrane.

      For figure 1k, it is a bit surprising that CDKN2A would peak so early after injury and then drop off. Most studies in other systems show a gradual increase in CDKN2A levels with persistent stress as opposed to a rapid increase in response to acute stress. Could the drop-off be due to preferential death of these cells? The CSV % in 1m was taken from 7d after trauma (plus 7 days in monolayer it appears). Further discussion on the timing of traditional senescence markers as compared to the emergence of CSV would be useful.

      Author´s response:

      We would like to thank the reviewer for this comment. That CDKN1A was induced by mechanical trauma without significant decrease at the later time points was in line with the P53 expression, which we detected via immunohistochemistry (IHC; positive staining of chondrocyte nuclei in cartilage). P53 and P21 are regarded as interconnected senescence markers. Interestingly, P53 is not regulated on gene expression level upon cartilage trauma or Doxorubicine stimulation – but there is a significant increase in P53 nuclear translocation.

      Although such a discrepancy between gene expression and protein activity has not been reported in case of P16 or P21, we plan to investigate the dynamics of these cell cycle regulators and its connection to CSV after cartilage trauma in more detail in future studies.

      We included the following statement in the discussion part:

      “In the current study, we observed that CSV on chondrocytes was reduced by siRNA-mediated silencing of CDKN2A and increased after Doxo treatment or cartilage trauma. While we confirmed that mRNA levels of both CDKN1A and CDKN2A were significantly enhanced upon injury but exhibited different expression levels over time, we determined CSV-positive cells only at one time point after ex vivo cartilage trauma. Future studies might also consider earlier and later time points after cartilage injury to identify a potential time-dependent peak or decline in CSV-positive chondrocytes. In this way a potential association between CSV and the expression levels of CDKN1A and CDKN2A, which are thought to play differential roles in initiating and maintenance of senescence, respectively [50], might be clarified.”

      [50] Stein G, Drullinger L, Soulard A, and Dulić V. Differential Roles for Cyclin-Dependent Kinase Inhibitors p21 and p16 in the Mechanisms of Senescence and Differentiation in Human Fibroblasts. Mol Cell Biol. 1999;19(3): 2109–2117. https://doi.org/10.1128/mcb.19.3.2109.

      There is no CSV staining shown for figures 4 and 5. While the quantification of CSV was done by flow cytometry, it would nice confirmation to see the increase in CSV on the surface of cells with either siRNA for vimentin or the simvastatin.

      Author´s response:

      CSV-IF of simvastatin-treated chondrocytes is provided in Figure 5 (b). We did not perform exemplary staining of CSV after VIM-KD, because the quantification was performed via flow cytometry.

      Reviewer #2 (Significance):

      The strengths of the study include a rigorous design and the establishment of a potential new cell surface marker of chondrocyte senescence. The main limitation is that the conclusions are largely descriptive in nature.

      If CSV is confirmed as a robust marker of senescence, this would be of value to the field. While this marker has been explored previously in other systems, there is value in this manuscript given the wide range of contexts investigated for a cell type in which senescence likely has an important role.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This study presents a sound piece of science in the puzzle about extracellular vimentin in the differentiation/dedifferentiation of human chondrocytes and senescence and osteoarthritis. Eventhough, no mechanism is elucidated, the results clearly point towards a correlation of the amount of extra cellular vimentin and the level of chondrocyte senescence, and therefore signs of osteoarthritic changes in the cultivated chondrocytes. The methods applied are state-of-the art and provide the means to generate meaningful results in this experimental setting. The paper is concise and clearly written, there are only minor remarks.

      Author´s response:

      We thank reviewer #3 for their interest in our work and their overall positive report.

      Minor comments:

      1. The main clue of the paper is extra cellular vinemtin around chondrites in culture, please provide better pictures (1g) to support this. Why is the extra cellular staining seen so broad and not concentrated on the cells surface? The picture chosen imply a huge amount of vimentin to be externilized in disease states. It also indicates that in diseased chondrocytes no intact or semi-intact vimentin network is found intracellular. Please comment.

      Author´s response:

      In Figure 1g, CSV is located on the cell membrane. The pattern of the staining was surprising to us, as well. CSV was not equally distributed on the membrane, but rather represented an inconsistent pattern. Sometimes the staining was located at the filopodia of the cells, sometimes the whole cell was covered by spots. We also observed this on cancer cells, which was in line with other studies using this antibody. It remains unclear whether the distribution of the CSV has any effect. But we assume that the high abundance in filopodia might be connected with cell adhesion and mobility, which was positively associated with CSV.

      Yes, chondrocytes isolated from highly degenerated tissue exhibited higher CSV levels as compared to cells derived from macroscopically intact regions. Although we did not investigate the vimentin network of these cells, our observations in Doxo-treated cells imply, indeed, that intracellular vimentin might be altered in diseased chondrocytes. According to this, Blain et al (Ref. 13) reported that there is a disassembly of the intracellular vimentin network in OA chondrocytes, which can disturb the chondrocyte phenotype and contributes to the development of OA (see discussion).

      1. In the doxo experiment no extracellular vimentin is found? Please explain.

      Author´s response:

      Doxo-treated cells are highly positive for CSV (= extracellular vimentin on membrane). However, the intracellular vimentin is strongly decreased and some cells seem to be negative. We have not clarified the underlying mechanism by now, but it seems that senescence/ disease progression negatively affects the transcription of vimentin and, at the same time, promotes the externalization of the existing intracellular vimentin. Altogether, this might result in a decline in intracellular vimentin.

      1. The SEM picture is showing what. IGH? The red dots are colloidal gold particles? In any case the quantity of stain gathered EM level would not correlate to the huge amount seen in LM staining. Please comment.

      Author´s response:

      For the SEM analysis, a gold particle-coated secondary antibody was used. The positive signal usually appears in white and was subsequently colored via a software. In IF and ICC staining, we had a signal amplification due to the biotin-streptavidin system and the magnification makes, of course, a huge difference.

      1. Why the ICC in Fig. 3c? The siRNA is not detected in the KD? A reduction of Vimentin could be shown via WB.

      Author´s response:

      In Figure 3c, the KD of P16 was confirmed on protein level. In addition to the gene expression analysis, we chose the ICC (IF) to confirm that there is a decline in active (nuclear) CDKN2A. In case of P53, we made the experience that gene expression and the amount of cytoplasmic/ nuclear protein might not be consistent.

      In Figure 4, we confirmed the successful KD of vimentin on mRNA and protein level (flow cytometry plus IF). Of course, WB would also be possible, but we decided to use the methods in which the antibody was well established and we wanted to visualize the disturbance of the intracellular vimentin network upon KD.

      1. Fig. 4c, why are there no remnants of the vimentin networks seen in the chondrocytes? A Knock-down, not a KO is shown.

      Author´s response:

      In fact, most of the intracellular vimentin seems to be gone. However, there are some remnants (condensed fibers/ bundles) of the former vimentin network. We applied the VIM-KD over seven days. Usually, a KD experiment is only conducted for 2-3 days. But since we were not sure how stable the vimentin protein would be, we chose seven days. This long-lasting KD might have resulted in a strong decline of the protein. Moreover, the CSV levels on these cells were very high, indicating that existing vimentin was externalized and additionally decreased the amount of intracellular vimentin.

      1. Please comment of the concentration of simvastatin, why not nmolar?

      Author´s response:

      The concentration of Simvastatin was chosen in accordance with Trogden et al. (Ref. 26), who first described the effects of simvastatin on the vimentin network. A lower concentration might have had the advantage, that the effects were less severe, allowing a longer observation time than 24h. However, as a proof-of-principle model to demonstrate the connection between vimentin network collapse ant CSV expression, the concentration worked quite well.

      1. CSV+ is misleading in Fig. 6g, it's not an over expression.

      Author´s response:

      We would like to thank the reviewer for this comment and removed the “+” to make it less misleading.

      1. The concept of EMT is debatable, at least in kidney fibrosis, and chondrocytes are not epithelial cells. Please add a more critical discussion point.

      Author´s response: The authors agree with the reviewer’s argument that chondrocytes are no epithelial cells ant that the term EMT doesn’t seem to be appropriate. However, this is one leading hypothesis proposed by the working group of Prof. Mayán, who described CX43 and other EMT-markers on/ in senescent chondrocytes (see reference 31; more recently: Cell Death Dis. 2022;13(8):681. doi: 10.1038/s41419-022-05089-w).

      We added the following passage in the discussion part to indicate that this hypothesis is a controversial concept:

      “Nevertheless, the hypothesis that chondrocytes might undergo an EMT-like process remains controversially discussed, because chondrocytes are mesenchymal and not epithelial cells. In a recent review, Gems and Kern propose to consider senescent chondrocytes as activated and hyperfunctional remodeling cells occurring during OA progression [49]. Accordingly, chondrosenescence might represent an unsuccessful attempt of tissue repair. They further suppose that the senescent or activated chondrocytes are associated with a hypertrophic, bone-forming phenotype, following the process of bone development rather than hyaline cartilage formation. In line with this, we observed that CSV was associated with enhanced osteogenic capacities and a decline in chondrogenic properties.”

      [49] Gems and Kern, 2022): Geroscience. 2022;44(5):2461-2469. doi: 10.1007/s11357-022-00652-x.

      Reviewer #3 (Significance):

      The manuscript provides novel insight in the role of intermediary filaments, i.e. vimentin, on chondrocyte senescence and osteoarthritic changes in vitro. It's strength is a thorough elucidation of the connection with a wealth of experimental data, a weakness is the missing elucidation, or first experiments in the direction, of the cell biological mechanism.<br /> It is well suited for a broad audience, because it deals with fundamental cell biological phenomena, definitely it's important for the OA /chondrocyte biology community.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      We don't see the case for 1,5-IP8 as settled in plants, and none of the papers mentioned above draws this strong conclusion. This may be due to several limitations in the available data. The mentioned studies do not allow to differentiate the effects of 1-IP7 and 1,5-IP8 and, where binding or competition experiments have been performed, e.g. on the transcription factors, the differences in the Kd values for IP7 and IP8 were minor. Furthermore,1,5-IP8 levels and Pi starvation response do not always correlate. IPTK1 mutants, for example, show Pi overaccumulation, and low 5-IP7, but normal 1,5-IP8 (Riemer et al., 2021). Finally, plants are complex organisms with multiple tissue types that serve for accumulating, exporting, transporting or finally consuming Pi. Therefore, correlating inositol pyrophosphate levels from whole-plant extracts with a Pi starvation response is problematic, except if these data could both be obtained from the same cell types or at least tissues.

      The comment of the reviewer made us recognize that the complex situation in plants deserves a more detailed coverage and we have therefore adjusted the introduction accordingly.

      Results: "We determined the corresponding lysines in Pho81 (Fig. S3), created a point mutation in the genomic PHO81 locus that substitutes one of them, K154, by alanine, and investigated the impact on the PHO pathway."

      In my opinion, it would be important to test here in a quantitative in vitro binding assay if (i) the SPX domain of Pho81 can bind PP-InsPs including 1,5-InsP8, (ii) if the dissociation constant is in agreement with the cellular levels of 1,5-InsP8 in yeast (compare Fig. 2) and (iii) if the K154A mutation blocks or reduces the binding of 1,5-InsP8. Without such experimentation, I find the statement "this result underlines the efficiency of the K154A substitution in preventing PP-IP binding to the Pho81 SPX domain." to be overly speculative, as no binding experiment has been conducted.

      We agree with the comment of the reviewer concerning the overstatement in the phrase. It has been deleted.

      As mentioned already in our previous work (Wild et al., 2016), Pho81SPX counts among the SPX domains that we could not express recombinantly. Likewise, full-length Pho81, which would be the relevant object for correlating in vitro binding studies with the cellular concentrations, has not been accessible. Expression in yeast did not provide sufficient material for ITC or other quantitative techniques. Therefore, we refrained from pursuing binding studies. Nevertheless, given the high conservation of the positively charged patch on SPX domains and the fact that, in every case where it has been tested so far, SPX domains showed inositol polyphosphate binding activity, we find it a conservative assumption that the Pho81SPX binds them as well. This is supported by the effects of the binding site mutant, which mimics the effect of ablating IP8 synthesis.

      Results: "Inositol pyrophosphate binding to the SPX domain labilizes the Pho81-Pho80 interaction." Again, in the absence of any protein - protein interaction assay I find this statement not to be supported by the experiments outlined in the manuscript. The best way to address this point would be to perform either co-IP or in vitro pull-down experiments between Pho81-SPX and Pho81-85, in the pre- and absence of 1,5-InsP8 and/or using the Pho81 point-mutants described in the text.

      Since Pho81 could not be produced recombinantly, neither by us nor by others who worked on this protein previously, quantitative in vitro binding assays are not accessible for now. A simple IP suffers from the problem that Pho81 interacts with Pho85-Pho80 not only through the SPX domain but also through the minimum domain. The latter interaction may be constitutive. Since the main point of the manuscript is not to dissect the exact mechanisms of Pho85-Pho80 regulations, but only to address the point why the postulated inactivation of this kinase by an 1-IP7/minimum domain complex makes no sense, we prefer not to show a profound (and more complex) analysis of how the different Pho81 domains contribute to binding.

      To test the potential of the SPX domain for binding Pho85/Pho80 in vivo, we have created a GFP-fusion of the SPX domain of Pho81. This fusion protein localizes mainly to the cytosol when cells are on high-Pi. Upon Pi starvation, it concentrates in the nucleus. This concentration is not observed in pho80 mutant background (New Fig. S7).

      In line with this, I would suggest to move the molecular modelling/docking studies from the discussion into the results section and to use these models to design some interface mutations that could be tested in coIP and/or pull-down assays. Alternatively, the authors may choose to omit the discussion section starting with: "Even though the minimum domain is unlikely to function as a receptor for PP-IPs this does not ... and ending with . In sum, multiple lines of evidence support the view that the SPX domain exerts dominant, 1,5-IP8 mediated control over Pho81 activity in response to Pi availability."

      We have now moved the modelling data to the Results section. The structure prediction of the interface is experimentally validated. Data on the effect of interface substitutions are already published, although these substitutions had not been recognized as affecting a common interface at the time. Substituting the interface residues either on the side of Pho80 or of Pho81 constitutively activates Pho85-Pho80 kinase and destabilizes its interaction with Pho81. This was shown by Co-IP experiments from cell extracts by Huang et al. We mention the respective substitutions in the manuscript and cite the paper in which their effect on PHO pathway activation had been described.

      Reviewer #2 (Recommendations For The Authors):

      Some points need additional attention by the authors:

      • In general, it would be helpful to introduce abbreviations more thoroughly (certain enzyme names, PA, MD, ...)

      We paid more attention to this.

      • Also in general, the authors may want to think about the nomenclature of inositol pyrophosphates. Given the expansion of PP-IPs that are being detected in different organisms these days it may be a good time to convert to a more precise nomenclature, i.e. 5PP-IP5 instead of 5-IP7; and 1,5(PP)2-IP4, instead of 1,5-IP8. The latter could just be stated once, and then be abbreviated as IP8.

      To our understanding the field has not yet come up with a unified nomenclature. Therefore, we prefer to stick with the more practical nomenclature that we have chosen, which also corresponds to what is commonly used in presentations and discussions among colleagues. We have now introduced a sentence making the link to the nomenclature that the reviewer has proposed.

      • p. 1, Abstract: "negative bioenergetic impacts" - the phrasing seems really vague

      Agreed, but we find it difficult to be more explicit and precise in the abstract while remaining concise and not distracting from the main message. This aspect is better explained in the introduction.

      • p. 3, Significance statement: "... unified model across all eukaryotic kingdoms" While the intended meaning of this wording is better explained in the text later, the phrasing here suggests a more all-encompassing study at hand, instead of a conclusion that fits more closely with established reports from other organisms. Please rephrase.

      We have adapted the phrase to avoid this impression.

      • p. 4: "IPTKs" - are the ITPKs meant here?

      Yes, that was a typo.

      • p. 7, the introduction ends abruptly and could use a concluding sentence.

      Done

      • p.7, "enzymes diphosphorylation either the..."; I understand what the authors are trying to say with diphosphorylating, but the enzymes are phosphorylating a phosphorylated substrate.

      Yes. We changed the phrase to "....adding phosphate groups at the 1- or 5-positions....".

      • p. 7, subtitle "...concentrations and kinetics of..."; kinetics of what? Synthesis/turnover?

      We corrected this subtitle

      • p. 8, with regards to the recovery experiment: Was this recovery determined elsewhere (please cite)? Otherwise it would be beneficial to include an extra figure to illustrate these recoveries in the supplementary information. And do the authors suspect some hydrolysis of IP8 given the lower recovery?

      We have now added the experiment testing recovery of IPPs as the new Fig. S1.

      • p. 9: It is appreciated that the authors point out the concentration of IP6 in S. cerevisiae. I found that concentration rather low, and the authors could highlight this a bit more, given their ability to carry our absolute quantification.

      This was a leftover from a previous version of the paper. Since the paper does not treat IP6 or lower inositol polyphosphates, we have deleted this phrase.

      • p. 9, Fig 2: The exponential decay of 5-IP7 is very nicely shown in Figure 2c. But one of the most important discussion points is IP8 being the key controller of the PHO pathway - it would therefore be beneficial for the argument to also show the same kind of graph for IP8 and if possible, fit a function to the data points to better quantify and compare the decay processes (e.g. via "half-life time" of PP-IPs during starvation, in addition to the suggested "critical concentration" which was only discussed for 5-IP7 thus far).

      Kinetic resolution is an issue here. The approach shown in Figs. 2 and 5 is not apt to determine a critical concentration of IP8 because the decline upon transfer to starvation conditions is too fast and difficult to relate to the equally rapid induction of the PHO pathway. We shall address this point in a more appropriate setup in a future study.

      • p.9, Fig 2a: Where does the 5-IP7 come from in the kcs1Δ strain? In the text the authors state that 5-IP7 in kcs1Δ was not detected, but the figure suggests otherwise. Please explain.

      Currently, we do not know where these residual signals stem from. One possibility is that they represent other isomers that exist in minor concentrations and that are not resolved from 5-IP7 in CE. We added a sentence to the figure legend to indicate this.

      • p. 10: "IP8 was undetectable in kcs1Δ and decreased by 75% in vip1Δ. kcs1Δ mutants also showed a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesisof 1-IP7 depends on 5-IP7. This might be explained by assuming that a significant source of 1-IP7 is synthesis of 1,5-IP8 through successive action of Kcs1 and Vip1, followed by dephosphorylation to 1-IP7." - Please specify this statement. Do the authors mean that 1,5-IP8 is only produced transiently below the detection capabilities of the method but that there still is a (reduced) flux from 5-IP7 to 1,5-IP8 to 1-IP7? Otherwise it would seem paradoxical to have a dependency on a non-existing metabolite in that cell line.

      This was not clearly expressed. The revised version now says: " ... a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesis of 1-IP7 depends on 5-IP7. This might be explained by assuming that, in the wildtype, most 1-IP7 stems from the conversion of 5-IP7 to 1,5-IP8, followed by dephosphorylation of 1,5-IP8 to 1-IP7.". We hope that this clarifies the matter.

      • p. 10: "pulse-labeling approaches are not available for PP-IPs." While this statement is correct, a recent paper co-authored by Qui and Jessen showed nice pulse-labeling data for the lower Ips and could be cited here (PMID: 36589890)

      Yes, indeed, we should have been more precise here. What we wanted to express was that rapid pulse-labeling methods for following phosphate group turnover were lacking, with a temporal resolution of minutes rather than hours. Existing pulse labeling approaches, including the study mentioned by the reviewer, do not provide that. We have changed the phrase accordingly.

      • p. 10: continuation of caption of Fig 2: "were extracted [and] analyzed"

      Corrected. Thank you.

      • p. 12: How is 1-IP7 made in the vip1 kcs1 double mutant?

      As explained above, we suspect that these may be side products of IPMKs, which accumulate in the absence of vip1 phosphatase.

      • p. 13, caption to Figure 3: "XXX cells were analyzed" please replace the place holder XXX.

      Done. Thank you.

      • p. 13, Fig 3B, C, D and p. 50, Fig. S4: On screen the contrast between the different shades of grey of the bars are just visible enough, but not on paper, I suggest using a higher contrast/ different colouring scheme.

      We enhanced the contrast.

      • p. 24, 25, Fig 7.: I could not really appreciate the AlphaFold part, and found it unnecessary. No docking or molecular dynamics simulations were carried out here, and it was not clear to me what information should be gleaned from this part.

      Following this comment, we have modified the respective part of the text. This part refers to a publication from the O'Shea lab (Nat. Chem Biol. 4,25) proposing the model that 1-IP7 and the Pho81 minimum domain bind competitively to the active site of Pho85 to inhibit its kinase activity. Modeling of complexes between Pho81, Pho80 and Pho85, which we present in the manuscript, rather suggests binding of the minimum domain to a groove in Pho80. This is important because it provides a viable alternative model for the action of the minimum domain. It suggests the minimum domain as a constitutive linker that attaches Pho80 to Pho85. Importantly, this model accounts perfectly for the results of previous random mutagenesis studies on Pho80 and on the minimum domain, which had independently identified both the Pho80 groove and the minimum domain residues that bind it in the prediction as critical residues for inhibition of Pho85, and for integrity of the Pho85/Pho80/Pho81 complex. We find this alternative explanation for Pho85-Pho80 regulation by Pho81, which we can derive by combining the predictions with already published experimental data, an important element to re-evaluate the relevance of 1-IP7 in PHO pathway regulation and resolve one of the existing discrepancies.

      • p. 28: No experiments were carried out with plants or mammals. The relevance for plants or mammalian systems therefore seems to be overstated at this point in time.

      We are not quite sure how to interpret this remark. We do not claim that our data support a role for IP8 in mammals and plants. But we refer to and cite studies providing the strongest evidence in favor of it in these systems. The relevance of our current study relies in refuting seemingly strong evidence from yeast, which had been diametrically opposed to the data obtained in plants and mammals. The revision of the situation in yeast now paves the way to drawing a coherent concept for fungi, plants and mammals. We feel that this is important and should be underlined.

      • p. 31: "300 mL of 3% ammonium" - 300 µL?

      Yes. Thank you.

      • p. 45, CE-ESI-MS parameters: "1IP8"

      Corrected.

      • p. 47: Figure S1: Please include more experimental details in the caption and/or methods section. Was a similar analysis software used as e.g. Figure S2 (NIS Elements Software)? Please also include all the analysis software in the Methods section under "fluorescence microscopy". Unless these additional experimental details already clarify the following point: Can the authors briefly comment on why the morphological determination in S1 requires trypan blue staining while in later experiments the yeast cells are readily recognized by the software in "simple" brightfield images?

      Trypan blue staining is not strictly required for this. It is just a simple method to fluorescently stain the cell wall. There are many other ways of delineating the cells. It could also have been done in a brightfield image.

      We updated the figure legend to better describe how these measurements were done and deposited the script and training file on figshare.

      • p. 48: "can be downloaded from **" please insert the link once the script is available online.

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      Reviewer #3 (Recommendations For The Authors):

      1) Italicize the scientific names of the organisms; this was inconsistent throughout the manuscript. Also, gene names should be italicized; this was also inconsistent (e.g., p.12 "... did not induce the PHO84 and PHO5 [sic] promoters...).

      Done

      2) Summary of the Figure 2A data in the text (p.9) probably has swapped the determined concentrations for 1-IP7 and IP8 (0.3 µM or 0.5 µM) as compared with the data figure.

      Yes, indeed. We have corrected this.

      3) Figure 2A: which of the mutant PP-IP levels are significantly different from the WT control?

      We have now added asterisks to indicate the significance for every mutant.

      4) In the discussion on the data (Fig. 2A), I was tripped up by the verb tense in this phrase "5-IP7 has not been detected in the kcs1Δ mutant and 1-IP7 has been strongly reduced..."; I think you want to use the past tense "was" in both cases [as is used in the next sentence]. It made me wonder if there was a difference in the detection of 5-IP7 and IP8 in the kcs1Δ mutant, you could detect 5-IP7 but not IP8; if so, where did the 5-IP7 come from?

      We have corrected the tense. Thank you for highlighting this. For the residual inositol pyrophosphate signal in kcs1Δ. We do not know its origin. One possibility, which we now mention in the text, is that it stems from IPMK side activity. It should be underlined that all signals disappear upon PI starvation.

      Figure 2C, include the data points that the lines are built from (suggestion).

      We refrained from that for the line graphs. For reasons of consistency, we should do this for every line graph. If we did that, Fig. 4B would become quite hard to read.

      6) Figure 3B-D, please check that the stipples or hatches are in the figure - the printed copy lacked them although I could see them in the electronic version; this was also true for Figures 5 and 6 (I do not know if it is a printer issue, but other hatches were visible: e.g., not seen in S4 but seen in S5).

      They are visible in our copies, also after printing. They may have been lost during file conversion at the journal.

      7) The text description of the Pho4-yEGFP, Pho5-yEGFP and Pho84-yEGFP says that the kcs1Δ mutant "showed Pho4-yEGFP constitutively in the nucleus already ... and PHO5 and PHO84 were activated". However, the data is more complex than that: whereas the localization of Pho4-yEGFP is constitutively nuclear, there is a higher basal (repressed) expression of both Pho5 and Pho84 as well as increased expression of both proteins under -Pi conditions. What accounts for the increased expression when Pho4 is already nuclear? This is also seen in the vip1Δ kcs1Δ mutant.

      We agree with the reviewer, but we cannot explain this effect with certainty. One possibility could be a wider dysregulation of Pi metabolism in kcs1 mutants. To name a few possibilities: Wildtype cells have polyphosphate reserves that are gradually mobilized during the first hours of P-starvation. kcs1 mutants don't have those and might fall into a "deeper" state of starvation faster. It should be kept in mind that the starvation response is also regulated at the level of chromatin structure, and by antisense transcripts. The influence of kcs1 on these processes is unclear.

      8) Figure 9 legend: please add a definition of the MP region (in red) and include it more explicitly in the described model.

      We now mention the relevant region also in the legend and have labeled the relevant regions in the images (Huang et al., 2001).

      9) Figure S2 legend: information is missing (downloading link).

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      10) Figure S4 and S5, missing statistics.

      They have been added to the new Fig. S6, which interprets differences between strains and conditions. Fig. S4 (now S3) shows timecourses of IPPs down to zero. Adding statistics for all pairwise differences between the timepoints would be almost an overkill.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      It is very important to find practical and efficient means in order to increase agricultural productivity. Drawing on data from variable field environments, this study provides a useful theoretical framework to identify new factors that could increase agricultural production. There is solid evidence to support the authors' claims, though following the fate of candidate species after introduction into rice fields would have strengthened the study. Plant biologists and ecologists working in nature and fields will find the work interesting.

      Thank you so much for your careful evaluation of our manuscript. We are very pleased to hear that you found our framework useful. We have revised our manuscript according to the "Recommendations for the Authors" to improve our manuscript.

      Public Review

      Reviewer #1 (Public Review):

      This manuscript describes the identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you so much for evaluating our manuscript. We have carefully read and responded to your comments. We hope our responses resolve your concerns on our study.

      The strength of this manuscript is to attempt application of eDNA analysis-based plant growth differentiation. The weakness is too preliminary data and experimental set-up to make any conclusion. The trials of authors experiments are ideal. However, the process of data analysis did not meet certain levels. For example, eDNA analysis of different time points on rice growth stages resulted in two influential organisms for rice growth. Then they cultivate two species and applied rice seedlings. Without understanding of fitness and robustness, how we can know the effect of the two species on rice growth.

      We agree with your comments that we did not have the fitness data of the two species and/or rice seedlings. Thus, it is still difficult to obtain deep understanding of the mechanisms of our findings that the species introduced in the system would influence rice growth. Nonetheless, our study demonstrated the effectiveness of our research framework as we found evidence that the species that were discovered by the eDNA monitoring and time series analysis indeed cause changes in the system. We believe that the first step is to show that the framework is workable and that detailed understanding of the mechanisms or genetic pathway was not a focus of our study. To avoid misunderstanding, we have added several explanations regarding this point in L426–431 and L447. For example, in L426, we have added the following statement: "... the detailed dynamics of the two introduced species was unclear (i.e., the fate of the introduced species). This is particularly important for understanding how the introduced organisms affected rice performance...".

      The authors did not check the fate of two species after introducing into rice. If this is true, it is difficult to link between the rice gene expression after treatments and the effectiveness of two species. I think the validation experiment in 2019 needs to be re-conducted.

      We did not check the fate of the two species (except measuring the eDNA concentrations of the species), and it is true that we cannot show evidence of "how" these two species influence the rice gene expression. Understanding molecular mechanisms of the phenomenon that we found is important (especially from the viewpoint of molecular biology), but our primary objective was to demonstrate that our "eDNA x time series analysis" framework is feasible for detecting previously overlooked but influential organisms. To this end, we believe that we achieved our objective and repeating the validation experiment should be for a different purpose (i.e., for understanding molecular mechanisms). We have clarified these points in L426–431 and L447 as explained above.

      Reviewer #2 (Public Review):

      The manuscript "Detecting and validating influential organisms for rice growth: An ecological network approach" explores the influence of biotic and abiotic entities that are often neglected on rice growth. The study has a straightforward experimental design, and well thought hypothesis for explorations. Monitoring data is collected to infer relationships between species and the environment empirically. It is analyzed with an up-to-date statistical method. This allowed the manuscript to hypothesize and test the effects most influential entities in a controlled experiment.

      Thank you so much for your careful evaluations. We are pleased to see that you evaluated our manuscript positively. We have further revised our manuscript according to your comments and hope the revision has resolved your concerns.

      The manuscript is interesting and sets up a nice framework for future studies. In general, the manuscript can be improved significantly, when this workflow is smoothly connected and communicated how they follow each other more than the sequence and dates provided. It is valuable philosophical thinking, and the research community can benefit from this framework.

      Thank you for your suggestions. In order to improve the logic flow and readability of our manuscript, we have revised the descriptions of workflow and clarified how the experimental and statistical steps were connected to each other. To do so, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). Also, we have moved all of the Supplementary Materials and Methods to the main text. We have thoroughly revised the manuscript, and we hope that all the parts of our manuscript have been connected more smoothly than in the original manuscript.

      I understand the length and format of the manuscript make it difficult to add more details, but I am sure it can refer to/clear some concepts/methods that might be new for the audience. How/why variables are selected as important parts of the system, a tiny bit of information about the nonlinear time series analysis in the early manuscript, and the biological reasoning behind these statistically driven decisions are some examples.

      We have explained how/why variables are selected (in L125), added more information about the nonlinear time series analysis (in L129 and L175) , and added the biological reasoning behind the statistical decisions (L195).

      Reviewer #3 (Public Review):

      Most farming is done by subtracting or adding what people want based in nature. However, in nature, crops interact with various objects, and mostly we are unaware of their effects. In order to increase agricultural productivity, finding useful objects is very important. However, in an uncontrolled environment, it coexists with so many biological objects that it is very inefficient to verify them all experimentally. It is therefore necessary to develop an effective screening method to identify external environmental factors that can increase crop productivity. This study identified factors presumed to be important to crop growth based on metabarcoding analysis, field sampling, and non-linear analysis/information theory, and conducted a mesocosm experiment to verify them experimentally. In conclusion, the object proposed by the author did not increase rice yield, but rather rice growth rate.

      Thank you so much for your evaluation of our manuscript. We have revised our manuscript based on your comments, and hope it has been improved compared with the original version.

      Strength

      In actual field data, since many variables are involved in a specific phenomenon, it is necessary to effectively eliminate false positives. Based on the metabarcoding technique, various variables that may affect rice growth were quantitatively measured, although not perfectly, and the causal relationship between these variables and rice growth was analyzed by using information transfer analysis. Using this method, two new players capable of manipulating rice growth were verified, despite their unknown functions until now. I found this process to be very logical, and I think it will be valuable in subsequent ecological studies.

      We are very pleased to see that you found our framework is very logical and potentially beneficial for future ecological studies.

      Weaknesses

      CK treatment's effectiveness remains questionable. Rice's growth was clearly altered by CK treatment. The validation of the CK treatment itself is not clear compared to the GN treatment, and the transcriptome data analysis results do not show that DEG is not present. The possibility of a side effect caused by a variable that the author cannot control remains a possibility in this case. Even though this part is mentioned in Discussion, it is necessary to discuss various possibilities in more detail.

      We agree that the effectiveness of the CK treatment was questionable. We have added some more discussion about this point in L376: "The unclear effects of the CK treatment relative to those of the GN treatment could be due to the relatively unstable removal method (i.e., C. kiiensis larvae were manually removed by a hand net) or incomplete removal of the larvae (some larvae might have remained after the removal treatment)."

      Reviewer #1 (Recommendations For The Authors):

      Comment #1-1 This manuscript describes identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you for your careful evaluations of our manuscript. We are pleased to see you found that our approach is novel. We have revised our manuscript in accordance with your comments, and we hope that the revision and responses resolved your concerns.

      Comment #1-2 1. Experimental setting: Authors made up small scale pot system in 2017 and then expanded manipulative experiment. I do not understand how two influencing organism sequences were identified from the single treatment depending on different time points. How they can be convince the two organisms affect the rice growth rather than other biological and environmental factors.

      In 2017, we performed an intensive monitoring of the experimental rice plots and obtained large time series data (122-day consecutive monitoring x 5 plots = 610 data points). The time series data were analyzed using the information-theoretic causal analysis. The analysis is critically different from correlational analyses and designed to identify causal relationships among variables. Although we understand that field manipulation experiments are a common and straightforward approach to identify causal relationships among organisms, we chose the "fieldmonitoring + time-series-based causal analysis" approach. This is because, as explained in the main text, there are numerous factors that could influence rice performance, and it is practically impossible to perform manipulative experiments for all the potential factors that could influence rice growth. On the other hand, our "field-monitoring + timeseries-based causal analysis" approach has a potential to identify multiple factors under field conditions, even by the single experimental treatment.

      Nonetheless, we must admit that our time-series-based approach still has a chance to misidentify causal factors. Our framework relies on statistics, so the chance of false-positive detection of causality cannot be zero. This was exactly the reason why we performed the "validation" experiment in 2019. To complement the statistical results of the 2017 experiments, we performed another experiment in 2019.

      Comment #1-3 2. eDNA technology: The eDNA analysis based on four universal primers 16s rRNA, 18s rRNA, ITS, and COI regions must not be enough to identify specific species. The resolution of species classification may not meet to confirm exact species. Thus, the accuracy of two species that they selected for further experiment is difficult to be confirmed. Authors also referred to "putative Globisporangium".

      Your point is correct. The DNA barcoding regions we selected are short and it is often difficult to identify species. However, this limitation could not have been overcome even if we had chosen a different genetic marker. The long-read sequencing technology could partially solve the issue, but the number of sequence reads generated by the long-read technique is less than that by the short-read sequencing technology, and comprehensive detection of all species in an ecological community was still challenging. Our approach struck a balance among the identification resolution, comprehensiveness of the analysis, and sequencing costs. In addition, even though we could not identify most ASVs at the species level, some ASVs could be identified at the species level (52 ASVs among the 718 ASVs which had causal influences on rice growth), and we selected the two species (G. nunn and C. kiiensis) from the 52 species.

      Further, the taxa assign algorithm we used here (i.e., Claident; Tanabe & Toju 2012 PLoS ONE 10.1371/journal.pone.0076910) adopted conservative criteria for species identification and has a low falsepositive probability.

      More importantly, this is also the reason why we performed the "validation" experiment in 2019. The species identified in the 2017 experiment are still "potential" organisms that influence rice growth (i.e., the hypothesis-generating phase), and we tested the hypothesis in 2019.

      Nonetheless, we must admit that clear description of potential limitations is important. Thus, we have discussed this in L418: "As for the second issue, short-read sequencing has dominated current eDNA studies, but it is often not sufficient for lower-level taxonomic identification. Using long-read sequencing techniques (e.g., Oxford Nanopore MinION) for eDNA studies is a promising approach to overcome the second issue".

      Comment #1-4 3. Biological relevance 1: Authors identify two organisms as influencing organism for rice growth. As conducting the first experiment in 2017, the 2019 experiment was different from natural condition. The two experiments in 2017 and 2019 were conducted under different conditions. How do they compare the experiments? At least, the eDNA analyses in 2017 and 2019 should be very similar. I cannot find such data.

      The experimental conditions were different between 2017 and 2019 because they were conducted in different years. Theoretically, it is ideal if the experimental conditions in 2019 are covered by the range of experimental conditions in 2017 (e.g,. rice variety, air temperature, rainfall, and solar radiation). If this condition were satisfied, the attractor (i.e., rice growth trajectory delineated in the state space) in 2019 would be within that in 2017, and our model prediction in 2017 would be used to predict dynamics in 2019 accurately. To fulfill the conditions, we made as much effort as possible: we used the same rice variety and soils in 2019 as those used in 2017, and started our experiment at the same timing in 2019 as that in 2017.

      Although natural ecological dynamics cannot be precisely controlled, our monitoring revealed that the ecological dynamics in 2019 was qualitatively similar to that in 2017. To demonstrate that the experimental conditions and eDNA community data were similar between the two experiments, we have presented the climate and eDNA data in an inset figure in Figure 3a, Figure 1–figure supplement 2, Figure 3–figure supplement 2. We must admit that these dynamics are not identical, but we hope that this resolves your concern.

      Comment #1-5 4. Lack of detail description: In the Materials and Methods, there are many parts which lack on detail description. For instance, authors must described the two species cultivation, application concentrations, and application methods.

      We have moved Supplementary Materials and Methods to the main text and added more detailed descriptions in Materials and Methods. Also, to improve the logical flow and readability of our manuscript, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). We have added the reference for how to cultivate G. nunn in L608 (Kobayashi et al., 2010; Tojo et al., 1993) (C. kiiensis was not cultivated but removed from the system as in Materials and Methods), and application concentrations. Application methods were described in Materials and Methods, the section Field manipulation experiments in 2019 in L596.

      Comment #1-6 5. Validation: Application of one species clearly resulted to promote rice growth. They must include appropriate control treatment. If they pick same genus but different species that identified no specific effect on rice growth through eDNA analysis, no effect on growth can be provided. Generally application of large population of certain non-harmful organism confer plant growth promotion. It is not surprising result. Authors need to prove effectiveness of eDNA analysis. In addition, the field experiments required at least two years of consistent data for publication because environmental factors are so dynamic.

      Thank you for pointing this out. We agree with your comment that species that were predicted to have no effect should not promote rice growth in a validation experiment. It was also one of our inititial experimental plans to include such species in our manipulation experiment in 2019, but we could not include them because of the limitation of time, labor, and money. More extensive validation of the statistical results of the 2017 data, including multi-year experiments, would further validate the effectiveness of our approach, which should be done as future studies. To clarify this point, we have added statements in the paragraph starting at L396.

      Comment #1-7 In conclusion, I suggest that authors need more large data analysis and validate with more accurate and meaningful protocol.

      As we explained in the revised manuscript and the Response to Comments #1-2 to #1-7, our study demonstrated a novel research framework to detect previously overlooked influential organisms under field conditions. We agree that larger data analysis would be ideal to further validate our approach, but whether and how to collect larger data is constrained by time, money, and labor. We believe that our study was designed carefully and could provide meaningful avenues for developing an ecological-network based, novel, and environment-friendly agriculture solutions.

      Reviewer #2 (Recommendations For The Authors):

      Comment #2-1 Lines 97-110: This is so cool. Modeling with empirical data is very powerful. But a rice field is an open system consisting of metacommunity dynamics. Maybe a tiny bit of biological and biogeochemical background here would be good.

      Thank you for your comments. We have added a few examples of how and in which systems these methods were used to evaluate community dynamics and detect biological interactions in L109-L118.

      Comment #2-2 Lines 111-126: I like the summary of the study here. I think the influential species concept can be a little more elevated. Paine's famous keystone species work has been cited but a couple more pieces of literature can help to enhance the ecological importance of this work.

      We have explained the work by Paine (1966) a bit more and added one more paper that showed the effect of multiple predator species on the system dynamics at L88. We have also added a relevant sentence at L137 to emphasize the ecological/agricultural significance of our work.

      Comment #2-3 Experimental design/Figure 1:

      Is there any rationale behind choosing red individuals to measure the growth?

      Is there any competition between the individuals in the pots?

      Figure 1e: It is nice to show the ASVs in time. I wonder how the plot would look like when normalized by biomass/DNA content/coverage/rarefaction because of the seasonality.

      As for the first question, we chose the four individuals to minimize the edge effects (i.e., effects of microclimates and neighboring rice would be different between the four rice individuals and those planted in the edge regions). We have mentioned this in the legend of Figure 1.

      As for the second question, there might be competition among the individuals in the pot. However, we did not measure the effect of competition (e.g., by comparing the growth with/without other rice individuals).

      As for the third question, we published detailed dynamics of ecological community in the Supplementary Figures in Ushio (2022) Proceedings B https://doi.org/10.6084/m9.figshare.c.5842766.v1. In addition, we have uploaded a video showing the temporal dynamics of some top (= most abundant) ASVs in https://doi.org/10.6084/m9.figshare.23514150.v2.

      We have mentioned the supporting information in L153.

      Comment #2-4 Line 146-147: Is this damage influence the inferences? Maybe it is better to justify.

      While we occasionally observed physical damages, it is unlikely that they affected our causal inference because the changes in the rice heights due to the damages were smaller and less frequent than those due to growth. We have noted this at L151.

      Comment #2-5 Line 161-162: Maybe refer readers to the methods section where you explain UIC analysis. It'd be easier to interpret the figures.

      Mentioned.

      Comment #2-6 Line 175-176: I believe very brief information in the intro about the organisms might help explain the hypothesis and interpret the results better.

      We have included brief information of the two species at L197.

      Comment #2-7 Figure 2: Species interaction strength: Are these proxies to the Jacobians? Is there a threshold for the influence we can consider strong/weak? For example, influential species compared to diagonal elements of the Jacobians (intraspecies interactions) could be shown as a mean vertical line in Figure 2b.

      "Influences to rice growth" in Figure 2b is transfer entropy (TE) from a target ASV to rice growth. They are not proxies of the Jacobians, but they might positively correlate with the absolute value of the Jacobians. We have clarified this point in the legend (L953). More direct estimations of the Jacobian can be done using the MDR S-map method (Chang et al. 2021 DOI:10.1111/ele.13897), but we did not perform the MDR S-map in the present manuscript (see Ushio et al. 2023 https://doi.org/10.7554/eLife.85795 for the application of the MDR S-map). As for TE, there is no clear threshold to distinguish strong/weak interactions.

      Comment #2-8 Figure 2: Looking at panels c and d, it looks like there is a negative frequency selection between two influential species. Is it a reasonable observation?

      This is an interesting point. In this manuscript, we have not carefully examined the interspecific relationship between these two particular species. However, the interspecific interactions were examined in detail and reported in Ushio (2022) Proceedings of the Royal Society B DOI:10.1098/rspb.2021.2690). We re-checked the result in Ushio (2022); although there is a negative correlation between them, we did not find any (statistical) causal relationship between them.

      Comment #2-9 Line 209: What is t-SNE analysis? Because of the manuscript's format, maybe methods should be shortly referred to in the relevant section or explained in brackets.

      We have spelled out t-SNE.

      Comment #2-10 Line 212-214: Maybe briefly explain what the hypotheses are for the alternative analysis, and what is the contribution of the results to the study.

      We have added a brief explanation at L241: "Alternative statistical modeling that included the treatments (the control versus GN or CK treatments) and manipulation timing (i.e., before or after the manipulation), which simultaneously took the temporal changes of all the treatments into account, also showed qualitatively similar results (Supplementary file 4), further supporting the results."

      Comment #2-11 Figure 3b/c: Maybe species names as panel titles could be helpful. d: Treatment names with initials in the legend could be also helpful to read the plots.

      We have added species name as panel titles of Figure 3b,c. Treatment names were included in the legend of Figure 3.

      Comment #2-12 Line 233: Maybe mention why the manuscript uses the word "clear".

      We have mentioned this in L185.

      Comment #2-13 Line 234-236: I think that these alternative tests should be explained somewhere.

      We have revised the sentence so that it includes some explanations (L241). Also, we have referred to Materials and Methods.

      Comment #2-14 Figure 4: The title says ecological community compositions, and panels show the growth rates and cumulative growth.

      Thank you for pointing this out. This was a typo and we have corrected it.

      Comment #2-15 Lines 246-269: Can these expression patterns be transient and relevant to the time point that the sample is taken?

      Yes, these expression patterns were transient. We collected rice leaf samples for RNA-seq 1 day before the first manipulation and 1, 14, and 38 days after the third manipulation (see Supplementary file 3 for the sampling design). When we merged the pot locations, we observed no difference in the gene expression for samples 1 day before the first manipulation and 14 and 38 days after the third manipulation (except for two genes in samples 38 days after the manipulation), and thus, we consider the DEGs that appeared only in the short period after the manipulation. We have mentioned this in L278 and L383: "We found almost no DEGs for leaf samples taken one day before and 14 and 38 days after the third manipulation (the leaf sampling event 1, 3, and 4), suggesting that the influences of the treatments on the gene expression patterns were transient." (L278) and "These changes were observed relatively quickly and transient." (L383)

      Comment #2-16 I wonder if a conceptual framework figure would help to generalize the workflow that can be used for other studies.

      Thank you for your suggestion. Although we agree with your comment that such a figure would be helpful to generalize the workflow, we believe that our framework is clear and decided not to include it in the present manuscript. We might consider including such a figure (like Figure 1a in Ushio 2022) if we have an opportunity to write a review paper regarding this topic.

      Comment #2-17 Lines 329-335: I feel this information is unclear in the early manuscript. Maybe it's necessary to clearly communicate in the beginning.

      We have explained that we could not find any relevant information at least at the time we detected the ASVs in L189.

      Comment #2-18 Lines 336-337: Can these species be identified in the previous data set from the ASV sequences?

      Yes, these species were identified in the DNA data set obtained in 2017.

      Comment #2-19 Lines 387-397: Are there any measurements such as total biomass, and statistical methods to help with the eDNA bias and data compositionality?

      We have confirmed that our quantitative eDNA metabarcoding generates comparable results with the fluorescence-based method and quantitative PCR (e.g., see Supplementary Figures in Ushio 2022) (mentioned in L310 in the revised manuscript). However, at least in this study, we could not perform a direct comparison of the eDNA data with species abundance and/or biomass. This is partly because the number of our target species was too large (> 1,000 species). The accurate estimation of species abundance and/or biomass is one of our next goals.

      Comment #2-20 Line 472: Maybe mention transfer entropy somewhere in the early manuscript.

      We have mentioned this in L175.

      Comment #2-21 Lines 494-503: Maybe a summary of this reasoning should be mentioned somewhere in the early manuscript too.

      We have described a brief summary of the reasoning in L195.

      Comment #2-22 Lines 29-33 If this sentence is simplified it might be easier to follow.

      The sentence has been divided into two sentences in L28. Also, each sentence has been simplified.

      Comment #2-23 Line 38 Maybe "macrobes" can be explicitly mentioned. Fungi, protozoa, etc.

      Mentioned.

      Comment #2-24 Line 139: I am not sure if the date should be in the title.

      Similar monitoring was done in 2017 and 2019. Thus, we think the date is necessary in the section title.

      Comment #2-25 Figure 1: There are 4 red individuals in the design but 5 measurements in the plots.

      Heights and SPAD of the four individuals were measured for each plot and the averaged values were used as representative values for each plot. Therefore, 20 measurements (= 4 rice individuals 5 plots) were done every day, but each plot has one rice height for each day. We have clarified this in the legend of Figure 1: "the average values of the four individuals were regarded as representative values for each plot."

      Comment #2-26 Figure 1b: Maybe use the same axis length for the temperature as the other plots?

      Corrected.

      Comment #2-27 Lines 259-261: Are there the names of the genes in databases?

      Yes, these are gene names used in the rice databases (e.g., The Rice Annotation Project Database; https://rapdb.dna.affrc.go.jp/inde x.html).

      Reviewer #3 (Recommendations For The Authors):

      Comment #3-1 Additionally, RGR is not statistically significant, but statistical significance is observed only in cumulative growth because data presentation does not reflect plant characteristics. RGR changes according to the developmental stage of the plant. Therefore, if RGR data are shown separately according to the rice growing season, the cumulative growth pattern and the pattern will appear similar.

      RGRs were calculated daily (i.e., cm/day) and they changed depending on the developmental stage of the rice (Figure 1 and Figure 4–figure supplement 1). Therefore, we might find similar RGR patterns if we focus on a specific period of the growing season. However, unfortunately, we performed the intensive (i.e., daily) monitoring in 2019 only during the field manipulation period (middle June to middle July 2019), and we cannot investigate the changes in cumulative growth throughout the growing season (this depends on how many days we add up RGR to calculate the cumulative growth, though). We agree that, if we had investigated the detailed pattern of RGR throughout the growing season in 2019, we could have found similar pattens between RGR and cumulative growth rate at a certain period in the growing season. In Figure 4, the cumulative growths were calculated based on the RGRs before the third manipulation or during 10 days after the third manipulation. We clarified this in the legend of Figure 4.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper describes the development and initial validation of an approach-avoidance task and its relationship to anxiety. The task is a two-armed bandit where one choice is 'safer' - has no probability of punishment, delivered as an aversive sound, but also lower probability of reward - and the other choice involves a reward-punishment conflict. The authors fit a computational model of reinforcement learning to this task and found that self-reported state anxiety during the task was related to a greater likelihood of choosing the safe stimulus when the other (conflict) stimulus had a higher likelihood of punishment. Computationally, this was represented by a smaller value for the ratio of reward to punishment sensitivity in people with higher task-induced anxiety. They replicated this finding, but not another finding that this behavior was related to a measure of psychopathology (experiential avoidance), in a second sample. They also tested test-retest reliability in a sub-sample tested twice, one week apart and found that some aspects of task behavior had acceptable levels of reliability. The introduction makes a strong appeal to back-translation and computational validity, but many aspects of the rationale for this task need to be strengthened or better explained. The task design is clever and most methods are solid - it is encouraging to see attempts to validate tasks as they are developed. There are a few methodological questions and interpretation issues, but they do not affect the overall findings. The lack of replicated effects with psychopathology may mean that this task is better suited to assess state anxiety, or to serve as a foundation for additional task development.

      We thank the reviewer for their kind comments and constructive feedback. We agree that the approach taken in this paper appears better suited to state anxiety, and further work is needed to assess/improve its clinical relevance.

      Reviewer #1 (Recommendations For The Authors):

      1) For the introduction, the authors communicate well the appeal of tasks with translational potential, and setting up this translation through computational validity is a strong approach. However, I had some concerns about how the task was motivated in the introduction:

      a) The authors state that current approach-avoidance tasks used in humans do not resemble those used in the non-human literature, but do not provide details on what exactly is missing from these tasks that makes translation difficult.

      Our intention for the section that the reviewer refers to was to briefly convey that historically, approach-avoidance conflict would have been measured either using questionnaires or joystick-based tasks which have no direct non-human counterpart. However, we note that the phrasing was perhaps unfair to recent tasks that were explicitly designed to be translatable across species. Therefore, we have amended the text to the following:

      In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver & White, 1994), or cognitive tasks that rely on motor biases, for example by using joysticks to approach/move towards positive stimuli and avoid/move away from negative stimuli, which have no direct non-human counterparts (Guitart-Masip et al., 2012; Kirlic et al., 2017; Mkrtchian et al., 2017; Phaf et al., 2014).

      b) Although back-translation to 'match' human paradigms to non-animal paradigms is useful for research, this isn't the end goal of task development. What really matters is how well these tasks, whether in humans or not, capture psychopathology-relevant behavior. Many animal paradigms were developed and brought into extensive use because they showed sensitivity to pharmacological compounds (e.g., benzodiazepines). The introduction accepts the validity of these paradigms at face value, and doesn't address whether developing human tests of psychopathology based on sensitivity to existing medication classes is the best way to generate new insights about psychopathology.

      We agree that whilst paradigms with translational and computational validity have merits of their own for neuroscientific theory, clinical validity (i.e. how well the paradigm reflects a phenomenon relevant to psychopathology) is key in the context of clinical applications. While our findings of associations between task performance and self-reported (state) anxiety suggest that our approach is a step in the right direction, the lack of associations with clinical measures was disappointing. Although future work is needed to more directly test the sensitivity of the current approach to psychopathology, this may mean that it, and its non-human counterparts, do not measure behaviours relevant to pathological anxiety. Since our primary focus in this paper was on translational and computational validity, we have opted to discuss the author’s suggestion in the ‘Discussion’ section, as follows:

      Further, it is worth noting that many animal paradigms were developed and widely adopted due to their sensitivity to anxiolytic medication (Cryan & Holmes, 2005). Given the lack of associations with clinical measures in our results, it is possible that current translational models of anxiety may not fully capture behaviours that are directly relevant to pathological anxiety. To develop translational paradigms of clinical utility, future research should place a stronger emphasis on assessing their clinical validity in humans.

      c) The authors may want to bring in the literature on the description-experience gap (e.g., PMID: 19836292) when discussing existing decision tasks and their computational dissimilarity to non-human operant conditioning tasks.

      We thank the reviewer for this useful addition to the introduction. We have now added the following to the 'Introduction’ section:

      Moreover, evidence from economic decision-making suggests that explicit offers of probabilistic outcomes can impact decision-making differently compared to when probabilistic contingencies need to be learned from experience (referred to as the ‘description-experience gap’; Hertwig & Erev, 2009); this finding raises potential concerns regarding the use of offer-based tasks in humans as approximations of non-human tasks that do not involve explicit offers.

      d) How does one evaluate how computationally similar human vs. non-human tasks are? What are the criteria for making this judgement? Specific to the current tasks, many animal learning tasks are not learning tasks in the same sense that human learning tasks are, in terms of the number of trials used and if the animals are choosing from a learned set of contingencies versus learning the contingencies during the testing.

      The computational similarity of human and non-human strategies in a given translational task can be tested empirically. This can be done by fitting models to the data and assessing whether similar models explain choices, even if parameter distributions might vary across species due to, for example, physiological differences. Indeed, non-human animals require much more training to perform even uni-dimensional reinforcement learning, but once they are trained, it should be possible to model their responses. In fact, it should even be possible to take training data into account in some cases. For example, the training phase of the Vogel/Geller-Seifter preclinical tests require an animal to learn to emit a certain action (e.g. lever press) simply to obtain some reward. In the next phase, an aversive outcome is introduced as an additional outcome, but one could model both the training and test phase together – the winning model in our studies would be a suitable candidate to model behaviour here. As we also discuss predictive validity in the ‘Discussion’ section, we opted to add the following text there too:

      … computational validity would also need to be assessed directly in non-human animals by fitting models to their behavioural data. This should be possible even in the face of different procedures across species such as number of trials or outcomes used (shock or aversive sound). We are encouraged by our finding that the winning computational model in our study relies on a relatively simple classical reinforcement learning strategy. There exist many studies showing that non-human animals rely on similar strategies during reward and punishment learning (Mobbs et al., 2020; Schultz, 2013); albeit to our knowledge this has never been modelled in non-human animals where rewards and punishment can occur simultaneously.

      2) What do the authors make of the non-linear relationship between probability of punishment and probability of choosing the conflict stimulus (Fig 2d), especially in the high task-induced anxiety participants? Did this effect show up in the replication sample as well?

      Figures 2c-e were created by binning the continuous predictors of outcome probabilities into discrete bins of equal interval. Since punishment probability varied according to Gaussian random walks, it was also distributed with more of its mass in the central region (~ 0.4), and so values at the extreme bins were estimated on fewer data and with greater variance. The non-linear relationships are likely thus an artefact of our task design and plotting procedure. The pattern was also evident in the replication sample, see Author response image 1:

      Author response image 1.

      However, since these effects were estimated as linear effects in the logistic regression models, and to avoid overfitting/interpretations of noise arising from our task design, we now plot logistic curves fitted to the raw data instead.

      3) How correlated were learning rate and sensitivity parameters? The EM algorithm used here can sometimes result in high correlations among these sets of parameters.

      As the reviewer suspects the parameters were strongly correlated, especially across the punishment-specific parameters. The Pearson’s r estimates for the untransformed parameter values were as follows:

      Reward parameters: discovery sample r = -0.39; replication sample r = -0.78

      Punishment parameters: discovery sample r = -0.91; replication sample r = -0.85

      We have included the correlation matrices of the estimated parameters as Supplementary Figure 2 in the ‘Computational modelling’ section of the Supplement.

      We have now also re-fitted the winning model using variational Bayesian inference (VBI) via Stan, and found that the cross-parameter correlations were much lower than when the data were fitted using EM. We also ran a sensitivity analysis assessing whether using VBI changed the main findings of our studies. This showed that the correlation between task-induced anxiety and the reward-punishment sensitivity index was robust to fitting method, as was the mediating effect of reward-punishment sensitivity index on anxiety’s effect on choice. This indicates that overall our key findings are robust to different methods of parameter-fitting.

      We now direct readers to these analyses from the new ‘Sensitivity analyses’ section in the manuscript, as follows:

      As our procedure for estimating model parameters (the expectation-maximisation algorithm, see ‘Methods’) produced high inter-parameter correlations in our data (Supplementary Figure 2), we also re-estimated the parameters using Stan’s variational Bayesian inference algorithm (Stan Development Team, 2023) – this resulted in lower inter-parameter correlations, but our primary computational finding, that the effect of anxiety on choice is mediated by relative sensitivity to reward/punishment was consistent across algorithms (see Supplement section 9.8 for details).

      We have included the relevant analyses comparing EM and VBI in the Supplement, as follows:

      [9.8 Sensitivity analysis: estimating parameters via expectation maximisation and variational Bayesian inference algorithms]

      Given that the expectation maximisation (EM) algorithm produced high inter-parameter correlations, we ran a sensitivity analysis by assessing the robustness of our computational findings to an alternative method of parameter estimation – (mean-field) variational Bayesian inference (VBI) via Stan (Stan Development Team, 2023). Since, unlike EM, the results of VBI are very sensitive to initial values, we fitted the data 10 times with different initial values.

      Inter-parameter correlations

      The VBI produced lower inter-parameter correlations than the EM algorithm (Supplementary Figure 8).

      Sensitivity analysis

      Since multicollinearity in the VBI-estimated parameters was lower than for EM, indicating less trade-off in the estimation, we re-tested our computational findings from the manuscript as part of a sensitivity analysis. We first assessed whether we observed the same correlations between task-induced anxiety and punishment learning, and reward-punishment sensitivity index (Supplementary Figure 9a). Punishment learning rate was not significantly associated with task-induced anxiety in any of the 10 VBI iterations in the discovery sample, although it was in 9/10 in the replication sample. On the other hand, the reward-punishment sensitivity index was significantly associated with task-induced anxiety in 9/10 VBI iterations in the discovery sample and all iterations in the replication sample. This suggests that the correlation of anxiety and sensitivity index is robust to these two fitting approaches.

      We also re-estimated the mediation models, where in the EM-estimated parameters, we found that the reward-punishment sensitivity index mediated the relationship between task-induced anxiety and task choice proportions (Supplementary Figure 9b). Again, we found that the reward-punishment sensitivity index was a significant mediator in 9/10 VBI iterations in the discovery sample and all iterations in the replication sample. Punishment learning rate was also a significant mediator in 9/10 iterations in the replication sample, although it was not in the discovery sample for all iterations, and this was not observed for the EM-estimated parameters.

      Overall, we found that our key results, that anxiety is associated with greater sensitivity to punishment over reward, and this mediates the relationship between anxiety and approach-avoidance behaviour, were robust across both fitting methods.

      As an aside, we were unable to run the model fitting using Markov chain Monte Carlo sampling approaches due to the computational power and time required for a sample of this size (Pike & Robinson, 2022, JAMA Psychiatry).

      4) What is the split-half reliability of the task parameters?

      We thank the reviewer for this query. We have now included a brief section on the (good-to-excellent) split-half reliability of the task in the manuscript:

      We assessed the split-half reliability of the task by correlating the overall proportion of conflict option choices and model parameters from the winning model across the first and second half of trials. For overall choice proportion, reliability was simply calculated via Pearson’s correlations. For the model parameters, we calculated model-derived estimates of Pearson’s r values from the parameter covariance matrix when first- and second-half parameters were estimated within a single model, following a previous approach recently shown to accurately estimate parameter reliability (Waltmann et al., 2022). We interpreted indices of reliability based on conventional values of < 0.40 as poor, 0.4 - 0.6 as fair, 0.6 - 0.75 as good, and > 0.75 as excellent reliability (Fleiss, 1986). Overall choice proportion showed good reliability (discovery sample r = 0.63; replication sample r = 0.63; Supplementary Figure 5). The model parameters showed good-to-excellent reliability (model-derived r values ranging from 0.61 to 0.85 [0.76 to 0.92 after Spearman-Brown correction]; Supplementary Figure 5).

      5) The authors do a good job of avoiding causal language when setting up the cross-sectional mediation analysis, but depart from this in the discussion (line 335). Without longitudinal data, they cannot claim that "mediation analyses revealed a mechanism of how anxiety induces avoidance".

      Thank you for spotting this, we have now amended the text to:

      … mediation analyses suggested a potential mechanism of how anxiety may induce avoidance.

      Reviewer #2 (Public Review):

      Summary:

      The authors develop a computational approach-avoidance-conflict (AAC) task, designed to overcome limitations of existing offer based AAC tasks. The task incorporated likelihoods of receiving rewards/ punishments that would be learned by the participants to ensure computational validity and estimated model parameters related to reward/punishment and task induced anxiety. Two independent samples of online participants were tested. In both samples participants who experienced greater task induced anxiety avoided choices associated with greater probability of punishment. Computational modelling revealed that this effect was explained by greater individual sensitivities to punishment relative to rewards.

      Strengths:

      Large internet-based samples, with discovery sample (n = 369), pre-registered replication sample (n = 629) and test-retest sub group (n = 57). Extensive compliance measures (e.g. audio checks) seek to improve adherence.

      There is a great need for RL tasks that model threatening outcomes rather than simply loss of reward. The main model parameters show strong effects and the additional indices with task based anxiety are a useful extension. Associations were broadly replicated across samples. Fair to excellent reliability of model parameters is encouraging and badly needed for behavioral tasks of threat sensitivity.

      We thank the reviewer for their comments and constructive feedback.

      The task seems to have lower approach bias than some other AAC tasks in the literature. Although this was inferred by looking at Fig 2 (it doesn't seem to drop below 46%) and Fig 3d seems to show quite a strong approach bias when using a reward/punishment sensitivity index. It would be good to confirm some overall stats on % of trials approached/avoided overall.

      The range of choice proportions is indeed an interesting statistic that we have now included in the manuscript:

      Across individuals, there was considerable variability in overall choice proportions (discovery sample: mean = 0.52, SD = 0.14, min/max = [0.03, 0.96]; replication sample: mean = 0.52, SD = 0.14, min/max = [0.01, 0.99]).

      Weaknesses:

      The negative reliability of punishment learning rate is concerning as this is an important outcome.

      We agree that this is a concerning finding. As reviewer 3 notes, this may have been due to participants having control over the volume used to play the aversive sounds in the task (see below for our response to this point). Future work with better controlled experimental settings will be needed to determine the reliability of this parameter more accurately.

      This may also have been due to the asymmetric nature of the task, as only one option could produce the punishment. This means that there were fewer trials on which to estimate learning about the occurrence of a punishment. Future work using continuous outcomes, as the reviewer suggests below, whilst keeping the asymmetric relationship between the options, could help in this regard.

      We have included the following comment on this issue in the manuscript:

      Alternatively, as participants self-determined the loudness of the punishments, differences in volume settings across sessions may have impacted the reliability of this parameter (and indeed punishment sensitivity). Further, the asymmetric nature of the task may have impacted our ability to estimate the punishment learning rate, as there were fewer occurrences of the punishment compared to the reward.

      The Kendall's tau values underlying task induced anxiety and safety reference/ various indices are very weak (all < 0.1), as are the mediation effects (all beta < 0.01). This should be highlighted as a limitation, although the interaction with P(punishment|conflict) does explain some of this.

      We now include references to the effect sizes to emphasise this limitation. We also note, as the reviewer suggests, that this may be due to crudeness of overall choice proportion as a measure of approach/avoidance, as it is contaminated with variables such as P(punishment|conflict).

      One potentially important limitation of our findings is the small effect size observed in the correlation between task-induced anxiety and avoidance (Kendall's tau values < 0.1, mediation betas < 0.01). This may be attributed to the simplicity of using overall choice proportion as a measure of approach/avoidance, as the effect of anxiety on choice was also influenced by punishment probability.

      The inclusion of only one level of reward (and punishment) limits the ecological validity of the sensitivity indices.

      We agree that using multi-level outcomes will be an important question for future work and now explicitly note this in the manuscript, as below:

      Using multi-level or continuous outcomes would also improve the ecological validity of the present approach and interpretation of the sensitivity parameters.

      Appraisal and impact:

      Overall this is a very strong paper, describing a novel task that could help move the field of RL forward to take account of threat processing more fully. The large sample size with discovery, replication and test-retest gives confidence in the findings. The task has good ecological validity and associations with task-based anxiety and clinical self-report demonstrate clinical relevance. The authors could give further context but test-retest of the punishment learning parameter is the only real concern. Overall this task provides an exciting new probe of reward/threat that could be used in mechanistic disease models.

      We thank the reviewer again for helping us to improve our analyses and manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Additional context:

      In the introduction "cognitive tasks that bear little semblance to those used in the non-human literature" seems a little unfair. One study that is already cited (Ironside et al, 2020) used a task that was adapted from non-human primates for use in humans. It has almost identical visual stimuli (different levels of simultaneous reward and aversive outcome/punishment) and response selection processes (joystick) between species and some overlapping brain regions were activated across species for conflict and aversiveness. The later point that non-human animals must be trained on the association between action and outcome is well taken from the point of view of computational validity but perhaps not sufficient to justify the previous statement.

      Our intention for this section was to briefly convey that historically, approach-avoidance conflict would have been measured either using questionnaires or joystick-based tasks which have no direct non-human counterpart. However, we agree that this phrasing is unfair to recent studies such as those by Ironside and colleagues. Therefore, we have amended the text to the following:

      In humans, on the other hand, approach-avoidance conflict has historically been measured using questionnaires such as the Behavioural Inhibition/Activation Scale (Carver & White, 1994), or cognitive tasks that rely on motor biases to approach/move towards positive stimuli and avoid/move away from negative stimuli which have no direct non-human counterparts (Guitart-Masip et al., 2012; Kirlic et al., 2017; Mkrtchian et al., 2017; Phaf et al., 2014).

      It would be good to speculate on why task induced anxiety made participants slower to update their estimates of punishment probability.

      Although a meta-analysis of reinforcement learning studies using reward and punishment outcomes suggests a positive association between punishment learning rate and anxiety symptoms (and depressed mood), we paradoxically found the opposite effect. However, previous work has suggested that distinct forms of anxiety associate differently with anxiety (Wise & Dolan, 2020, Nat. Commun.), where somatic anxiety was negatively correlated with punishment learning rate whereas cognitive anxiety showed the opposite effect. We have now added the following to the manuscript, and noted that future work is needed to understand the potentially complex relationship between anxiety and learning from punishments:

      Notably, although a recent computational meta-analysis of reinforcement learning studies showed that symptoms of anxiety and depression are associated with elevated punishment learning rates (Pike & Robinson, 2022), we did not observe this pattern in our data. Indeed, we even found the contrary effect in relation to task-induced anxiety, specifically that anxiety was associated with lower rates of learning from punishment. However, other work has suggested that the direction of this effect can depend on the form of anxiety, where cognitive anxiety may be associated with elevated learning rates, but somatic anxiety may show the opposite pattern (Wise & Dolan, 2020) and this may explain the discrepancy in findings. Additionally, parameter values are highly dependent on task design (Eckstein et al., 2022), and study designs to date may be more optimised in detecting differences in learning rate (Pike & Robinson, 2022) – future work is needed to better understand the potentially complex association between anxiety and punishment learning rate. Lastly, as punishment learning rate was severely unreliable in the test-retest analyses, and the associations between punishment learning rate and state anxiety were not robust to an alternative method of parameter estimation (variational Bayesian inference), the negative correlation observed in our study should be treated with caution.

      Were those with more task-based anxiety more inflexible in general?

      The lack of associations across reward learning rate and task-induced anxiety suggest that this was not a general inflexibility effect. To test the reviewer’s hypothesis more directly, we conducted a sensitivity analysis by examining the model with a general learning rate – this did not support a general inflexibility effect. Please see the new section in the Supplement below:

      [9.10 Sensitivity analysis: anxiety and inflexibility]

      As anxious participants were slower to update their estimates of punishment probability, we determined whether this was due to greater general inflexibility by examining the model including two sensitivity parameters, but one general learning rate (i.e. not split by outcome). The correlation between this general learning rate and task-induced anxiety was not significant in either samples (discovery: tau = -0.02, p = 0.504; replication: tau = -0.01, p = 0.625), suggesting that the effect is specific to punishment.

      Was the 16% versus 20% of the two samples with clinically relevant anxiety symptoms significantly different? What about other demographics in the two samples?

      The difference in proportions were not significantly different (χ2 = 2.33, p = 0.127). The discovery sample included more females and was older on average compared to the replication sample – information which we now report in the manuscript:

      The discovery sample consisted of a significantly greater proportion of female participants than the replication sample (59% vs 52%, χ2 = 4.64, p = 0.031). The average age was significantly different across samples (discovery sample mean = 37.7, SD = 10.3, replication sample mean = 34.3, SD = 10.4; t785.5 = 5.06, p < 0.001). The differences in self-reported psychiatric symptoms across samples did not reach significance (p > 0.086).

      It would be interesting to know how many participants failed the audio attention checks.

      We have now included information about what proportion of participants fail each of the task exclusion criteria in the manuscript:

      Firstly, we excluded participants who missed a response to more than one auditory attention check (see above; 8% in both discovery and replication samples) – as these occurred infrequently and the stimuli used for the checks were played at relatively low volume, we allowed for incorrect responses so long as a response was made. Secondly, we excluded those who responded with the same response key on 20 or more consecutive trials (> 10% of all trials; 4/6% in discovery and replication samples, respectively). Lastly, we excluded those who did not respond on 20 or more trials (1/2% in discovery and replication samples, respectively). Overall, we excluded 51 out of 423 (12%) in the discovery sample, and 98 out of 725 (14%) in the replication sample.

      There doesn't appear to be a model with only learning from punishment (i.e. no reward learning) included in the model comparison. It would be interesting to see how it compared.

      We have fitted the suggested model and found that it is the least parsimonious of the models. Since participants were monetarily incentivised based on the rewards only, this was to be expected. We have now added this ‘punishment learning only’ model and its variant including a lapse term into the model comparison. The two lowest bars on the y-axis in Author response image 2 represent these models.

      Author response image 2.

      Were sex effects examined as these have been commonly found in AAC tasks. How about other covariates such as age?

      We have now tested the effects of sex and age on behaviour and on parameter values. There were indeed some significant effects, albeit with some inconsistencies across the two samples, which for completeness we have included in the manuscript, as follows:

      While sex was significantly associated with choice in the discovery sample (β = 0.16 ± 0.07, p = 0.028) with males being more likely to choose the conflict option, this pattern was not evident in the replication sample (β = 0.08 ± 0.06, p = 0.173), and age was not associated with choice in either sample (p > 0.2).

      Comparing parameters across sexes via Welch’s t-tests revealed significant differences in reward sensitivity (t289 = -2.87, p = 0.004, d = 0.34; lower in females) and consequently reward-punishment sensitivity index (t336 = -2.03, p = 0.043, d = 0.22; lower in females i.e. more avoidance-driven). In the replication sample, we observed the same effect on reward-punishment sensitivity index (t626 = -2.79, p = 0.005, d = 0.22; lower in females). However, the sex difference in reward sensitivity did not replicate (p = 0.441), although we did observe a significant sex difference in punishment sensitivity in the replication sample (t626 = 2.26, p = 0.024, d = 0.18).

      Minor: Still a few placeholders (Supplementary Table X/ Table X) in the methods

      We thank the reviewer for spotting these errors. We have now corrected these references.

      Reviewer #3 (Public Review):

      This study investigated cognitive mechanisms underlying approach-avoidance behavior using a novel reinforcement learning task and computational modelling. Participants could select a risky "conflict" option (latent, fluctuating probabilities of monetary reward and/or unpleasant sound [punishment]) or a safe option (separate, generally lower probability of reward). Overall, participant choices were skewed towards more rewarded options, but were also repelled by increasing probability of punishment. Individual patterns of behavior were well-captured by a reinforcement learning model that included parameters for reward and punishment sensitivity, and learning rates for reward and punishment. This is a nice replication of existing findings suggesting reward and punishment have opposing effects on behavior through dissociated sensitivity to reward versus punishment.

      Interestingly, avoidance of the conflict option was predicted by self-reported task-induced anxiety. This effect of anxiety was mediated by the difference in modelled sensitivity to reward versus punishment (relative sensitivity). Importantly, when a subset of participants were retested over 1 week later, most behavioral tendencies and model parameters were recapitulated, suggesting the task may capture stable traits relevant to approach-avoidance decision-making.

      We thank the reviewer for their useful analysis of our study. Indeed, it was reassuring to see that performance indices were reliable across time.

      However, interpretation of these findings are severely undermined by the fact that the aversiveness of the auditory punisher was largely determined by participants, with the far-reaching impacts of this not being accounted for in any of the analyses. The manipulation check to confirm participants did not mute their sound is highly commendable, but the thresholding of punisher volume to "loud but comfortable" at the outset of the task leaves substantial scope for variability in the punisher delivered to participants. Indeed, participants' ratings of the unpleasantness of the punishment was moderate and highly variable (M = 31.7 out of 50, SD = 12.8 [distribution unreported]). Despite having this rating, it is not incorporated into analyses. It is possible that the key finding of relationships between task-induced anxiety, reward-punishment sensitivity and avoidance are driven by differences in the punisher experienced; a louder punisher is more unpleasant, driving greater task-induced anxiety, model-derived punishment sensitivity, and avoidance (and vice versa). This issue can also explain the counterintuitive findings from re-tested participants; lower/negatively correlated task-induced anxiety and punishment-related cognitive parameters may have been due to participants adjusting their sound settings to make the task less aversive (retest punisher rating not reported). It can therefore be argued that the task may not actually capture meaningful cognitive/motivational traits and their effects on decision-making, but instead spurious differences in punisher intensity.

      We thank the reviewer for raising this important potential limitation of our study. We agree that how participants self-adjusted their sound volume may important consequences for our interpretations of the data. Unfortunately, despite the scalability of online data collection, this highlights one of its major weaknesses in the lack of controllability over experimental parameters. The previous paper from which we obtained our aversive sounds (Seow & Hauser, 2021, Behav Res, doi.org/10.3758/s13428-021-01643-0) contains useful analyses with regards to this discussion. When comparing the unpleasantness of the sounds played at 50% vs 100% volume, the authors indeed found that the lower volumes lead to lower unpleasantness ratings. However, the magnitude of this effect did not appear to be substantial (Fig. 4 from the paper), and even at 50% volume, the scream sounds we used were rated in the top quartile for unpleasantness, on average. This implies that the sounds have sufficient inherent unpleasantness, even when played at half intensity. We find this reassuring, in the sense that any self-imposed volume effects may not be large. Of note, our instructions to participants to adjust the volume to a ‘loud but comfortable’ level was based on the same phrasing used in this study.

      To the reviewers point on how this might affect the reliability of the task, we have included the following in the ‘Discussion’ section:

      Alternatively, as participants self-determined the loudness of the punishments, differences in volume settings across sessions may have impacted the reliability of this parameter (and indeed other measures).

      Please see below for analyses accounting for punishment unpleasantness ratings.

      This undercuts the proposed significance of this task as a translational tool for understanding anxiety and avoidance. More information about ratings of punisher unpleasantness and its relationship to task behavior, anxiety and cognitive parameters would be valuable for interpreting findings. It would also be of interest whether the same results were observed if the aversiveness of the punisher was titrated prior to the task.

      As suggested, we have now included sensitivity analyses using the unpleasantness ratings that show their effect is minimal on our primary inference. We report relevant results below in the ‘Recommendations For The Authors’ section. At the same time, we think it is important to acknowledge that unpleasantness is a combination of both the inherent unpleasantness of the sound and the volume it is presented at, where only the latter is controlled by the participant. Therefore, these analyses are not a perfect indicator of the effect of participant control. For convenience, we reproduce the key findings from this sensitivity analysis here:

      Approach-avoidance hierarchical logistic regression model

      We assessed whether approach and avoidance responses, and their relationships with state anxiety, were impacted by punishment unpleasantness, by including unpleasantness ratings as a covariate into the hierarchical logistic regression model. Whilst unpleasantness was a significant predictor of choice (positively predicting safe option choices), all significant predictors and interaction effects from the model without unpleasantness survived (Supplementary Figure 11). Critically, this suggests that punishment unpleasantness does not account for all of the variance in the relationship between anxiety and avoidance.

      Mediation model

      When unpleasantness ratings were included in the mediation models, the mediating effect of the reward-punishment sensitivity index did not survive (discovery sample: standardised β = 0.003 ± 0.003, p = 0.416; replication sample: standardised β = 0.004 ± 0.003, p = 0.100; Supplementary Figure 12). Pooling the samples resulted in an effect that narrowly missed the significance threshold (standardised β = 0.004 ± 0.002, p = 0.068).

      More generally, whether or not to titrate the punishments (and indeed the rewards) is an interesting experimental decision, which we think should be guided by the research question. In our case, we were interested in individual differences in reward/punishment learning and sensitivity and their relation to anxiety, so variation in how aversive the sounds affected approach-avoidance decisions was an important aspect of our design. In studies where the aim is to understand more general processes of how humans act under approach-avoidance conflict, it may be better to tightly control the salience of reinforcers.

      Ultimately, the best test of the causal role of anxiety on avoidance, and against the hypothesis that our results were driven by spurious volume control effects, would be to run within-subjects anxiety interventions, where these volume effects are naturally accounted for. This will be an important direction for future studies using similar measures. We have added a paragraph in the ‘Discussion’ section on this point:

      Relatedly, participants had some control over the intensity at which the punishments were presented, which may have driven our findings relating to anxiety and putative mechanisms of anxiety-related avoidance. Sensitivity analyses showed that our finding that anxiety is positively associated with avoidance in the task was robust to individual differences in self-reported punishment unpleasantness, whilst the mediation effects were not. Future work imposing better control over the stimuli presented, and/or using within-subjects designs will be needed to validate the role of reward/punishment sensitivities in anxiety-related avoidance.

      Although the procedure and findings reported here remain valuable to the field, claims of novelty including its translational potential are perhaps overstated. This study complements and sits within a much broader literature that investigates roles for aversion and cognitive traits in approach-avoidance decisions. This includes numerous studies that apply reinforcement learning models to behavior in two-choice tasks with latent probabilities of reward and punishment (e.g., see doi: 10.1001/jamapsychiatry.2022.0051), as well as other translationally-relevant paradigms (e.g., doi: 10.3389/fpsyg.2014.00203, 10.7554/eLife.69594, etc).

      We agree with the reviewer that our approach builds on previous work in reinforcement learning, approach-avoidance conflict and translational measures of anxiety. Whilst there are by now many studies using two-choice learning tasks with latent reward and punishment probabilities, our main, and which we refer to as ‘novel’, aim was to bring these fields together in such a way so as to model anxiety-related behaviour.

      We note that we do not make strong statements about whether these effects speak to traits per se, and as Reviewer 1 notes, the evidence from our study suggests that the present measure may be better suited to assessing state anxiety. While computational model parameters can and are certainly often interpreted as constituting stable individual traits, a more simple interpretation of our findings may be that state anxiety is associated with a momentary preference for punishment avoidance over reward pursuit. This can still be informative for the study of anxiety, especially given the notion of a continuous relationship between adaptive/state anxiety and maladaptive/persistent anxiety.

      Having said that, we agree with the underlying premise of the reviewer’s point that how the measure relates to trait-level avoidance/inhibition measures will be an interesting question for future work. We appreciate the importance of using tasks such as ours and those highlighted by the reviewer as trait-level measures, especially in computational psychiatry. We have now included a discussion on the potential roles of cognitive/motivational traits, in line with the reviewer’s recommendation – briefly, we have included the suggested references by the reviewer, discussed the measure’s potential relevance to cognitive/motivational traits, and direct interested readers to the broader literature. Please see below for details.

      Reviewer #3 (Recommendations For The Authors):

      As stated in the public review, punisher unpleasantness and its relationship to key findings (including for retest) should be reported and discussed.

      We signpost readers to our new analyses, incorporating unpleasantness ratings into the statistical models, from the main manuscript as follows:

      Since participants self-determined the volume of the punishments in the task, and therefore (at least in part) their aversiveness, we conducted sensitivity analyses by accounting for self-reported unpleasantness ratings of the punishment (see the Supplement). Our finding that anxiety impacts approach-avoidance behaviour was robust to this sensitivity analysis (p < 0.001), however the mediating effect of the reward-sensitivity sensitivity index was not (p > 0.1; see Supplement section 9.9 for details).

      We reproduce the relevant section from the Supplement below. Overall, we found that the effect of anxiety on choices (via its interaction with punishment probability) remained significant after accounting for unpleasantness, however the mediating effect of reward-punishment sensitivity was no longer significant when unpleasantness ratings were included in the model. As noted above, unpleasantness ratings are not a perfect measure of self-imposed sound volume, and indeed punishment sensitivity is essentially a computationally-derived measure of unpleasantness, which makes it difficult to interpret the mediation model which contains both of these measures. However, since we found that anxiety affected choice over and above and effects of self-imposed sound volume (using unpleasantness ratings as a proxy measure), we argue that the task still holds value as a model of anxiety-related avoidance.

      [Supplement Section 9.9: Sensitivity analyses of punishment unpleasantness]

      Distribution of unpleasantness

      The punishments were rated as unpleasant by the participants, on average (discovery sample: mean rating = 31.1 [scored between 0 and 50], SD = 13.1; replication sample: mean rating = 32.1, SD = 12.7; Supplementary Figure 10).

      Approach-avoidance hierarchical logistic regression model

      We assessed whether approach and avoidance responses, and their relationships with state anxiety, were impacted by punishment unpleasantness, by including unpleasantness ratings as a covariate into the hierarchical logistic regression model. Whilst unpleasantness was a significant predictor of choice (positively predicting safe option choices), all significant predictors and interaction effects from the model without unpleasantness ratings survived (Supplementary Figure 11). Critically, this suggests that punishment unpleasantness does not account for all of the variance in the relationship between anxiety and avoidance.

      Mediation model

      When unpleasantness ratings were included in the mediation models, the mediating effect of the reward-punishment sensitivity index did not survive (discovery sample: standardised β = 0.003 ± 0.003, p = 0.416; replication sample: standardised β = 0.004 ± 0.003, p = 0.100; Supplementary Figure 12). Pooling the samples resulted in an effect that narrowly missed the significance threshold (standardised β = 0.004 ± 0.002, p = 0.068).

      Test-retest reliability of unpleasantness

      The test-retest reliability of unpleasantness ratings was excellent (ICC(3,1) = 0.75), although participants gave significantly lower ratings in the second session (t56 = 2.7, p = 0.008, d = 0.37; mean difference of 3.12, SD = 8.63).

      Reliability of other measures with/out unpleasantness

      To assess the effect of accounting for unpleasantness ratings on reliability estimates of task performance, we extracted variance components from linear mixed models, following a standard approach (Nakagawa et al., 2017) – note that this was not the method used to estimate reliability values in the main analyses, but we used this specific approach to compare the reliability values with and without the covariate of unpleasantness ratings. The results indicated that unpleasantness ratings did not have a material effect on reliability (Supplementary Figure 14).

      We discuss the findings of these sensitivity analyses in the ‘Discussion’ section, as follows:

      Relatedly, participants had some control over the intensity at which the punishments were presented, which may have driven our findings relating to anxiety and putative mechanisms of anxiety-related avoidance. Sensitivity analyses showed that our finding that anxiety is positively associated with avoidance in the task was robust to individual differences in self-reported punishment unpleasantness, whilst the mediation effects were not. Future work imposing better control over the stimuli presented, and/or using within-subjects designs will be needed to validate the role of reward/punishment sensitivities in anxiety-related avoidance.

      Introduction and discussion should spend more time relating the task and current findings to existing procedures and findings examining individual differences in avoidance and cognitive/motivational correlates.

      We thank the reviewer for the opportunity to expand on the literature. Whilst there are numerous behavioural paradigms in both the human and non-human literature that involve learning about rewards and punishments, our starting point for the introduction was the state-of-the-art in translational models of approach-avoidance conflict models of anxiety. Therefore, for the sake of brevity and logical flow of our introduction, we have opted to bring in the discussion on other procedures primarily in the ‘Discussion’ section of the manuscript.

      We have now included the reviewer’s suggested citations from their ‘Public Review’ as follows:

      Since we developed our task with the primary focus on translational validity, its design diverges from other reinforcement learning tasks that involve reward and punishment outcomes (Pike & Robinson, 2022). One important difference is that we used distinct reinforcers as our reward and punishment outcomes, compared to many studies which use monetary outcomes for both (e.g. earning and losing £1 constitute the reward and punishment, respectively; Aylward et al., 2019; Jean-Richard-Dit-Bressel et al., 2021; Pizzagalli et al., 2005; Sharp et al., 2022). Other tasks have been used that induce a conflict between value and motor biases, relying on prepotent biases to approach/move towards rewards and withdraw from punishments, which makes it difficult to approach punishments and withdraw from rewards (Guitart-Masip et al., 2012; Mkrtchian et al., 2017). However, since translational operant conflict tasks typically induce a conflict between different types of outcome (e.g. food and shocks/sugar and quinine pellets; Oberrauch et al., 2019; van den Bos et al., 2014), we felt it was important to implement this feature. One study used monetary rewards and shock-based punishments, but also included four options for participants to choose from on each trial, with rewards and punishments associated with all four options (Seymour et al., 2012). This effectively requires participants to maintain eight probability estimates (i.e. reward and punishment at each of the four options) to solve the task, which may be too difficult for non-human animals to learn efficiently.

      We have also included a discussion on the measure’s potential relevance to cognitive/motivational traits as follows:

      Finally, whilst there is a broad literature on the roles of behavioural inhibition and avoidance tendency traits on decision-making and behaviour (Carver & White, 1994; Corr, 2004; Gray, 1982), we did not replicate the correlation of experiential avoidance and avoidance responses or the reward-punishment sensitivity index. Since there were also no significant correlations across task performance indices and clinical symptom measures, our findings suggest that the measure may be more sensitive to behaviours relating to state anxiety, rather more stable traits. Nevertheless, how performance in the present task relates to other traits such as behavioural approach/inhibition tendencies (Carver & White, 1994), as has been found in previous studies on reward/punishment learning (Sharp et al., 2022; Wise & Dolan, 2020) and approach-avoidance conflict (Aupperle et al., 2011), will be an important question for future work.

      We also now direct readers to a recent, comprehensive review on applying computational methods to approach-avoidance behaviours in the ‘Introduction’ section:

      A fundamental premise of this approach is that the brain acts as an information-processing organ that performs computations responsible for observable behaviours, including approach and avoidance (for a recent review on the application of computational methods to approach-avoidance conflict, see Letkiewicz et al., 2023).

      I am curious why participants were excluded if they made the same response on 20+ consecutive trials. How does this represent a cut-off between valid versus invalid behavioral profiles?

      We apologise for the lack of clarity on this point in our original submission – this exclusion criterion was specifically if participants used the same response key (e.g. the left arrow button) on 20 or more consecutive trials, indicating inattention. Since the left-right positions of the stimuli were randomised across trials, this did not exclude participants who repeatedly chose the same option frequently. However, as we show in the Supplement, this, along with the other exclusion criteria, did not affect our main findings.

      We have now clarified this as follows:

      … we excluded those who responded with the same response key on 20 or more consecutive trials (> 10% of all trials; 4%/6% in discovery and replication samples, respectively) – note that as the options randomly switched sides on the screen across trials, this did not exclude participants who frequently and consecutively chose a certain option.

    1. 12:3 Those who are wi se[a] will shine like the brightness of the heavens, and those who lead many to righteousness, like the stars for ever and ever. https://www.americamagazine.org/politics-society/2020/05/08/its-time-rethink-electoral-college https://www.npr.org/sections/itsallpolitics/2011/12/20/144016912/we-the-people-npr-readers-would-ratify-four-new-amendments https://www.americamagazine.org/politics-society/2020/05/08/its-time-rethink-electoral-college https://www.npr.org/sections/itsallpolitics/2011/12/20/144016912/we-the-people-npr-readers-would-ratify-four-new-amendments https://constitutioncenter.org/blog/vote-now-an-amendment-to-end-the-electoral-college https://www.nytimes.com/2020/02/09/opinion/letters/electoral-college.html https://www.latimes.com/opinion/readersreact/la-ol-le-electoral-college-20180904-story.html you are offline https://slate.com/news-and-politics/2014/05/amending-the-constitution-is-much-too-hard-blame-the-founders.html we the people rise again https://slate.com/news-and-politics/2012/06/fix-the-constitution-amending-by-national-referendum.html safe souls, safe fu https://slate.com/news-and-politics/2012/06/fixing-the-constitution-protecting-informational-privacy.html https://slate.com/news-and-politics/2020/05/new-reconstruction-constitution-democracy.html We the People of Slate … The U.S. Constitution, as you mighta been, shoulda [“come” on … its someday] rewrϕte it. "Politicians talk about the Constitution as if it were as sacrosanct as the Ten Commandments [interjection: spec. it is actually almost exactly related!]. But the document itself invites change and revision. What if the president served only one six-year term instead two four-year terms? What if your state’s population determined how many senators represent it? What if the Constitution included a right to health care? We asked legal scholars and Slate readers to cross out what they didn’t like in the Constitution and pencil in their hearts’ desires. Here’s what the document would look like with their best ideas." Slate: u_s_constitution as_rewritten by_slate_legal_experts_and_readers 多也了了夕 "with a wand of scheffilara, 并#亦太 he begins … "I am now on the Staff of Menelaus, the Spears of Longinus and Lancelot; and the name "Mosche ex Nashon." Logically the recent mentions of Gilgamesh and the simultaneous 同時 overlaping 場道 of the eventual link between the famous ruling of Solomon on the separation of babies and mothers and waters and land … to a story of many “two cities” that culminates in a cultural or societal or “evolutionary” link to Sodom and Gomorrah and the city-state of Babylon (and it’s Hanging Gardens) and also of course to Paris and Troy and “Masstodon” and city-states [ciudadestado] and perhaps planet-cities; from Cambridge to Cambridge across the “Cable” to see state to “London” … recently I called it “the city of realms” … I started out logically intending to link “game theory” and John Nash to the mathematical story of Sputnik and a revival of American physics; but in my usual way of rambling into the woods [I mean neighborhood] of stream of consciousness … turned into a premonitory discourse of “two cities” and how sometimes even things as obvious as the number of letters in the word “two” don’t do a good enough job of conveying … how and/or why one is simply never enough, and two isn’t much better–but in the end a circle … is drawn; the perfect circle in our imaginary mathematical perfection … I see a parted “line” in the letter pronounced “tea” (and beginning that word); and two “vee” (pron. of “v”) symbols joined together in a word we pronounce as “double-you” … and symbolically because I know “V” is the Roman Numeral for 5 (five) and I know not how to multiply in Roman numerals– It’s important to pause; here. I am going to write a more detailed piece on “the two cities” as I work through this maze like crossroads between “them” and “demo…” … here demorigstrably I am trying to fuse together an evolutionary change in … lit. biological evolution as well as an echelon leap forward in "self-government" … in a place where these two things are unfathomable and unspokenly* connected. https://www.google.com/search?q=prometheuslocke+%2Bsite%3Agodlikeproductions.com “Silence is betrayal” -MLK To a question on the idiom; is Bablyon about “the law” or “of the land of Nod?” “What is democracy” … the song, Metallica’s “ONE” echoes and repeats; as we apparently scrive together the word “THEM” … I question myself … if Babylon were the capital city of some mythical Nation of Time … if it were the central “turning point” of Sheol; ... >|< Can you not see that in this place; in a world that should see and does there is a gigantic message proving that we are not in reality and trying to show us how and why that's the best news since ... ever---that it's as simple as conjoining "the law of the land" with a basic set of rules that automatically turn Hell into something so much closer to Heaven I just do not understand---why we cant stand up together and say "bullets will not kill innocent children" and "snowflakes will not start avalanches ...." that cover or bury or hide the road from Earth to Verital)e .... or from the mythical Valis to Tanis---or from Rigel to Beth-El ... "guess?" ## as "an easy" answer; I'm looking for a fusion of "law and land" that somehow remembers a "jok'er a scene" about "lawn" seats; and "where the girls are green;" It's as simple as night and day; Heaven and Hell ... the difference between survival and--what we are presented with here; it's "doing this right"--that ends the Hell of representative democracy and electoral college--the blindness and darkness of not seeing "EXTINCTION LEVEL EVENT" encoded in these words and in our governments foundation ... by the framers [not just of the USA; but English .. and every language]  ... is literally just as simple as "not caring" or thinking we are at the beginning of some long process--or thinking it will never be done--that special "IT" that's the emancipation of you and I. Here words like "gnosis" and "gaudeamus" pair with my/ur "new ntersanding*" of the difference between Asgard and Medgard and really understanding our purpose here is to end "evil" ... things like "simulating disease and pain" (here, simulating meaning ... intentionally causing, rather than "gamifying away") and successfully linking the "Pillars of Hercules" to Plato's vision of Atlantis and the letter sequences "an" and "as" ... unlock a fusion of religion and mythology and "cryptographic truth" that connects "messianic" and "Christian" to "Roman" ... "Chinese" and "American" ... literally the key to the difference between the phrases "we are" and "we were" .... in "sight" of "silicon" in simulation and Israel, Genesis, and "silence" ... trying to the raising of Asgardian enlightenment ... and seeing "simple cypher" connecting to "Norse" ... and the "I AM THAT" surer than shit ... the intention and design of all religion and creation is to end "simulated reality" and also not seeing "SR" ... in Israel and Norse ... "for instance." https://www.google.com/search?q=%22I+AM%22+%22WE+ARE%22+%2Bsite%3Afromtaws "SOIS" a key--in two languages conjugated literally as both "I AM" and "WE ARE" simultaneously; Search: I know that if I am than so are you ... and it is because we have overcome .... something I truly cannot figure out, fathom, or believe ... was truly here before us--a spiralling series of failures ... speaking: to the heavens; but in secret and in action; "doing everything possible to succeed." It's a simple linguistic concept; the "singularity" and the "plurality" of a simple word--"to be"--but it goes to the heart of everything that we are and everything that is around us. This is a message about understanding and preserving individuality as well as liberty; and literally seeing "ARXIV" and understanding "often" and failing to connect God and prescience to "IV" and the Fourth Amendment ... it's about blindness and ... "curing the blind instantly" ... and fathoming how and why this message has been etched into our entire history and and all religions and myths and music--to help us "to be THAT we" that actually "are responsible" for the end of Hell. I neglected to mention "Har-Wer" and "Tower of Babel" which are both related lingusitically, religiously and topically: "to who ..." and while we're on "four score and [seven years from now]" seeing the fourth "living thing" in Eden and it's (the name, Abel) connection to Babel and Abraham Lincoln; slavery and ... understanding we live in a place where the history of the United States also, like Monoceros and "Neil Armstrong's first step" are a time shifted ... overlayed map to achieving freedom ... it's about becoming a father-race ... and actually "doing" the technological steps required to "emancipate the e's of 'me&e'" and survive in exo-planetary space--- it might be as simple as adding "because we did this" here and now; and having it be something we are truly proud of .... forevermore™ ... for certain in the heart of this story about cyclicality and repetition of error--its not because we did "this" or something over and over again; it's about changing "the problem" and then helping others to also overcome ... "things like time travel ... erasing speech" --- however that happenecl. I also failed to mention that "I am in Hell" ... as in this world is hellacious to me; in an overlay with the Hellenic period and this message that we are in the Trojan Horse ... a small gem .... "planet" truly is the Ark of the Covenant---and it's the simple understanding that "reality is hell" is to "living without air conditioning and plumbing is hell" just as soon as you achieve ... "rediscovering" those things--- I can't figure out why I am the only person screaming "this is Hell." That's also, Hell. ... but recently suggested an old joke about "there being 10 kinds of people in the world (obv an anti-tautology and a tautology simultaneously)" only after that brief bit of singularity and duality mentioning the rest of the joke: "those that understand binary and those that don't know how to base convert between counting with two hands and counting with only an 'on and off.'" It's not obvious if you aren't trying to figure it out, I suppose; but 10 is decimal notation for "kiss" and the "often" without "of" ... and binary notation for the decimal equivalent of "2." A long long time ago in a state that simply non-randomly ties to the heart of the name of our galaxy ... I was again thinking of the "perfect imperfections" of things like saying "three equals one equals one" (which, of course was related to the Holy Trinity and it's "prescient/anachronistic Adamic presence encoded in the name Ab|ra|ha|m" which means "father of a great multitude") ... I brought that one back in the last few months; connecting the letter K and in this "logos-rythmic" tie to the "base of a number system" embellish the truth just a bit and suggest a more accurate rendition of the original [there is no such thing as equality, "is" of separate objects--as in no two snowflakes are the same unless they are literally the same one; true of ancient weights and with the advent of (thinking about) time no two "planets" are the same even if they're the exact same one--unless it's at a fixed moment in time. This name may be viewed either as meaning "father of many" in Hebrew or else as a contraction of ABRAM (1) and הָמוֹן (hamon) meaning "many, multitude". The biblical patriarch Abraham was originally named Abram but God changed his name (see Genesis 17:5). https://en.wikipedia.org/wiki/Yeshua#Yeshua,_Yehoshua,_and_Yeshu_in_the_Talmud K=3:11 ... to a handle on the music, the DHD of the gate and the *ring of David's "sling" ... ---and that's a relationship of "3 is to 11" as [the SAT style "analog]y" as a series of alpha, two mathematic, and two numeric symbols ... may only tie in my mind alone to the books of Genesis and Matthew and the phrase "chapter and verse" and to the stories of Lot and Job ... again in Genesis and the eponymous "Book of Job." So ... "tying up loose ends one 10b [III] iv. " as it appears I've taken it upon myself to call a Job and suggest is my "Lot in life [x]i* [3]" I worry sometimes that important things are missing, or will disappear---for instance Mirriam Webster, which is a "canonical/standard dictionary) should probably have an entry for "lot in life" non-idiomatically as "granny apples to sour apples" as 2 MANY ALSO ICI; 1twoⅱ ... following in Mitnick's bold introductory word steps; the curve and the complement ... the missiles and the canoes; the line and the blank space ... "supposedly two examples of two kinds, which could be three not nothings ... Today I write about something monumental; as if as important as the singularity depicted in Arthur C. Clarke's 2001 "A Space Odyssey" ... and remember a day when I thought it very novel and interesting to see the words "stillborn and yet still born" connected in a single piece of writing to "Stillwater and yet still water" ... today adding in another phrase noting the change wrought only by one magical single "space" (also a single capital letter; and a third phrase): "block chains with a great blockchain." http://www.goodmath.org/blog/2015/07/21/arabic-numerals-have-nothing-to-do-with-angle-counting/ https://gizmodo.com/no-this-viral-image-does-not-explain-the-history-of-ar-1719306568 https://en.wikipedia.org/wiki/Chinese_word_for_%22crisis%22 https://dictionary.hantrainerpro.com/chinese-english/translation-ji_howmany.htm https://dictionary.hantrainerpro.com/chinese-english/translation-duo_many.htm https://en.wikipedia.org/wiki/Euripides, Iphigenia in Aulis or Iphigenia at Aulis[1] (Ancient Greek: Ἰφιγένεια ἐν Αὐλίδι, Iphigeneia en Aulidi; variously translated, including the Latin Iphigenia in Aulide) is the last of the extant works by the playwright Euripides. Written between 408, after Orestes, and 406 BC, the year of Euripides' death, the play was first produced the following year[2] in a trilogy with The Bacchae and Alcmaeon in Corinth by his son or nephew, Euripides the Younger,[3] and won first place at the City Dionysia in Athens. The play revolves around Agamemnon, the leader of the Greek coalition before and during the Trojan War, and his decision to sacrifice his daughter, Iphigenia, to appease the goddess Artemis and allow his troops to set sail to preserve their honour in battle against Troy. The conflict between Agamemnon and Achilles over the fate of the young woman presages a similar conflict between the two at the beginning of the Iliad. In his depiction of the experiences of the main characters, Euripides frequently uses tragic irony for dramatic effect. J.K. Rowling spurred just this past week a series of explanations about just exactly what is a blockchain coin worth ... and why is it so; her final words on the subject (artistic liberty taken, obviously not the last she'll say of this magic moment) "I don't think I trust this." Taken directly from an off the cuff email to ARXM titled: "Slow the S is ... our Hypothes.is" I imagine I'll be adding some wiki/ipfs stuff to it--and try to keep it compatible; the design and layout is almost exactly what I was dreaming about seeing--as a "first rough draft product." Lo, and behold. It's been added to the many places I host my tome; the small compilation of nearly every important email that has gone out ... all the way back to the days of the strange looking Margarita glass ... that now very much resembles the "Cantonese character 'le'" which I've come to associate with a "handle" on multiple corners of a room--something like an automatic coat rack conveyor belt connecting different versions of "what's in the box." I'm planning on using that symbol 了 to denote something like multiple forks of the same page. Obviously I'm thinking forward to things like "the Transhumaist Chain Party" (BDSM, right?)'s version of some particular piece of legislation, let's say everything starts with the sprawling "bulbing" of "Amendment M" ideas and specific verbiage ... and then we'll of course need some kind of new git/subversion/cvs style version control mechanism to merge intelligently into something that might actually .... really should ... make it into that place in history--the first constitutional amendment ratified by a "Continental Congress of All People" ... but you could also see it as an ongoing sort of forking of something like the "wikipedia page" on what some specific term, say "technocracy" means, and how two parties might propagandize and change the meaning of such thing; to suit the more intelligent and wise times we now live in. For instance, we might once have had a "democracy" and a "democractic" party that had some Anarchist Cook Book version of the history of it ending in something like Snipes and Stallone's "DEMOLITION MAN." Just kidding, we all know "democracy" has everything to do with "d is cl ... and not th" ... to be the them that is the heart of the start of the first true democracy. At least the first one I've ever seen, in my old "to a republic" ... style. As it is you can play around with commenting and highlighting and annotating all the stuff I've written and begged and begged for comments on--while I work on layering the backend to to perma-store our ideas and comments on both a blockchain (probably a new one; now that i've worked a little with ethereum) with maybe some key-merkle-tree-walk-search stuff etched into the original Rinkeby ... and then of course distributed data in the "public owned and operated" IPFS. To be clear, I plan on rewriting the backend storage so that we will have a permanent record of all comments; all versions of whatever is being commented on; and changes/revisions to those documents--sort of turning the web into a massive instant "place of collaboration, discussion, and co-authoring" ... if you use the wonderful LEGO pieces that have been handed to us in ideas from places like me, lemma--dissenter, and of course hypothes.is who has brought you and i such a polished and nice to look at "first draft" of something like the living Constitution come repository of all human knowledge. I do sort of secretly wich they would have called this project something like "annotating and reflecting (or real or ...) knowledge" just so the movement could have been called ARK. ... or something .... but whatever join the "calling you a reporter" group or ... "supposedly a scientist?" NOIR INgR .. I CITE SITE OF ENUDRICAM; a rekindling of the dream of a city appearing high above in the sky, now with a boldly emblazened smiling rainbow and upsidown river ... specifically the antithesis of "angel falls," there's a lagoon too--actually a chain of several ponds underneith the floating rock ... and in some versions of this waking dream there are rings around the thing; you might imagine an artificial set of centripetal orbitals something like a fusion of the ring Eslyeum and the "Six-Axis ride" of the JKF Center's "Spacecamp." I write as I dream, and though I cannot for certain explain exactly how; it's become a strong part of my mythology that this spectacular rendition of "what ends the silence" has something to do with the magical delivery of "a book" ... something not of this Earth but an unnatural thing; one I've dreamt of creating many times. This book is something like the DSM-IV and something like a Merck diagnostic manual; but rather than the old antiquated cures of "the Norse Medgard" this spectacle nearly "itsimportant" autoprints itself and lands on something like every doorpost; what it is is a list of reasons why "simply curing all disease" with no explanation and no conversation would be a travesty of morality--how it would render us half-blind to the myriad of new solutions that can come from truly understanding why "ITIS" to me has become a kind of magical marker: an "it is special" as in, it's cure could possibly solve a number of other problems. Through that missing "o," English on the ball, we see a connection between a number of words that shine bright light including Exodus itself which means "let there be light," the word for Holy Fire and the Burning Bush.. .reversed to hSE'Ah, and a story about the Second Coming parting our holy waters. This answer connects the magical Rod's of Aaron in Exodus and the Iron Rod of Jesus Christ to the Sang Rael itself... in a fusion that explains how the Periodic Table element for Iron links not just to Total Recall and Mars, but also to this key my dream of what the first day of the Second Coming might be like; were the Rod of Christ... in the right hands. In a story that also spans the Bible, you might understand better how stone to bread and your input make all the difference in the world between Heaven and Adam's Hand. Once more, what do you think He ....   Since the very earliest days of this story, I have asked for better for you, even than see Nearly all of the original parts of the original "post-origination dream" remain intact; there's a walkway that magically creates new paths and "attractions" based on where you walk, something like an inversion of the artificial intelligence term "a random walk down a binary tree" ... for instance going left might bring you to the Internet Cafetornaseum of the Earl of Sandwich; and going to the right might bring you to the ICIMAX/Auditorium of Science and Discovery--there's a walkway to "Magical GLAS D'elevators" that open a special "instantiation" of the Japan Room of the Potter and the Toolmaker ... complete with a special [second level and hidden staircase] Pool of Bethesdaibo verily delivering something like youth of mind and body ... or at least as close to such a thing as a sip of Holy Water or Ambrosia or a dip in the pool of Coccoon and Ponce De'Leon could instantly bring ... to those that have seen Jupiter Ascending ... the questions of "nature versus nurture" and what it means to be "old and wise" and "young at heart" truly mean--- https://www.youtube.com/watch?v=M8CyN1awWls https://link.springer.com/chapter/10.1057/9780230366688_16 https://www.youtube.com/watch?v=YDo5zvYNn3A Somewhere between the outdoor rafting ride and the level with the special "ballroom of the ancient gallery" ... perhaps now being named or renamed or recalled as something about "Face [of] the Music" lies a magical "mini-maize" ... a look at a mock-up (or #isitit) of Merlink and Harthor's "round table" that displays a series of ... (at least to me) magical appearing holographic displays and controls that my dreams have stolen from Phillip K. Dick's Minority Report and something of what I hope Microsoft's Dynamics/Hololens/Surface will become---a series of short "focus groups" .... to guage and discuss the information in the "CITIES-D5AM-MERCK" ... how to end world hunger and nearly all disease with the press of a magical buzzer--castling churches to something like "political-party-town-hall-meeting centers" and replacing jails and prisons and hospitals with something like the "Hospitalier's PRIDE and DOJOY's I practiced "Kung-fun-dance" ... a fusion of something like a hotel and a school that probably looks very much like a university with classrooms and dorms and dining hall's all fit into a single building. I imagine a series of 2 or 3 "room changes" as in you walk from the one where you get the book and talk about it ... to the one where you talk about "what everyone else said about it" and maybe another one that actually connects you to other people with something like Facebook's Portal; the point of the whole thing to really quickly "rubber stamp" the need for an end to "bars in the sky" nonalcoholic connotation--as in "overcoming the phrase the sky is the limit" and showing us the need for a beacon of glowing hope fulfilled--probably actually the vision of a holographic marker turning into actual rings around the single moon of Earth, the focus of the song annoucing the dawn of the age of Aquarius--- It might lead us also to Ceres; and another set of artificial rings, or to Monoceros and a rehystorical understanding of the birthplace and birthing of the "river roads" that bridge the "space gaps" in the galaxy from our "one giant leap for mankind" linking the Apollo moon landing to the mythological connection to the sun; and connecting how the astrological charts of the ancients might detail a special kind of overlapping--the link between Earth's SOL and something like Proxima or Alpha Centauri; and how that "monostar bridge" might overlap to Orion and from there through Sagitarius and the center of the Milky Way ... all the way to Andromeda and more dreams of being in a place where there's a map to a tri-galactic system in the constellation Cancer and a similar one in Leo ... and just incase you haven't noticed it--a special marker here, I thought to myself it might be cool to "make an acronymic tie to Monoceros" and without even thinking auto-wrote Orion (which was the obvious constellation next to Monoceros, in the charts) and then to Sagitarrius; which is the obvious ... heart of our astrological center and link to "other galaxies." ----I've dreamt or scriven or reguessed numerous times how the Milky Way's map to an "Atlas marked through time by the ages and the ancients" might tie this place and this actual map to the creation of the railways between stars to the beginning and the end of time and of course to this message that links it all to time travel. There's a few "guesses" I've contemplated; that perhaps the Milky Way chart is a metal-cosmic or microcosmic map to the dawn of time in the galactic vision of ... just after the big bang; or it might tie to a map of something like the unthinkable--a civilization that became so powerful it was able to reverse the entropy of "cosmic expansion" and reverse the thing Asimov wrote of in "The Last Question" as the end of life and the ability to survive basically due to "heat loss." "The Last Question." (And if you read two, why not "The Last Answer"?). Find these readings added to our collection, 1,000 Free Audio Books: Download Great Books for Free. https://archive.org/details/texts http://zlibraryexau2g3p.onion.pet/ Looking for free, professionally-read audio books from Audible.com, including ones written by Isaac Asimov? * all "asterisks" in the abovə document denote a sort of Adamic unspoken relationship between notations and meanings; here adding the "Latin word for three" and source of the phrase "t.i.d." (which is doctor/pharmacy latin for "three times a day") where the "t" there is an abbreviation of "ter" ... and suppose the link between K and 11 and 3 noting it's alphanumeric position in the English alphabet as the 11th letter and only linking cognitively to three via the conversion between hex, and binarryy ... aberrative here is the overlapping "hakkasan" style (or ZHIV) lack of mention of the answer in "state of Kansas" and the "citystate of Slovakia" as described in the ICANN document linked [in] the related subsection or slice of the word "binarry" for the state of India. Tetris could be spelled with the addition of only a single letter [in] "tea"---the three letters "ris" are the hearts of the words "Christ" and "wrist" [and arguably of Osiris where you also see the round table character of the solar-system/sun glyph and the chemical element for The Fifth Element (as def. by i) via "Sinbad" and "Superman." The ERIS Free Network should also be mentioned here in connection with the IRC network I associate in the place between skipping stones and sacred hearts defined by "AOL" and "Kdice" in my life. In the lexicon of modern HTML, curly braces are generally relative to "classes" and "major object definitions (javascript/css)" while square brackets generally only take on computer-interpreted meaning in "Markdown" which is clearly (by definition, by this character set "[]") a superset (or at least definately not a subset) of HTML. Dr. Will Caster (Johnny Depp) is a scientist who researches the nature of sapience, including artificial intelligence. He and his team work to create a sentient computer; he predicts that such a computer will create a technological singularity, or in his words "Transcendence". His wife, Evelyn (played by Rebecca Hall), is also a scientist and helps him with his work. Following one of Will's presentations, an anti-technology terrorist group called "Revolutionary Independence From Technology" (R.I.F.T.) shoots Will with a polonium-laced bullet and carries out a series of synchronized attacks on A.I. laboratories across the country. Will is given no more than a month to live. In desperation, Evelyn comes up with a plan to upload Will's consciousness into the quantum computer that the project has developed. His best friend and fellow researcher, Max Waters (Paul Bettany), questions the wisdom of this choice, reasoning that the "uploaded" Just from my general understanding and memory "st" is not ... to me (specifically) an abbreviation of "state" but "ste" is a U.S. Postal code (also "as I understand it") for the name of a special room or set of rooms called a "suite" and in Adamic "connotation" I sometimes read it as "sweet" ... which has several meanings that range from "cool" to "a kind of taste sensation" to "easy to sway or fool." If you asked me though, for instance if "it" was an abbreviation or shorthand notation or acronym for either "a United state" or "saint" ... you'd be sure. While it's clear from studying linguistic cryptography ... (If I studied it a little here and some there, its also from the "universal translator of Star Trek") and the personal understanding that language is a kind of intelligent code, and "any code is crackable" ... that I caution here that "meaning" and "face value" often differ widely and wildly ... even in the same place or among the same group of people ... either varying over time or heritage. Menelaus, in Greek mythology, king of Sparta and younger son of Atreus, king of Mycenae; the abduction of his wife, Helen, led to the Trojan War. During the war Menelaus served under his elder brother Agamemnon, the commander in chief of the Greek forces. When Phrontis, one of his crewmen, was killed, Menelaus delayed his voyage until the man had been buried, thus giving evidence of his strength of character. After the fall of Troy, Menelaus recovered Helen and brought her home. Menelaus was a prominent figure in the Iliad and the Odyssey, where he was promised a place in Elysium after his death because he was married to a daughter of Zeus. The poet Stesichorus (flourished 6th century BCE) introduced a refinement to the story that was used by Euripides in his play Helen: it was a phantom that was taken to Troy, while the real Helen went to Egypt, from where she was rescued by Menelaus after he had been wrecked on his way home from Troy and the phantom Helen had disappeared. https://www.britannica.com/topic/Menelaus-Greek-mythology This article is about the ancient Greek city. For the town of ancient Crete, see Mycenae (Crete). For the hamlet in New York, see Mycenae, New York. Μυκῆναι, Μυκήνη The Lion Gate at Mycenae, the only known monumental sculpture of Bronze Age Greece 37°43′49″N 22°45′27″ECoordinates: 37°43′49″N 22°45′27″E This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols. Mycenae (Ancient Greek: Μυκῆναι or Μυκήνη, Mykēnē) is an archaeological site near Mykines in Argolis, north-eastern Peloponnese, Greece. It is located about 120 kilometres (75 miles) south-west of Athens; 11 kilometres (7 miles) north of Argos; and 48 kilometres (30 miles) south of Corinth. The site is 19 kilometres (12 miles) inland from the Saronic Gulf and built upon a hill rising 900 feet (274 metres) above sea level.[2] In the second millennium BC, Mycenae was one of the major centres of Greek civilization, a military stronghold which dominated much of southern Greece, Crete, the Cyclades and parts of southwest Anatolia. The period of Greek history from about 1600 BC to about 1100 BC is called Mycenaean in reference to Mycenae. At its peak in 1350 BC, the citadel and lower town had a population of 30,000 and an area of 32 hectares.[3] 3. Chew 2000, p. 220; Chapman 2005, p. 94: "...Thebes at 50 hectares, Mycenae at 32 hectares..." https://en.wikipedia.org/wiki/Clymene_(mythology) Melpomene (/mɛlˈpɒmɪniː/; Ancient Greek: Μελπομένη, romanized: Melpoménē, lit. 'to sing' or 'the one that is melodious'), initially the Muse of Chorus, she then became the Muse of Tragedy, for which she is best known now.[1] Her name was derived from the Greek verb melpô or melpomai meaning "to celebrate with dance and song." She is often represented with a tragic mask and wearing the cothurnus, boots traditionally worn by tragic actors. Often, she also holds a knife or club in one hand and the tragic mask in the other. Melpomene is the daughter of Zeus and Mnemosyne. Her sisters include Calliope (muse of epic poetry), Clio (muse of history), Euterpe (muse of lyrical poetry), Terpsichore (muse of dancing), Erato (muse of erotic poetry), Thalia (muse of comedy), Polyhymnia (muse of hymns), and Urania (muse of astronomy). She is also the mother of several of the Sirens, the divine handmaidens of Kore (Persephone/Proserpina) who were cursed by her mother, Demeter/Ceres, when they were unable to prevent the kidnapping of Kore (Persephone/Proserpina) by Hades/Pluto. In Greek and Latin poetry since Horace (d. 8 BCE), it was commonly auspicious to invoke Melpomene.[2] See also [AREXMACHINA] Muses in popular culture The Nine Muses Flagstaff (/ˈflæɡ.stæf/ FLAG-staf;[6] Navajo: Kinłání Dookʼoʼoosłííd Biyaagi, Navajo pronunciation: [kʰɪ̀nɬɑ́nɪ́ tòːkʼòʔòːsɬít pɪ̀jɑ̀ːkɪ̀]) is a city in, and the county seat of, Coconino County in northern Arizona, in the southwestern United States. In 2018, the city's estimated population was 73,964. Flagstaff's combined metropolitan area has an estimated population of 139,097. Flagstaff lies near the southwestern edge of the Colorado Plateau and within the San Francisco volcanic field, along the western side of the largest contiguous ponderosa pine forest in the continental United States. The city sits at around 7,000 feet (2,100 m) and is next to Mount Elden, just south of the San Francisco Peaks, the highest mountain range in the state of Arizona. Humphreys Peak, the highest point in Arizona at 12,633 feet (3,851 m), is about 10 miles (16 km) north of Flagstaff in Kachina Peaks Wilderness. The geology of the Flagstaff area includes exposed rock from the Mesozoic and Paleozoic eras, with Moenkopi Formation red sandstone having once been quarried in the city; many of the historic downtown buildings were constructed with it. The Rio de Flag river runs through the city. Originally settled by the pre-Columbian native Sinagua people, the area of Flagstaff has fertile land from volcanic ash after eruptions in the 11th century. It was first settled as the present-day city in 1876. Local businessmen lobbied for Route 66 to pass through the city, which it did, turning the local industry from lumber to tourism and developing downtown Flagstaff. In 1930, Pluto was discovered from Flagstaff. The city developed further through to the end of the 1960s, with various observatories also used to choose Moon landing sites for the Apollo missions. Through the 1970s and '80s, downtown fell into disrepair, but was revitalized with a major cultural heritage project in the 1990s. The city remains an important distribution hub for companies such as Nestlé Purina PetCare, and is home to the U.S. Naval Observatory Flagstaff Station, the United States Geological Survey Flagstaff Station, and Northern Arizona University. Flagstaff has a strong tourism sector, due to its proximity to Grand Canyon National Park, Oak Creek Canyon, the Arizona Snowbowl, Meteor Crater, and Historic Route 66. #PSANSDISL #LWDISP either without gas or seeing cupidic arroz in "thank you" or "allta, wild" ... pps: a magnanimous decision ... I stand here on the brink of what appears to be total destruction; at least of everything I had hoped and dreamed for ... for the last decade in my life which appears literally to span thousands of years if not more in the eyes of some other beholder. I spent several months in Kentucky telling a story of a post apocalyptic and post-cataclysmic delusion; some world where I was walking around in a "fake plane" something like a holodeck built and constructed around me as I "took a walk around the world" to ... it did anything but ease my troubled mind. Recently a few weeks in Las Vegas, and a similar story; telling as I walked penniless down the streets filled with casino's and anachronistic taxi-cabs ... some kind of vision of the entirety of the heavens or the Earth or the "choir of angels" I think of when I echo the words Elohim and Aesir from mythology ... there with me in one small city in superposition; seeing what was a very well put together and interesting story about a "star port" Nirvane ... a place that could build cities into the face of mountains and half working monorails appearing in the sky---literally right before my eyes. I suppose this is the place "post cataclysm" though I still have trouble understanding what it is that's actually about ... in my mind it connects to the words "we are losing habeas" echo'ed from the streets of Los Angeles in a more clear and more military voice than usual--as I walked block by block trying to evade a series of events that would eventually somehow connect all the way to the "outskirts of Orlando, Florida" in a place called Alhambra. Apparently the name of a castle; though I wasn't aware of that until much later. It doesn't feel at all like a "cataclysm" to me; I see no great rift--only a world filled with silent liars, people who collectively believe themselves to have stolen something--something gigantic--at least that's the best interpretation of the throws and impetus behind the thing that I and mythology together call Jormungandr. With an eye for "mythological connections" you could clearly see that name of the Great Serpent of Revelation connects to something like the Unseelie; the faeries of Gaelic lore. To me though this world seems still somewhat fluid, it's my entire life--moving from Plantation to a place where the whole of it might be Bethlehem and to "clear my throat" it's not hard to see here how that land of "coughs" connects to the Biblical land of Nod and to the "Adamically sieved" Snifleheim ... from just a little twist on the ancient Norse land most probably as close to Hel as anyone ever gets--or so I dream and hope---still today. It all looks so real and so fake at the same time; planned for thousands of generations, the culmination of some grand masterpiece story that certainly ties history and myth and reality into a twisted heap of "one big nothing, one big nothing at all." I've tried to convey to the world how important I believe this place and this time to be--not by some choice of my own ... but through an understanding of the import of our history and the impact of having it be so obviously tuned and geared towards this specific time ... many thousands of years literally all focused on a single moment, on one day or one hour or even just a few years where all of that gets thrown down on the table as if some trump card has been played--and whether or not you fathom the same magnanimous statement or situation or position ... to me, I think it depends on whether or not you grew up in the same kind of way, believing our history to be so fixed and so difficult to change. I don't particularly feel like that's the "zeitgeist" of today; I feel like the children believe it to be some kind of game, and that it is such as easy thing to "sed" away or switch and turn into something else--another story, another purpose ... anyone's personal fantasy land come true. I don't think that's the case at all, it's clearly a personal nightmare; and it's clearly one we've seen time and time again--though not myself--the Jesus Christ that is the same yesterday, today; and once again perhaps echoing "no tomorrow" never remembers or believes that we've "seen it all before" or that we've ever really gotten the point; the thing you present to me as "factual reality" is a sickness, it disgusts me; and I'd do anything to go back to the world "where I was so young, and so innocent" and so filled with starry-eyed hope that we were at the foot of something grand and amazing that would become an empire turned republic of the heavens; filling the stars ... with the kind of love for kindness and fairness that I once associated very strongly with the thing I still believe to be the American Spirit. "Suddenly it changes, violently it changes" ... another song echoes through the ages--like the "words of the prophets dancing ((as light)) through the air" ... and I no longer even have a glimmer of hope that the thing I called the American People still exist; I feel we've been replaced by some broken container of minds, that the sky itself has become corrupt to the point that there's no hope of turning around this thing that I once believed with all my heart and all my mind was so obviously a "designed downward spiral" one that was---again--so obviously something of a joke, intended to be easy to bounce off a false bottom and springboard beyond "escape velocity" and beyond the dark waters of "nearest habitable star systems (being so very far away)" into a place where new words and new ideas would "soar" and "take flight." Here though; I am filled with a kind of lonely sadness ... staring at what appears to be the same mistake(s) happening over and over again; something I've come to call "skipping stones in the pond of reality" and really do liken it to this thing that appears to be the new meaning of "days" and ... a civilization that spends absolutely no love or lust to enter a once sacred and holy place and tarnish it with their sick beliefs and their disgusting desires. You all ... you appear to be some kind of springboard to "bunt" forth yet another age or era of nothingness into the space between this planet and "none worth reaching" and thank God, out of grasp. Today, I'd condemn the entirety of this world simply for it's lack of "oathkeepers" and understanding of what the once hallowed words of Hippocrates meant to ... to the people charged and dharmically required to heal rather than harm. It appears the place and time that was once ... at least destined to be the beginning of Heaven ... has become a "recurring stump" of some future unplanned and tarnished by many previous failed efforts and attempts to overcome this same "lack of conversation or care" for what it meant to be "humane" in a world where that was clearly set high aloft and above "humanity" in the place where they--where we were the best nature had to offer, the sanest, the kindest; the shining last best hope. Today I write almost every day ... secretly thanking "my God" for the disappearance of my tears and the still small but bright hope that "Tearran" will one day connect the Boston Tea Party and the idea that "render to Caesar" and Robin of Loxley ... all have something to do with a re-ordering of society and the worth and import of "money" ... to a place that cares more for freedom from murder than it does ... "freedom from having to allow others to hear me speak." I hold back tears and emotions; not by conscious choice or ability but ... still with that strange kind of lucky awkward smile; and secretly not so far below the surface it's the hope of "a swift death" that ... that really scares me more than the automatons and mechanical responses I see in the faces of many drivers as they pass me on the street--the imagery of connecting it to the serpentine monster of the movie Beetlejuice ... something I just "assume" the world understands and ... doesn't seem to fear (either); as if Churchill had gotten it all wrong and backwards--the only thing you have to fear, is the loss of fear of "loss." Here my crossroads---halfway between the city my son lives in and the city my parents live in--it's on making a decision on whether I should continue at all, or personally work on some kind of software project I've been writing about, or whether I should focus on writing about a "revolution" in government and society that clearly is ... "somewhat underway." In my mind it's obvious these things are all connected; that the software and the governance and the care of whether or not "Babylon" is remembered as a city of great laws and great change or a city of demons and depravity ... that these thi]ngs all hinge and congeal around a change in your hearts; hoping you will chose to be the beginning of a renaissance of "society and civilization" rather than the kings and queens of a sick virtual anarchy ... believing yourselves to have stolen "a throne of God" rather than to literally be the devastating and demoralizing depreciation of "lords and fiefdoms" to something more closely resembled by the time of the Four Horsemen depicted in Highlander. These words intended to be a "forward" to yet another compliment of a ((nother installment of a partial)) chain of emails; whimsically once half-joking ... I called it the Great Chain of Revelation. The software too; part of the great chain, this "idea" that the blockchain revolution will eventually create a distributed and equal governance structure, and a rekindling of monetary value focused on "free and open collaboration" rather than "survival of the most unfit"--something society and civilization seem to have turned the "call of life" from and to ... literally just in the last few years as we were so very close to ... reaching beyond the Heaven(s). I don't think its hard to imagine how a "new set of ground rules" could significantly change the "face of a place" -- make it something shiny and new or even on the other side of the coin, decayed or depraved. It's not hard to connect the kind of change I'm hoping for with "collision protection" and "automatic laws" to the (perhaps new, perhaps ... ancient) Norse creation story of the brothers of Odin: Vili and Ve. It might be hard to see today how a new "kind of spiritual interaction" might be only a few "mouse clicks" away though--how it could change everything literally in a flash of overnight sensation ... or how it might take something like a literal flash of stardom (or ... on the other hand, something like totalitarian or authoritarian "iron fisting") to make a change like this "ubiquitious" or ... something like the (imagined in my mind as ... messianic) "ED" of storming through the cosmos or the heavens and turning something that might appear to be "free and perfect feeling" today into a universe "civlized overnight" and then ... I wonder how long it would take to laud a change like that; for it to be something of a voluntary "reunderstanding" of a process ... to change the meaning of every word or every thought that connects to the process of "civilization" to recognize that something so great and so powerful has happened as to literally change the meaning of the word, to turn a process of civilization into something that had a ... "signta-lamcla☮" of forboding and then a magical staff struck into the heart of a sea and then ... and then the word itself literally changes to introduce a new "mid term" or "halfway point" in which a great singularity or enlightenment or change in perspective or understanding sort of acknowledges ... that some "clear outside" force not only intervened on the behalf of the future and the people of our world but that it was uniquely involved in the whole of-- "waking up" tio a nu def of #Neopoliteran. ^Like the previous notation; the below text comes from an email previously sent; and while i stand behind things like my sanity, my words; and my continued and faithful attempt to speak and convey both a useful and helpful truth to the world---sometimes just a single day can make all the difference in the world. Sometimes it's just a single moment; a flash or a comment about ^th@ blink of an eye" ... and I've literally just "thought up/had/experienced/transitioned thru" that exact moment. The lies standing between "communication" and either "cooperation" or .... some other kind of action have become more defined. More obvious. Because of this clarification; like a kind of "ins^tant* gnosis" ... search high and lo ... the depths all the way to above the heavens ... for a festive divorce ceremonial ritual ... that looks something like a bachelor party ':;] — @amrs@koyu.SPACe ... @suzq@rettiwtkcuf.social (@yitsheyzeus) May 22, 2020 I ... TERON; Gjall are painting me into a corner here; and I don't see around it anymore--I don't see the light, and I don't see the point. I was a happy-go-lucky little kid in my mind; that's not "what I wanted to be" or what I wanted to present, it's who I was. I saw "Ashkenazi" and ... know I am one of those ... and I kind of understood that something horrible might have happened, or might happen here--and I kind of understand that crying smashing feeling of "to ash" that echoes through the ages in the potpourri songs about pockets full of Parker Posey .. and ancient Psalms about "from the ashes of Edom" we have come--and from that you can see the cyclical sickness of this ... place so sure it's "East of Eden" and yet gung-ho on barrelling down the same old path towards ash and towards Edom and towards ... more of Dave's "ashes to ashes dust to dust" and his "smoke clouds roll and symphony of death..." and few words of solace in a song called Recently that I imagine was fleeting and has recently come and gone--people stare, I can't ignore the sick I see. I can't ignore his "... and tomorrow back to being friends" and all but wonder who among us doesn't realize it's "ash" and "gone" and "no memory of today" that's the night between now and ... a "tomorrow with friends" not just for me--but for all of you--for this place that snickers and pantomimes some kind of ... anything but "I'm not done yet" and "there's more ... vendetta ... and retribution to be had, Adam ... please come back in a few more of our faux-days." This is sickness; and happy-go-lucky Himodaveroshalayim really doesn't do much but complain about that word, the "sickle" and the tragic unavoidable ... ash of it all ... these days--you'd think we could "pull out" of this mess, turn another way; smile another day, but it seems there's only one way to get to that avenu in the mind of ... "he who must not know or be me." I have to admit I found some joy in the epiphany that the hidden city of Zion and it's fusion with the Namayim' version of how that "Ha" gels and jives with the name Abraham and the Manna from Heaven and the bath salt and the tina and the "am in e" of amphetamine--maybe a glimmer or a shimmer or a glow of hope at the moment "Nazion" clicked ... and I said ... "no, not me ... I'm nothing like a king, no dreams of authoritarianism at all in the heart of Kish@r;" even as I wrote words that in the spirit of the moment were something of a "tis of a'we" that connected to my country and the first sing-songy "tisME" that I linked to trying to talk in the rhyming spirit of some "first Christ" that probably just like me was one limmerick away from the end of the rainbow and one "Four Non Blondes" song away from tying "or whatever that means" and this land crowned with "brotherhood" (to some personal "of the Bell, and of the bell towers so tall and Crestian") to just one Hopp skip and jump away from the heart of the obvious echoes of a bridge between haiku and Heroku... a few more gears shift into place, a click and and a mechanical turn of the face of the clock's ku-ku striking ... it was the word "Earthene" that was the last "Jesusism" around the post Cimmerian time linking Dionysus and Seuss to that same "su-s" that's belonging to a moment in the city of Uranus--codified and etched in stone as "MCO"--not just for its saucer and warp nacelles and "deflector dish" but for it's underground caverns and it's above ground "Space Mountain" and that great golf ball in the heart of it all. The gears of time and the dawns of civilizequey.org query the missing "here" in our true understanding of what "in the beginning, to hear; to here ... to rue the loss of the Maize from Monoceros to the VEGA system and the tri-galactic origin of ... "some imaginary universal ... Earthene pax" to have dropped the ball and lost it all somewhere between "Avenu Malkaynu" and melaleuca trees--or Yggrasil and Snifleheim--or simply to miss the point and "rue brickell" because of bricks rather than having any kind of love or nostalgia linking to a once cobblestone roadway to the city in the Emerald skies paved in golden "do not return" signs ... to have lost Avenues well after not realizing it was "Heaven'es that were long gone far before I stepped foot on this road once called too Holy for sandals" in a place where that Promised Land and this place of "K'nanites" just loses it's grip on reality when it comes to mentioning the possibility that the original source and story of Ca'anan was literally designed to rid the world of ... "bad nanites" and the mentality of ... vindictiveness that I see behind every smirk. The final hundred nanoseconds on our clock towards doom and gloom cause another bird to fly; another snake to curl up and listen again to the songs designed to charm it into oblivion; whether that's about a club in South Beach or a place not so far from our new "here..." all remains to be seen in my innocent eyes wondering what it truly is that stands between what you are ... and finding "forgiveness not needed--innocent child writes to the mass" ... and the long arm of the minute hand and the short finger of the hour for one brief moment reconcile and move towards "midnight" together; and it's simply idyllic, the Nazarene corner between nil and null you've relegated the history of Terran poast futures into ... "foreves mas" or so they (or you) think. I'm still so far from "Five Finger Death Punch" though; and so far from Rammstein and so far from any kind of sick events that could stand between me and "the eternal" and change my still "casual alternative rock" loving heart to something more death metal; I rue whatever lies between me and there being any kind of Heaven that thinks there could exist a "righteous side" of Hell and it... simultaneously. I still see light here in admonishing the masses and the angels standing against the story and the message God brings us in our history. I still see sparks in siding with the "causticness" of "no holodecks in sight" and the hunger and the pain of simulating ... "the hells of reality" over the story of decades or centuries of silence refusing to see "holography" and "simulated" in the word Holocaust and the horrors of this place that simply doesn't seem to fathom or understand the moments of hunger pangs and the fear of "dark Earth pits" or towers of "it's not Nintendo-DS" linking the Man in the High Castle to an Iron Mask. I rally against being what I clearly am raised high on some pedestal by some force beyond my comprehension and probably beyond that of the "perfect storm in time" that refuses to itself acknowledge what it means to gaze at such an unfathomable loss of innocence at the cost of a "happy and serene future" or even at the glimmer of the Never-Never-Land I'd hoped we would all cherish and love and share ... the games and the newfound freedom that comes not just from "seeing Holodeck" turn into "no bullets" and "no cages" but into a world that grows and flourishes into something that's so far beyond my capability to understand that I'm stuck here; dumbfounded; staring at you refusing to stop car accidents and school shootings ... because "pedestal." For the "fire and the glory" of some night you refuse to see is this one--this place where morality rekindles from ... from what appears tobe one small candle, but truly--if it's not in your heart, and it's not coming from some great force of goodness--fear today and a world of "forever what else may come." Here in a place the Bible calls Penuel at the crossing of a River Jordan ... the Angel of the Lord notes the parallels in time and space between the Potomac and the Rhine--stories of superposition and cities and nation-states that are nothing more than a history of a history of things like the Monoceros "arroz" linking not just to the constellation Orion but to Sagittarius and to Cupid and of course to the Hunter you know so well-- Searching for a Saturday; a sabbath to be made Holy once more ... "at the Rubycon" The Einstein-Rosen Wormhole and the Marshall-Bush-JFKjr Tunnel The waters are called narah, (for) the waters are, indeed, the offspring of Nara; as they were his first residence (ayana), he thence is named Narayana. — Chapter 1, Verse 10[3] In a semi-fit of shameless arexua-self recognition i'm going to mention Amazon's new series "Upload" and connect it to the PKD work that my Martian-in-simulcrum-ciricculum-vitae on "colonization education" ... tying together Transcendance, Total Recall and ... well; to be honest it actually gave me another "uptick" in the upbeat ... maybe i'll stick around until I'm sure there's at least one more copy of me in the ivrtual-invverse ... oh, that reminds me ... Farmer)'s Lord of Opium also touches on this same "mind of God in the computer" subject (which of course leads to Ghost in the Shell and Lucy--thanks Scarlette :). While I'm listing Matrix-intersected pieces of the puzzle to No Jack City, Elon Musk's neuralace and Anderson's Feed are also worth a mention. Also the first link in this paragraph is titled ... "the city of the name of time never spoken after time woke up and stfu'd" (which of course is the primary subject of this ... update to the city Aerosol). The ... "actual original typed dream" included a sort of "roller coaster ride" through space all the way to Mars; where the real purpose of "the thing" I am calling the "Mars Hall" was to display previous victories and failures ... and the introduction of "older or future" culture's suggestions for "the right way" to colonize a new habitat. If it were Epcot Center, this would be something like SpaceMountain taking you to to the foture of "Epcot Countries" as if moving from "countries" to planets were as easy as simply ... "reading backwards." THE SOFTWARE, SINGERS, AND SHIELD(S) OF HEIROSOLYMITHONEYY Thinking just a little bit ahead of myself, but I'm on "Unreal Object/Map Editor within the VR Server" and calling it something like "faux-wet-ware" ... which then of course leads to a similar onomonopeia of "weapons and ..." where-with-all to find a better singer's name to connect the road of "sword" to a Wo'riordan ... but I think that fusion of warrior and woman probably does actually say ... enough of it all; on this road to the living Bright Water that the diety in my son's middle name defines well here, as "waking up," stretching it's tributaries and it's winding wonders and wistfully .... Narayana (Sanskrit: नारायण, IAST: Nārāyaṇa) is known as one who is in yogic slumber on the celestial waters, referring to Lord Maha Vishnu. He is also known as the "Purusha" and is considered the Supreme being in Vaishnavism. andromedic; the ports of call ... to the mediterranean (literally) from the gulf coast; ... ho engages in the creation of 14 worlds within the universe as Brahma when he deliberately accepts rajas guna, himself sustains, maintains and preserves the universe as Vishnu by accepting sattva guna. Narayana himself annihilates the universe at the end of maha-kalp ... . there's no place like home. there's no place like home. there's no place like home. and so it begins ... "f: r e l i g i o n find out what it means to me. faucet, ever single one, stream of purity ... from Fort Myers ... f ... flicks ... Flint. " ^this notation will from this email forward in linear time denote some form of contact method or information related to the context of the message you are reading. This particular one sends me an encrypted email. 5if there is an "@" symbol involved in the "anchor's hypertext reference" (technically an "a href=" in HTML4) your browser should attempt to open an email client to send a message over an anonymous SMTP relay. Understand that "anonymous" in this case may or may not mean your sending email address is hidden or obvuscated--so if you want to receive a reply you must include it in the DATA of your SMTP transmission defined by the RFC5321 attached. In most cases "anonymous" also means that you will not have the recipients direct contact information unless they have made it public---additionally the exact server/system/relay used may or may not be the "Sbroken Berkman Perl Script" linked to in the "hypertext reference" specifically anchored to the words "an anonymous SMTP relay" above. A simple "hat character" (^) and the letter "t" as you see beginning the above paragraph will denote a contact method or form that works over the internet using an HTTP protocol defined in a series of RFC's including (but not limited to) RFC's numbered as 2616, 7230, 7235, 2068 and use a simple language which is based on a definition suggested or proposed currently by an organization called the "W3C Consortium" ---and ... previously set and defined by an organiza^tion located at html.spec.whatwg.org; which appears (to me, for the first time as I write these words) to follow the conceptual spirit of the "living document" defined by the several "Continental Congresses, et alia." I personally now conjoin this document in my head to a procession of patrilineal or matrilnear predecessors to the actual event .... still to be defined ... but related to this specific email, this mailing list; its contributors and readers as well as actual members of the organization (still to be created, defined, or named) that creates a "round table*" of members that is open to the public, to all voters educated enough to understand the specific issue being voted on (up to a standard that; in this place and time appears to be unset and unmet but materially related to reawching the age of 18 years old; growing up in or being born in the United States of America (related spec.* to the Constitution of the United States of America which is officially "self-defined" through a process which includes all three branches of the government which it also "self-defines" and purports to be "of, for, and by the people"--though the general population is only able to contribute through an indirect process (read:the people cannot directly contribute to the constitution without either running for office (like a senator) or being appointed to a specific government position (like a judge or executive branch public servant). The current state of American representative democracy is the highest standard to which I am currently knowledgable of "extant*"--and it is specifically substandard, inferior, and "just not good enough" as a comparison to the process required to vote in the organization being "self-defined" through this process. It is my sincere and clear hope that "this process" will result in a legal and moral amendment to the document shown in the previous link and presented by the Legislative Branch of the United States here. It is my current and faithful belief that anything else would also be significantly below the standards morally required by "this process" which of course includes over 200 years of American citizenship and (other international relations; i.e., e.g, for "iv" example, id est, exemplia gratia) as well as the Sons of Liberty and prior to that contributions from the Crown and the "Parliament and Crown" of the United Kingdom; among others et alea's ifndef: 'swikipedia/et_al.. To note specifically because of lack of personal knowledge and public notoriety (assuming all other requiremnant* achem requirements) alas, babylon. i listened to a man yesterday who was talking about "true heroes" ... he of course noted jesus christ and superman together, suggesting the first was one, and the second just a fiction. he also talked about people like ghandi and "leaders who use non-violent means to "change the world." i at least agree with him on the third, ghandi is a good prototype for some kind of hero. staring at this ... "to be completed" work on tales of two cities, whether from sodom and gomorrah all the way to athens and sparta and perhaps even london and paris--and this particular city, babylon; it stands out as one which truly has no equal or even "mirror" in the history of the world. i suppose i'd add "alexandria" and suggest the library and the laws; something that are fundamental to the ethos of the planet i call "athens." i imagine he did not know "hammurabi's" name; and even today in this place where i ask and do not receive answers; i imagine you still don't connect muhammad or amsterdam ... to this king who in our history is set apart and lifted high on a pedestal of having "codified and written down" laws ... for the very first time. it's almost comical, it took me a paragraph and a sentence to connect "the king and i" to this mirror world, where the bible and the people have most assuredly decided "babylon" is a negative thing or a depraved place. "fallen, fallen, is [the city of] babylon the great" ... just a quote from one of my favorite movies; which of course is re-quoting "dante" and/or "the bible" "a dwelling place [of] (the) demons (say), it has become." www.icann.org/news/blog/the-problem-with-the-seven-keys kauri on IPFS: has-abaslom-and-the-ethos-of-arcadia

      12:3 Those who are wi se[a] will shine like the brightness of the heavens, and those who lead many to righteousness, like the stars for ever and ever.

      you are offline

      we the people rise again

      safe souls, safe fu


      We the People of Slate ...

      The U.S. Constitution, as you [mighta been, shoulda "come" on ... its somedayrewrϕte it.

      "Politicians talk about the Constitution as if it were as sacrosanct as the Ten Commandments [interjection: spec. it is actually almost exactly related!]. But the document itself invites change and revision. What if the president served only one six-year term instead two four-year terms? What if your state's population determined how many senators represent it? What if the Constitution included a right to health care? We asked legal scholars and Slate readers to cross out what they didn't like in the Constitution and pencil in their hearts' desires. Here's what the document would look like with their best ideas."

      多也了了夕 "with a ~~wand~~ of scheffilara, 并#亦太 he begins ... "I am now on the Staff of Menelaus, the Spears of Longinus and Lancelot; and the name "Mosche ex Nashon."

      Logically the recent mentions of Gilgamesh and the simultaneous 同時 overlaping 場道 of the eventual link between the famous ruling of Solomon on the separation of babies and mothers and waters and land ... to a story of many "two cities" that culminates in a cultural or societal or "evolutionary" link to Sodom and Gomorrah and the city-state of Babylon (and it's Hanging Gardens) and also of course to Paris and Troy and "Masstodon" and city-states [ciudadestado] and perhaps planet-cities; from Cambridge to Cambridge across the "Cable" to see state to "London" ... recently I called it "the city of realms" ... I started out logically intending to link "game theory" and John Nash to the mathematical story of Sputnik and a revival of American physics; but in my usual way of rambling into the woods [I mean neighborhood] of stream of consciousness ... turned into a premonitory discourse of "two cities" and how sometimes even things as obvious as the number of letters in the word "two" don't do a good enough job of conveying ... how and/or why one is simply never enough, and two isn't much better--but in the end a circle ... is drawn; the perfect circle in our imaginary mathematical perfection ... I see a parted "line" in the letter pronounced "tea" (and beginning that word); and two "vee" (pron. of "v") symbols joined together in a word we pronounce as "double-you" ... and symbolically because I know "V" is the Roman Numeral for 5 (five) and I know not how to multiply in Roman numerals--

      It's important to pause; here. I am going to write a more detailed piece on "the two cities" as I work through this maze like crossroads between "them" and "demo..." ... here demorigstrably I am trying to fuse together an evolutionary change in ... lit. biological evolution as well as an echelon leap forward in "self-government" ... in a place where these two things are unfathomable and unspokenly* connected.

      To a question on the idiom; is Bablyon about "the law" or "of the land of Nod?"

      "What is democracy" ... the song, Metallica's "ONE" echoes and repeats; as we apparently scrive together the word "THEM" ... I question myself ... if Babylon were the capital city of some mythical Nation of Time ... if it were the central "turning point" of Sheol; ... >|<

      Can you not see that in this place; in a world that should see and does there is a gigantic message proving that we are not in reality and trying to show us how and why that's the best news since ... ever---that it's as simple as conjoining "the law of the land" with a basic set of rules that automatically turn Hell into something so much closer to Heaven I just do not understand---why we cant stand up together and say "bullets will not kill innocent children" and "snowflakes will not start avalanches ...." that cover or bury or hide the road from Earth to Verital)e .... or from the mythical Valis to Tanis---or from Rigel to Beth-El ... "guess?"

      ## as "an easy" answer; I'm looking for a fusion of "law and land" that somehow remembers a "jok'er a scene" about "lawn" seats; and "where the girls are green;"

      It's as simple as night and day; Heaven and Hell ... the difference between survival and--what we are presented with here; it's "doing this right"--that ends the Hell of representative democracy and electoral college--the blindness and darkness of not seeing "EXTINCTION LEVEL EVENT" encoded in these words and in our governments foundation ... *by the framers [not just of the USA; but English .. and every language] *

      ... is literally just as simple as "not caring" or thinking we are at the beginning of some long process--or thinking it will never be done--that special "IT" that's the emancipation of you and I.

      Here words like "gnosis" and "gaudeamus" pair with my/ur "new ntersanding*" of the difference between Asgard and Medgard and really understanding our purpose here is to end "evil" ... things like "simulating disease and pain" (here, simulating meaning ... intentionally causing, rather than "gamifying away") and successfully linking the "Pillars of Hercules" to Plato's vision of Atlantis and the letter sequences "an" and "as" ... unlock a fusion of religion and mythology and "cryptographic truth" that connects "messianic" and "Christian" to "Roman" ... "Chinese" and "American" ... literally the key to the difference between the phrases "we are" and "we were" ....

      in "sight" of "silicon" in simulation and Israel, Genesis, and "silence" ... trying to the raising of Asgardian enlightenment ... and seeing "simple cypher" connecting to "Norse" ...

      and the "I AM THAT" surer than shit ... the intention and design of all religion and creation is to end "simulated reality" and also not seeing "SR" ... in Israel and Norse ... "for instance."

      It's a simple linguistic concept; the "singularity" and the "plurality" of a simple word--"to be"--but it goes to the heart of everything that we are and everything that is around us. This is a message about understanding and preserving individuality as well as liberty; and literally seeing "ARXIV" and understanding "often" and failing to connect God and prescience to "IV" and the Fourth Amendment ... it's about blindness and ... "curing the blind instantly" ... and fathoming how and why this message has been etched into our entire history and and all religions and myths and music--to help us "to be THAT we" that actually "are responsible" for the end of Hell.

      • I neglected to mention "Har-Wer" and "Tower of Babel" which are both related lingusitically, religiously and topically: "to who ..." and while we're on "four score and [seven years from now]" seeing the fourth "living thing" in Eden and it's (the name, Abel) connection to Babel and Abraham Lincoln; slavery and ... understanding we live in a place where the history of the United States also, like Monoceros and "Neil Armstrong's first step" are a time shifted ... overlayed map to achieving freedom ... it's about becoming a father-race ... and actually "doing" the technological steps required to "emancipate the e's of 'me&e'" and survive in exo-planetary space---

      it might be as simple as adding "because we did this" here and now; and having it be something we are truly proud of .... forevermore™ ... for certain in the heart of this story about cyclicality and repetition of error--its not because we did "this" or something over and over again; it's about changing "the problem" and then helping others to also overcome ... "things like time travel ... erasing speech" --- however that happenecl.

      • I also failed to mention that "I am in Hell" ... as in this world is hellacious to me; in an overlay with the Hellenic period and this message that we are in the Trojan Horse ... a small gem .... "planet" truly is the Ark of the Covenant---and it's the simple understanding that "reality is hell" is to "living without air conditioning and plumbing is hell" just as soon as you achieve ... "rediscovering" those things---

      • I can't figure out why I am the only person screaming "this is Hell." That's also, Hell.

      ... but recently suggested an old joke about "there being 10 kinds of people in the world (obv an anti-tautology and a tautology simultaneously)" only after that brief bit of singularity and duality mentioning the rest of the joke: "those that understand binary and those that don't know how to base convert between counting with two hands and counting with only an 'on and off.'" It's not obvious if you aren't trying to figure it out, I suppose; but 10 is decimal notation for "kiss" and the "often" without "of" ... and binary notation for the decimal equivalent of "2." A long long time ago in a state that simply non-randomly ties to the heart of the name of our galaxy ... I was again thinking of the "perfect imperfections" of things like saying "three equals one equals one" (which, of course was related to the Holy Trinity and it's "prescient/anachronistic Adamic presence encoded in the name Ab|ra|ha|m" which means "father of a great multitude") ... I brought that one back in the last few months; connecting the letter K and in this "logos-rythmic" tie to the "base of a number system" embellish the truth just a bit and suggest a more accurate rendition of the original [there is no such thing as equality, "is" of separate objects--as in no two snowflakes are the same unless they are literally the same one; true of ancient weights and with the advent of (thinking about) time no two "planets" are the same even if they're the exact same one--unless it's at a fixed moment in time.

      K=3:11 ... to a handle on the music, the DHD of the gate and the *ring of David's "sling" ...

      ---and that's a relationship of "3 is to 11" as [the SAT style "analogy)]y" as a series of alpha, two mathematic, and two numeric symbols ... may only tie in my mind alone to the books of Genesis and Matthew and the phrase "chapter and verse" and to the stories of Lot and Job ... again in Genesis and the eponymous "Book of Job." So ... "tying up loose ends one 10b [III] iv. " as it appears I've taken it upon myself to call a Job and suggest is my "Lot in life [x]i* [3]"

      • I worry sometimes that important things are missing, or will disappear---for instance Mirriam Webster, which is a "canonical/standard dictionary) should probably have an entry for "lot in life" non-idiomatically as "granny apples to sour apples" as

      2 MANY ALSO ICI; 1two ... following in Mitnick's bold introductory word steps; the curve and the complement ... the missiles and the canoes; the line and the blank space ... "supposedly two examples of two kinds, which could be three not nothings ... Today I write about something monumental; as if as important as the singularity depicted in Arthur C. Clarke's 2001 "A Space Odyssey" ... and remember a day when I thought it very novel and interesting to see the words "stillborn and yet still born" connected in a single piece of writing to "Stillwater and yet still water" ... today adding in another phrase noting the change wrought only by one magical single "space" (also a single capital letter; and a third phrase): "block chains with a great blockchain."

      • https://en.wikipedia.org/wiki/EuripidesIphigenia in Aulis or Iphigenia at Aulis[1] (Ancient Greek: Ἰφιγένεια ἐν Αὐλίδι, Iphigeneia en Aulidi; variously translated, including the Latin Iphigenia in Aulide) is the last of the extant works by the playwright Euripides. Written between 408, after Orestes, and 406 BC, the year of Euripides' death, the play was first produced the following year[2] in a trilogy with The Bacchae and Alcmaeon in Corinth by his son or nephew, Euripides the Younger,[3] and won first place at the City Dionysia in Athens.

      • The play revolves around Agamemnon, the leader of the Greek coalition before and during the Trojan War, and his decision to sacrifice his daughter, Iphigenia, to appease the goddess Artemis and allow his troops to set sail to preserve their honour in battle against Troy. The conflict between Agamemnon and Achilles over the fate of the young woman presages a similar conflict between the two at the beginning of the Iliad. In his depiction of the experiences of the main characters, Euripides frequently uses tragic irony for dramatic effect.

      J.K. Rowling spurred just this past week a series of explanations about just exactly what is a blockchain coin worth ... and why is it so; her final words on the subject (artistic liberty taken, obviously not the last she'll say of this magic moment) "I don't think I trust this."

      Taken directly from an off the cuff email to ARXM titled: "Slow the S is ... our Hypothes.is"

      I imagine I'll be adding some wiki/ipfs stuff to it--and try to keep it compatible; the design and layout is almost exactly what I was dreaming about seeing--as a "first rough draft product." Lo, and behold. It's been added to the many places I host my tome; the small compilation of nearly every important email that has gone out ... all the way back to the days of the strange looking Margarita glass ... that now very much resembles the "Cantonese character 'le'" which I've come to associate with a "handle" on multiple corners of a room--something like an automatic coat rack conveyor belt connecting different versions of "what's in the box." I'm planning on using that symbol 了 to denote something like multiple forks of the same page. Obviously I'm thinking forward to things like "the Transhumaist Chain Party" (BDSM, right?)'s version of some particular piece of legislation, let's say everything starts with the sprawling "bulbing" of "Amendment M" ideas and specific verbiage ... and then we'll of course need some kind of new git/subversion/cvs style version control mechanism to merge intelligently into something that might actually .... really should ... make it into that place in history--the first constitutional amendment ratified by a "Continental Congress of All People" ... but you could also see it as an ongoing sort of forking of something like the "wikipedia page" on what some specific term, say "technocracy" means, and how two parties might propagandize and change the meaning of such thing; to suit the more intelligent and wise times we now live in. For instance, we might once have had a "democracy" and a "democractic" party that had some Anarchist Cook Book version of the history of it ending in something like Snipes and Stallone's "DEMOLITION MAN."

      Just kidding, we all know "democracy" has everything to do with "d is cl ... and not th" ... to be the them that is the heart of the start of the first true democracy. At least the first one I've ever seen, in my old "to a republic" ... style. As it is you can play around with commenting and highlighting and annotating all the stuff I've written and begged and begged for comments on--while I work on layering the backend to to perma-store our ideas and comments on both a blockchain (probably a new one; now that i've worked a little with ethereum) with maybe some key-merkle-tree-walk-search stuff etched into the original Rinkeby ... and then of course distributed data in the "public owned and operated" IPFS. To be clear, I plan on rewriting the backend storage so that we will have a permanent record of all comments; all versions of whatever is being commented on; and changes/revisions to those documents--sort of turning the web into a massive instant "place of collaboration, discussion, and co-authoring" ... if you use the wonderful LEGO pieces that have been handed to us in ideas from places like me, lemma--dissenter, and of course hypothes.is who has brought you and i such a polished and nice to look at "first draft" of something like the living Constitution come repository of all human knowledge. I do sort of secretly wich they would have called this project something like "annotating and reflecting (or real or ...) knowledge" just so the movement could have been called ARK. ... or something .... but whatever join the "calling you a reporter" group or ... "supposedly a scientist?"

      NOIR INgR .. I CITE SITE OF ENUDRICAM; a rekindling of the dream of a city appearing high above in the sky, now with a boldly emblazened smiling rainbow and upsidown river ... specifically the antithesis of "angel falls," there's a lagoon too--actually a chain of several ponds underneith the floating rock ... and in some versions of this waking dream there are rings around the thing; you might imagine an artificial set of centripetal orbitals something like a fusion of the ring Eslyeum and the "Six-Axis ride" of the JKF Center's "Spacecamp." I write as I dream, and though I cannot for certain explain exactly how; it's become a strong part of my mythology that this spectacular rendition of "what ends the silence" has something to do with the magical delivery of "a book" ... something not of this Earth but an unnatural thing; one I've dreamt of creating many times. This book is something like the DSM-IV and something like a Merck diagnostic manual; but rather than the old antiquated cures of "the Norse Medgard" this spectacle nearly "itsimportant" autoprints itself and lands on something like every doorpost; what it is is a list of reasons why "simply curing all disease" with no explanation and no conversation would be a travesty of morality--how it would render us half-blind to the myriad of new solutions that can come from truly understanding why "ITIS" to me has become a kind of magical marker: an "it is special" as in, it's cure could possibly solve a number of other problems.

      Through that missing "o," English on the ball, we see a connection between a number of words that shine bright light including Exodus itself which means "let there be light," the word for Holy Fire and the Burning Bush.. .reversed to hSE'Ah, and a story about the Second Coming parting our holy waters.

      This answer connects the magical Rod's of Aaron in Exodus and the Iron Rod of Jesus Christ to the Sang Rael itself... in a fusion that explains how the Periodic Table element for Iron links not just to Total Recall and Mars, but also to this key

      my dream of what the first day of the Second Coming might be like; were the Rod of Christ... in the right hands. In a story that also spans the Bible, you might understand better how stone to bread and your input make all the difference in the world between Heaven and Adam's Hand. Once more, what do you think He ....

      Since the very earliest days of this story, I have asked for better for you, even than see

      Nearly all of the original parts of the original "post-origination dream" remain intact; there's a walkway that magically creates new paths and "attractions" based on where you walk, something like an inversion of the artificial intelligence term "a random walk down a binary tree" ... for instance going left might bring you to the Internet Cafetornaseum of the Earl of Sandwich; and going to the right might bring you to the ICIMAX/Auditorium of Science and Discovery--there's a walkway to "Magical GLAS D'elevators" that open a special "instantiation" of the Japan Room of the Potter and the Toolmaker ... complete with a special [second level and hidden staircase] Pool of Bethesdaibo verily delivering something like youth of mind and body ... or at least as close to such a thing as a sip of Holy Water or Ambrosia or a dip in the pool of Coccoon and Ponce De'Leon could instantly bring ... to those that have seen Jupiter Ascending ... the questions of "nature versus nurture" and what it means to be "old and wise" and "young at heart" truly mean---

      Somewhere between the outdoor rafting ride and the level with the special "ballroom of the ancient gallery" ... perhaps now being named or renamed or recalled as something about "Face [of] the Music" lies a magical "mini-maize" ... a look at a mock-up (or #isitit) of Merlink and Harthor's "round table" that displays a series of ... (at least to me) magical appearing holographic displays and controls that my dreams have stolen from Phillip K. Dick's Minority Report and something of what I hope Microsoft's Dynamics/Hololens/Surface will become---a series of short "focus groups" .... to guage and discuss the information in the "CITIES-D5AM-MERCK" ... how to end world hunger and nearly all disease with the press of a magical buzzer--castling churches to something like "political-party-town-hall-meeting centers" and replacing jails and prisons and hospitals with something like the "Hospitalier's PRIDE and DOJOY's I practiced "Kung-fun-dance" ... a fusion of something like a hotel and a school that probably looks very much like a university with classrooms and dorms and dining hall's all fit into a single building. I imagine a series of 2 or 3 "room changes" as in you walk from the one where you get the book and talk about it ... to the one where you talk about "what everyone else said about it" and maybe another one that actually connects you to other people with something like Facebook's Portal; the point of the whole thing to really quickly "rubber stamp" the need for an end to "bars in the sky" nonalcoholic connotation--as in "overcoming the phrase the sky is the limit" and showing us the need for a beacon of glowing hope fulfilled--probably actually the vision of a holographic marker turning into actual rings around the single moon of Earth, the focus of the song annoucing the dawn of the age of Aquarius---

      It might lead us also to Ceres; and another set of artificial rings, or to Monoceros and a rehystorical understanding of the birthplace and birthing of the "river roads" that bridge the "space gaps" in the galaxy from our "one giant leap for mankind" linking the Apollo moon landing to the mythological connection to the sun; and connecting how the astrological charts of the ancients might detail a special kind of overlapping--the link between Earth's SOL and something like Proxima or Alpha Centauri; and how that "monostar bridge" might overlap to Orion and from there through Sagitarius and the center of the Milky Way ... all the way to Andromeda and more dreams of being in a place where there's a map to a tri-galactic system in the constellation Cancer and a similar one in Leo ... and just incase you haven't noticed it--a special marker here, I thought to myself it might be cool to "make an acronymic tie to Monoceros" and without even thinking auto-wrote Orion (which was the obvious constellation next to Monoceros, in the charts) and then to Sagitarrius; which is the obvious ... heart of our astrological center and link to "other galaxies."

      ----I've dreamt or scriven or reguessed numerous times how the Milky Way's map to an "Atlas marked through time by the ages and the ancients" might tie this place and this actual map to the creation of the railways between stars to the beginning and the end of time and of course to this message that links it all to time travel. There's a few "guesses" I've contemplated; that perhaps the Milky Way chart is a metal-cosmic or microcosmic map to the dawn of time in the galactic vision of ... just after the big bang; or it might tie to a map of something like the unthinkable--a civilization that became so powerful it was able to reverse the entropy of "cosmic expansion" and reverse the thing Asimov wrote of in "The Last Question" as the end of life and the ability to survive basically due to "heat loss."

      "The Last Question." (And if you read two, why not "The Last Answer"?). Find these readings added to our collection, 1,000 Free Audio Books: Download Great Books for Free.

      Looking for free, professionally-read audio books from Audible.com, including ones written by Isaac Asimov?

      * all "asterisks" in the abovə document denote a sort of Adamic unspoken relationship between notations and meanings; here adding the "Latin word for three" and source of the phrase "t.i.d." (which is doctor/pharmacy latin for "three times a day") where the "t" there is an abbreviation of "ter" ... and suppose the link between K and 11 and 3 noting it's alphanumeric position in the English alphabet as the 11th letter and only linking cognitively to three via the conversion betweehex, and binarryy ... aberrative here is the overlapping "hakkasan" style (or ZHIV) lack of mention of the answer in "state of Kansas" and the "citystate of Slovakia" as described in the ICANN document linked [in] the related subsection or slice of the word "binarry" for the state of India. Tetris could be spelled with the addition of only a single letter [in] "tea"---the three letters "ris" are the hearts of the words "Christ" and "wrist" [and arguably of Osiris where you also see the round table character of the solar-system/sun glyph and the chemical element for The Fifth Element (as def. by i) via "Sinbad" and "Superman." The ERIS Free Network should also be mentioned here in connection with the IRC network I associate in the place between skipping stones and sacred hearts defined by "AOL" and "Kdice" in my life. In the lexicon of modern HTML, curly braces are generally relative to "classes" and "major object definitions (javascript/css)" while square brackets generally only take on computer-interpreted meaning in "Markdown" which is clearly (by definition, by this character set "[]") a superset (or at least definately not a subset) of HTML.

      Dr. Will Caster (Johnny Depp) is a scientist who researches the nature of sapience, including artificial intelligence. He and his team work to create a sentient computer; he predicts that such a computer will create a technological singularity, or in his words "Transcendence". His wife, Evelyn (played by Rebecca Hall), is also a scientist and helps him with his work.

      Following one of Will's presentations, an anti-technology terrorist group called "Revolutionary Independence From Technology" (R.I.F.T.) shoots Will with a polonium-laced bullet and carries out a series of synchronized attacks on A.I. laboratories across the country. Will is given no more than a month to live. In desperation, Evelyn comes up with a plan to upload Will's consciousness into the quantum computer that the project has developed. His best friend and fellow researcher, Max Waters (Paul Bettany), questions the wisdom of this choice, reasoning that the "uploaded"

      Just from my general understanding and memory "st" is not ... to me (specifically) an abbreviation of "state" but "ste" is a U.S. Postal code (also "as I understand it") for the name of a special room or set of rooms called a "suite" and in Adamic "connotation" I sometimes read it as "sweet" ... which has several meanings that range from "cool" to "a kind of taste sensation" to "easy to sway or fool."

      If you asked me though, for instance if "it" was an abbreviation or shorthand notation or acronym for either "a United state" or "saint" ... you'd be sure.

      While it's clear from studying linguistic cryptography ... (If I studied it a little here and some there, its also from the "universal translator of Star Trek") and the personal understanding that language is a kind of intelligent code, and "any code is crackable" ... that I caution here that "meaning" and "face value" often differ widely and wildly ... even in the same place or among the same group of people ... either varying over time or heritage.

      Menelaus, in Greek mythologyking of Sparta and younger son of Atreus, king of Mycenae; the abduction of his wife, Helen, led to the Trojan War. During the war Menelaus served under his elder brother Agamemnon, the commander in chief of the Greek forces. When Phrontis, one of his crewmen, was killed, Menelaus delayed his voyage until the man had been buried, thus giving evidence of his strength of character. After the fall of Troy, Menelaus recovered Helen and brought her home. Menelaus was a prominent figure in the Iliad and the Odyssey, where he was promised a place in Elysium after his death because he was married to a daughter of Zeus. The poet Stesichorus (flourished 6th century BCE) introduced a refinement to the story that was used by Euripides in his play Helen: it was a phantom that was taken to Troy, while the real Helen went to Egypt, from where she was rescued by Menelaus after he had been wrecked on his way home from Troy and the phantom Helen had disappeared.

      This article is about the ancient Greek city. For the town of ancient Crete, see Mycenae (Crete). For the hamlet in New York, see Mycenae, New York.

      Μυκῆναι, Μυκήνη

      Lions-Gate-Mycenae.jpg

      The Lion Gate at Mycenae, the only known monumental sculpture of Bronze Age Greece

      37°43′49"N 22°45′27"ECoordinates37°43′49"N 22°45′27"E

      This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.

      Mycenae (Ancient Greek: Μυκῆναι or Μυκήνη, Mykēnē) is an archaeological site near Mykines in Argolis, north-eastern PeloponneseGreece. It is located about 120 kilometres (75 miles) south-west of Athens; 11 kilometres (7 miles) north of Argos; and 48 kilometres (30 miles) south of Corinth. The site is 19 kilometres (12 miles) inland from the Saronic Gulf and built upon a hill rising 900 feet (274 metres) above sea level.[2]

      In the second millennium BC, Mycenae was one of the major centres of Greek civilization, a military stronghold which dominated much of southern Greece, Crete, the Cyclades and parts of southwest Anatolia. The period of Greek history from about 1600 BC to about 1100 BC is called Mycenaean in reference to Mycenae. At its peak in 1350 BC, the citadel and lower town had a population of 30,000 and an area of 32 hectares.[3]

      3. Chew 2000, p. 220; Chapman 2005, p. 94: "...Thebes at 50 hectares, Mycenae at 32 hectares..."

      Melpomene (/mɛlˈpɒmɪniː/Ancient GreekΜελπομένηromanizedMelpoménēlit. 'to sing' or 'the one that is melodious'), initially the Muse of Chorus, she then became the Muse of Tragedy, for which she is best known now.[1] Her name was derived from the Greek verb melpô or melpomai meaning "to celebrate with dance and song." She is often represented with a tragic mask and wearing the cothurnus, boots traditionally worn by tragic actors. Often, she also holds a knife or club in one hand and the tragic mask in the other.

      Melpomene is the daughter of Zeus and Mnemosyne. Her sisters include Calliope (muse of epic poetry), Clio (muse of history), Euterpe (muse of lyrical poetry), Terpsichore (muse of dancing), Erato (muse of erotic poetry), Thalia (muse of comedy), Polyhymnia (muse of hymns), and Urania (muse of astronomy). She is also the mother of several of the Sirens, the divine handmaidens of Kore (Persephone/Proserpina) who were cursed by her mother, Demeter/Ceres, when they were unable to prevent the kidnapping of Kore (Persephone/Proserpina) by Hades/Pluto.

      In Greek and Latin poetry since Horace (d. 8 BCE), it was commonly auspicious to invoke Melpomene.[2]

      See also [AREXMACHINA]

      Flagstaff (/ˈflæɡ.stæf/ FLAG-staf;[6] NavajoKinłání Dookʼoʼoosłííd Biyaagi, Navajo pronunciation: [kʰɪ̀nɬɑ́nɪ́ tòːkʼòʔòːsɬít pɪ̀jɑ̀ːkɪ̀]) is a city in, and the county seat of, Coconino County in northern Arizona, in the southwestern United States. In 2018, the city's estimated population was 73,964. Flagstaff's combined metropolitan area has an estimated population of 139,097.

      Flagstaff lies near the southwestern edge of the Colorado Plateau and within the San Francisco volcanic field, along the western side of the largest contiguous ponderosa pine forest in the continental United States. The city sits at around 7,000 feet (2,100 m) and is next to Mount Elden, just south of the San Francisco Peaks, the highest mountain range in the state of Arizona. Humphreys Peak, the highest point in Arizona at 12,633 feet (3,851 m), is about 10 miles (16 km) north of Flagstaff in Kachina Peaks Wilderness. The geology of the Flagstaff area includes exposed rock from the Mesozoic and Paleozoic eras, with Moenkopi Formation red sandstone having once been quarried in the city; many of the historic downtown buildings were constructed with it. The Rio de Flag river runs through the city.

      Originally settled by the pre-Columbian native Sinagua people, the area of Flagstaff has fertile land from volcanic ash after eruptions in the 11th century. It was first settled as the present-day city in 1876. Local businessmen lobbied for Route 66 to pass through the city, which it did, turning the local industry from lumber to tourism and developing downtown Flagstaff. In 1930, Pluto was discovered from Flagstaff. The city developed further through to the end of the 1960s, with various observatories also used to choose Moon landing sites for the Apollo missions. Through the 1970s and '80s, downtown fell into disrepair, but was revitalized with a major cultural heritage project in the 1990s.

      The city remains an important distribution hub for companies such as Nestlé Purina PetCare, and is home to the U.S. Naval Observatory Flagstaff Station, the United States Geological Survey Flagstaff Station, and Northern Arizona University. Flagstaff has a strong tourism sector, due to its proximity to Grand Canyon National ParkOak Creek Canyon, the Arizona SnowbowlMeteor Crater, and Historic Route 66.

      PSANSDISL #LWDISP either without gas or seeing cupidic arroz in "thank you" or "allta, wild" ...

      pps: a magnanimous decision ...

      I stand here on the brink of what appears to be total destruction; at least of everything I had hoped and dreamed for ... for the last decade in my life which appears literally to span thousands of years if not more in the eyes of some other beholder. I spent several months in Kentucky telling a story of a post apocalyptic and post-cataclysmic delusion; some world where I was walking around in a "fake plane" something like a holodeck built and constructed around me as I "took a walk around the world" to ... it did anything but ease my troubled mind.

      Recently a few weeks in Las Vegas, and a similar story; telling as I walked penniless down the streets filled with casino's and anachronistic taxi-cabs ... some kind of vision of the entirety of the heavens or the Earth or the "choir of angels" I think of when I echo the words Elohim and Aesir from mythology ... there with me in one small city in superposition; seeing what was a very well put together and interesting story about a "star port" Nirvane ... a place that could build cities into the face of mountains and half working monorails appearing in the sky---literally right before my eyes.

      I suppose this is the place "post cataclysm" though I still have trouble understanding what it is that's actually about ... in my mind it connects to the words "we are losing habeas" echo'ed from the streets of Los Angeles in a more clear and more military voice than usual--as I walked block by block trying to evade a series of events that would eventually somehow connect all the way to the "outskirts of Orlando, Florida" in a place called Alhambra.

      Apparently the name of a castle; though I wasn't aware of that until much later.

      It doesn't feel at all like a "cataclysm" to me; I see no great rift--only a world filled with silent liars, people who collectively believe themselves to have stolen something--something gigantic--at least that's the best interpretation of the throws and impetus behind the thing that I and mythology together call Jormungandr. With an eye for "mythological connections" you could clearly see that name of the Great Serpent of Revelation connects to something like the Unseelie; the faeries of Gaelic lore. To me though this world seems still somewhat fluid, it's my entire life--moving from Plantation to a place where the whole of it might be Bethlehem and to "clear my throat" it's not hard to see here how that land of "coughs" connects to the Biblical land of Nod and to the "Adamically sieved" Snifleheim ... from just a little twist on the ancient Norse land most probably as close to Hel as anyone ever gets--or so I dream and hope---still today. It all looks so real and so fake at the same time; planned for thousands of generations, the culmination of some grand masterpiece story that certainly ties history and myth and reality into a twisted heap of "one big nothing, one big nothing at all."

      I've tried to convey to the world how important I believe this place and this time to be--not by some choice of my own ... but through an understanding of the import of our history and the impact of having it be so obviously tuned and geared towards this specific time ... many thousands of years literally all focused on a single moment, on one day or one hour or even just a few years where all of that gets thrown down on the table as if some trump card has been played--and whether or not you fathom the same magnanimous statement or situation or position ... to me, I think it depends on whether or not you grew up in the same kind of way, believing our history to be so fixed and so difficult to change. I don't particularly feel like that's the "zeitgeist" of today; I feel like the children believe it to be some kind of game, and that it is such as easy thing to "sed" away or switch and turn into something else--another story, another purpose ... anyone's personal fantasy land come true.

      I don't think that's the case at all, it's clearly a personal nightmare; and it's clearly one we've seen time and time again--though not myself--the Jesus Christ that is the same yesterday, today; and once again perhaps echoing "no tomorrow" never remembers or believes that we've "seen it all before" or that we've ever really gotten the point; the thing you present to me as "factual reality" is a sickness, it disgusts me; and I'd do anything to go back to the world "where I was so young, and so innocent" and so filled with starry-eyed hope that we were at the foot of something grand and amazing that would become an empire turned republic of the heavens; filling the stars ... with the kind of love for kindness and fairness that I once associated very strongly with the thing I still believe to be the American Spirit.


      "Suddenly it changes, violently it changes" ... another song echoes through the ages--like the "words of the prophets dancing ((as light)) through the air" ... and I no longer even have a glimmer of hope that the thing I called the American People still exist; I feel we've been replaced by some broken container of minds, that the sky itself has become corrupt to the point that there's no hope of turning around this thing that I once believed with all my heart and all my mind was so obviously a "designed downward spiral" one that was---again--so obviously something of a joke, intended to be easy to bounce off a false bottom and springboard beyond "escape velocity" and beyond the dark waters of "nearest habitable star systems (being so very far away)" into a place where new words and new ideas would "soar" and "take flight."

      Here though; I am filled with a kind of lonely sadness ... staring at what appears to be the same mistake(s) happening over and over again; something I've come to call "skipping stones in the pond of reality" and really do liken it to this thing that appears to be the new meaning of "days" and ... a civilization that spends absolutely no love or lust to enter a once sacred and holy place and tarnish it with their sick beliefs and their disgusting desires. You all ... you appear to be some kind of springboard to "bunt" forth yet another age or era of nothingness into the space between this planet and "none worth reaching" and thank God, out of grasp. Today, I'd condemn the entirety of this world simply for it's lack of "oathkeepers" and understanding of what the once hallowed words of Hippocrates meant to ... to the people charged and dharmically required to heal rather than harm.

      It appears the place and time that was once ... at least destined to be the beginning of Heaven ... has become a "recurring stump" of some future unplanned and tarnished by many previous failed efforts and attempts to overcome this same "lack of conversation or care" for what it meant to be "humane" in a world where that was clearly set high aloft and above "humanity" in the place where they--where we were the best nature had to offer, the sanest, the kindest; the shining last best hope.


      Today I write almost every day ... secretly thanking "my God" for the disappearance of my tears and the still small but bright hope that "Tearran" will one day connect the Boston Tea Party and the idea that "render to Caesar" and Robin of Loxley ... all have something to do with a re-ordering of society and the worth and import of "money" ... to a place that cares more for freedom from murder than it does ... "freedom from having to allow others to hear me speak." I hold back tears and emotions; not by conscious choice or ability but ... still with that strange kind of lucky awkward smile; and secretly not so far below the surface it's the hope of "a swift death" that ... that really scares me more than the automatons and mechanical responses I see in the faces of many drivers as they pass me on the street--the imagery of connecting it to the serpentine monster of the movie Beetlejuice ... something I just "assume" the world understands and ... doesn't seem to fear (either); as if Churchill had gotten it all wrong and backwards--the only thing you have to fear, is the loss of fear of "loss."


      Here my crossroads---halfway between the city my son lives in and the city my parents live in--it's on making a decision on whether I should continue at all, or personally work on some kind of software project I've been writing about, or whether I should focus on writing about a "revolution" in government and society that clearly is ... "somewhat underway." In my mind it's obvious these things are all connected; that the software and the governance and the care of whether or not "Babylon" is remembered as a city of great laws and great change or a city of demons and depravity ... that these thi]ngs all hinge and congeal around a change in your hearts; hoping you will chose to be the beginning of a renaissance of "society and civilization" rather than the kings and queens of a sick virtual anarchy ... believing yourselves to have stolen "a throne of God" rather than to literally be the devastating and demoralizing depreciation of "lords and fiefdoms" to something more closely resembled by the time of the Four Horsemen depicted in Highlander.

      These words intended to be a "forward" to yet another compliment of a ((nother installment of a partial)) chain of emails; whimsically once half-joking ... I called it the Great Chain of Revelation. The software too; part of the great chain, this "idea" that the blockchain revolution will eventually create a distributed and equal governance structure, and a rekindling of monetary value focused on "free and open collaboration" rather than "survival of the most unfit"--something society and civilization seem to have turned the "call of life" from and to ... literally just in the last few years as we were so very close to ... reaching beyond the Heaven(s).

      I don't think its hard to imagine how a "new set of ground rules" could significantly change the "face of a place" -- make it something shiny and new or even on the other side of the coin, decayed or depraved. It's not hard to connect the kind of change I'm hoping for with "collision protection" and "automatic laws" to the (perhaps new, perhaps ... ancient) Norse creation story of the brothers of Odin: Vili and Ve.

      It might be hard to see today how a new "kind of spiritual interaction" might be only a few "mouse clicks" away though--how it could change everything literally in a flash of overnight sensation ... or how it might take something like a literal flash of stardom (or ... on the other hand, something like totalitarian or authoritarian "iron fisting") to make a change like this "ubiquitious" or ... something like the (imagined in my mind as ... messianic) "ED" of storming through the cosmos or the heavens and turning something that might appear to be "free and perfect feeling" today into a universe "civlized overnight" and then ...

      I wonder how long it would take to laud a change like that; for it to be something of a voluntary "reunderstanding" of a process ... to change the meaning of every word or every thought that connects to the process of "civilization" to recognize that something so great and so powerful has happened as to literally change the meaning of the word, to turn a process of civilization into something that had a ... "signta-lamcla☮" of forboding and then a magical staff struck into the heart of a sea and then ... and then the word itself literally changes to introduce a new "mid term" or "halfway point" in which a great singularity or enlightenment or change in perspective or understanding sort of acknowledges ...

      that some "clear outside" force not only intervened on the behalf of the future and the people of our world but that it was uniquely involved in the whole of--

      "waking up" tio a nu def of #Neopoliteran.

      ^Like the previous notation; the below text comes from an email previously sent; and while i stand behind things like my sanity, my words; and my continued and faithful attempt to speak and convey both a useful and helpful truth to the world---sometimes just a single day can make all the difference in the world.

      Sometimes it's just a single moment; a flash or a comment about ^th@ blink of an eye" ... and I've literally just "thought up/had/experienced/transitioned thru" that exact moment. The lies standing between "communication" and either "cooperation" or .... some other kind of action have become more defined. More obvious. Because of this clarification; like a kind of "ins^tant* gnosis"

      ... search high and lo ... the depths all the way to above the heavens ...\ \ for a festive divorce ceremonial ritual ... that looks something like a bachelor party ':;]

      --- @amrs@koyu.SPACe ... @suzq@rettiwtkcuf.social (@yitsheyzeus) May 22, 2020

      I ... TERON;

      Gjall are painting me into a corner here; and I don't see around it anymore--I don't see the light, and I don't see the point. I was a happy-go-lucky little kid in my mind; that's not "what I wanted to be" or what I wanted to present, it's who I was. I saw "Ashkenazi" and ... know I am one of those ... and I kind of understood that something horrible might have happened, or might happen here--and I kind of understand that crying smashing feeling of "to ash" that echoes through the ages in the potpourri songs about pockets full of Parker Posey .. and ancient Psalms about "from the ashes of Edom" we have come--and from that you can see the cyclical sickness of this ... place so sure it's "East of Eden" and yet gung-ho on barrelling down the same old path towards ash and towards Edom and towards ... more of Dave's "ashes to ashes dust to dust" and his "smoke clouds roll and symphony of death..." and few words of solace in a song called Recently that I imagine was fleeting and has recently come and gone--people stare, I can't ignore the sick I see.

      I can't ignore his "... and tomorrow back to being friends" and all but wonder who among us doesn't realize it's "ash" and "gone" and "no memory of today" that's the night between now and ... a "tomorrow with friends" not just for me--but for all of you--for this place that snickers and pantomimes some kind of ... anything but "I'm not done yet" and "there's more ... vendetta ... and retribution to be had, Adam ... please come back in a few more of our faux-days." This is sickness; and happy-go-lucky Himodaveroshalayim really doesn't do much but complain about that word, the "sickle" and the tragic unavoidable ... ash of it all ... these days--you'd think we could "pull out" of this mess, turn another way; smile another day, but it seems there's only one way to get to that avenu in the mind of ... "he who must not know or be me."


      I have to admit I found some joy in the epiphany that the hidden city of Zion and it's fusion with the Namayim' version of how that "Ha" gels and jives with the name Abraham and the Manna from Heaven and the bath salt and the tina and the "am in e" of amphetamine--maybe a glimmer or a shimmer or a glow of hope at the moment "Nazion" clicked ... and I said ... "no, not me ... I'm nothing like a king, no dreams of authoritarianism at all in the heart of Kish@r;" even as I wrote words that in the spirit of the moment were something of a "tis of a'we" that connected to my country and the first sing-songy "tisME" that I linked to trying to talk in the rhyming spirit of some "first Christ" that probably just like me was one limmerick away from the end of the rainbow and one "Four Non Blondes" song away from tying "or whatever that means" and this land crowned with "brotherhood" (to some personal "of the Bell, and of the bell towers so tall and Crestian") to just one Hopp skip and jump away from the heart of the obvious echoes of a bridge between haiku and Heroku... a few more gears shift into place, a click and and a mechanical turn of the face of the clock's ku-ku striking ... it was the word "Earthene" that was the last "Jesusism" around the post Cimmerian time linking Dionysus and Seuss to that same "su-s" that's belonging to a moment in the city of Uranus--codified and etched in stone as "MCO"--not just for its saucer and warp nacelles and "deflector dish" but for it's underground caverns and it's above ground "Space Mountain" and that great golf ball in the heart of it all.

      The gears of time and the dawns of civilizequey.org query the missing "here" in our true understanding of what "in the beginning, to hear; to here ... to rue the loss of the Maize from Monoceros to the VEGA system and the tri-galactic origin of ... "some imaginary universal ... Earthene pax" to have dropped the ball and lost it all somewhere between "Avenu Malkaynu" and melaleuca trees--or Yggrasil and Snifleheim--or simply to miss the point and "rue brickell" because of bricks rather than having any kind of love or nostalgia linking to a once cobblestone roadway to the city in the Emerald skies paved in golden "do not return" signs ... to have lost Avenues well after not realizing it was "Heaven'es that were long gone far before I stepped foot on this road once called too Holy for sandals" in a place where that Promised Land and this place of "K'nanites" just loses it's grip on reality when it comes to mentioning the possibility that the original source and story of Ca'anan was literally designed to rid the world of ... "bad nanites" and the mentality of ... vindictiveness that I see behind every smirk.

      The final hundred nanoseconds on our clock towards doom and gloom cause another bird to fly; another snake to curl up and listen again to the songs designed to charm it into oblivion; whether that's about a club in South Beach or a place not so far from our new "here..." all remains to be seen in my innocent eyes wondering what it truly is that stands between what you are ... and finding "forgiveness not needed--innocent child writes to the mass" ... and the long arm of the minute hand and the short finger of the hour for one brief moment reconcile and move towards "midnight" together; and it's simply idyllic, the Nazarene corner between nil and null you've relegated the history of Terran poast futures into ... "foreves mas" or so they (or you) think.


      I'm still so far from "Five Finger Death Punch" though; and so far from Rammstein and so far from any kind of sick events that could stand between me and "the eternal" and change my still "casual alternative rock" loving heart to something more death metal; I rue whatever lies between me and there being any kind of Heaven that thinks there could exist a "righteous side" of Hell and it... simultaneously.


      I still see light here in admonishing the masses and the angels standing against the story and the message God brings us in our history. I still see sparks in siding with the "causticness" of "no holodecks in sight" and the hunger and the pain of simulating ... "the hells of reality" over the story of decades or centuries of silence refusing to see "holography" and "simulated" in the word Holocaust and the horrors of this place that simply doesn't seem to fathom or understand the moments of hunger pangs and the fear of "dark Earth pits" or towers of "it's not Nintendo-DS" linking the Man in the High Castle to an Iron Mask.

      I rally against being what I clearly am raised high on some pedestal by some force beyond my comprehension and probably beyond that of the "perfect storm in time" that refuses to itself acknowledge what it means to gaze at such an unfathomable loss of innocence at the cost of a "happy and serene future" or even at the glimmer of the Never-Never-Land I'd hoped we would all cherish and love and share ... the games and the newfound freedom that comes not just from "seeing Holodeck" turn into "no bullets" and "no cages" but into a world that grows and flourishes into something that's so far beyond my capability to understand that I'm stuck here; dumbfounded; staring at you refusing to stop car accidents and school shootings ... because "pedestal." For the "fire and the glory" of some night you refuse to see is this one--this place where morality rekindles from ... from what appears tobe one small candle, but truly--if it's not in your heart, and it's not coming from some great force of goodness--fear today and a world of "forever what else may come."


      Here in a place the Bible calls Penuel at the crossing of a River Jordan ... the Angel of the Lord notes the parallels in time and space between the Potomac and the Rhine--stories of superposition and cities and nation-states that are nothing more than a history of a history of things like the Monoceros "arroz" linking not just to the constellation Orion but to Sagittarius and to Cupid and of course to the Hunter you know so well--

      Searching for a Saturday; a sabbath to be made Holy once more ... "at the Rubycon"

      The Einstein-Rosen Wormhole and the Marshall-Bush-JFKjr Tunnel

      The waters are called narah, (for) the waters are, indeed, the offspring of Nara; as they were his first residence (ayana), he thence is named Narayana.

      --- Chapter 1, Verse 10[3]

      In a semi-fit of shameless arexua-self recognition i'm going to mention Amazon's new series "Upload" and connect it to the PKD work that my Martian-in-simulcrum-ciricculum-vitae on "colonization education" ... tying together Transcendance, Total Recall and ... well; to be honest it actually gave me another "uptick" in the upbeat ... maybe i'll stick around until I'm sure there's at least one more copy of me in the ivrtual-invverse ... oh, that reminds me ... Farmer)'s Lord of Opium also touches on this same "mind of God in the computer" subject (which of course leads to Ghost in the Shell and Lucy--thanks Scarlette :).

      While I'm listing Matrix-intersected pieces of the puzzle to No Jack City, Elon Musk's neuralace and Anderson's Feed are also worth a mention. Also the first link in this paragraph is titled ... "the city of the name of time never spoken after time woke up and stfu'd" (which of course is the primary subject of this ... update to the city Aerosol).

      The ... "actual original typed dream" included a sort of "roller coaster ride" through space all the way to Mars; where the real purpose of "the thing" I am calling the "Mars Hall" was to display previous victories and failures ... and the introduction of "older or future" culture's suggestions for "the right way" to colonize a new habitat. If it were Epcot Center, this would be something like SpaceMountain taking you to to the foture of "Epcot Countries" as if moving from "countries" to planets were as easy as simply ... "reading backwards."

      THE SOFTWARE, SINGERS, AND SHIELD(S)

      OF

      HEIROSOLYMITHONEYY

      Thinking just a little bit ahead of myself, but I'm on "Unreal Object/Map Editor within the VR Server" and calling it something like "faux-wet-ware" ... which then of course leads to a similar onomonopeia of "weapons and ..." where-with-all to find a better singer's name to connect the road of "sword" to a Wo'riordan ... but I think that fusion of warrior and woman probably does actually say ... enough of it all; on this road to the living Bright Water that the diety in my son's middle name defines well here, as "waking up," stretching it's tributaries and it's winding wonders and wistfully ....

      Narayana (Sanskrit: नारायण, IASTNārāyaṇa) is known as one who is in yogic slumber on the celestial waters, referring to Lord Maha Vishnu. He is also known as the "Purusha" and is considered the Supreme being in Vaishnavism.

      andromedic; the ports of call ... to the mediterranean (literally) from the gulf coast;

      ... ho engages in the creation of 14 worlds within the universe as Brahma when he deliberately accepts rajas guna, himself sustains, maintains and preserves the universe as Vishnu by accepting sattva guna. Narayana himself annihilates the universe at the end of maha-kalp ...

      .

      there's no place like home. there's no place like home. there's no place like home.

      and so it begins ... "f:

      r e l i g i o n

      find out what it means to me. faucet, ever single one, stream of purity ...

      from Fort Myers ... f ... flicks ... Flint.

      "

      ^this notation will from this email forward in linear time denote some form of contact method or information related to the context of the message you are reading. This particular one sends me an encrypted email. 5if there is an "@" symbol involved in the "anchor's hypertext reference" (technically an "a href=" in HTML4) your browser should attempt to open an email client to send a message over an anonymous SMTP relay. Understand that "anonymous" in this case may or may not mean your sending email address is hidden or obvuscated--so if you want to receive a reply you must include it in the DATA of your SMTP transmission defined by the RFC5321 attached. In most cases "anonymous" also means that you will not have the recipients direct contact information unless they have made it public---additionally the exact server/system/relay used may or may not be the "Sbroken Berkman Perl Script" linked to in the "hypertext reference" specifically anchored to the words "an anonymous SMTP relay" above.

      A simple "hat character" (^) and the letter "t" as you see beginning the above paragraph will denote a contact method or form that works over the internet using an HTTP protocol defined in a series of RFC's including (but not limited to) RFC's numbered as 2616, 7230, 7235, 2068 and use a simple language which is based on a definition suggested or proposed currently by an organization called the "W3C Consortium"

      ---and ... previously set and defined by an organiza^tion located at html.spec.whatwg.org; which appears (to me, for the first time as I write these words) to follow the conceptual spirit of the "living document" defined by the several "Continental Congresses, et alia." I personally now conjoin this document in my head to a procession of patrilineal or matrilnear predecessors to the actual event .... still to be defined ... but related to this specific email, this mailing list; its contributors and readers as well as actual members of the organization (still to be created, defined, or named) that creates a "round table" of members that is open to the public, to all voters educated enough to understand the specific issue being voted on (up to a standard that; in this place and time appears to be unset and unmet but materially related to reawching the age of 18 years old; growing up in or being born in the United States of America (related spec. to the Constitution of the United States of America which is officially "self-defined" through a process which includes all three branches of the government which it also "self-defines" and purports to be "of, for, and by the people"--though the general population is only able to contribute through an indirect process (read:the people cannot directly contribute to the constitution without either running for office (like a senator) or being appointed to a specific government position (like a judge or executive branch public servant).

      The current state of American representative democracy is the highest standard to which I am currently knowledgable of "extant"--and it is specifically substandard, inferior, and "just not good enough" as a comparison to the process required to vote in the organization being "self-defined" through this process*. It is my sincere and clear hope that "this process" will result in a legal and moral amendment to the document shown in the previous link and presented by the Legislative Branch of the United States here. It is my current and faithful belief that anything else would also be significantly below the standards morally required by "this process" which of course includes over 200 years of American citizenship and (other international relations; i.e.e.gfor "iv" exampleid estexemplia gratia) as well as the Sons of Liberty and prior to that contributions from the Crown and the "Parliament and Crown" of the United Kingdom; among others et alea's ifndef: 'swikipedia/et_al..

      To note specifically because of lack of personal knowledge and public notoriety (assuming all other requiremnant* achem requirements)

      alas, babylon.

      i listened to a man yesterday who was talking about "true heroes" ... he of course noted jesus christ and superman together, suggesting the first was one, and the second just a fiction. he also talked about people like ghandi and "leaders who use non-violent means to "change the world." i at least agree with him on the third, ghandi is a good prototype for some kind of hero. staring at this ... "to be completed" work on tales of two cities, whether from sodom and gomorrah all the way to athens and sparta and perhaps even london and paris--and this particular city, babylon; it stands out as one which truly has no equal or even "mirror" in the history of the world. i suppose i'd add "alexandria" and suggest the library and the laws; something that are fundamental to the ethos of the planet i call "athens."

      i imagine he did not know "hammurabi's" name; and even today in this place where i ask and do not receive answers; i imagine you still don't connect muhammad or amsterdam ... to this king who in our history is set apart and lifted high on a pedestal of having "codified and written down" laws ... for the very first time. it's almost comical, it took me a paragraph and a sentence to connect "the king and i" to this mirror world, where the bible and the people have most assuredly decided "babylon" is a negative thing or a depraved place.

      "fallen, fallen, is [the city of] babylon the great"

      ... just a quote from one of my favorite movies; which of course is re-quoting "dante" and/or "the bible"

      "a dwelling place [of] (the) demons (say), it has become."

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the Review Commons editor and three reviewers for their enthusiastic response, including their constructive suggestions and appreciation of the high impact and originality of our study. We have completed the revisions and new analyses suggested by the reviewers, and we thank the reviewers for their suggestions to increase the impact and interest in this work and for guiding us towards this much improved manuscript.

      In this response letter, we present the response to each reviewer comment and associated revisions made to the text and figures as bullet points below the reviewers' text (black text).

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Yang et al. took advantage of recently published long-read-based genomic sequences of nearly homozygous genomes from complete hydatidiform moles to retrieve allelic sequences of LINE-1, the currently only active and autonomous retrotransposon of the human genome, and produced the repertoire of intact LINE-1 in a genome. The authors performed cell-culture-based retrotransposition assays measurements and in vivo fitness estimations of all identified intact LINE-1 to infer evolutionary dynamics. In this article, the authors further validate the major contribution of polymorphic LINE-1 to the de novo retrotransposition events in the human genome. They also described, at unprecedented resolution, allelic variations among LINE-1 loci and the potential impact of these variations to the interpretation of mutagenic potential of each LINE-1 locus.

      Major comments:

      1 - The key conclusions of the article are mostly convincing. However, it would be a substantial improvement to consolidate the data of the article with information about known active LINE-1s in germ cells or in cancer by using data from recent publications of the Devine and Tubio labs (for example PMID: 34772701, 32024998, 25082706). Across the article, no mention is made of the transductions generated during LINE-1 de novo retrotransposition, which is instrumental to monitor in vivo activity of a group of LINE-1 active copies. It would be of particular interest to make a link between in vitro activity from this study with LINE-1 classification based on their observed activity in cancer (PMID: 32024998, Figure 3b).

      • We thank this and the other reviewers for this suggestion. We agree that a more explicit comparison to the often-reported counts of 3’ transductions would be a valuable addition to our analyses. We have added the 3’ transduction counts from PMID:34772701, PMID:32024998 and PMID:25082706 to Table S2 (column Y, Z and AA), and made a comparison between these data and our Hamming-distance-based in vivo activity, as the new Figure S5. We found correlations between the two measurements in a significant proportion of LINE-1s, but some interesting exceptions exist which likely reflects the fact that most catalogued 3’ transductions come from cancer genomes, and cancer and germline cells represent distinct cellular environments in which distinct sets of LINE-1s are able to replicate (and leave 3’ transductions). In addition to the new figure (Figure S5), we have added a discussion paragraph focused on this interesting comparison.

      2 - The use of CHM1 BAC library Sanger sequencing validation and comparison with CHM13 and hg38 sequences is instrumental to support the building of LINE-1 repertoire in CHM1 genome, which is a valuable contribution of the article. The use of a distance-based metric to infer fitness of a LINE-1 is an interesting approach and allow to group LINE-1 copies based on their in vivo activity potential. Again, it would be beneficial to correlate the inferred fitness and retrotransposition activity of copies/alleles, when known, from the above-mentioned literature.

      • The sequence validation of LINE-1 sequences in CHM1 is an important point which we have addressed in the edited manuscript. Specifically, we used three forms of sequence validation including end-sequencing of one clone of each LINE-1 after it was cloned into the retrotransposition vector and whole-plasmid sequencing of select LINE-1s with discrepant activity amongst the three clones we assayed. In addition, we sequenced the entire LINE-1 sequence for four LINE-1s which had the largest number of mutations relative to their allelic counterpart in CHM13. Please see the above response to ‘Major comment 1’ for details of our new analysis comparing the previous literature to our data.

      3 - Some aspects of the writing of the article should be improved to better support the conclusions.

      • We thank the reviewer for providing these examples of parts of the text that were particularly difficult to read and comprehend. We have deeply streamlined and improved the text throughout the manuscript based upon detailed editing for readability and clarity by two experienced scientific writers. Below, we detail how we revised the particular sections presented by the reviewer, but we think the entire manuscript is now more succinct and clearer.

      • In general, the descriptions are dense, and details could be provided in a more direct way to lighten the results section. Several redundancies in the discussion can be combined to increase clarity.

      • We have spent considerable time tightening up the text, including removing several overlapping sections from the discussion which can be seen in the included version with changes tracked.

      • There is a lack of clarity in the description of how was handled each pair of alleles for which retrotransposition measurements vary between the study and the literature (last paragraph of the "Comprehensive measurement of LINE-1 in vitro activity in a human genome" section). It is not completely clear how the analysis was done and the way the data is presented in File S3 is not helping to support the conclusion. It could be useful to include some illustrative examples in a panel of Figure 2.

      • We agree that this description was hard to parse, and we have rewritten this and accompanying methods to simplify our explanation of these results. In addition, we have revised Figure 2 to show the data in much more detail. To further aid the logic flow related to this section, we moved the previous Figure 5B to Figure 2B, updated it with more suitable examples and edited the associated descriptions.

      • Regarding inferred in vivo activity, the text contains alternative description with the use of "fit" / "unfit", in vivo "active" / "inactive" or "no closely related LINE-1s" terms. The authors should find a way to clearly define and systematically use one set of terms to enhance clarity along the article. To parallel with in vitro active/inactive, it would be useful to use in vivo fit/unfit.

      • We thank the reviewer for this suggestion and agree with their suggested unified use of ‘in vivo fit/unfit’. To clarify and simplify these terms as much as possible, we added detailed explanations of in vivo / in vitro activity and systematically defined in vitro "active/inactive" (page 5, right column, line 50) and in vivo "fit/unfit" (page 8, left column, line 26) at their first appearance in the article, and we changed most instances of "in vivo activity" to "in vivo fitness" when context permits.

      4 - The authors suggest that in vitro activity can be predicted by integration of population frequency and in vivo activity (/fitness) (second paragraph of the "An analysis of LINE-1 evolutionary history [...] and in vivo activity" section). It would be beneficial to strengthen the writing of this section and ultimately validate/test the model by including data from some of the previous studies (e.g. Brouha 2003, Lutz 2003, Seleme 2006, Beck 2010, Rodriguez-Martin 2020, Chuang 2021).

      • We have thoroughly revised this section of the results (see response to ‘Major comment 3’ above), per the reviewers suggestion, to increase reader comprehension of this important data. In addition, we greatly appreciate the reviewer’s suggestion of a very interesting experimental direction – moving beyond a single long-read-based genome to many diverse genomes, and ultimately calculating the in vivo fitness of the LINE-1s from these diverse genomes. For a long time this has not been possible, but the recent publication of the Human Pangenome presents an opportunity to study this interesting question. Though beyond the scope of this paper, our lab is actively working on this fascinating question, and we appreciate the reviewer’s shared interest in this question.

      5 - The identification of adaptive mutations is only partially described and not strongly supported by experimental or analytical data. It would be interesting to explore the role of phylogenetically informative sites described in Figure 5B/C by testing non CHM1 alleles in retrotransposition assay (by introducing amino acid changes into the cloned CHM1 LINE-1 alleles) or by positioning the sites in ORF1p or ORF2p structure and/or domains to infer impact on functionality.

      • The reviewer rightly points out that this is one of the most interesting and novel findings of our manuscript. However, the testing of potentially adaptive mutations is potentially complicated and nuanced. Specifically, we don’t know the mechanism by which these mutations might be adaptive. It is possible that they simply increase in vivo germline retrotransposition activity and this increase would be reflected by an increase of in vitro retrotransposition activity. However, another possibility is that these adaptive phenotypes only show themselves in vivo or in the context of the host restriction factors expressed in the germline. We strongly agree with the reviewer that experimental and analytical data on the phylogenetic informative sites associated with the Figure 5 phylogeny is the key to finding out the mechanisms for these changes to affect LINE-1 activity/fitness, and we are, indeed, exploring this very question in the lab now with related projects. We respectfully suggest that these (extremely cool) experiments are beyond scope of this paper, but we have also added some more detailed description and analyses of the potentially adaptive LINE-1 variations from Figure 5 (from page 9, right column, line 50 to page 10, left column, line 5).

      Minor comments:

      1 - Regarding the in vitro retrotransposition assay, it would be beneficial to provide more data. The current Figure 2 could be enriched by the addition of data related to the variation in the replicates of the experiment (technical but mostly biological with the three clones per LINE-1 tested). Figure 2 could include a dashed line for 100% L1RP and 5% (since it is used as a threshold). It would be useful to provide an additional panel in Figure 2 to illustrate alleles of LINE-1 that are active in this study and compare the values obtained previously in other studies. Similarly, a supplemental table or alignment could be provided to document amino acid changes in the two alleles of each pair (see comment above in the Major Comment 5). The L1Hs subfamilies could also be included in the graph of Figure 2 to support the conclusions of remaining active old L1Hs at allelic forms in the human genome.

      • Upon consideration of this helpful comment, we now augment the presentation of our in vitro activity data with a remade Figure 2 with boxplots to show the variation of the data, as well as a horizontal dashed line showing the active-cutoffs and star signs showing which LINE-1s belong to L1Hs or L1PA2.

      2 - Also, the validation of cloning is not well described. The choice of PCR validation must be supported by more technical details on the design of the primers used to validate each copy. The authors should clearly state that the strategy chosen for retrotransposition assay does not rely on the transcription from LINE-1 5UTR but from an upstream strong promoter, ruling out the role of potential mutations in LINE-1 promoter.

      • As detailed above in the response to ‘Major Comment 1’, we used a combination of end sequencing, whole plasmid sequencing, and multi-read Sanger sequencing to validate the sequences of each LINE-1 cloned from a CHM1 clone. When cloning each LINE-1, we used a specific set of primers designed for the ends of the UTRs for each LINE-1. We have updated the methods and text to clarify this cloning step, and the sequences of these oligos are included in Table S2.
      • To clarify the fact that our retrotransposition assays use a common, strong promoter, we added text in several places stating this setup and discussing (paragraph that starts at page 11, right column, line 18) how 5'UTRs and other non-ORF factors can affect the rate of LINE-1 in vitro activity.

      3 - There are discrepancies with the reported numbers of LINE-1s between Figure 1A and Table S1: 154 vs. 151 in CHM1, 144 vs. 143 in CHM13, respectively.

      • We thank the reviewer for spotting this error on our part. The numbers in Figure 1 and the main text were correct, and we have revised Table S1 to reflect this data.

      4 - The choice of colors in Figure 3 is not perfectly clear and sometimes not as reported in the text (green highlight and orange highlight). Part of the Figure 3 legend is missing. It should include a description of the color code chosen for the right histogram.

      • We thank the reviewer for bringing this inconsistency to our attention. Based upon feedback from all reviewers, we have simplified the color scheme in Figure 3 and Figure 5 to focus on the core conclusions of these two figures. Specifically, in Figure 3, we have removed the quadrant shading and more clearly presented the cutoffs of ‘polymorphic/high frequency’ and ‘in vitro active/inactive’ as dashed lines in the scatter plot. In Figure 5, we have simplified to two colors – black for in vivo unfit and orange to show the in vivo fit LINE-1s which is also used in Figure 4 to show the definition of in vivo activity. These updated colors are now defined in the figure legends and main text, and we have made references to these colors consistent throughout.

      5 - For Figure 4, it would be useful to define in the legends the color code for the top histogram. To better read the scatter plot, the words "fit" and "unfit" could be added on each side of the vertical dashed line.

      • We thank the reviewer again for suggestions to improve the clarity of our figures. As mentioned above in ‘Minor comment 1’, we have removed unnecessary colors including the gradient of the histograms in Figure 3 and Figure 4, since the boundaries of each bin are already defined by the axis labels and tics. As suggested, we have also added ‘fit’ and ‘unfit’ labels to the dashed cutoff line in Figure 4 to clarify the meaning of this line.

      6 - In panel B of Figure 5, it seems that the color code and hot/cold description is not fully formatted.

      • This formatting error has been corrected.

      Reviewer #1 (Significance (Required)):

      In this article, Yang and colleagues present an unprecedented view of the allelic diversity of young LINE-1 copies related to variable retrotransposition activity in an individual genome. One key aspect of their work is the description of the presence of young active LINE-1 alleles that are absent or non-intact in other genome assemblies, while described at a lower scale in initial work from the Kazazian and Moran labs, cited in the manuscript. The work of Yang et al. demonstrates the requirement of multiple approaches and long-read-based sequencing of individual genomes to fully infer the mutagenesis risk of LINE-1 activity.

      The data and methods provided by the authors open the door to a more systematic analysis of mutations and rare allelic forms to understand both mechanistic aspects and evolution of LINE-1 retrotransposition in the human genome. The identification of rare allelic forms of old LINE-1 that retain activity despite previously being considered as inactive is particularly interesting in the light of LINE-1 evolution in the human genome. The authors also describe allelic diversity inside of the Ta1d subfamily, suggesting further diversification and emergence of LINE-1 subgroups. Together with the identification of nucleotide polymorphism among LINE-1 copies, these findings strengthen the notion of individual genomes with individual set of potentially mutagenic LINE-1 alleles.

      The findings and methods described in this article are of great interest to a wide audience including the fields of research focusing on human genome evolution, transposable elements, genomic instability, human genetic variation, and personalized medical diagnostic.

      Aurélien J. Doucet CNRS - Université Côte d'Azur

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript is an interesting and well-crafted study of LINE-1 activity at the single genome human genome level using long read-based haploid assemblies. The manuscript has some real gems and address critical aspects of LINE- biology that are typically not rigorously examined. The authors are to be commended for undertaking this exercise and for providing interesting perspectives that challenge the dogma that dominates the field in several areas. Despite the noted strengths of the contributions, the manuscript ignores the clear limitations inherent to the approaches taken and at times appears as dogmatic as the dogma that they themselves are trying to challenge. These deficiencies should be addressed before this manuscript is published.

      • We thank Reviewer 2 for their enthusiastic appreciation of the value and innovation of our manuscript. We also thank the reviewer for encouraging us to make careful consideration of the missing references relevant to our findings. We have had two researchers with experience in relevant fields edit our text for both readability, clarity, and proper inclusion of relevant references. We have added these throughout and taken careful effort to replace ‘dogmatic’ statements with clear presentations of the data and thorough referencing of the relevant literature.

      Several major and minor points to consider during revision include:

      Major:

      1. Several strategies have been published in the past that have confidently assign LINE-1s to specific loci despite use of shorter reads. These works should be acknowledged, even if as stated in the manuscript, use of longer reads will only continue to add confidence and validity to future assignments.

      2. We thank the reviewer for this suggestion, and we apologize for the omission of these important publications. As noted above, we have added numerous relevant references (reference 17-27 in the revised text) throughout the text including previous work that used short reads to confidently assign polymorphic/non-reference LINE-1s to specific loci. For example, we now cite the MELT pipeline to detect de novo L1 insertions with short reads (PMID: 28855259), and Iskow et al. 2010, which detects LINE-1s with junction fragment sequencing (PMID: 20603005). We have also added additional text to clarify that short reads are, indeed, often sufficient to place new LINE-1 insertions, while long reads are especially useful for resolving the sequence and location of these insertions. The new text (page 2, left column, line 22-30) presents the advantages/disadvantages of both short reads and long reads.

      3. One of the important requirements for precise quantification of LINE-1 activity and predicted risk scores cited in the manuscript was the need to predict activity based on sequence and location. This requirement, as posited in the manuscript, ignores the critical role of epigenetic control in the regulation of LINE-1 activity. As such, a discussion that acknowledges the critical roles of histone and DNA covalent modifications, and that integrates epigenomic insight into predictions of LINE-1 activity must be included in the manuscript.

      4. We thank the reviewer for suggesting this important discussion point. In response, we have expanded our discussion of this topic to place our data in the context of other literature on the effects of epigenomic regulation on in vivo LINE-1 activity, including histone and DNA modifications, as well as the effects of post transcriptional restriction factors (paragraph starting at page 11, right column, line 42).

      5. The limitations associated with the use of the CHMI were not addressed in the manuscript. While CHMI contain a paternal only genome, with no maternal contribution, the moles may arise from fertilization of an anuclear empty ovum by a haploid 23,X sperm or fertilization by two sperm giving rise to 46,XX or 46,XY karyotype. As such, generalizable conclusions about CHMI genetics should be carefully made given that the loss of maternal epigenetic imprinting and gain of paternally imprinted expression may result in abnormal gene expression, including that of LINE-1s. These variances will in turn impact LINE-1 activity profiles.

      6. We thank the reviewer for pointing out this confusingly written section of our manuscript, and we agree with the reviewer that LINE-1 activity measurements could be complicated in the CHM cell lines; however, all of our retrotransposition assays were carried out in the common background of 293T cells (chosen because of their low expression of know LINE-1 restriction factors (PMID: 25182477). We have modified the text (page 11, right column, line 52) to clarify these points.

      Minor

      1. Important citations of previously published work are not properly referenced throughout the manuscript. These are too numerous to identify individually, but the authors should carefully read the manuscript to ensure that proper documentation and reference to previous work is duly acknowledged.

      2. Please see our above response to ‘Major point 1’.

      There are several typos and missing prepositions that should be corrected. For instance, on page 7, the word "great" should be "greater".

      • Please see our above response to ‘Major point 1’ and Reviewer 1’s ‘Major comment 3’ for details on our in depth editing of the manuscript.

      Reviewer #2 (Significance (Required)):

      The contribution is highly significant as it challenges previously held concepts and advances our understanding of critical structure and function relationships of Line-1s.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Yang et al. perform an in-depth analysis of potentially mobile source L1 alleles in a single human genome (CHM1) previously subjected to Pacbio whole genome sequencing. The retrotransposition efficiencies of source L1 alleles with intact ORFs were tested in vitro, and these efficiencies compared to a model of in vivo activity based on Hamming distance to other ORF-intact L1 alleles. Comparisons of CHM1 L1 alleles are made to CHM13 (used for the recent T2T reference assembly), and also to population-scale sequencing efforts to establish how widespread each source L1 allele is. These data showcase the advantages of being able to resolve L1 alleles with long-read sequencing, allowing the field to make much more accurate predictions of retrotransposition potential in a given genome. The core analyses appear robust and for the most part enough detail is provided to follow what was done.

      • We thank Reviewer 3 for their in depth reading and analysis of our manuscript and data, and for their enthusiasm about the importance of this work in the context of foundational research from their lab and many others in the field. We have carefully considered each comment and completed several new analyses of our data and related data from other publications. We feel that our manuscript is much improved with this new data, as detailed below. Comments:

      1) The text overlooks the potential importance of L1 5'UTR mutations in L1 activity and evolution, as per PMID:25274305, PMID:1701022, and other studies, as well as the impact of genomic context on source L1 activity, as per PMID:27016617, PMID: 33186547 etc. L1 promoter evolution is arguably a major driver of L1 lineage emergence.

      • We thank the reviewer for suggesting these important additions. To present the relevance of 5'UTR mutations on LINE-1 activity and evolution, we added a discussion paragraph (paragraph starting at page 11, right column, line 16) to address how 5'UTRs and other non-ORF factors can affect the rate of LINE-1 in vitro activity. Several key references have been added and discussed in the paragraph: PMID:25274305 reported the regulation of human LINE-1 by the evolution of its 5'UTR; PMID:1701022 was one of the earliest papers that found the effect the 5'UTR promoters on human LINE-1 retrotransposition; PMID: 27016617 and PMID: 33186547 reported specific L1 loci regulated by different promoters and was included in the discussion; PMID:9430649 was one of the examples of non-human LINE-1 lineages emerging because of different promoters and was cited in the added discussion paragraph. We have also added discussion points to make clear that genomic content has a clear role in the activity of source LINE-1s (paragraph starting at page 11, right column, line 42).

      2) The way the retrotransposition assay is done here (I think) removes parts of the UTRs as part of introducing L1s into retrotransposition vectors, meaning that the assay tests the biochemical activity of the ORFs. It would be helpful to readers to have a more detailed method for this assay, including the origins of the reporter plasmids, whether there is a CMVp boosting the L1 promoter etc, and some clarity about how much of each L1 was cloned into the assay.

      • We have added relevant details to the results (page 6, left column, line 5), discussion (page 11, right column, line 52), and methods (page 13, right column, line 16 and 30) sections to clarify the reviewer’s important points. The LINE-1s tested for in vitro activity were cloned in their entirety (UTRs and ORFs) but driven by both their native promoters in the 5'UTR as well as an upstream CMV promoter. Also, please see our response to Reviewer 1 ‘Minor comment 2’ above.

      3) Pacbio long-read sequencing has been used previously to locate and characterise L1 alleles in human DNA. The Introduction states: "These represent the first scalable methods to catalog LINE-1 locations and sequences in individual human genomes". The "first" here is questionable. Citations to PMID:31853540 and PMID:34772701 should be included. The latter is particularly relevant at it not only resolves source L1 sequences with PacBio sequencing but also summarises their retrotransposition efficiencies in vitro and population frequencies.

      • We apologize for leaving out these and other important references, and we agree that the “first” claim is unnecessary. We have added the references suggested for the reviewer as well as several other important references as detailed in the above response to Reviewer 2 ‘Major point 1’. In addition, we have revised the adjacent text and deleted any references to our work as the “first” in these approaches.

      4) I am very interested in the two source L1s (on chr7 and chr9) that were found here to be more active in vitro than L1RP (to my knowledge the most active such element isolated to date, or close to it). Is there anything unusual about these two L1s? A quick look at the supplemental suggested the chr9 element was 5' truncated, was it tested as such in vitro? Also I think it would be worth contrasting the assay (all in HEKs) used here to test efficiency with the assay used by Brouha ... I feel readers may be surprised to find two L1s more mobile than L1RP in one genome.

      • To provide more details about the two active L1s (chr7 and chr9), we investigated key changes that could be related to the in vitro activity of these elements and now show them in Figure 2B and File S3. In the process of this updated analysis and suggested modifications to Figure 2 by this reviewer and Reviewer 1, we saw that the chr7 L1, mentioned here, had one very high activity measurement pulling its activity above L1RP. As such, we decided to more rigorously normalize our data by using the positive and negative controls across all plates of each day instead of normalizing to the controls of individual plates, as we had previously done. In addition, for any L1 with discrepant activity among the three clones we assayed, we used whole plasmid sequencing to confirm the identity and consistency of all three clones. In three cases, we found that one or two of the three clones was the wrong L1, and hence excluded them for the in vitro activity calculation. After this validation and testing of additional clones, all clones from the same L1 have consistent in vitro activity (see updated Figure 2). The updated in vitro activity of the chr7 L1 is at 86.7% L1RP, and the chr9 L1 is at 261.4% L1RP in addition to the chr17 LINE-1 with 117% L1RP and two additional LINE-1s that have near-L1RP activity levels (Table S2, column S). These changes in L1 activity were updated in the text, figures, and supplemental materials. Also, we note that the chr9 element is 6019bp in length and was tested as such in vitro. Current work in the lab is attempting to understand the mechanisms of increased LINE-1 in vitro and in vivo activity, as described in detail in response to Reviewer 1’s ‘Major comment 5’.

      5) In several places it is mentioned how L1 alleles may differ from sequences provided in reference assemblies, and may therefore explain discrepancies between assay results here and in other studies (e.g. Brouha). The Seleme and Lutz papers are correctly mentioned here, but arguably the most complete demonstration of this concept, from PMID:31230816, is overlooked. This study reports a chr13 source L1 that was previously found to be inactive by Brouha, and with broken ORFs in the reference genome, has both mobile and immobile alleles in the human population. This L1 is actually in CHM13, but not CHM1, and is "hot" in some individuals and not others. There are several places in the manuscript where this earlier study is very relevant and it would be fair to ask it to be mentioned, especially as the results are concordant. The same concept is reinforced by an even more recent paper (PMID:35728967), except in macaque, showing that this is a general consideration for primate L1 lineages, and actually that source L1 is relatively old and yet jumps extremely well in vitro, which fits an observation made in the present study. Mutually supporting observations like these really add confidence that what is reported in the present study is robust.

      • We thank the reviewer for their suggestion to include these highly relevant and important papers; we apologize for this initial omission. We have now added several sentences to the introduction and discussion (top left paragraph page 11) in addition to citations of these papers.

      6) Hamming distance between ORF-intact source L1 alleles is used to assess in vivo activity. This seems reasonable. However, in other works, transductions have been used to identify families of very closely related L1s. I realise that many highly mobile source L1s will rarely generate insertions carrying transductions, and yet I wonder if any of the youngest L1s in the present study form transduction families, and whether estimates of in vivo activity based on transductions found in population-scale data would reconcile better with in vitro retrotransposition assay data.

      • We thank the reviewer for pointing out our exclusion of data on 3' transductions, the most commonly used surrogates of in vivo activity, while also acknowledging that only a small percent of new L1 retrotranspositions carry 3' transduction. Please see our above response to Reviewer 1’s ‘Major comment 1’ for details on our newly added comparison of our in vivo activity data to the 3' transduction-based somatic LINE-1 retrotransposition landscape of those reported in PMID:34772701, PMID:32024998 and PMID:25082706.

      7) In the Introduction, it is stated that L1 only transmits vertically. It may be prudent to mildly qualify this position, based on PMID:29983116.

      • The referenced text in the introduction has been changed from "LINE-1s only transmit vertically" to "LINE-1s generally transmit vertically with few exceptions", with the addition of the suggested citation.

      8) A column in Table S2 looks mislabelled: Column R should be CHM1 not CHM13?

      • We thank the reviewer for seeing this error. Column P (Column R in the previous version) of Table S2 is now correctly labeled as "CHM1 L1 intactness".

      Geoff Faulkner (University of Queensland)

      Reviewer #3 (Significance (Required)):

      This is a well-executed study of considerable interest to the mobile DNA field, and anyone working with long-read DNA sequencing. Its strengths are the genomic and bioinformatic analysis, leveraging the PacBio long-read data and BAC library available for CHM1 to full effect. One limitation (in current form) is its near-exclusive focus on ORFs to encapsulate how mobile a given L1 allele is, when genomic context and L1 promoter mutations could also contribute heavily. Although I liked the manuscript very much and enjoyed reviewing it, some of the conceptual advances are encroached upon by other work (including some very relevant and yet uncited literature). These issues can very likely be addressed via a revision, additional analyses may be required but not new experiments.

      Geoff Faulkner (University of Queensland)

    1. Author Response

      We would like to thank the reviewers for their positive and constructive comments on the manuscript.

      We are planning the following revisions to both DGRPool and the corresponding manuscript to address the reviewers’ comments:

      1) We agree with reviewer #1 that normalizing the data could potentially improve the GWAS results. Thus, we plan to explore the implementation of this option and assess its impact on the overall results. We will also investigate replacing the ANOVA test with a KRUSKAL test. Instead of upfront data normalization, we will consider using the PLINK –pheno-quantile-normalize option. Both options will be compared on a set of phenotypes where we can analyze the output (i.e., for phenotypes where we expect to find specific variants), to determine whether these strategies enhance the detection power.

      2) We also agree with both reviewers that gene expression information is of interest. However, we recognize that incorporating such information would entail substantial work (as elaborated in our response to comments below). We feel that this extensive work is beyond the current scope of this paper, which primarily focuses on phenotypes and genotype-phenotype associations. Nonetheless, we are committed to enhancing user experience by including more gene-level outlinks to Flybase. Additionally, we will link variants and gene results to Flybase's online genome browser, JBrowse. By following the reviewers' suggestions, we aim to guide DGRPool users to potentially informative genes.

      3) In agreement with reviewer #2, we acknowledge that additional tools could enhance DGRPool's functionality and facilitate meta-analyses for users. Therefore, we are in the process of developing a gene-centric tool that will allow users to query the database based on gene names. Moreover, we intend to integrate ortholog databases into the GWAS results. This feature will enable users to extend Drosophila gene associations to other species if necessary.

      4) Finally, we also concur with both reviewers about making minor edits to the manuscript to address their feedback.

      Reviewer #1 (Public Review):

      This is a technically sound paper focused on a useful resource around the DRGP phenotypes which the authors have curated, pooled, and provided a user-friendly website. This is aimed to be a crowd-sourced resource for this in the future.

      The authors should make sure they coordinate as well as possible with the NC datasets and community and broader fly community. It looks reasonable to me but I am not from that community.

      We thank the reviewer for the positive comments. We are relatively well-connected to the D. melanogaster community and aim to leverage this connection to render the resource as valuable as possible. DGRPool in fact already reflects the input of many potential users and was also inspired by key tools on the DGRP2 website. Furthermore, it also rationalizes why we are often bridging our results with other resources, such as linking out to Flybase, which is the main resource for the Drosophila community at large.

      I have only one major concern which in a more traditional review setting I would be flagging to the editor to insist the authors did on resubmission. I also have some scene setting and coordination suggestions and some minor textual / analysis considerations.

      The major concern is that the authors do not comment on the distribution of the phenotypes; it is assumed it is a continuous metric and well-behaved - broad gaussian. This is likely to be more true of means and medians per line than individual measurements, but not guaranteed, and there could easily be categorical data in the future. The application of ANOVA tests (of the "covariates") is for example fragile for this.

      The simplest recommendation is in the interface to ensure there is an inverse normalisation (rank and then project on a gaussian) function, and also to comment on this for the existing phenotypes in the analysis (presumably the authors are happy). An alternative is to offer a kruskal test (almost the same thing) on covariates, but note PLINK will also work most robustly on a normalised dataset.

      We thank the reviewer for raising this interesting point. Indeed, we did not comment on the distribution of individual phenotypes due to the underlying variability from one phenotype to another, as suggested by the reviewer. Some distributions appear normal, while others are clearly not normally distributed. This information is 'visible' to users by clicking on any phenotype; DGRPool automatically displays its global distribution if the values are continuous/quantitative. We acknowledge the reviewer's concerns regarding the use of ANOVA tests. However, we consider it acceptable to perform linear regression (including ANOVA tests) on non-normally distributed data, as only the prediction errors need to follow a normal distribution.

      Furthermore, the ANOVA test is solely conducted to assess whether any of the potential covariates (such as well-established inversions and symbiont infection status) are associated with the phenotype of interest. PLINK2 automatically corrects for the effects of these covariates during GWAS by considering them as part of the regression model.

      Nevertheless, we concur with the reviewer that normalizing the data could potentially enhance GWAS results. Consequently, we commit to exploring the impact of data normalization on the overall outcomes. Additionally, we will consider replacing the ANOVA test with a KRUSKAL test, and using the PLINK –pheno-quantile-normalize option. We intend to compare both approaches using a set of phenotypes where we can compare the output (i.e., where specific variants are expected to be identified). This comparison will help us determine if either method enhances the detection power.

      Minor points:

      On the introduction, I think the authors would find the extensive set of human GWAS/PheWAS resources useful; widespread examples include the GWAS Catalog, Open Targets PheWAS, MR-base, and the FinnGen portal. The GWAS Catalog also has summary statistics submission guidelines, and I think where possible meta-data harmonisation should be similar (not a big thing). Of course, DRGP has a very different structure (line and individuals) and of course, raw data can be freely shown, so this is not a one-to-one mapping.

      Thank you for the suggestion. We will cite these resources in the Introduction and check the GWAS catalog submission guidelines to compare to the ones we are proposing in this paper.

      For some authors coming from a human genetics background, they will be interpreting correlations of phenotypes more in the genetic variant space (eg LD score regression), rather than a more straightforward correlation between DRGP lines of different individuals. I would encourage explaining this difference somewhere.

      We appreciate this potential issue and we will make this distinction clearer in the manuscript to avoid any confusion.

      This leads to an interesting point that the inbred nature of the DRGP allows for both traditional genetic approaches and leveraging the inbred replication; there is something about looking at phenotype correlations through both these lenses, but this is for another paper I suspect that this harmonised pool of data can help.

      We agree with the reviewer and hope that more meta-analyses will be made possible by leveraging the harmonized data that are made available through DGRPool.

      I was surprised the authors did not crunch the number of transcript/gene expression phenotypes and have them in. Is this because this was better done in other datasets? Or too big and annoying on normalisation? I'd explain the rationale to leave these out.

      This is a very good point raised by the reviewer, and this is in fact something that we initially wanted to do. However, to render the analysis fair and robust, it would require processing all datasets in the same way. This implies cataloging all existing datasets and processing them through the same pipeline. Then, it also requires adding a “cell type” or “tissue” layer, because gene expression data from whole flies is obviously not directly comparable to gene expression data from specific tissues or even specific conditions. This would be key information as phenotypes are often tissue-dependent. So, as implied by the reviewer, we deemed this too big of a challenge beyond the scope of the current paper. Nevertheless, we plan to continue investigating this avenue, especially given the strong transcriptomics background of our lab, in a potential follow-up paper.

      I think 25% FDR is dangerously close to "random chance of being wrong". I'd just redo this section at a higher FDR, even if it makes the results less 'exciting'. This is not the point of the paper anyway.

      We agree with the reviewer that this threshold implies a higher risk of false positive results. However, this is not an uncommonly used threshold (Li et al., PLoS biology, 2008; Bevers et al., Nature Metabolism, 2019; Hwangbo et al, Elife, 2023), and one that seems robust enough in our analysis since similar phenotypes are significant in different studies. Nevertheless, we will revisit these results and explore how a more stringent threshold may impact the results.

      I didn't buy the extreme line piece as being informative. Something has to be on the top and bottom of the ranks; the phenotypes are an opportunity for collection and probably have known (as you show) and cryptic correlations. I think you don't need this section at all for the paper and worry it gives an idea of "super normals" or "true wild types" which ... I just don't think is helpful.

      This section of the paper was intended to investigate anecdotal evidence suggesting that certain DGRP lines consistently rank at the top or bottom when examining fitness-related traits. If accurate, this observation could imply that inbreeding might have made these lines generally weaker, potentially introducing bias into studies aimed at uncovering the genetic basis of complex traits. However, as per the analyses presented, we did not discover support for this phenomenon. Nevertheless, we consider this message important to convey. In response to the reviewer's feedback, we intend to provide a clearer explanation of the reasoning behind this section of the paper and its main conclusion.

      I'd say "well-established inversion genotypes and symbiot levels" rather than generic covariates. Covariates could mean anything. You have specific "covariates" which might actually be the causal thing.

      Thank you. We will update the manuscript accordingly.

      I wouldn't use the adjective tedious about curation. It's a bit of a value judgement and probably places the role of curation in the wrong way. Time-consuming due to lack of standards and best practice?

      Thank you. We will update the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, Gardeux et al provide a web-based tool for curated association mapping results from DRP studies. The tool lets users view association results for phenotypes and compare mean phenotype ~ phenotype correlations between studies. In the manuscript, the authors provide several example utilities associated with this new resource, including pan-study summary statistics for sex, traits, and loci. They highlight cross-trait correlations by comparing studies focused on longevity with phenotypes such as oxphos and activity.

      Strengths:

      -Considerable efforts were dedicated toward curating the many DRG studies provided.

      -Available tools to query large DRP studies are sparse and so new tools present appeal

      Weaknesses:

      The creation of a tool to query these studies for a more detailed understanding of physiologic outcomes seems underdeveloped. These could be improved by enabling usages such as more comprehensive queries of meta-analyses, molecular information to investigate given genes or pathways, and links to other information such as in mouse rat or human associations.

      We appreciate the reviewer's kind comments.

      Regarding the tools, we concur with the reviewer that incorporating additional tools could enhance DGRPool and facilitate users in conducting meta-analyses. Therefore, we intend to introduce a gene-centric tool that enables users to query the database based on gene names. Additionally, we will establish links to ortholog databases within the GWAS results, thereby allowing users to extend fly gene associations to other species, if required.

      Furthermore, we have plans to link out to a 'genome browser-like' view (Flybase’s JBrowse tool) of the GWAS results centered around the affected variants/genes. We are considering integrating this feature into the new gene-centric tool as well.

      Another potential downstream analysis we are considering is gene-set enrichment. This analysis would involve assessing the enrichment of genes in Gene Ontology or other pathway databases directly from the GWAS results page.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the positive feedback of the reviewers and have modified the manuscript to address their comments, including changes to the text, figures, and methods. We believe that these revisions have strengthened and improved the manuscript. Reviewers’ comments in blue and detailed responses in black are below.

      Reviewer #1 Weaknesses:

      • Is "function" of the ISNs to balance "nutrient need" or osmolarity? Balancing hemolymph osmolarity for physiological homeostasis is conceptually different from balancing thirst and hunger.

      We have added the following text to the introduction to address this: “Thus, the ISNs sense both AKH and hemolymph osmolality, arguing that they balance internal osmolality fluctuations and nutrient need (Jourjine, Mullaney et al., 2016).” (ln 80-82).

      • The final schematic nicely sums up how the different peptidergic pathways might work together, but it is unclear which connections are empirically-validated or speculative. It would be informative to show which parts of the model are speculative versus validated. For example, does FAFB volume synapse = functional connectivity and not just anatomical proximity? A bulk of the current manuscript relies on "synapses of relatively high confidence" (according to Materials and methods: line 522). I recommend distinguishing empirically tested & predicted connections in the final schematic, and maybe reword/clarify throughout the manuscript as "predicted synaptic partners"

      We modified the schematic to clarify EM based connections versus functionally validated connections. We also clarified the EM predicted synaptic partners, using “predicted synaptic partners” throughout the manuscript.

      Reviewer #2 Areas for further development:

      • Does BIT inhibit all of the IPCs or some of them? I think it is critical to indicate the ROIs used for each neuron in the methods. Which part of the neuron is used for imaging experiments? Dendrites, cell bodies, or synaptic terminals?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

      • The discussion section is not giving big picture explanation of how these neurons work together to regulate sugar and water ingestion. Silencing and activation experiments are good, but without showing the innate activity of these neural groups during ingestion, it is not clear what their functions are in terms of regulating fly behavior.

      We agree that how these peptidergic neurons coordinately regulate feeding is unclear. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is challenging. Acute imaging during feeding would only in part address this challenge, as cumulative changes in nutrient need signals may impart circuit changes that are not apparent by monitoring the acute activity of peptidergic neurons. We modified a paragraph in the discussion to address this (ln 434-443).

      “Overall, our work sheds light on neural circuit mechanisms that translate internal nutrient abundance cues into the coordinated regulation of sugar and water ingestion. We show that the hunger and thirst signals detected by the ISNs influence a network of peptidergic neurons that act in concert to prioritize ingestion of specific nutrients based on internal needs. We hypothesize that multiple internal state signals are integrated in higher brain regions such that combinations of peptides and their actions signify specific needs to drive ingestion of appropriate nutrients. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is a future challenge to further illuminate homeostatic feeding regulation.”

      Reviewer #1 (Recommendations For The Authors):

      • For the final schematic figure, it may be informative to include nanchung and AKHR in the schematic.

      We now include this (Fig 6).

      • For the ingestion duration with optogenetic activation, I don't think the right way to represent the data is by normalizing them to the no LED control. I think it should show raw ingestion time. I understand that the normalized data make the figure "cleaner" (no need to show +/- LED separately) but I think visualization of the raw data is important.

      We now include this in a new Supplemental Figure (Fig S6).

      • Methods for ingestion with optogenetic activation should be detailed in the Methods section.

      We expanded upon this in the ‘Temporal consumption assay (TCA)’ methods section. (ln 461-466).

      Reviewer #2 (Recommendations For The Authors):

      1) I think the authors are not following the recommendations of the Flywire community which recommends that people who contributed to the tracing of neurons are offered authorship in the published papers. I see the authors are thanking other lab members who have done tracing for the neurons described in this study, but I would like them to clarify whether they are following the guidelines provided by Flywire.

      We followed the Flywire guidelines and contacted all Flywire users contributing more that 10% to neuron edits for permission to publish with acknowledgements. (see Flywire guidelines https://docs.google.com/document/d/1bUkOB5JnT3u__JDvAoVDHJ3zr5NXQtV_63yx2w6Tcc/edit).

      2) The method section for voltage imaging is missing.

      We now include a section on voltage imaging (ln 496-498).

      3) ROIs for imaging are not indicated in the methods or in the figures. It is hard to judge what is the origin of neural activity plotted in the figures; are they imaging cell bodies, dendrites, or axons?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would first like to thank the reviewers and the editor for their insightful comments and suggestions. We are particularly glad to read that our so<ware package constitutes a set of “well-written analysis routines” which have “the potential to become very valuable and foundational tools for the analysis of neurophysiological data”. We have updated the manuscript to address their remarks where appropriate.

      Additionally, we would like to stress that this kind of tools is in continual development. As such, the manuscript offered a snapshot of the package at one point during this process, which in this case was several months ago at initial submission. Since then, several improvements were implemented. The manuscript has been further updated to reflect these more recent changes.

      From the Reviewing Editor:

      The reviewers identified a number of fundamental weaknesses in the paper.

      1) For a paper demonstrating a toolbox, it seems that some example analyses showing the value of the approach (and potentially the advantage in simplification, etc over previous or other approaches) are really important to demonstrate.

      As noted by the first reviewer, the online repository (i.e. GitHub page) conveys a better sense of the toolboxes’ contribution to the field than the present manuscript. This is a fair remark but at the same time, it is unclear how to illustrate this in a journal article without dedicating a great deal of page space to presenting raw code, while online tools offer an easier and clearer way to do this. As a work-around, our strategy was to illustrate some examples of data analysis in Figures 4&5 by comparing each illustrated processing step to the corresponding command line used by the Pynapple package. Each step requires a single line of code, meaning that one only needs to write three lines of code to decode a feature from population activity using a Bayesian decoder (Fig. 4a), compute a cross-correlograms of two neurons during specific stimulus presentation (Fig. 4b) or compute the average firing rate of two neurons around a specific time of the experimental task (Fig. 4c). We believe that these visual aides make it unnecessary to add code in the main text of this manuscript. However, to aid reader understanding, we now provide clear references to online Jupyter notebooks which show how each figure was generated in figure legends as well as in the “Code Availability” section.

      https://github.com/pynapple-org/pynapple-paper-2023

      Furthermore, we have opted-in for the “Executable Research Articles” feature at eLife, which will make it possible to include live scripts and figures in the manuscript once it is accepted for publication. We do not know at this stage what it entails exactly, but we hope that Figures 4&5 will become live with this feature. The readers will have the possibility to see and edit the code directly within the online version of the manuscript.

      2) The manuscript's claims about not having dependencies seem confusing.

      We agree that this claim was somewhat unfounded. There are virtually no Python packages that do not have dependencies. Our intention was to say that the package had no dependencies outside the most common ones, which are Numpy, Scipy, and Pandas. Too many packages in the field tend to have long list of dependencies making long-term back-compatibility quite challenging. By keeping depencies minimal, we hope to maximise the package’'s long term back-compatibility. We have rephrased this statement in the manuscript in the following sections:

      Figure 1, legend.

      “These methods depend only on a few, commonly used, external packages.”

      Section Foundational data processing: “they are for the most part built-in and only depend on a few widely-used external packages. This ensures that the package can be used in a near stand-alone fashion, without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      3) Given its significant relevance, it seems important to cite the FMATool and describe connections between it (or analyses based on it) and the presented work.

      Indeed, although we had already cited other toolboxes (including a review covering the topic comprehensively), we should have included this one in the original manuscript. Unfortunately, to the best of our knowledge, this toolbox is not citable (there is no companion paper). We have added a reference to it in plain text.

      4) Some discussion of integration between Pynapple and the rest of a full experimental data pipeline should be discussed with regard to reproducibility.

      This is an interesting point, and the third paragraph of the discussion somewhat broached this issue. Pynapple was not originally designed to pre-process data. However, it can, in theory, load any type of data streams a<er the necessary pre-processing steps. Overall, modularity is a key aspect of the Pynapple framework, and this is also the case for the integration with data pre-processing pipelines, for example spike sorting in electrophysiology and detection of region of interest in calcium imaging. We do not think there should be an integrated solution to the problem but, instead, to make it possible that any piece of code can be used for data irrespective of their origin. This is why we focused on making data loading straightforward and easy to adapt to any particular situation. To expand on this point and make it clear that Pynapple is not meant to pre-process data but can, in theory, load any type of data streams a<er the necessary pre-processing steps, we have added the following sentences to the aforementioned paragraph:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data have already been pre-processed (for example, spike sorting and detection of ROIs).”

      5) Relatedly, a description of how data are stored a<er processing (i.e., how precisely are processed data stored in NWB format).

      We agree that this is a critical issue. NWB is not necessarily the best option as it is not possible to overwrite in a NWB file. This would require the creation of a new NWB file each time, which is computationally expensive and time consuming. It also further increases the odds of writing error. Theoretically, users who needs to store intermediate results in a flexible way could use any methods they prefer, writing their own data files and wrappers to reload these data into Pynapple objects. Indeed, it is not easy to properly store data in an object-specific manner. This is a long-standing issue and one we are currently working to resolve.

      To do so, we are developing I/O methods for each Pynapple core objects. We aim to provide an output format that is simple to read and backward compatible in future Pynapple releases. This feature will be available in the coming weeks. To note, while NWB may not be the central data format of Pynapple in future releases, it has become a central node in the neuroscience ecosystem of so<ware. Therefore, we aim to facilitate the interaction of users with reading and writing for this format by developing a set of simple standalone functions.

      Reviewer #1 (Public Review):

      A typical path from preprocessed data to findings in systems neuroscience o<en includes a set of analyses that o<en share common components. For example, an investigator might want to generate plots that relate one time series (e.g., a set of spike times) to another (measurements of a behavioral parameter such as pupil diameter or running speed). In most cases, each individual scientist writes their own code to carry out these analyses, and thus the same basic analysis is coded repeatedly. This is problematic for several reasons, including the waste of time, the potential for errors, and the greater difficulty inherent in sharing highly customized code.

      This paper presents Pynapple, a python package that aims to address those problems.

      Strengths:

      The authors have identified a key need in the community - well-written analysis routines that carry out a core set of functions and can import data from multiple formats. In addition, they recognized that there are some common elements of many analyses, particularly those involving timeseries, and their object- oriented architecture takes advantage of those commonalities to simplify the overall analysis process.

      The package is separated into a core set of applications and another with more advanced applications, with the goal of both providing a streamlined base for analyses and allowing for implementations/inclusion of more experimental approaches.

      Weaknesses:

      There are two main weaknesses of the paper in its present form.

      First, the claims relating to the value of the library in everyday use are not demonstrated clearly. There are no comparisons of, for example, the number of lines of code required to carry out a specific analysis with and without Pynapple or Pynacollada. Similarly, the paper does not give the reader a good sense of how analyses are carried out and how the object-oriented architecture provides a simplified user interaction experience. This contrasts with their GitHub page and associated notebooks which do a better job of showing the package in action.

      As noted in the response to the Reviewing Editor and response to the reviewer’s recommendation to the authors below, we have now included links to Jupyter notebooks that highlight how panels of Figures 4 and 5 were generated (https://github.com/pynapple-org/pynapple-paper-2023). However, we believe that including more code in the manuscript than what is currently shown (I.e. abbreviated call to methods on top of panels in Figs 4&5) would decrease the readability of the manuscript.

      Second, the paper makes several claims about the values of object-oriented programming and the overall design strategy that are not entirely accurate. For example, object-oriented programming does not inherently reduce coding errors, although it can be part of good so<ware engineering. Similarly, there is a claim that the design strategy "ensures stability" when it would be much more accurate to say that these strategies make it easier to maintain the stability of the code. And the authors state that the package has no dependencies, which is not true in the codebase. These and other claims are made without a clear definition of the properties that good scientific analysis so<ware should have (e.g., stability, extensibility, testing infrastructure, etc.).

      Following thFMAe reviewer’s comment, we have rephrased and clarified these claims. We provide detailed response to these remarks in the recommendations to authors below.

      There is also a minor issue - these packages address an important need for high-level analysis tools but do not provide associated tools for preprocessing (e.g., spike sorting) or for creating reproducible pipelines for these analyses. This is entirely reasonable, in that no one package can be expected to do everything, but a bit deeper account of the process that takes raw data and produces scientific results would be helpful. In addition, some discussion of how this package could be combined with other tools (e.g., DataJoint, Code Ocean) would help provide context for where Pynapple and Pynacollada could fit into a robust and reliable data analysis ecosystem.

      We agree the better explaining how Pynapple is integrated within data preprocessing pipelines is essential. We have clarified this aspect in the manuscript and provide more details below.

      Reviewer #1 (Recommendations For The Authors):

      Page 1

      • Title

      The authors should note that the application name- "Pynapple" could be confused with something from Apple. Users may search for "Pyapple" as many python applications contain "py" like "Numpy". "Pyapple" indeed is a Python Apple that works with Apple products. They could consider "NeuroFrame", "NeuroSeries" or "NeuroPandas" to help users realize this is not an apple product.

      We thank the referee for this interesting comment. However, we are not willing to make such change at this point. The community of users has been growing in the last year and it seems too late to change the name. To note, it is the first time such comment is made to us and it does not seem that users and collaborators are confused with any Apple products.

      • Abstract

      The authors mentioned that the Pynapple is "fully open source". It may be better to simply say it is "open source".

      We agree, corrected.

      Assuming the authors keep the name, it would be helpful if the full meaning of Pynapple - Python Neural Analysis Package was presented as early as possible.

      Corrected in the abstract.

      • Highlight

      An application being lightweight and standalone does not imply nor ensure backward compatibility. In general, it would be useful if the authors identified a set of desirable code characteristics, defined them clearly in the introduction, and then describe their so<ware in terms of those characteristics.

      Thank you for your comment. We agree that being lightweight and standalone does not necessarily imply backward compatibility. Our intention was to emphasize that Pynapple is designed to be as simple and flexible as possible, with a focus on providing a consistent interface for users across different versions. However, we understand that this may not be enough to ensure long-term stability, which is why we are committed to regular updates and maintenance to ensure that the code remains functional as the underlying code base (Python versions, etc.) changes.

      Regarding your suggestion to identify a set of desirable code characteristics, we believe this is an excellent idea. In the introduction, we briefly touch upon some of the core principles that guided our development of Pynapple: a lightweight, stable, and simple package. However, we acknowledge that providing a more detailed discussion of these characteristics and how they relate to the design of our so<ware would be useful for readers. We have added this paragraph in the discussion:

      “Pynapple was developed to be lightweight, stable, and simple. As simplicity does not necessarily imply backward compatibility (i.e. long-term stability of the code), Pynapple main objects and their properties will remain the same for the foreseeable future, even if the code in the backend may eventually change (e.g. not relying on Pandas in future version). The small number of external dependencies also decrease the need to adapt the code to new versions of external packages. This approach favors long-term backward compatibility.”

      Page 2

      • The authors wrote -

      "Despite this rapid progress, data analysis o<en relies on custom-made, lab-specific code, which is susceptible to error and can be difficult to compare across research groups."

      It would be helpful to add that custom-made, lab-specific code can lead to a violation of FAIR principles (https://en.wikipedia.org/wiki/FAIR_datadata). More generally, any package can have errors, so it would be helpful to explain any testing regiments or other approach the authors have taken to ensure that their code is error-free.

      We understand the importance of the FAIR principles for data sharing. However, Pynapple was not designed to handle data through their pre-processing. The only aspect that is somehow covered by the FAIR principles is the interoperability, but again, it is a requirement for the data to interoperate with different storage and analysis pipelines, not of the analysis framework itself. Unlike custom-made code, Pynapple will make interoperability easier, as, in theory, once the required data loaders are available, any analysis could be run on any dataset. We have added the following sentence to the discussion:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data has already been pre-processed (for example, spike sorting and ROI detection). According to the FAIR principles, pre-processed data should interoperate across different analysis pipelines. Pynapple makes this interoperability possible as, once the data are loaded in the Pynapple framework, the same code can be used to analyze different datasets”

      • The authors wrote -

      "While several toolboxes are available to perform neuronal data analysis ti–11,2ti (see ref. 29 for review), most of these programs focus on producing high-level analysis from specified types of data and do not offer the versatility required for rapidly-changing analytical methods and experimental methods."

      Here it would be helpful if the authors could give a more specific example or explain why this is problematic enough to be a concern. Users may not see a problem with high-level analysis or using specific data types.

      Again, we apologize for not fully elaborating upon our goals here. Our intention was to point out that toolboxes o<en focus on one particular case of high-level analysis. In many cases, such packages lack low level analysis features or the flexibility to derive new analysis pipelines quickly and effortlessly. Users can decide to use low-level packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background. The simplicity of Pynapple, and the set of examples and notebooks, make it possible for individuals who start coding to be quickly able to analyze their data.

      As we do not want to be too specific at this point of the manuscript (second paragraph of the intro) and as we have clarified many of the aspects of the toolbox in the new revised version, we have only added the following sentence to the paragraph:

      “Users can decide to use low-level data manipulation packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background.”

      • The authors wrote -

      "To meet these needs, a general toolbox for data analysis must be designed with a few principles in mind"

      Toolboxes based on many different principles can solve problems. It is likely more accurate to say that the authors designed their toolbox with a particular set of principles in mind. A clear description of those principles (as mentioned in the comment above) would help the reader understand why the specific choices made are beneficial.

      We agree that these are not “universal” principles and clearly more the principles we had in mind when we designed the package. We have clarified these principles and made clear that these are personal point of views.

      We have rephrased the following paragraph:

      “To meet these needs, we designed Pynapple, a general toolbox for data analysis in systems Neuroscience with a few principles in mind.“

      • The authors wrote -

      "The first property of such a toolbox is that it should be object-oriented, organizing so<ware around data."

      What facts make this true? For example, React is a web development library. A common approach to using this library is to use Hooks (essentially a collection of functions). This is becoming more popular than the previous approach of using Components (a collection of classes). This is an example of how Object-oriented programming is not always the best solution. In some cases, for example, object- oriented coding can cause problems (e.g. it can be hard to find the place where a given function is defined and to figure out which version is being used given complex inheritance structures.)

      In general, key selling points of object-oriented programming are extension, inheritance, and encapsulation. If the authors want to retain this text (which would be entirely reasonable), it would be helpful if they explained clearly how an object-oriented approach enables these functions and why they are critical for this application in particular.

      The referee makes a particularly important point. We are aware of the limits of OOP, especially when these objects become over-complex, and that the inheritance become unclear.

      We have clarified our goal here. We believe that in our case, OOP is powerful and, overall, is less error- prone that a collection of functions. The reasons are the following:

      An object-oriented approach facilitates better interactions between objects. By encapsulating data and behavior within objects, object-oriented programming promotes clear and well-defined interfaces between objects. This results in more structured and manageable code, as objects communicate with each other through these well-defined interfaces. Such improved interactions lead to increased code reliability.

      Inheritance, a key concept in object-oriented programming, allows for the inheritance of properties. One important example of how inheritance is crucial in the Pynapple framework is the time support of Pynapple objects. It determines the valid epoch on which the object is defined. This property needs to be carried over during different manipulations of the object. Without OOP, this property could easily be forgotten, resulting in erroneous conclusions for many types of analysis. The simplest case is the average rate of a TS object: the rate must be computed on the time support ( a property of TS objects), not the beginning to the end of the recording (or of a specific epoch, independent of the TS). Finally, it is easier to access and manipulate the meta information of a Pynapple object than without using objects.

      • The authors wrote -

      "drastically diminishing the odds of a coding error"

      This seems a bit strong here. Perhaps "reducing the odds" would be more accurate.

      We agree. Now changed.

      Page 3

      • The authors wrote -

      ". Another property of an efficient toolbox is that as much data as possible should be captured by only a small number of objects This ensures that the same code can be used for various datasets and eliminates the need of adapting the structure"

      It may be better to write something like - "Objects have a collection of preset variables/values that are well suited for general use and are very flexible." Capturing "as much data as possible" may be confusing, because it's not the amount that this helps with but rather the variety.

      We thank the referee for this remark. We have rephrased this sentence as follows:

      “Another property of an efficient toolbox is that a small number of objects could virtually represents all possible data streams in neuroscience, instead of objects made for specific physiological processes (e.g. spike trains).”

      • The authors wrote -

      "The properties listed above ensure the long-term stability of a toolbox, a crucial aspect for maintaining the code repository. Toolboxes built around these principles will be maximally flexible and will have the most general application"

      There are two issues with this statement. First, ensuring long-term stability is only possible with a long- term commitment of time and resources to ensure that that code remains functional as the underlying code base (python versions, etc.) changes. If that is something you are commisng to, it would be great to make that clear. If not, these statements need to be less firm.

      Second, it is not clear how these properties were arrived at in the first place. There are things like the FAIR Principles which could provide an organizing framework, ideally when combined with good so<ware engineering practices, and if some more systematic discussion of these properties and their justification could be added, it would help the field think about this issue more clearly.

      The referee makes a valid point that ensuring long-term stability requires a long-term commitment of time and resources to maintain the code as the underlying technology evolves. While we cannot make guarantees about the future of Pynapple, we believe that one of the best ways to ensure long-term stability is by fostering a strong community of users and contributors who can provide ongoing support and development. By promoting open-source collaboration and encouraging community involvement, we hope to create a sustainable ecosystem around Pynapple that can adapt to changes in technology and scientific practices over time. Ultimately, the longevity of any scientific tool depends on its adoption and use by the research community, and we hope that Pynapple can provide value to neuroscience researchers and continue to evolve and improve as the field progresses.

      It is noteworthy that the first author, and main developer of the package, has now been hired as a data scientist at the Center for Computational Neuroscience, Flatiron Institute, to explicitly continue the development of the tool and build a community of users and contributors.

      • The authors wrote -

      "each with a limited number of methods..."

      This may give the impression that the functionality is limited, so rephrasing may be helpful.

      Indeed! We have now rephrased this sentence:

      “The core of Pynapple is five versatile timeseries objects, whose methods make it possible to intuitively manipulate and analyze the data.”

      • The authors wrote that object-oriented coding

      "limits the chances of coding error"

      This is not always the case, but if it is the case here, it would be helpful if the authors explain exactly how it helps to use object-oriented approaches for this package.

      We agree with the referee that it is not always the case. As we explained above, we believe it is less error-prone that a collection of functions. Quite o<en, it also makes it easier to debug. We have changed this sentence with the following one:

      “Because objects are designed to be self-contained and interact with each other through well-defined methods, users are less likely to make errors when using them. This is because objects can enforce their own internal consistency, reducing the chances of data inconsistencies or unexpected behavior. Overall, OOP is a powerful tool for managing complexity and reducing errors in scientific programming.”

      • Fig 1

      In object-oriented programming, a class is a blueprint for the classes that inherit it. Instantiating that<br /> class creates an object. An object contains any or all of these - data, methods, and events. The figure could be improved if it maintained these organizational principles as figure properties.

      We agree with the referee’s remark regarding the logic of objects instantiation but how this could be incorporated in Fig. 1 without making it too complex is unclear. Here, objects are instantiated from the first to the second column. We have not provided details about the parent objects, as we believe these details are not important for reader comprehension. In its present form, the objects are inherited from Pandas objects, but it is possible that a future version is based on something else. For the users, this will be transparent as the toolbox is designed in such a way that only the methods that are specific to Pynapple are needed to do most computation, while only expert programmers may be interested in using Pandas functionalities.

      • The authors wrote that Pynapple does -

      "not depend on any external package"

      As mentioned above, this is not true. It depends on Numpy and likely other packages, and this should be explained. It is perfectly reasonable to say that it depends on only a few other packages.

      As said above, we have now clarified this claim.

      Page 5.

      • The authors wrote -

      "represent arrays of Ts and Tsd"

      For a knowledgeable reader's reference, it would be helpful to refer to these either as Numpy arrays (at least at first when they are defined) or as lists if they are native python objects.

      Indeed, using the word “arrays” here could be confusing because of Numpy arrays. We have changed this term with “groups”.

      • The authors wrote -

      "Pynapple is built with objects from the Pandas library ... Pynapple objects inherit the computational stability and flexibility"

      Here a definition of stability would be useful. Is it the case that by stability you mean "does not change o<en"? Or is some other meaning of stability implied?

      Yes, this is exactly what we meant when referring to the stability of Pandas. We have added the following precision:

      “As such, Pynapple objects inherit the long-term consistency of the code and the computational flexibility computational stability and flexibility from this widely used package.”

      Page 6

      • Fig 2

      In Fig 2 A and B, the illustrations are good. It would also be very helpful to use toy code examples to illustrate how Pynapple will be used to carry out on a sample analysis-problem so that potential users can see what would need to be done.

      We appreciate the kind works. Regarding the toy code, this is what we tried to do in Fig. 4. Instead of including the code directly in the paper, which does not seem a modern way of doing this, we now refer to the online notebooks that reproduce all panels of Figure 4.

      • The authors wrote -

      "While these objects and methods are relatively few"

      In object-oriented programming, objects contain methods. If a method is not in an object, it is not technically a method but a function. It would be helpful if the authors made sure their terminology is accurate, perhaps by saying something like "While there are relatively few objects, and while each object has relatively few methods ... "

      We agree with the referee, we have changed the sentence accordingly.

      • The authors wrote -

      "if not implemented correctly, they can be both computationally intensive and highly susceptible to user error"

      Here the authors are using "correctly" to refer to two things - "accuracy" - gesng the right answer, and "efficiency" - gesng to that answer with relatively less computation. It would be clearer if they split out those two concepts in the phrasing.

      Indeed, we used the term to cover both aspects of the problem, leading to the two possible issues cited in the second part of the sentence. We have changed the sentence following the referee’s advice:

      “While there are relatively few objects, and while each object has relatively few methods, they are the foundation of almost any analysis in systems neuroscience. However, if not implemented efficiently, they can be computationally intensive and if not implemented accurately, they are highly susceptible to user error.”

      • In the next sentence the authors wrote -

      "Pynapple addresses this concern."

      This statement would benefit from just additional text explaining how the concern is addressed.

      We thank the referee for the suggestion. We have changed the sentence to this one: “The implementation of core features in Pynapple addresses the concerns of efficiency and accuracy”

      Page 9

      • The authors wrote -

      This is implemented via a set of specialized object subclasses of the BaseLoader class. To avoid code redundancy, these I/O classes inherit the properties of the BaseLoader class. "

      From a programming perspective, the point of a base class is to avoid redundancy, so it might be better to just mention that this avoids the need to redefine I/O operations in each class.

      We have rephrased the sentence as follows:

      “This is implemented via a set of specialized object subclasses of the BaseLoader class, avoiding the need to redefine I/O operations in each subclass"

      • The authors wrote -

      "classes are unique and independent from each other, ensuring stability"

      How do classes being unique and independent ensure stability? Perhaps here again the misunderstanding is due to the lack of a definition of stability.

      We thank the referee for the remark. We first changed “stability” for “long-term backward compatibility”. We further added the following sentence to clarify this claim. “For instance, if the spike sorting tool Phy changes its output in the future, this would not affect the “Neurosuite” IO class as they are independent of each other. This allows each tool to be updated or modified independently, without requiring changes to the other tool or the overall data format.”

      • The authors wrote -

      "Using preexisting code to load data in a specific manner instead of rewriting already existing functions avoids preprocessing errors"

      Here it might be helpful to use the lingo of Object-oriented programming. (e.g. inheritance and polymorphism). Defining these terms for a neuroscience audience would be useful as well.

      We do not think it is necessary to use too much technical term in this manuscript. However, this sentence was indeed confusing. We have now simplified it:

      “[…], users can develop their own custom I/O using available template classes. Pynapple already includes several of such templates and we expect this collection to grow in the future.”

      Page 10

      • The authors wrote -

      "These analyses are powerful because they are able to describe the relationships between time series objects while requiring the fewest number of parameters to be set by the user."

      It is not clear that this makes for a powerful analysis as opposed to an easy-to-use analysis.

      We have changed “powerful” with “easy to use".

      Page 12

      "they are built-in and thus do not have any external dependencies"

      If the authors want to retain this, it would be helpful to explain (perhaps in the introduction) why having fewer external dependencies is useful. And is it true that these functions use only base python classes?

      We have rephrased this sentence as follows:

      “they are for the most part built-in and only depend on a few common external packages, ensuring that they can be used stand-alone without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      Other comments:

      • It would be helpful, as mentioned in the public review, to frame this work in the broader context of what is needed to go from data to scientific results so that people understand what this package does and does not provide.

      We have added the following sentence to the discussion to make sure readers understand:

      “The path from data collection to reliable results involves a number of critical steps: exploratory data analysis, development of an analysis pipeline that can involve custom-made developed processing steps, and ideally the use of that pipeline and others to replicate the results. Pynapple provides a platform for these steps.”

      • It would also be helpful to describe the Pynapple so<ware ecosystem as something that readers could contribute to. Note here that GNU may not be a good license. Technically, GNU requires any changes users make to Pynapple for their internal needs to be offered back to the Pynapple team. Some labs may find that burdensome or unacceptable. A workaround would be to have GNU and MIT licenses.

      The main restriction of the GPL license is that if the code is changed by others and released, a similar license should be used, so that it cannot become proprietary. We therefore stick to this choice of license.

      We would be more than happy to receive contributions from the community. To note, several users outside the lab have already contributed. We have added the following sentence in the introduction:

      “As all users are also invited to contribute to the Pynapple ecosystem, this framework also provides a foundation upon which novel analyses can be shared and collectively built by the neuroscience community.”

      • This so<ware shares some similarities with the nelpy package, and some mention of that package would be appropriate.

      While we acknowledge the reviewer's observation that Nelpy is a similar package to Pynapple, there are several important differences between the two.

      First, Nelpy includes predefined objects such as SpikeTrain, BinnedSpikeTrain, and AnalogSignal, whereas Pynapple would use only Ts and Tsd for those. This design choice was made to provide greater flexibility and allow users to define their own data structures as needed.

      Second, Nelpy is primarily focused on electrophysiology data, whereas Pynapple is designed to handle a wider range of data types, including calcium imaging and behavioral data. This reflects our belief that the NWB format should be able to accommodate diverse experimental paradigms and modalities.

      Finally, while Nelpy offers visualization and high-level analysis tools tailored to electrophysiology, Pynapple takes a more general-purpose approach. We believe that users should be free to choose their own visualization and analysis tools based on their specific needs and preferences.

      The package has now been cited.

      Reviewer #2 (Public Review):

      Pynapple and Pynacollada have the potential to become very valuable and foundational tools for the analysis of neurophysiological data. NWB still has a steep learning curve and Pynapple offers a user- friendly toolset that can also serve as a wrapper for NWB.

      The scope of the manuscript is not clear to me, and the authors could help clarify if Pynacollada and other toolsets in the making become a future aspect of this paper (and Pynapple), or are the authors planning on building these as separate publications.

      The author writes that Pynapple can be used without the I/O layer, but the author should clarify how or if Pynapple may work outside NWB.

      Absolutely. Pynapple can be used for generic data analysis, with no requirement of specific inputs nor NWB data. For example, the lab is currently using it for a computational project in which the data are loaded from simple files (and not from full I/O functions as provided in the toolbox) for further analysis and figure generation.

      This was already noted in the manuscript, last paragraph of the section “Importing data from common and custom pipelines”

      “Third, users can still use Pynapple without using the I/O layer of Pynapple.”.

      We have added the following sentence in the discussion

      “To note, Pynapple can be used without the I/O layer and independent of NWB for generic, on-the-fly analysis of data.”

      This brings us to an important fundamental question. What are the advantages of the current approach, where data is imported into the Ts objects, compared to doing the data import into NWB files directly, and then making Pynapple secondary objects loaded from the NWB file? Does NWB natively have the ability to store the 5 object types or are they initialized on every load call?

      NWB and Pynapple are complimentary but not interdependent. NWB is meant to ensure long-term storage of data and as such contains a as much information as possible to describe the experiment. Pynapple does not use NWB to directly store the objects, however it can read from NWB to organize the data in Pynapple objects. Since the original version of this manuscript was submitted, new methods address this. Specifically, in the current beta version, each object now has a “save” method. Obviously, we are developing functions to load these objects as well. This does not depend on NWB but on npz, a Numpy specific file format. However, we believe it is a bit too premature to include these recent developments in the manuscript and prefer not to discuss this for now.

      Many of these functions and objects have a long history in MATLAB - which documents their usefulness, and I believe it would be fisng to put further stress on this aspect - what aspects already existed in MATLAB and what is completely novel. A widely used MATLAB toolset, the FMA toolbox (the Freely moving animal toolbox) has not been cited, which I believe is a mistake.

      We agree that the FMA toolbox should have been cited. This ha now been corrected.

      Pynapple was first developed in Matlab (it was then called TSToolbox). The first advantage is of course that Python is more accessible than Matlab. It has also been adopted by a large community of developers in data analysis and signal processing, which has become without a doubt much larger than the Matlab community, making it possible to find solutions online for virtually any problem one can have. Furthermore, in our experience, trainees are now unwilling to get training in Matlab.

      Yet, Python has drawbacks, which we are fully aware of. Matlab can be very computationally efficient, and old code can usually run without any change, even many years later.

      A limitation in using NWB files is its standardization with limited built-in options for derived data and additional metadata. How are derived data stored in the NWB files?

      NWB has predetermined a certain number of data containers, which are most common in systems neuroscience. It is theoretically possible to store any kind of data and associated metadata in NWB but this is difficult for a non-expert user. In addition, NWB does not allow data replacement, making is necessary to rewrite a whole new NWB file each time derived data are changed and stored. Therefore, we are currently addressing this issue as described above. Derived data and metadata will soon be easy to store and read.

      How is Pynapple handling an existing NWB dataset, where spikes, behavioral traces, and other data types have already been imported?

      This is an interesting point. In theory, Pynapple should be able to open a NWB file automatically, without providing much information. In fact, it is challenging to open a NWB file without knowing what to look for exactly and how the data were preprocessed. This would require adapting a I/O function for a specific NWB file. Unfortunately, we do not believe there is a universal solution to this problem. There are solutions being developed by others, for example NWB Widgets (NWB Widgets). We will keep an eye on this and see whether this could be adapted to create a universal NWB loader for Pynapple.

      Reviewer #2 (Recommendations For The Authors):

      Other tools and solutions are being developed by the NWB community. How will you make sure that these tools can take advantage of Pynapple and vice versa?

      We recognize the importance of collaboration within the NWB community and are committed to making sure that our tools can integrate seamlessly with other tools and solutions developed by the community.

      Regarding Pynapple specifically, we are designing it to be modular and flexible, with clear APIs and documentation, so that other tools can easily interface with it. One important thing is that we want to make sure Pynapple is not too dependent of another package or file format such as NWB. Ideally, Pynapple should be designed so that it is independent of the underlying data storage pipeline.

      Most of the tools that have been developed in the NWB community so far were designed for data visualisation and data conversion, something that Pynapple does not currently address. Multiple packages for behavioral analysis and exploration of electro/optophysiological datasets are compatible with the NWB format but do not provide additional solutions per se. They are complementary to Pynapple.

    1. Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalecent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript by Lan et al. addresses the still incompletely resolved question as to how branching morphogenesis of the embryonic mammary epithelium is regulated at the molecular and cellular level. Using (combinatorial) primary explant cultures of wildtype and genetically engineered mouse embryos, in which the authors have developed a unique expertise over many years, together with imaging and RNAseq analyses, they (i) show that the timing of epithelial branching is dictated by the biological age of the epithelium, but that an epithelial-mesenchymal interaction is required to bestow branching ability on the mammary epithelium somewhere between E13.5 and E16.5, (ii) seek to determine if and how lineage and cell proliferation affect branching, (iii) show that while salivary mesenchyme can promote growth (i.e. branching density) of the E16.5 mammary epithelium, the mode of branching (i.e. lateral branching vs tip-clefting) is an intrinsic property of the mammary epithelium, (iv) use transcriptomics to identify genes that are likely to control either mammary- or salivary gland specific growth and/or branching patterns, (v) hypothesize that low levels of WNT signaling in the mammary gland mesenchyme (due to relatively high expression of WNT signaling inhibitors) are responsible for mammary specific branching, (vi) show that hyperactivation of WNT/CTNNB1 signaling in the mesenchyme indeed induces hyperbranching, (vii) identify Eda and Igf1 as putative mediators and paracrine signaling factors that regulate branching of the mammary epithelium upon secretion from the mesenchyme downstream of WNT/CTNNB1 signaling and (viii) show that mammary gland branching is impaired in Igfr1 null embryos.

      Major comments: 1. Overall, this is a solid study that is well controlled and technically of high quality. The materials and methods should allow follow up and replication by others and the transcriptomic data have been made available via NCBI GEO. I think the authors convincingly demonstrate points (i), (iii), (iv) and (vi) and (viii). I have some questions regarding (ii), (v) and (vii) and (viii) that I will pose below.

      Our response:

      We thank the reviewer for the careful assessment and recognition of our work. In the subsequent sections, we have tried to address all the concerns raised by the reviewer.

      Re: (ii): The authors try to study the link between basal cell fate and branching. They use position of the cells (which they describe clearly and which is a good choice), since they cannot use specific markers due to the fact that the basal and luminal linages have not yet segregated at this point. This part of the manuscript is not the most straightforward to follow. The most obvious experiment would have been to focus on the location of the cells and their associated cell cycle profile - but the authors themselves have just recently published a pre-print (their REF #54, now also out in JCB) that is an in-depth study of the link between cell proliferation + cell motility and branching, but this only becomes apparent in the discussion. In that sense, Fig2 of the current manuscript is less novel, although it is nice to see that it holds up in a slightly different analysis.

      Our response:

      We thank the reviewer for acknowledging our recently published work, which is focusing on the active branching phase during late embryogenesis/around birth. In the current proliferation analysis, however, our focus was on a different aspect of embryonic mammary gland development: understanding the mechanism underlying the ability to acquire competence to branch, i.e. how the epithelium changes between late bud and sprout stages. Our data obtained from tissue recombination and 3D culture experiments suggest that heterotypic mesenchymes or mesenchyme-free 3D organoid culture conditions do not provide sufficient signals to support branching of mammary epithelia before E16.5. We have rephrased the text to better emphasize this point.

      Instead of focusing on the cell cycle markers, the authors turn to a K14-Eda mouse model - which shows precocious branching and a temporary reduction in K8 expression. They also analyze Eda-KO embryos. Quite frankly, I find the authors' reasoning difficult to follow here and I cannot deduce how these experiments really address the question at hand (i.e. how lineage and cell proliferation affect branching), so I hope they can rewrite this section of the paper to make the arguments more clear and easy to follow for the reader who, at this point, knows little about Eda. For example, the authors present the argument that K14-Eda mice show a transient reduction in K8 expression - but we don't know if that also really means a (temporary?) change in (future?) luminal cell fate. In fact, since Eda later also makes an appearance as a candidate factor to be secreted by the mesenchyme together with Igf1, I wonder if their K14-Eda data would not be better suited to underscore that point instead and if the authors should perhaps eliminate this section altogether and just refer to their prior work in REF #45. If the authors think the current data add something more, than they need to be more explicit about this (and then also introduce the link to REF #45 in the results section).

      __Our response: __

      We agree with all the reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own and should be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      Re: (v): Do the authors have any WNT/CTNNB1 target genes that they can include in their transcriptomics analysis to show that the WNT/CTNNB1 signaling levels are indeed lower in the mammary mesenchyme? Axin2 comes to mind, but there are some other negative feedback targets that are often induced across tissues, e.g. Rnf43 and/or Znrf3 and/or Sp5?E.g. to include in FIg6E?

      __Our response: __

      In the original manuscript (lines 339-342), we have performed the GSVA analysis comparing the KEGG database, and the significantly altered pathways comparing different mammary mesenchymes with salivary gland mesenchyme have been pooled and displayed as heatmap in Supplementary Fig 4b. The WNT signaling pathway is lower in the mammary mesenchyme, especially at E16.5.

      As suggested by the reviewer, we have analyzed Axin2, the most commonly used readout of WNT/CTNNB1 signaling activity in our RNA-seq data that we include as a __new Supplemental Fig. 4c __in the preliminary revised manuscript. Axin2 data indicate that Wnt/β-catenin signaling activity is lower in the E16.5 fat pad, where branching takes place, compared to younger stages of mammary gland and the salivary gland.

      Plan for the final revision:

      Additionally, we will provide expression data of a transgenic Wnt reporter from the same developmental stages and tissues that were used to generate the RNA-seq data.

      Re: (vii) and (viii): The authors convincingly show the phenotype of the Igfr1 KO mice, but I hope the authors concur that an epithelial only Igfr1 KO (or alternatively a mesenchymal only Igf1 KO, or epithelial/mesenchymal recombination experiments with WT vs IGFR1 null or IGF1 null tissue, or experiments with small molecule inhibitors of IGF1/IGFR1 signaling) would have given more solid mechanistic evidence regarding the presumed paracrine effect of IGF1 signaling. I am not asking the authors to perform another mouse experiment or even generate or use these conditional strains, but if the authors agree, then I do think this would merit some attention in the discussion section. See also my comments regarding Eda in point 1.

      Our response:

      As shown in the current manuscript, Igf1 is expressed in the mammary and salivary gland mesenchyme. This finding is in line with E14 in situ expression data available in Genepaint (https://gp3.mpg.de/results/Igf1) showing that overall in embryonic tissues, Igf1 is mainly produced in mesenchymal tissues. Of note, in Genepaint, a clear signal can be detected in the salivary gland mesenchyme, not the epithelium. Published E16 and E18 datasets indicate low level of Igf1 expression in the mammary epithelium (https://wahl-lab-salk.shinyapps.io/Mammary_snATAC/). Hence, we conclude that Igf1 is mainly produced by mesenchymal cells. Instead, Igf1r appears to be rather ubiquitously expressed.

      A previous study assessed BrdU incorporation in Igf1r-/- mammary buds at E14.5, and reported a specific proliferation defect in the epithelium, while no difference was detected in the mesenchyme (Fig. 9, Heckman et al., 2007; PMID:17662267). However, we cannot exclude the possibility of autocrine, mesenchymal Igf1/Igf1r signaling, which in turn could lead to upregulation of a paracrine factor to regulate epithelial growth.

      We agree with the reviewer in that novel conditional mouse models are beyond the scope of the current study. However, we do not think that small molecule drugs could be used to block Igf1r activity in a tissue-specific manner neither.

      Plan for the final revision:

      To further delineate the paracrine and/or autocrine role of Igf1/Igf1r pathway during mammary epithelial growth and branching, we will perform tissue recombination experiments between Igf1r-/- and control mammary epithelium and mesenchyme, as suggested by the reviewer.

      Minor comments: - A few minor spelling/grammar errors, including a couple of "the"s missing (first line of the abstract, and also preceding "Majority" in line 148.

      Our response:

      We apologize for these slips. They have been corrected in the revised manuscript.

      • Line 517-518: please also include the details for the Eda mice.

      Our response:

      We apologize for missing this important information in materials and methods. We have included a short introduction of the K14-Eda mice, a new reference for the original publication producing them, as well as the Jackson Laboratories strain number for Eda-/- (a.k.a. Tabby) mice in the revised manuscript.

      • 1f spelling error: separation

      Our response:

      The spelling error has been corrected in the revised manuscript.

      **Referees cross-commenting**

      Having read all three review reports I think they are pretty much in agreement, with shared questions about the inclusion/meaning/discussion of the lineage specification data and also agreement about the overall technical solidity of the data and this approach.

      I gather that reviewer #2 asks for more controls than myself or reviewer #3 and while I think all of their points are valid, in principle, I don't think all of these are required. I should add that I am inclined to trust the authors on their ability to separate mesenchyme and epithelium as they have been developing and optimising this system over many years.

      Our response:

      We are grateful to the reviewer for the reliance on the technical aspect of our experiments. We do routinely monitor tissue purity in the recombinants (for more details, see our response to reviewer #2). To demonstrate this, we have included new data in new Supplementary Fig. 1a,b and new Supplementary Fig. 3. We believe these additions will further enhance the validity of our findings and effectively address the concerns raised by reviewer 2.

      Reviewer #1 (Significance (Required)):

      General assessment: This is a carefully executed study in which an impressive amount of (combinatorial) embryonic mammary tissue explant experiments are combined with quantitative imaging and transcriptomics analysis.

      The main limitations of the work lie in the fact that the investigation of a potential link between branching and the cell cycle is not entirely novel, as the authors themselves recently published an nice pre-print (now also out in JCB) describing similar analyses. In addition, the mechanistic link between WNT/CTNNB1 signaling in the mesenchyme and the paracrine signaling activities of the presumed downstream effectors EDA and IGF, while plausible, is not yet complete. The work also does not yet addresses what exactly the branching identity is that is bestowed upon the mammary epithelium between E13.5 and E16.5 and how this then becomes an intrinsic (epigenetic?) feature of the mammary gland.

      Advance: This work provides more insight into the embryonic branching of the mammary gland - a stage of mammary gland development that is still poorly understood and that is, in general, understudied. In part, the work confirms prior work in the literature (their REF #19) regarding mammary and salivary gland tissue recombination experiments. It supplements this with a more elaborate time series of heterochronic and heterologous epithelium/mesenchyme explant cultures, using genetically engineered (and fluorescently labeled) mouse tissues to allow better and quantitative imaging. The transcriptomic analysis of different mesenchyme populations is also informative and allows the researchers to propose a putative mechanism for why the mammary gland branches differently from the salivary gland. The advance is both technical and functional, as well as conceptual, with some advance in terms of mechanism.

      Audience: This works should appeal to mammary gland biologists interested in the molecular and cellular mechanisms of (early) mammary gland development, as well as to a broader community of developmental biologists studying branching morphogenesis in tissues such as lung, kidney and salivary gland.

      My expertise: WNT signaling and mammary gland biology, at the intersection of developmental, stem cell and cancer biology

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      The mammary gland is a branched structure that consists of a bilayered epithelium embedded in a specialized mesenchyme. In mice, at 11,5 days of embryogenesis, the ectoderm thickens forming 5 pairs of peculiar structures called placodes. During the following days, the placodes will grow and invaginate into the surrounding mammary mesenchyme and they will finally start to branch by the end of embryogenesis (E16). It has been suggested that the bidirectional communication between the growing mammary gland and the surrounding mesenchyme plays a pivotal role in the determination of each step of mammary gland development (placode formation, mammary bud invagination, gland outgrowth, branching). The role of different signalling molecules has already been shown, particularly for the placode growth and mammary bud invagination. Nevertheless, the pathways regulating embryonic mammary gland branching are still incompletely understood. In this manuscript, Lan and colleagues aim to decipher the correlation between different stages of mammary gland development such as proliferation, lineage segregation and ductal branching. Furthermore, they want to define which stage of mammary development is intrinsically determined by the epithelium and which one requires the supportive guidance of the mesenchyme. Lastly, they aim to discover the key signal for the growth and branching of mammary epithelium. To these purposes, they used an ex vivo model of heterochronic epithelial-mesenchymal recombination. In particular, they micro-dissected the epithelium and/or the mesenchyme from murine mammary glands at different stages of embryonic development (i.e. at E13,5 for the quiescent phase or 16,5 for branching phase) and explanted them together in different combinations using fluorescent reporters. To assess the role of the mesenchyme they also cultured the epithelium in a mesenchyme free 3d structure. Through this model they demonstrated that the presence of the mesenchyme is necessary for the priming of mammary epithelium for branching, since only E16,5 epithelial cells were able to grow and branch in a mesenchyme free 3D experiment. Nevertheless, intrinsic properties of the epithelium are necessary for the timing of branching, since E16,5 mesenchyme was not able to accelerate the outgrowth of E13,5 epithelia. In order to determine which epithelial properties are important, the authors correlated the beginning of cell proliferation in the embryonic mammary gland to the beginning of the branching phase. They indeed used the Fucci2a mouse model to carefully characterise the timing of mammary cells proliferation at different stages of embryonic development, concluding that the great majority of proliferating cells reside in the inner part of the mammary bud until E14,5, while in the external part at later stages. Regarding the importance of cell proliferation, Lan and colleagues claim that the beginning of the branching phase is not its direct consequence, thanks to the use of the K14Cre- Eda mouse model, known to have anticipated mammary gland development. Using this and the Eda-/- models, the authors also sustain that the branching occurs independently of the lineage specification of the epithelium. The use of salivary mesenchyme instead the mammary one was able to increase the number of branching of E16,5 mammary epithelium. Nevertheless, this model demonstrated that the branching pattern (side branching vs tip bifurcation) is an intrinsic feature of the epithelium. Lan and colleagues also defined the transcriptomic profiles of the mammary and salivary mesenchymes at different stages. In particular, they observed an increased expression of negative regulators of Wnt pathway in the mammary mesenchyme compared to the salivary mesenchyme. Moreover, using a mouse model where B-catenin is stabilised, they observed increased tip production in the mammary gland epithelium. They also showed that IGF1 production is increased after Wnt pathway activation and they tested its function, both treating their ex vivo cultures with exogenous IGF1 and using Igf1r-/- mouse models.

      Major comments 1- The great majority of the results of the manuscript are based on an ex vivo model of heterochronic epithelial-mesenchymal recombination. Since the authors are studying the effect of the mesenchyme of different stages on the epithelium (and vice versa), the purity of the two compartments after the dissection is particularly important. Although they said that the purity is evaluated (line 112), it would be important to show a control staining in which they use known markers of the mesenchyme with no colocalization with the fluorescent reporter of the epithelium.

      Our response:

      We agree with the reviewer that the purity of the separated tissues is very important for our conclusions. This is why we have used genetically labeled tissues in all recombination experiments: the epithelium and the mesenchyme were always isolated from embryos ubiquitously expressing GFP or tdTomato. We find this the most reliable way to assess the origin and purity of the isolated tissues. If there was any carry-over mesenchyme isolated with the GFP+ epithelium, this would be revealed as GFP+ mesenchymal cells in the recombinants consisting of otherwise tdTomato+ mesenchyme. And vice versa: any carry-over tdTomato+ epithelium isolated with the mesenchyme would be revealed as tdTomato+ epithelial cells in the recombinants. We apologize for not making this clear enough in the original manuscript. In the revised manuscript, we now provide confocal high-resolution images of the recombinant explants (new Supplementary Fig. 1a,b). The explants have been co-stained with the epithelial marker EpCAM, revealing a robust colocalization between the ubiquitously expressed florescent labels in the designated epithelial tissues and the EpCAM.

      2- Another important point for understanding the quality and impact of these findings is to assess the similarities and differences, if there are, between the in vivo mesenchyme and the ex vivo one. Indeed, once explanted and put in culture, mesenchymal cells could change their transcriptomic profile and consequently change their signals to the epithelium. The authors should assess the expression of the genes and pathways studied during embryonic development in vivo.

      Our response:

      The reviewer is correct in that the transcriptomes will likely undergo some changes when organs are cultured ex vivo. This is why RNA-seq was done on freshly isolated tissues. Regarding the potential changes taking place ex vivo, however, we do not consider them relevant with respect to the questions we are addressing in this study. The reason is (as reported in the manuscript) that all control recombinations (homochronic recombinations such as E13 epithelium + E13 mesenchyme, E16 epithelium + E16 mesenchyme etc.) branched essentially as in vivo. Therefore, we find the results and conclusions made from the tissue recombination experiments solid.

      3- The authors clearly showed that E16,5 epithelium is able to branch in a mesenchyme free 3D culture model, while epithelia from earlier stages don't. This led to the conclusion that mesenchyme is necessary for acquiring the branching ability. Nevertheless, the authors also said that early stages epithelia scarcely grow in the mesenchyme free 3D culture. Therefore, the lack of branching may be due to the lack of growth, if not the increase of death, of epithelial cells. The authors should quantify the size and the cell death of the epithelia in the different culture conditions and discuss better this point.

      Our response:

      The reviewer is correct in that one of the key functions of the mammary mesenchyme up to E16.5 may be to provide survival signals for the epithelium, and this might explain why epithelia younger than E16.5 fail to grow/branch when recombined with salivary gland mesenchyme and in mesenchyme-free organoid culture.

      Plan for the final revision:

      To address this issue, we will assess apoptosis in mammary epithelia cultured in the mesenchyme-free 3D culture organoid set-up.

      4- The Fucci2a model allowed to assess the proliferation of embryonic mammary epithelium, showing that the great majority of proliferating cells are basal, at late stages of development (line 182). As it has already been shown, lineage specification is a late process during mammary gland development. The fact that the proliferating cells reside at the external part of the bud does not mean that they are basal cells yet. A p63/K8 staining could be important to understand if the increased proliferation occured in already specified basal cells or not.

      __Our response: __

      Indeed, mammary lineage specification is a later process. As pointed out in the manuscript and by reviewer #1, the widely used basal and luminal lineage markers have not yet segregated to separate compartments at the developmental stages analyzed in our study, and therefore cannot be used as tools for this purpose. We would like to emphasize that in the manuscript, we analyzed the cells based on their position, and have used the term basal to indicate the basal position, not the prospective lineage. Accordingly, we used the term inner instead of luminal cells to indicate their location, not lineage. We have further clarified this point in the preliminary revised manuscript.

      5- The use of Fucci2a model showed that 20% of epithelial cells are proliferative at E13,5. This phase is considered as "quiescent" by the authors (line 120), but the moderate proliferation rate shown in this experiment demonstrated that it is not. A change of the nomenclature is needed.

      __Our response: __

      We have removed the word “quiescent” from the text.

      6- Through the use of K14-Eda and Eda-/- models, the authors claimed that the lineage specification is not a prerequisite for ductal branching. To support this point, they showed that the K14-Eda mice have an anticipated branching although the expression of K8 in the inner part of the bud is transitorily decreased. The authors link the K8 downregulation to a transient suppression of the luminal lineage, but this is clearly overclaimed. Although K8 is a known marker of luminal lineage, the downregulation of one marker is not sufficient to support their thesis. They should first check more markers and in particular critical regulators of luminal lineage as Notch1, Foxa1 and Elf5. Lately, the use of different models that drive embryonic epithelial cells to a forced lineage commitment (Notch1 or Δnp63 overexpression) would support more their claim. As additional evidence, the authors showed that Eda is able to promote basal cell signature. Firstly, the authors should better explain why this point would support their thesis. Secondly, the supplementary figure 2b does not show which genes are taken into account to define the basal signature. A list of these genes would be helpful, as well as staining for some representative proteins.

      Our response:

      We thank the reviewer for these constructive suggestions. We agree with all reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own to be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      7- The authors used the same mouse models to assess the importance of proliferation in the determination of ductal branching and they claimed that proliferation is not a sufficient feature. This conclusion was supported by two observations. The first one is the fact that the K14-Eda model shows an increased cell proliferation at early stages compared to wt, coupled with anticipated branching. Secondly, although having smaller glands compared to wt and showing a delay in ductal branching, Eda-/- mice have an epithelial proliferation rate very similar to wt. Again, the conclusion that proliferation is not sufficient for branching is overclaimed. Firstly, the authors should explain how the buds in wt and Eda-/- mice have different sizes although the similar proliferation (increased cell death?, cellular volume?). Secondly, to support the thesis that proliferation is not sufficient for branching, functional experiments should be performed (see point 12). For instance, the short-time treatments with inhibitors or promotors of proliferation may help to understand the effective role of proliferation in the determination of branching.

      Our response:

      We show that there is no direct link between onset of proliferation and acquisition of branching ability. However, we are not claiming that proliferation is not important for branching, as obviously new cells are needed as building blocks of growing tissues. In a recently published paper, we have assessed the role of proliferation in branch point formation in embryonic mammary glands. Using mitomycin C to block proliferation, we showed that initiation of new branches occurs even when proliferation is blocked (Myllymäki et al., JCB2023, PMID: 37367826).

      The reviewer was also asking why Eda-/- mammary primordia are smaller at E15.5-E16.5 despite similar proliferation rates. In the revised manuscript, we have quantified the volume of E13.5 Eda-/- and control mammary buds and show that Eda-/- buds are ~25% smaller (3.5 ± 0.8 x 105 µm3 in Eda-/- vs. 4.6 ± 0.7 x 105 µm3 in control, mean ± SD) already at the bud stage (new Supplementary Fig. 2c,d).

      We have also quantified the cellular size in Eda-/- and control mammary glands at E13.5 and E15.5 and found that mammary epithelial cells in Eda-/- embryos are ~15% smaller (new Supplementary Fig. 2e,f). Together, these data indicate that the smaller size of E15.5-E16.5 Eda-/- mammary glands is a combinatorial effect the smaller mammary anlage at E13.5 and smaller cell size. These findings, while interesting on their own, do not challenge our conclusions regarding the link between onset of proliferation and acquisition of branching ability.

      8- The heterotypic epithelial-mesenchymal recombination using the salivary gland is interesting. Nevertheless, some stainings to assess the purity of their systems are again required (e.g., marker of salivary epithelium to verify the purity of the mesenchyme and vice versa).

      __Our response: __

      As mentioned above, all tissue recombination experiments were performed so that the epithelium and the mesenchyme originated from genetically labelled embryos expressing different fluorescent proteins. In the revised manuscript, we provide confocal images of the salivary-mammary tissue recombinants (new Supplementary Fig. 3), confirming the purity of the tissue compartments used in these experiments.

      This model clearly showed that the mammary epithelium can form more branching when combined with the salivary mesenchyme. Moreover, the salivary epithelium preferentially branches through tip bifurcation, while mammary epithelium combined with the salivary mesenchyme has a mixed pattern of tip bifurcation and side branching (typical of the mammary gland). The authors thus concluded that the branching pattern is an intrinsic feature of the epithelium. However, a comparison between the percentage of tip bifurcation and side branching in the heterotypic combination and the homotypic combination between mammary epithelium and mammary mesenchyme is crucial to understand this point. Indeed, these results are not sufficient to exclude that the branching pattern is partially determined by intrinsic features and partially by extrinsic signals. The authors should carefully quantify the branching pattern in the homotypic combination and compare that to the heterotypic one. If the percentage of tip bifurcation do not change, their conclusion is correct; if this percentage increases in the heterotypic combination, it would be a sign of a partial effect of the signals of the mesenchyme.

      Our response:

      We thank the reviewer for raising this question. We have independently generated data on the type of mammary gland branching events in two papers with somewhat different culture and imaging conditions (Lindström et al., BiorXiv 2022 and Myllymäki et al., JCB, 2023, PMID: 37367826). Both analyses showed that in embryonic mammary glands, the majority of branching events (~70%) occurs by side-branching. These data are in line with the current study that we have now complemented to include also the mammary-mammary recombination experiments (revised Supplementary Video 1, revised Fig. 4b). Quantification of branching events revealed no significant difference in the type of branching events of mammary epithelia grown with salivary or mammary gland mesenchyme (revised Fig. 4c), further supporting our initial conclusions.

      9- Through the analysis of their transcriptomic data, Lan and colleagues found that the mammary mesenchyme expresses higher levels of negative regulators of Wnt pathway compared to the salivary mesenchyme. To demonstrate the value of their findings, they should confirm this in vivo, through staining of known Wnt proteins on the salivary and mammary mesenchymes at the embryonic stage.

      Our response:

      In mammals, there are 19 Wnt ligands, over a dozen secreted Wnt inhibitors, 10 Frizzled receptors, two Lrp co-receptors, and numerous other pathway modifiers that contribute to the net Wnt signaling activity in a complex manner. Furthermore, it has been “notoriously difficult to generate useful antibodies to vertebrate Wnt proteins...In general, these sera do not detect endogenous Wnt proteins in cell extracts, nor do they detect Wnt proteins in tissues by staining techniques. Hence, there are few data on Wnt protein distribution in intact vertebrate animals.” This is a direct citation from the Wnt Homepage, maintained by the Nusse Lab; https://web.stanford.edu/group/nusselab/cgi-bin/wnt/reagents#antibod.

      For all these reasons, we do not find this approach feasible nor informative.

      Instead, in the revised manuscript, we report the expression levels of Axin2, the most commonly used transcriptional readout of canonical Wnt activity in our RNA-seq data (new Supplementary Fig. 4c). Axin2 levels are lowest in the E16 fat pad where mammary branching takes place, much lower than in any other tissues analyzed in the study.

      Plan for the final revision:

      To complement these findings, we will additionally provide expression data of a transgenic Wnt reporter from the same developmental stages and tissues that were used to generate the RNA-seq data.

      10- Since the ability of the salivary mesenchyme to promote a higher rate of branching in the mammary epithelium, the authors wanted to assess what could be the role of Wnt signalling. To do so, they used a mouse model where B-catenin is stabilised, allowing an increased Wnt signalling in the mammary mesenchyme. As a result, they observed increased branching in the mammary epithelium. They also found that IGF1 is a ligand regulated by Wnt pathway in the mesenchyme. Therefore, the use of exogenous IGF1 in their ex vivo model was able to increase the branching of the mammary epithelium. Moreover, Igf1r-/- embryos showed a significant decrease of mammary gland branching. The conclusion based on these experiments was that the Wnt-Igf1-Igf1r axis plays a pivotal role in the promotion of mammary gland branching during embryogenesis. This conclusion is overclaimed for different reasons. Firstly, the normalization of the ductal branching to the body weight is insufficient to exclude that the impact of the Igf1r knockout may have severe consequences on the mammary gland formation, upstream of the ductal branching. Another parameter for this normalization is required (e.g., size of the bud before branching, proliferation status, etc).

      Our response:

      We agree with the reviewer in that Igf1r knockout may affect mammary gland formation in multiple ways, and also prior to onset of branching, as already indicated in the original manuscript: “…apart from one study reporting the smaller size of the E14 mammary bud in IGF-1R deficient embryos …” (line 398-399 in the revised version) and ‘…mammary gland 3 that was consistently absent.’ (line 414-415 in the revised version).

      To assess whether the reduced size and branching of E16.5/E18.5 Igf1r-/- mammary glands is merely a consequence of the smaller anlage, the revised manuscript includes new data reporting quantification of the volume of mammary gland 2 of Igf1r-/- and wild type littermate embryos at E13.5, E16.5, and E18.5 from 3D confocal images of whole mount EpCAM stained mammary glands. As can be seen from the new Fig. 7g-h, at E13.5, the mutant mammary buds are about 60% of the size of the controls, at E16.5, 25% and at E18.5 only 20 % revealing a progressive defect, indicative of a specific defect at the outgrowth and branching stage. This conclusion was validated by normalization to the body weight: at E13.5 the size of Igf1r-/- mammary anlage did not differ from that of the wild type embryos (p = 0.11), at E16.5 the sprouts were smaller in the mutants, though the difference did not reach statistical significance (p = 0.08), while at E18.5, the Igf1r-/- mammary glands were significantly smaller (p = 0.000021) (new Fig. 7i). We find these data compelling evidence for a specific role for Igf1r in outgrowth and branching of the embryonic mammary gland.

      The use of alternative models to specifically knockout the receptor in the epithelium or the ligand in the mesenchyme (e.g. viruses) would be even more useful to specifically focus on the role of this pathway for ductal branching excluding side effects.

      Our response:

      We thank the reviewer for this suggestion. Unfortunately, based on our experience, viral shRNA delivery is not sufficiently efficient for effective gene silencing, unlike Cre delivery for a gain-of-function approach (used in the current study to flox out exon 3 of beta-catenin) in case where the endogenous pathway activity is very low and therefore, targeting even a subset of cells is sufficient for upregulation of paracrine factors.

      Plan for the final revision:

      To address the question on the autocrine or paracrine role of Igf1r, we will perform tissue recombination experiments between Igf1r-/- and control mammary epithelium and mesenchyme.

      Another limit of this model is the fact that Igfr1 can be bound by Igf2 as well and we cannot exclude that this has an impact too (except if Igf2 is not expressed at this stage). A quantification of Igf2 expression may be useful.

      Our response:

      Indeed, we cannot exclude the possibility that Igf2 could also play a role (Igf2 expression was similar to Igf1 in our RNA-seq dataset, see Supplementary Fig. 5), but the connection of mesenchymal Wnt signaling activity was to Igf1, not Igf2 – in fact Igf2 was somewhat downregulated in Wnt3A treated sample reported by Wang et al. (Wang et al., 2021) (highlighted by an arrow in the revised Fig. 6). We have also clarified this point in the Discussion of the preliminary revised manuscript.

      11- From the experiments presented in this section it is clear that Wnt-Igf1-Igf1r axis has to be finely regulated to have the correct amount of ductal branching in the embryonic mammary epithelium. Nevertheless, the author just showed the RNA levels of Igf1 in the different compartments they have analysed. Stainings to see the effective presence of the ligand on the tissue is mandatory to clarify the role of this axis in the ductal branching in vivo.

      Our response:

      Igf1-Igf1r signaling plays a critical growth promoting function during embryonic and postnatal development. The expression of Igf1 at RNA and protein level has been detected in almost all tissues in humans (Daughaday et al., Endocr. Rev., 1989; PMID: 2666112). Given that Igf1 is a secreted protein and multiple Igf binding proteins (Igfpbs) (that regulate the bioactivity of Igf1 by sequestering it) are expressed in the mammary and salivary gland mesenchyme (Supplementary Fig. 5), we find it unlikely that Igf1 staining would provide any additional information to the current study, as they cannot be used to assess the source of Igf1, nor the location of the signaling activity.

      Furthermore, as underlined by the authors, this axis is specifically important and upregulated in the salivary gland. Due the limit of the Igf1R-/- model, we cannot exclude that, although Wnt-Igf1-Igf1r axis is able to increase the branching ability of mammary epithelium, the normal branching rate observed in wt mice is due to other pathways.

      Our response:

      We agree with the reviewer in that other pathways are also important in regulating normal mammary gland branching, for example, Eda/NF-κB and FGF pathways as we described in the Introduction. Our results do not exclude the possibility that also pathways other than Wnt regulate Igf1 expression. The reviewer is correct that if a paracrine factor is expressed in the salivary gland but not in the mammary mesenchyme, its physiological effect may be limited to the salivary gland. Indeed, cluster 5 identified by the mFuzzy analysis (Fig. 5f) is likely to include some genes like that. This is why we decided to focus on cluster 6 genes like Igf1. In the revised manuscript, we have better highlighted the difference between cluster 5 and 6 genes.

      Unfortunately, with the currently available tools, we cannot test the importance of the endogenous mesenchymal Wnt signaling activity by inactivating Wnt signaling activity specifically in the mesenchyme at the time point when branching begins. This would require an inducible mesenchymal Cre line (mesenchymal β-catenin is essential for the early fate specification of the primary mammary mesenchyme; Hiremath et al., 2012, PMID: 23034629), and conditional β-catenin null mouse. We do not have such mice available and we find that these experiments are beyond the scope of the current study.

      12- Lastly, once claimed to have found the key factor necessary for ductal branching promotion, the authors should also test if the proliferation and lineage segregations are unaffected in this context, confirming their dispensable role claimed in the initial part of the manuscript.

      __Our response: __

      Igf1/Igf1r is well-known for its growth promoting function via cell proliferation. We have no reasons to think that this would not be the case also in the mammary gland, and it was not our intention to give the impression that proliferation was not affected. In fact, Hiremath et al. (2012) already reported a defect in epithelial cell proliferation in Igf1rmammary buds at E14. Our key finding is that compared to other organs, the mammary gland is particularly sensitive to loss of Igf1r during branching morphogenesis. Finally, as pointed out earlier, better tools will be needed to assess the potential link between lineage segregation and onset of branching, a topic that we hope to address in the future.

      Minor comments: 1- An important paper on mammary gland ductal branching was published on Nature in 2017 by Scheele and colleagues and should be presented in the introduction, even though it is at later stages (after birth).

      Our response:

      We thank the reviewer for the suggestion. In the revised manuscript, we have added the findings from Scheele et al. 2017 in the introduction.

      2- In line 136 and 139 the authors referred to Fig 2 but it should be Fig 1

      Our response:

      We apologize for these slips. They have been corrected in the revised manuscript.

      3- The sentence on line 142 should be rephrased, since "advanced developmental stages" may be referred to pubertal development. The authors should specify that they are talking about embryonic development.

      Our response: We apologize for the potential misunderstanding. In the revised manuscript, we have used the phrase “advanced embryonic developmental stage” to describe our conclusion more precisely.

      Reviewer #2 (Significance (Required)):

      Overall, the authors concluded that embryonic mammary gland development and branching are extremely sensitive to the loss of IGF1, normally produced by the mesenchyme. The topic of the paper is interesting, the experimental approaches are well conceived, the data are convincing and the findings are of interest to developmental biologists. Nevertheless, there are some significant points that need to be further investigated before considering the manuscript suitable for publication:

      Our response:

      We thank the reviewer for the careful assessment and positive feedback of our manuscript. We have already addressed most of the points raised and most remaining ones will be addressed in the final revised manuscript.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Here the authors use classical embryonic tissue recombination and pharmacological manipulation of explants in conjunction with cutting edge 3D imaging of tissue derived from highly sophisticated reporter and knock-out mouse models and state of the art transcriptomic analysis to masterfully delineate and dissect regulatory pathways critical for embryonic mammary development. Specifically, they set out to parse regulation of proliferation from that of branch patterning.

      While it has long been established that epithelial-mesenchymal interaction is necessary for mammary branching this work shows by heterochronic recombination that initiation mammary branching is not advanced by mesenchymal stage. By examining Fucci2a embryos the authors demonstrate that branching is preceded by a significant increase in basal cell-biased proliferation but, through further analysis of Eda gain and loss of function mice, conclude that proliferation per se does not cause branching. They show by heterotypic recombination with salivary tissue that early mammary epithelia rudiments require their own mesenchyme for survival and that although later E16.5 rudiments expand more robustly when in contact with salivary mesenchyme they nevertheless retain their characteristic mammary branch pattern. Thus, they establish that initiation and patterning are intrinsic properties of the epithelium but that early survival and later expansion/proliferation is regulated by the mesenchymal context. By transcriptomic comparison of mammary and salivary mesenchyme they reveal that genes encoding canonical Wnt attenuators and antagonists are highly expressed in early mammary mesenchyme and drop as branching ensues. The low expression of these negative regulators of Wnt signaling in salivary mesenchyme is proposed as an explanation for its growth and branch stimulating capability. In keeping with these observations, the authors show that experimental activation of mammary mesenchymal Wnt signaling augments both growth and branching. Lastly, they identify transcriptomic changes in IGF1 coincident with the initiation of mammary branching and confirm its role by extending analyses of the effects of gain and loss of function of IGF1 on embryonic mammary development.

      This is a thorough, well-constructed paper that adds new knowledge and important conceptual nuance and mechanistic insight to classical findings on branch patterning. This work is a technical tour de force and backed by solid quantitative and statistical analysis throughout. Their experimental approach is superb and the conclusions are sound. Their findings will be of great interest to the community of mammary gland biologists and to the wider field of embryologists focused on early development of a broad range of ectodermal appendages.

      I have some minor criticisms that I believe can be quickly remedied in a minor rewrite and suggestions for the authors consideration to improve the manuscript discussion as follows:

      Minor issues Abstract, line 37: The authors misuse the word "decompose" - it should be "deconstruct"

      __Our response: __ We thank the reviewer for pointing out our mistake, which we have corrected in the revised manuscript.

      Results, p7 line 48: Add "The" to the sentence: "The majority...."

      __Our response: __ Corrected it in the revised manuscript.

      P8 line 173 This sentence refers to Figure 2G which is a quantitative plot. I would suggest replacing the word "cluster" which implies a spatial organization with the word "subset" or "significant fraction" The spatial data in Fig 2d support basal bias but do NOT to my eye show any clustering - in fact the proliferative basal cells appear to be evenly dispersed within the basal layer.

      Our response:

      We thank the reviewer for highlighting this aspect. We agree that “significant fraction” is a more suitable term than “cluster”.

      P9 line 188: The statement on basal cell lineage specification needs a reference.

      __Our response: __

      Following the suggestions from reviewers #1 and 3, we have removed the content about lineage segregation in Results, together with this sentence.

      P10 line 201-216 I found the section on lineage specification (fig S2) weaker than the rest and a distraction from the main thrust of the paper making it difficult for the reader to focus. I suggest omitting this section and supplemental figures associated with it altogether.

      __Our response: __

      We agree with all reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own that should be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      P9 line 190: "displays precocious onset of branching" it is sufficient to say: displays precocious branching - the use of both "precocious" and "onset' is redundant.

      P10 line 229 Similarly, delete "the onset of branching was delayed" it is sufficient to say: branching was delayed.

      __Our response: __ Both sentences have been corrected it in the revised manuscript.

      P11 line 243: Delete "on the regulation of the" and substitute the word "to" in the sentence: "Next, we shifted our focus on the regulation of the branching pattern, which is thought to be determined by mesenchymal cues."

      __Our response: __ Corrected it in the revised manuscript.

      P11 line 241 subtitle and Figure 4 title: The disparity in titles here is jarring for the reader: Results text subtitle: "Salivary gland mesenchyme is rich in growth-promoting cues, but does not alter the mode of branch point formation of the mammary epithelium". Figure 4 Title: "Mammary mesenchyme is indispensable for the branching ability of the mammary gland". I suggest to the authors divide the figure as well as the text to make the two points indicated by their disparate titles separately.

      __Our response: __ We thank reviewer for the suggestion to clarify the Results part of the manuscript. As suggested, we have split the data under two separate subtitles, but due to limitations in figure numbers, we prefer to report these data in one figure panel.

      P12 line 279 From here on out the manuscript has a tendency to use the term "growth" ambiguously - in many instances it is unclear do they mean expansion, proliferation, increased branch number/ morphology?? Please try to clarify.

      __Our response: __

      Our aim is to use the term growth to mean tissue growth (expansion). We hope that this is clearer in the revised manuscript.

      P16 line 341 use word "prompted" instead of word "promoted"

      __Our response: __ We thank reviewer for spotting out the slip, which we have corrected in the revised manuscript.

      P16 line 382: include word "embryonic" before "mammary development"

      __Our response: __ We have modified the text in the revised manuscript.

      Discussion P18 line 416: Add the words "later stage (E16.5)" to the sentence: "Importantly, we demonstrate that salivary gland mesenchyme could only promote the growth of later stage (E16.5) mammary epithelium"

      __Our response: __ We thank reviewer for the suggestion. We have modified the text in the revised manuscript.

      P19 line 437: Given the authors statement "Instead, cell motility is critical for branch point formation in the mammary gland" they should consider a brief sentence mentioning their transcriptomic findings on cadherin 11 and Tenascin.

      __Our response: __ We thank the reviewer for appreciation of our transcriptomic data. In the revised manuscript, we have added the following text in discussion: “Accordingly, we observed significantly increased expression of cell migration promoting genes such as Cdh11 (encoding Cadherin 11), and Tnc (encoding Tenascin C) 60,61 in the E16.5 mesenchyme compared to E13.5 (Supplementary Table 2).”

      P19 line 451: Similarly, given their statement "This observation suggests that mammary epithelium itself carries the instructions dictating the mode of branching" they could consider their transcriptomic data on Ltbp1 in "mammary specific" clusters 7,8,9 as a matrix molecule initially expressed by mammary mesenchyme but which becomes expressed by luminal epithelial cells at precisely the time they acquire lineage specification and intrinsic branching capability.

      __Our response: __ This is an excellent suggestion. We have added following text in discussion: “It is worth noting that certain mesenchymal factors, such as Ltbp1, began transitioning towards epithelium-specific expression around E16.5 69. Exploring the potential impact of these factors on the self-instructed branching capacity of the mammary epithelium could yield valuable insights.”

      P20 lines 462-470 The authors should address their theory of Wnt suppression in the mammary mesenchyme in the context, albeit conflictingly, of earlier studies showing expression of Wnt signaling reporters, in either epithelial or mesenchymal locations during early stages.

      Our response: __ We thank reviewer for the suggestion. In the preliminary revised manuscript, we report Axin2 expression data as __new Supplementary Fig. 4c. Axin2 expression data suggest that Wnt/β-catenin activity is lowest in the E16.5 fat pad (where branching takes place) compared to all other tissues analyzed in the study.

      Plan for the final revision:

      For the final revised manuscript, we will additionally generate transgenic Wnt reporter expression data (see also our response to point 3 of Reviewer #1). These results will be discussed in light of the published Wnt reported literature in the final revised manuscript.

      Reviewer #3 (Significance (Required)):

      Here the authors use classical embryonic tissue recombination and pharmacological manipulation of explants in conjunction with cutting edge 3D imaging of tissue derived from highly sophisticated reporter and knock-out mouse models and state of the art transcriptomic analysis to masterfully delineate and dissect regulatory pathways critical for embryonic mammary development. Specifically, they set out to parse regulation of proliferation from that of branch patterning.

      This is a thorough, well-constructed paper that adds new knowledge and important conceptual nuance and mechanistic insight to classical findings on branch patterning. This work is a technical tour de force and backed by solid quantitative and statistical analysis throughout. Their experimental approach is superb and the conclusions are sound. Their findings will be of great interest to the community of mammary gland biologists and to the wider field of embryologists focused on early development of a broad range of ectodermal appendages.

      Our response:

      We much appreciate the positive evaluation of our manuscript. We have addressed all the feedback provided by the reviewer 3 in the preliminary revised manuscript, except the last point, which will be included in the final revision along with the new data on the Wnt reporter expression.

      Field of expertise: Embryonic and adult mammary development, Wnt signaling, cell adhesion

    1. Author Response

      eLife assessment

      This useful paper examines changes (or lack thereof) in birds' fear response to humans as a result of COVID-19 lockdowns. The evidence supporting the primary conclusion is currently inadequate, because the model used does not properly account for many potentially confounding factors that could influence the study's outcomes. If the analytic approach were improved, the findings would be of interest to urban ecologists, behavioral biologists and ecologists, and researchers interested in understanding the effects of COVID-19 lockdowns on animals.

      Many thanks for these supportive words. We did our best to improve our manuscript according to the reviewers and editor comments. Importantly, we regret being unclear in the Methods, as our models already controlled for most of the confounds (see below) discussed by the reviewers.

      For example, given that a single observer collected the data at most sites, site as a random intercept in the models controls also for the observer effects (which is one of the reasons why site is in the model). We added details to Methods (L352-356, see also “Statistical analyses” in the main text).

      The first reviewer asked us to use “some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here”. Our main results are now based on country-specific models and hence, the use of a single value predictor for each city is not appropriate. Please, see also below.

      The second reviewer is concerned about multicollinearity in our models because of the 0.95 correlation between Period and Stringency Index. However, these are key predictor variables of interest that have never been used within the same model as predictors. We now clearly explain this in the Methods (L458-538, 548-550) and within legend of Figure S2.

      The third reviewer suggested that our models would benefit from controlling for day in the species-specific breeding cycle. Although we don’t have precise city-specific information on the timing of breeding stages in the sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained most of the variance (see Table S1-S2) – something that could have been expected. In other words, we do control for what the third reviewer asked for. Similarly, we account for habitat features that may influence escape distance by including site in the models. Site usually refers to a specific park (we assume that within-park heterogeneity is lower than between park variation) and hence partly addresses the reviewer’s concern. Again, we highlight this within the Methods (L466-476).

      Reviewer #1 (Public Review):

      This paper uses a series of flight initiation "challenges" conducted both prior to and during COVID-19-related restrictions on human movement to estimate the degree to which avian escape responses to humans changed during the "anthropause". This technique is suitable for understanding avian behavioral responses with a high degree of repeatability. The study collects an impressive dataset over multiple years across five cities on two continents. Overall the study finds no effect of lockdown on avian escape distance (the distance at which the "target" individual flees the approaching observer). The study considers the variable of interest as both binary (during lockdown or prior to lockdown) and continuous, using the Oxford Stringency Index (with neither apparently affecting escape distance). Overall this paper presents interesting results which may suggest that behavioral responses to humans are rather inflexible over "short" (~2 year) timespans. The anthropause represents a unique opportunity to disentangle the mechanistic drivers of myriad hypothesized impacts humans have on the behavior, distribution, and abundance of animals. Indeed, this finding would provide important context to the larger body of literature aimed at these ends.

      Thank you very much for your positive feedback.

      However, the paper could do more to carefully fit this finding into the broader literature and, in so doing, be a bit more careful about the conclusions they are able to draw given the study design and the measures used. Taking some of these points (in no particular order):

      Thank you. We did our best in addressing your comments (see below and updated Methods, Results and Discussion sections).

      1) Oxford Stringency Index is a useful measure of governmental responses to the pandemic and it's true that in some scenarios (including the (Geng et al. 2021) study cited by this paper) it can correlate with human mobility. However, it is far from a direct measure of human mobility (even in the Geng study, to my reading, the index only explained a minority of the variation). Moreover, particular sub-components of the index are wholly unrelated to human mobility (e.g. would changes to a country's public information campaign lead to concomitant changes in urban human mobility?). Finally, compliance with government restrictions can vary geographically and over time (i.e. we might expect lower compliance in 2021 than in 2020) and the index is calculated at the scale of entire countries and may not be very reflective of local conditions. Overall this paper could do more to address the potential shortcomings of the Oxford Stringency Index as a measure of human mobility including attempting to validate the effect on human mobility using other datasets (e.g. the google dataset and/or those discussed in (Noi et al. 2022). This is of critical importance since the fundamental logic of the experimental design relies on the assumption that stringency ~ mobility.

      Thank you for this comment. First, Oxford Stringency Index seemed to us as the best available index for our purposes, i.e to estimate people's mobility during the shutdown because restrictions surely influenced the possibility that people would be outside, and because the index is a country-specific estimate. However, in addition, we now checked all indices mentioned in Noi et al. 2022 and found useful only the Google Mobility Reports, which we now use, because (a) it is publicly available, (b) it is available also for territories outside US, and (c) provides data for each city included in our dataset as well as for urban parks where most of our data were collected. Note that some platforms are no longer providing their mobility data (e.g. Apple).

      However, Google Mobility provides day-to-day variation in human mobility, whereas we are interested in overall increase/decrease in human mobility. Nevertheless, we correlated the Google mobility index with the Stringency index and found that human mobility generally decreases with the strength of the anti-pandemic measures adopted in sampled countries (albeit the effect for some countries, e.g. Poland, is small; Fig. 5).

      Moreover, we also added analysis using # of humans collected directly in the field during escape trials (e.g. Fig. 6 and S6) and found that the link between # of humans and Stringency index or Google Mobility was weak and noise, 95%CIs widely crossing zero (Fig. 6).

      Importantly, if we use Google Mobility and # of humans, respectively, as predictors of escape distance, the results are qualitatively very similar to results based on Oxford Stringency Index (Fig. S6), or Period, with tiny effect sizes for both (95%CIs for Google Mobility -0.3 – 0.06, Table S5, for # of humans -0.12 – 0.02, Table S6) supporting our previous conclusions.

      Note that Google Mobility and the number of humans have their limitations (see our comment to the editor and the Methods section in the main manuscript, e.g. L418-433). The lack of Google Mobility data for years before the COVID-19 pandemic does not allow us to fully explore whether overall human activity decreased during COVID-19 or not (our test for period prior and during COVID-19). If the year 2022 reflects a return to “normal” (which is to be disputed due to COVID-19-driven rise in home office use) the 2020 and 2021 had on average lower levels of human activity (Fig. 4). Whether such a difference is biologically meaningful to birds is unclear given the immense day-to-day change in human mobility and presence (Fig 4). Moreover, the number of humans capture within- and between-day variation rather than long-term changes in human presence.

      We added details on the new analysis into the method and results sections (e.g. Fig. 4-6; L142-165, 418-438, 495-535) and Supplementary Information (Figs. S5-S9 and associated Tables) and discuss the problematic accordingly. Moreover, to enhance clarity about country specific effect (or their lack), we also add country specific estimates to the Results (Fig. 1 and Fig. S6 and respective Tables). Finally, our statistical design and random structure of the model allowed us to control for spatial and temporal variation in compliance with government restrictions.

      2) The interpretation of the primary finding (that behavioral responses to humans are inflexible) could use a bit more contextualization within the literature. Specifically, the study offers three potential explanations for the observed invariance in escape response: 1) these behaviors are consistent within individuals and this study provides evidence that there was no population turnover as a result of lockdowns; 2) escape response is linked to other urban adaptations such that to be an urban-dwelling species dictates escape response; and/or 3) these populations already exhibit maximum habituation and the reduction in human mobility would only have increased that habituation but that trait is already at a boundary condition. Some comments on each of these respectively:

      Thank for these comments. We incorporated them in the main text (L293-329). Your point 1) corresponds to our point (i): “Most urban bird species in our sample may be relatively inflexible in their escape responses because the species may be already adapted to human presence” (L293-306); your point 2) to our point (ii): “Urban environment might filter for bold individuals (Carrete and Tella, 2013, 2010; Sprau and Dingemanse, 2017). Thus, the lack of consistent change in escape behaviour of urban birds during the COVID-19 shutdowns may indicate an absence (or low influx) of generally shy, less tolerant individuals and species from rural or less disturbed areas into the cities…” (L307-314); your point 3) to our point (iii): “Urban birds might have been already habituated to or tolerant of variation in human presence, irrespective of the potential changes in human activity patterns” (L315-329). To distinguish between (ii) and (iii) or the two from (i), individually-marked birds and comprehensive genetic analyses are needed, which we now note in the Discussion (L330-348). Importantly, we also discuss that the lack of response might be due to relatively small changes in human activity (L253-292), which we unfortunately could not fully quantify.

      a) Even had these populations turned over as a result of a massive rural-to-urban dispersal event, it's not clear that the escape distance in those individuals would be different because this paper does not establish that these hypothetical rural birds have a different behavioral response which would be constant following dispersal. Thus the evidence gathered here is insufficient to tell us about possible relocations of the focal species.

      Thank you for this point. We address this point in the Introduction and Discussion (L92-101, 307-314). Rural bird populations/individuals are on average less tolerant of humans than urban birds (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146) and at the same time, bird individuals seem consistent in their escape responses (Carrete & Tella 2010, Biol Lett 23:167–170; Carrete & Tella 2013, Sci Rep 3:1–7).

      Additionally, the paper cites several papers that found no changes in abundance or movements of animals in response to lockdowns but ignore others that do. For example: (Wilmers et al. 2021), (Warrington et al. 2022) (though this may have been published after this was submitted...), and (Schrimpf et al. 2021).

      We added the papers (L89-91). Thank you!

      There is a missed opportunity to consider the drivers of some of these results - the findings in this paper are interesting in light of studies that did observe changes in space use or abundance - i.e. changes in space use could arise precisely because responses to humans are non-plastic but the distribution and activities of humans changed.

      Thank you. Indeed, we now address this in the Discussion (L303-306): “However, some studies reported changes in the space use by wildlife (Schrimpf et al., 2021; Warrington et al., 2022; Wilmers et al., 2021). and these could arise, as our results indicate, from fixed and non-plastic animal responses to humans who changed their activities”.

      To wit, the primary finding here would imply that the reaction norm to human presence is apparently fixed over such timescales - however, and critically, the putative reduction in human activity/mobility combined with fixed responses at the individual level might then imply changes in avian abundance/movement/etc.

      Unfortunately, we have not measured changes in avian abundance or movements. But, please, note that the change in human mobility in sampled cities might be not as dramatic as initially thought and we consider this scenario to be most plausible in explaining no significant differences in avian escape responses before and during the COVID-19 shutdowns (see Fig. 4). Nevertheless, we add your point into the Discussion: If our findings imply that in birds the reaction norm to human presence is fixed over the studied temporal scale, the putative changes in human presence might then imply changes in avian abundance or movement (L293 and text below it).

      b) If this were the case, wouldn't this be then measurable as a function of some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here? Site accounted for ~15% of the total variation in escape distance but was treated as a random effect - perhaps controlling for the nature of the urban environment using some e.g. remotely sensed variable would provide additional context here.

      Urbanity mirrors the long-term level of human presence in cities whereas we were interested mainly in the rather short-term effects of potential changes of human presence on bird behaviour. Thus, we are not sure how adding such variable will help elucidating the current results. Please, also note that we added the country-specific analysis. Site indeed accounted for considerable amount the total variance in escape distance and that is why it was included as random intercept, which controls for non-independents of data points from each city. This could partly help us to control for difference in habitat type (e.g. urbanization level) within cities.

      c) Because it's not clear the extent to which the populations tested had turned over between years, the paper could do with a bit more caution in interpreting these results as behavioral. This study spans several years so any response (or non-response) is not necessarily a measure of behavioral change because the sample at each time point could (likely does) represent different individuals. In fact, there may be an opportunity here to leverage the one site where pre-pandemic measures were taken several years prior to the pandemic. How much variance in the change in escape distance is observed when the gap between time points far exceeds the lifetime of the focal taxa versus measures taken close in time?

      We believe the initial Fig S4, now Figure 2, addresses this point. The between years temporal variation in FIDs exceeds the variation due to lockdowns. This is true both for measures taken in consecutive years, as well as for measures taken far apart.

      d) Finally, I think there are a few other potential explanations not sufficiently accounted for here:

      i) These behaviors might indeed be plastic, but not over the timescales observed here.

      We agree and have added this point (L301-303). Thank you.

      ii) Time of year - this study took place during the breeding season. The focal behavior here varies with the time of year, for example, escape distance for many of these species could be tied up in nest defense behaviors, tradeoffs between self-preservation and e.g. nest provisioning, etc.

      Please, note that we controlled for the date in our analyses. Date was used as a proxy for the progress in the breeding season (L463-464 and Fig. 1 caption). Note that we collected data only from foraging or resting individuals, and data were neither collected near the nest sites nor from individuals showing warning behaviours, which we now note (L400-401).

      iii) Escape behaviors from humans are adaptively evolved, strongly heritable, and not context dependent - thus we would only expect these behaviors to change on evolutionary timescales.

      We discussed this at L307-308 and 381-383. Escape behaviors from humans are highly consistent for individuals, populations, and species (Carrete & Tella 2010, Biol Lett 23:167–170; Díaz et al. 2013, PloS One 8:e64634; Mikula et al. 2023, Nat Commun 14:2146). Whether such behavior is consistent across contexts is less clear (e.g. Diamant et al. 2023, Proc Royal Soc B, in press; but see, e.g. Radkovic et al. 2019, J Ecotourism 18:100-106; Gnanapragasam et al. 2021, Am Nat 198:653-659). Escape distance is often not measured simultaneously, for example, with human presence. In other words, whereas general level of human presence may have no effect on escape distance, the day-to-day or hour-to-hour variations might. We need studies on fine temporal scales (day-to-day or hour-to-hour) using marked individual to elucidate this phenomenon.

      iv) See point one above - it's possible that the lockdown didn't modify human activity sufficiently to trigger a behavioral response or that the reaction norm to human behavior is non-linear (e.g. a threshold effect).

      We agree, now use also Google Mobility Reports and # of humans data to elucidated this phenomenon and have added such interpretations to L253-292 and, e.g. Fig. 4.

      LITERATURE CITED Geng DC, Innes J, Wu W, Wang G. 2021. Impacts of COVID-19 pandemic on urban park visitation: a global analysis. J For Res 32:553-567. doi:10.1007/s11676-020-01249-w

      Noi E, Rudolph A, Dodge S. 2022. Assessing COVID-induced changes in spatiotemporal structure of mobility in the United States in 2020: a multi-source analytical framework. Int J Geogr Inf Sci.

      Schrimpf MB, Des Brisay PG, Johnston A, Smith AC, Sánchez-Jasso J, Robinson BG, Warrington MH, Mahony NA, Horn AG, Strimas-Mackey M, Fahrig L, Koper N. 2021. Reduced human activity during COVID-19 alters avian land use across North America. Sci Adv 7:eabf5073. doi:10.1126/sciadv.abf5073

      Warrington MH, Schrimpf MB, Des Brisay P, Taylor ME, Koper N. 2022. Avian behaviour changes in response to human activity during the COVID-19 lockdown in the United Kingdom. Proc Biol Sci 289:20212740. doi:10.1098/rspb.2021.2740

      Wilmers CC, Nisi AC, Ranc N. 2021. COVID-19 suppression of human mobility releases mountain lions from a landscape of fear. Curr Biol 31:3952-3955.e3. doi:10.1016/j.cub.2021.06.050

      Reviewer #2 (Public Review):

      Mikula et al. have a large experience studying the escape distances of birds as a proxy of behavioral adaptation to urban environments. They profited from the exceptional conditions of social distance and reduced mobility during the covid-19 pandemic to continue sampling urban populations of birds under exceptional circumstances of low human disturbance. Their aim was to compare these new data with data from previous "normal" years and check whether bird behavior shifted or not as a consequence of people's lockdown. Therefore, this study would add to the growing body of literature assessing the effect of the covid-19 shutdown on animals. In this sense, this is not a novel study. However, the authors provide an interesting conclusion: birds have not changed their behavior during the pandemic shutdown. This lack of effects disagrees with most of the previously published studies on the topic. I think that the authors cannot claim that urban birds were unaffected by the covid-19 shutdown. I think that the authors should claim that they did not find evidence of covid-19-shutdown effects. This point of view is based on some concerns about data collection and analyses, as well as on evolutionary and ecological rationale used by the authors both in their hypotheses and results interpretation. I will explain my criticisms point by point:

      We are grateful for your positive appraisal of our manuscript, as well as for your helpful critical comments. We toned down the discussion to claim, as suggested by you, that we did not find evidence for effects of covid-19-shutdowns on escape behaviour of birds in urban settings (see Results and Discussion sections). In general, we attempted to provide a more nuanced discussion and reporting of our findings. We also changed the manuscript title to “Urban birds' tolerance towards humans was largely unaffected by the COVID-19 shutdowns” and added validation using Google Mobility Reports (Fig. 5 & S6, Table S3a and S5) and the actual number of humans (Fig. 6 and S6; Table S3b-e and S6). Note however that there is only a single robust study on the topic of shutdown and animal escape distances (Diamant et al. 2023, Proc Royal Soc B, in press), i.e. the topic is largely unexplored (e.g. L99-101), whereas we discuss our finding in light of shutdown influences on other behaviours (L293-329).

      1) The authors used ambivalent, sometimes contradictory, reasoning in their predictions and results interpretation. Some examples:

      We tried to clarify our reasoning and increased consistency in our claims in the Introduction. Please, note that we simplified the Introduction and now provide one main expectation: FIDs of urban birds should increase with decreased human presence. This pattern is robustly empirically documented, regardless of the mechanism involved (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146). Please, see our revised Discussion for a more comprehensive discussion of mechanisms which could explain the patterns described in our study.

      1.1) The authors claimed that urban birds perceive humans as harmless (L224), but birds actually escape from us, when we approach them... Furthermore, they escape usually 5 to 20 m away. This is more distance that would be necessary just to be not trampled.

      We agree and have deleted mentions that humans are perceived as harmless.

      1.2) If we are harmless, why birds should spend time monitoring us as a potential threat (L102)? Indeed, I disagree with the second prediction of the authors. I could argue that reduced human activity should increase animal vigilance because real bird predators (e.g. raptors) may increase their occurrence or activity in empty cities. If birds should increase their vigilance because the invisible shield of human fear of their predators is no longer available, then I would expect longer escape distances.

      Thank you for this comment. We deleted this prediction and largely rewrote Introduction based on your comments and comments from the other reviewers.

      1.3) To justify the same escape behavior shown by birds in pre- and pandemic conditions from an adaptive point of view, the authors argued a lack of plasticity and a strong genetic determination of such behavior. This contravenes the plasticity proposed in the previous point or the expected effect of the stringency index (L112).

      We now attempted to write this more clearly while incorporating your suggestions. In the Discussion, we now propose various hypothesis that can, but need not be mutually exclusive. Please, note that we simplified the Introduction and now provide one main hypothesis: FIDs of urban birds should increase with decreased human presence.

      In my opinion, some degree of plasticity in the escape behavior would be really favorable for individuals from an adaptive perspective, as they may face quite different fear landscapes during their lives. Looking at the figures, one can see notable differences in the escape distance of the same species between sites in the same city. As I can hardly imagine great genetic differences between birds sampled in a park or a cemetery in Rovaniemi, for instance, I would expect a major role of plasticity to explain the observed variability. Furthermore, if escape behavior would not be plastic, I would not expect date or hour effects. By including them in their models, the authors are accepting implicitly some degree of plasticity.

      We regret being unclear. We do accept some degree of plasticity. Yet, our study design prohibits the assessment of the degree of individual plasticity because sampled birds were not individually marked and approached repeatedly. We tried to soften the statements in our Discussion to not fully dismiss a possibility that urban birds have some degree of plasticity in their antipredator behaviour (L293-329). Note however, that while our data collection was not designed to test how hour-to-hour changes in human numbers influence escape distance, the effect of the number of humans (i.e. hour-to-hour variation in human numbers) in our sample was tiny.

      The date and hour effect simply control for the particularities of the given day and hour (e.g. warm vs cold times or the time until sunset). In other words, the within species differences (even from the same park) may have little to do with individual plasticity, but instead may reflect between individual differences. We now add this issue to Methods (L471-476): “This approach enabled us to control for spatial and temporal heterogeneity and specificity in escape behaviour of birds (e.g. species-specific responses, changes in escape distances with the progress in the breeding season, spatial and temporal variation in compliance with government restrictions or particularities of the given day and hour)....”

      2) Looking at the figures I do not see the immense stochasticity (L156, Fig. S3, S5) claimed by the authors. Instead, I can see that some species showed an obvious behavioral change during the shutdown. For instance, Motacilla alba, Larus ridibundus, or Passer domesticus clearly reduced their escape distances, while others like the Dendrocopos major, Passer montanus, or Turdus merula tended to increase it.

      At L138-141 and 327-329 we discussed the within and between genera and cross-country variation and stochasticity in response to the shutdowns (Fig. 2). The reference to species-specific plots was perhaps a little bit misleading. We think that the essential figure, that we now reference at this point, is Figure 2 that shows the temporal trends and/or stochasticity that seem to have little in common with lockdowns. Please, also look at Figure 3 and S3-S4. These show that in all selected genera/species, the trends did not significantly deviate from central regression line which indicates no change in FID before and during the COVID-19 shutdowns.

      On the other hand, birds in Poland tended to have larger escape distances during the shutdown for most species, while in Rovaniemi there was an apparent reduction of escape distances in most cases. The multispecies and multisite approach is a strength of this study, but it is an Achilles' heel at the same time. The huge heterogeneity in bird responses among species and sites counterbalanced and as a result, there was an apparent lack of shutdown effects overall. Furthermore, as most data comes from a few (European) species (i.e. Columba, Passer, Parus, Pica, Turdus, Motacilla) I would say that the overall results are heavily influenced (or biased) by them. The authors realize that results are often area- or species-specific (L203), therefore, does a whole approach make sense?

      We are grateful for this valuable comment. We believe the general approach makes sense as there is a general expectation about how birds should respond to changes in human presence. That is why we control for non-independence of data points in our sample. Thus, although lots of data come from a few European species, this is corrected for by the model. Note that given the sheer number of sampled species, some site- or species-specific trends may have occurred by chance. Importantly, we believe that Figure 2, with species-site specific temporal trends, reveals that the between year stochasticity in escape distances seems greater that any effects of lockdowns. Nevertheless, we have further dealt with this issue in the revised manuscript by running country-specific models which again clearly showed no significant effect of Period on escape behaviour of birds (including, no effects in Poland and Finland).

      3) The previous point is worsened by the heterogeneity of cities and periods sampled. For instance:

      3.1) I can hardly imagine any common feature between a small city in northern Finland (Rovaniemi) and a megacity in Australia (Melbourne). Thus, I would not be surprised to find different results between them.

      3.2) Prague baseline data was for 2014 and 2018, while for the rest of the study sites were for 2018 and 2019. If study sites used a different starting point, you cannot compare differences at the final point.

      We are slightly confused by these comments.

      3.1) The cities are expected to be different but (i) the difference may be smaller than imagined (e.g. park structures, managed grass cover, few shrubs and deciduous-dominated tree species) and (ii) we expect the effects of lockdowns to be similar across cities. Whether we have no people in Rovaniemi parks (which despite Rovaniemi’s small size are usually extremely well-visited) or no people in Melbourne parks should not make a difference in principle. Note however, that to avoid overconfident conclusions, we allow for different reaction norms within cities. Please, also note that we are now providing country-specific results which should identify whether shutdowns lead to different reaction in sampled countries. We found no strong effect of shutdowns in any of sampled countries/cities.

      3.2) Because of the possible between site differences at the starting point, we use study site as random intercept and control for the between site reaction norms by including the random slope of the period. In other words, such possible differences do not influence outcomes of our models. Regardless, our a priori expectation is that the human activity levels in a given park was similar prior to covid and hence in 2014, 2018, and 2019. Again, we are now providing country-specific results which identify whether shutdowns led to different reactions in sampled countries, which they mostly did not

      3.3) Due to the obvious seasonal differences between the northern and southern hemispheres, data collection in Australia began five months later than in the rest of the sites (Aug vs Mar 2020). There, urban birds faced already too many months of reduced human disturbances, while European birds were sampled just at the beginning of the lockdown.

      We agree that each city or even park within the city has its specific environmental conditions (here including the time point of lockdown). That is why we control for city and park location in the random structure of the model (see Method section). We now add results per country that shows no clear differences (e.g. Fig. 1).

      However, the aim of our study was to test for general, global effects of lockdowns, which are minimal. Note that we now specifically test for country-specific effects in separate models on each country (e.g. Fig. 1, Fig S6) but all country-specific effects are small and still centre around zero.

      3.4) Some cities were sampled by a single observer, while others by many of them. Even if all of them are skilled birders, they represent different observers from a statistical point of view and consequently, observer identity was an extra source of noise in your data that you did not account for.

      We agree. In Finland and Hungary, data were collected by two closely cooperating observers. In Poland, all data were collected by a single observer. In the Czech Republic and Australia, a single observer (P.M. and M.W., respectively) sampled 46 sites out of 56 and 32 sites out of 37, respectively. Each site was sampled by the same observer both before and during the shutdowns. We now clearly state it in the Methods (L352-356). In other words, our models already largely control for the possible observer confound by having site as a random intercept. Moreover, previous study showed that FID estimates do not vary significantly between trained observers (Guay et al. 2013, Wildlife Research, 40, 289-293).

      4) Although I liked the stringency index as a variable, I am not sure if it captured effectively the actual human activity every day. Even if restrictive measures were similar between countries, their actual accomplishment greatly depended on people's commitment and authorities' control and sanctions. I would suggest using a more realistic measure of human activity, such as google mobility reports.

      Thank you for this comment. We now validate the use of the stringency index with the Google Mobility Reports, showing that human mobility generally (albeit in some countries relatively weakly) decreases with the strength of governmental antipandemic measures. Please, note that our main research question is related to the general change in human outdoor activity and not to week-to-week, day-to-day or hour-to-hour changes captured by stringency index, Google Mobility or the number of humans during an escape trial data. Nevertheless, using Google Mobility and the number of humans as predictors led to the similar results as for stringency index and Period (Fig. 1 and S6). Please, see extended discussion on this topic in our manuscript (L270-292).

      5) The authors used escape trials from birds on the ground and perched birds. I think that they are not comparable, as birds on the ground probably perceive a greater risk than those placed some meters above the ground, i.e. I would expect shorter escape distances for perched birds. As this can be strongly dependent on the species preferences or sampling site (i.e, more or less available perches), I wonder how this mixture of observations from birds on the ground and perched birds could be affecting the results.

      We now added information that most birds were sampled when on the ground (79%). Importantly, previous studies have found that perch height has a minimum effect on FIDs (e.g. Bjørvik et al. 2015. J Ornithol 156:239–246; Kalb et al. 2019, Ethology 125:430-438; Ncube & Tarakini 2022, Afr J Ecol 60:533– 543; Sreekar et al. 2015,. Tropic Conserv Sci 8:505-512). We added this information to the Method section (L394-395).

      6) The authors did not sample the same location in the same breeding season to avoid repeated sampling of the same individuals (L331). This precaution may help, but it does not guarantee a lack of pseudoreplication. Birds are highly mobile organisms and the same individuals may be found in different places in the same city. This pseudoreplication seems particularly plausible for Rovaniemi, where sampling points must be necessarily close due to the modest size of this city.

      We appreciate your concern. We cannot fully exclude the possibility of sampling some individuals twice. However, we sampled during the breeding season within which most birds are territorial, active in the areas around the nests and hence an individual switching parks is unlikely. Also, most sampled birds in our study are passerines which have small territories (typically few hundred square meters). Some larger birds may have larger territories and move larger distance to forage (e.g. kestrels which often forage outside cities) but these birds represent a minority of our records and we have not sampled outside the cities.

      7) An intriguing result was that the authors collected data for 135 species during the shutdown, while they collected data only for 68 species before the pandemic. Such a two-fold increase in bird richness would not be expected with a 36% increase in sampling effort during 2020-21. I wonder if this could be reflecting an actual increase in bird richness in urban areas as a positive result of the shutdown and reduced human presence.

      There were 141 unique day-years during before COVID and 161 during COVID. So, the sampling effort as calculated by days does not explain the difference in species numbers. Whether the actual effort, which was 381 vs 463 h of sampling, explains the difference is unclear, which we now note in the Methods (L476-483). If not, your proposition is possible, but we would like to avoid any speculations on this topic in the manuscript as it is difficult to infer species diversity from FID sampling.

      8) The authors dismissed the multicollinearity problem of explanatory variables unjustifiably (L383). However, looking at fig. S1, I can see strong correlations between some of them. For instance, period and stringency index were virtually identical (r=0.95), while temperature and date were also strongly correlated.

      We are confused by this comment and think this reflects a misunderstanding. Period and stringency index are explanatory variables of interest that were never included in the same model and hence their correlation does not contribute to the within a model multicollinearity. To avoid further confusion, we note this within (Fig. S2) legend. However, we must be cautious when interpreting the results from the models on period, Google Mobility, # of humans and stringency index, as the four measure are similar.

      We discuss multicollinearity of explanatory variables within the manuscript (L458-538, 548-550) and noted that, with the exception of temperature and day within the breeding season (r = 0.48), the correlations among explanatory variables were minimal. We thus used only temperature as an explanatory variable (i.e. fixed factor; also because temperature reflects both season and variation in temperature across a season) whereas the day was included as a random intercept to control for pseudoreplication within day. Collinearity between all other predictors was low (|r| <0.36).

      9) The random structure of the models is a key element of the statistical analyses but those random factors are poorly explained and justified. I needed to look up the supplementary tables to fully understand the complex architecture of the random part of the models. To the best of my knowledge, random variables aim to account for undesirable correlations in the covariance matrix, which is expected in hierarchical designs, such as the present one. However, the theoretical violation of data independence may happen or not. As the random structure is usually of little interest, you should keep it as simple as necessary, otherwise random factors may be catching part of data variability that you would like to explain by fixed variables. I think that this is what is happening (at least, in part) here, as the authors included a too-complex random structure. For instance, if you include the year as a random factor, I think that you are leaving little room for the period effect. The authors simplified the random structure of the models (L387), but they did not explain how. Nevertheless, this model selection was not important at all, as the authors showed the results for several models. I assume, consequently, that the authors are considering all these models equally valid. This approach seems quite contradictory.

      The random structure of the model controls for possible pseudoreplication in the data, that is for the cases where we have multiple data points that may not be independent and hence technically represent one. Apart from that, random structure tells us about where the variance in the data lies. This is often of interest and your previous questions about city, site or species specificities can be answered with the random part of the model. To follow up on your example, year is included in the model because data from a single year are not independent (for example because of delayed breeding season in one year vs. in another).

      We regret being unclear about the model specification and have attempted to clarify the methods (L466-476). We first specified a model with an ideal random structure that necessarily was complex (perhaps too complex). We then showed that using models with simpler random structures did not influence the outcomes. We now use a simpler model within the main text, but do keep the alternative models to show that the results are not dependent on the random structure of the model (Fig. S1 and Table S2).

      Reviewer #3 (Public Review):

      This study examined the changes in fear response, as measured by the flight initiation distances (FID), of birds living in urban areas. The authors examined the FIDs of birds during the pandemic (COVID-19 lockdown restrictions) compared to FIDs measured before the pandemic (mostly in 2018 & 2019). The main study justification was that human presence changed drastically during the pandemic lockdowns and the change in human presence might have influenced the fear response of birds as a result of changing the "landscape of fear". Human presence was quantified using a 'stringency' index (government-mandated restrictions). Urban areas were selected from within five different cities, which included four European cities (Czech Republic - Prague, Finland - Rovaniemi, Hungary - Budapest, Poland - Poznan), and one city in the global south (Australia - Melbourne). Using 6369 flight initiation distances across 147 different bird species, the authors found that FIDs were not significantly different before the pandemic versus during the pandemic, nor was the variation in FID explained by the level of 'stringency'.

      Major strengths: There are several strengths to this study that allows for understanding the variety of factors that influence a bird's response to fear (measured as flight initiation distances). This study also demonstrates that FIDs are highly variable between species and regions.

      Specifically,

      1) One of the major strengths of this paper is the focus on birds living in urban areas, a habitat type that is hypothesized to have changed drastically in the 'landscape of fear' experienced by animals during the pandemic lockdown restrictions (due to the presumed decrease in human presence and densities). Maintaining the focus on urban birds allowed for a deeper examination of the effect of human behaviour changes on bird behaviour in urban habitats, which are at the interface of human-wildlife interactions.

      2) This study accounted for several variables that are predicted to influence flight initiation distances in birds including species, genus, region (country), variability between years, pandemic year (pre- versus during), the strictness of government-mandated lockdown measures, and ecological factors such as the human observer starting distance, flock size, species-specific body size, ambient air temperature (also a proxy of the timing during the breeding season), time of day, date of data collection (timing within the regional [Europe or Australia] breeding season), and categorization of urban site type (e.g. park, cemetery, city centre).

      3) This study examined FIDs in two years previous to the pandemic (mostly 2018 and 2019, one site was 2014) which would account for some of the within- and between-year FID variation exhibited prior to the pandemic.

      4) This study uses strong statistical approaches (mixed effect models) which allows for repeat sampling, and a post hoc analysis testing for a phylogenetic signal.

      Thank you for your supportive and positive comments.

      Major weaknesses: The authors used government 'stringency' as a proxy for human presence and densities, however, this may not have been an accurate measure of actual human presence at the study sites and during measurements of FIDs. Furthermore, although the authors accounted for many factors that are predicted to influence fear response and FIDs in birds, there are several other factors that may have contributed to the high level of variation and patterns in FIDS observed during this study, thus resulting in the authors' conclusion that FIDs did not vary between pre- and during pandemic years.

      Thank you for your suggestions. We agree. To capture the general human presence in parks, we now incorporated an analysis using Google Mobility Reports (Fig S6b) that directly measures human mobility in each of sampled cities and specifically in urban parks where most our data were collected, and also address your further concerns that you detail below. Albeit not the main interest of our study, we now also incorporated an analysis using actual # of humans during an escape trial (Fig. S6c).

      Moreover, we think that including further possible confounds should not influence our conclusions. In other words, including further confounds will decrease the variance that can be explained by shutdowns and thus such shutdown effects (if any) would be tiny and hence likely not biologically meaningful.

      Specifically,

      1) The authors used "government stringency" as a measure of change in human activity, which makes the assumption that the higher the level of 'stringency', the fewer humans in urban areas where birds are living. However, the association between "stringency" and actual human presence at the study sites was not measured, nor was 'stringency' compared to other measures of human presence such as human mobility.

      Thank you for this essential comment. Initially, we viewed Oxford Stringency Index as the best available index for our purposes. However, we now further acknowledge its limitations (L) and validate the Oxford Stringency Index with the Google Mobility Reports data, showing that both indices are generally negatively (albeit sometimes weakly) correlated across sampled cities (i.e. human mobility decreases with the increasing stringency index). Although other human presence indices were used in the past, e.g. Cuebiq, Descartes Labs and Maryland Uni index, Apple (see Noi et al. 2022, Int J Geograph Info Sci, 36, 585-616), we used only the Google Mobility index because (a) it is publicly available, (b) is available also for territories outside US, and (c) provides data for urban parks within each city included in our dataset. Note however that Google Mobility data are inappropriate to answer our primary question, i.e. whether changes in human presence outdoors due to the COVID-19 shutdowns had any effect on avian tolerance towards humans. First, Google Mobility was available only for 2020-22, i.e. the baseline pre-COVID-19 data for 2018-2019 were unavailable. Thus, there was no way to check whether the human activity levels really changed during the COVID-19 years. Second, Google Mobility data are calculated as a change from 2020 January–February baseline for each day of the week for each city and its location (here we used parks). In other words, the data are not comparable between days and cities, albeit we attempted to correct for this within the random structure of the mixed model. Also, the data may be influenced by extreme events within the 2020 Jan–Feb baseline period (see here). Third, the Google Mobility varies greatly between days and across season (see Fig 4 & S5 or the first figure in these responses), likely more than the possible change due to shutdowns. Nevertheless, we found that results based on Google Mobility are qualitatively very similar to results based on stringency index. Moreover, we showed that the relationships between # of humans and both Google Mobility or Stringency index (Figure 6) are weak and noise with 95%CIs widely overlapping zero (Table S3b-e). Also, similarly to other predictors of human presence, # of humans only poorly predicted changes in avian escape distances. We added details on the new analysis into the Methods and Results and Supplement (L134-165 and associated figures and tables, L415-535).

      2) There was considerable variation in FID measurements, which can be seen in the figures, indicating that most of the variation in FID was not accounted for in the authors' models.

      We are confused by this statement. The fact that the FIDs varied does not translate directly to that our models did not account for the variation. Nevertheless, we do control for most of the discussed confounds (see further answers below). Importantly, it is unclear how including further possible confounds should influence our conclusions, unless the lockdowns effects are tiny, in which case those might not be biologically meaningful.

      Factors that may have contributed to variation in FIDs that were not accounted for in this study are as follows:

      a. The authors accounted for the date of data collection using the 'day' since the start of the general region's breeding season (Europe: Day 1 = 1 April; Australia: Day 1 = 15 August). Using 'day' since the breeding season started probably was an attempt to quantify the effect of the breeding stage (e.g. territory establishment, nest young, fledgling) on FIDs. However, breeding stages vary both within- and between species, as well as between sub-regions (e.g. Finland vs. Hungary). As different species respond to predation or human presence differently depending on the stage during their breeding cycle, more specificity in the breeding cycle stage may allow for explaining the observed variation and patterns in FID.

      We agree. Although we don’t have a precise city-specific information on the timing of breeding stages in sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained relatively high portion of the variance in our data (see Table S1 and S2) - perhaps something you expected.

      b. Variation in species-specific FIDs may also vary with habitat features within urban sites, such as the proximity of trees and other protective structures (e.g. perches and cover), the openness of the area, and the level of stressors present (e.g. noise pollution, distance to roads). Perhaps accounting for this habitat heterogeneity would account for the FID variation measured in this study.

      We agree. We don’t have such fine-scale data, but we included site identity (typically within a particular park or cemetery) which should account for the habitat heterogeneity among localities. Depending on the model, site explained relatively little variance (1-6%), indicating low heterogeneity between localities in these undescribed characteristics. Also note that park structure may be quite similar both within and between cities, i.e. managed green grass areas, with only a few shrubs and deciduous trees. Therefore, the possible minor habitat heterogeneity should not have any great impacts on our results.

      c. The authors accounted for species and genus within their models, however, FIDs may vary with other species-specific (or even specific populations of a species) characteristics such as whether the species/population is neophobic versus neophilic, precocial versus altricial, and the level of behavioural plasticity exhibited. These variables were not accounted for in the analysis.

      We agree that FIDs can be correlated with many possible factors. Here, we were interested in general patterns, while controlling for FID differences between species, as well as for possible species-specific reaction norms to lockdowns. Whether neophobic vs neophilic population or precocial versus altricial species react differently to lockdowns might be of interest, but it is beyond the scope of this study. However, that population and population specific reaction norms explain little variation (Table S2a, 0-6% of variation) so such a confound should not substantially influence our conclusion much. We do not have fine-scale data on the level of neophobia, but the effects of lockdowns seem similar for precocial (see Anas, Larus, Cygnus) and altricial (the remaining, mostly passerine) species in our dataset (see Fig. 3 and S3-S4). Please, note that we sampled mainly adults (L386). Moreover, the effects for clades, which may differ in their cognitive skills, are also similar (e.g. Corvids vs. Anas or Cygnus; Fig. 3).

      d. Three different methods of measuring the distances between flight and the observer location were used, and FIDs were only measured once per bird, such that there were no measures of repeatability for a test subject. Thus, variation surrounding the measurement of FIDs would have contributed to the variation in FIDs seen during this study.

      While all observers were trained, the three methods may add some noise to the FID estimates. However, the FID estimates from a single method may still slightly differ between observers (so do well standardized morphology measurements; Wang, et al. 2019, PLoS Biology, 17, e3000156). Importantly, FID estimates are highly replicable among skilled observers (Guay et al. 2013, Wildlife Research 40:289-293), and we previously validated this approach and showed that distance measured by counting steps did not differ from distance measured by a rangefinder (Mikula 2014, Ardea 102:53-60), which we now explicitly state (L391-394). Importantly, we control for observer bias by specifying locality as a random intercept (see further details in our response to the Editor). Moreover, each site was sampled by the same observer both before and during the shutdowns.

      3) The sample design of this study may have influenced the FID variability associated with specific species, and specific populations of species. A different number of species were sampled across the time periods of interest; 68 species were sampled before the pandemic versus 135 species after the pandemic. However, the authors do not appear to have directly compared the FIDs for the same species before the pandemic compared to during the pandemic (e.g. the FIDs of Eurasian blackbirds before the pandemic versus during the pandemic). Furthermore, within the same country-city, it is unclear whether the species observed before the pandemic were observed at the same location (e.g. same habitat type such as the same park) during the pandemic. As a species' FID response may be influenced by population characteristics and features specific to each site (e.g. habitat openness), these factors may have influenced the variability in FID measurements in this study.

      We regret being unclear in our methods. Our full model uses all data, but alternative models (see e.g. Fig. S1) used data with ≥5 as well as ≥10 observations before and during lockdowns for a given species. Importantly, Figure 2 and 3 depict data for species sampled at specific sites. We now clarify this within the Methods (L460-483) and the Results (L125-133 and associated figures) and in the figure legends (Fig. S1).

      4) The models in this study accounted for many factors predicted to affect FIDs (see the section on major strengths), however, the number of fixed and random factors are large in number compared to the total sample size (N =6369), such that models may have been over-extended.

      The number of predictors and random effects is well within the limits for the given sample size (Korner-Nievergelt et al. 2015. Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan). Importantly, simpler models give similar results as the more complex ones (Fig. S1) and the visual (model free) representations of our raw and aggregated data confirm our model results. This, we suggest, makes our findings robust and convincing.

      Overarching main conclusion

      Overall, this study examines factors influencing FIDs in a variety of bird species and concludes that FIDs did not differ during the pandemic lockdowns compared to before the pandemic (2019 and earlier). Furthermore, FIDs were not influenced by the strictness of government-mandated restrictions. Although the authors accounted for many factors influencing the measurement of FIDs in birds, the authors did not achieve their aim of disentangling the effects of pandemic-specific ecological effects from ecological effects unrelated to the pandemic (such as habitat heterogeneity).

      We find this statement confusing. We accounted for most relevant confounding factors and found little evidence for the strong effect of pandemic. Moreover, we now added country-specific analyses that confirm the lack of evidence, highlight the Figure 3 that shows no clear shutdown effect and also explore how levels of human presence changed over and within the years. Adding more possible confounds (albeit note that not many are left to add) might only further reduce the variation that could be explained by pandemic and hence such hypothetical effects of pandemic will be if anything small and thus likely not biologically meaningful.

      Their findings indicate that FIDs are highly variable both within- and between- species, but do not strongly support the conclusion that FIDs did not change in urban species during the pandemic lockdown. Therefore, this study is of limited impact on our understanding of how a drastic change in human behaviour may impact bird behaviour in urban habitats.

      It is unclear why you think our study lacks support for the conclusion that FIDs changed little during pandemic, if all results show no such effects. However, we toned down our Discussion and highlighted also potential issues linked to our approach (e.g. that sampled individuals were not marked and hence we cannot distinguish between various mechanisms that might explain the described pattern (L293-329) or that human presence may not have changed (L253-269). For further details see our previous response.

      Overall, the study demonstrates the challenges in using FIDs as a general fear response in birds, even during a pandemic lockdown when fewer humans are presumably present, and this study illustrates the large degree of variation in FIDs in response to a human observer.

      We appreciate and agree that our study demonstrates the challenges in quantifying human activity to understand bird escape distance and we added a paragraph on this topic to the discussion (L270-292).

      Nevertheless, we hope that our above responses clarify and address most of the issues you had with our manuscript. We tried to show that (a) most of your proposed controls are indeed included in our study design, models, and visualisations, and that (b) multiple evidence (from models and visualisation of raw and aggregated data) support the no overall effect conclusion. We further emphasize the temporal and between- and within-species variability in FIDs in the Results and now specifically indicate that lockdowns did not influenced FIDs above such variability (Fig. 2-3, Fig. S3). In other words, the natural (e.g. temporal) variation in FIDs seems far greater that potential effects of lockdowns (Fig. 2). We believe that even if lockdowns would have tiny effects that could have been detected with more. stringent experimental design (e.g. individually tagged birds) or even more complex models, such effects would be far from being biologically meaningful.

    2. Reviewer #1 (Public Review):

      This paper uses a series of flight initiation "challenges" conducted both prior to and during COVID-19-related restrictions on human movement to estimate the degree to which avian escape responses to humans changed during the "anthropause". This technique is suitable for understanding avian behavioral responses with a high degree of repeatability. The study collects an impressive dataset over multiple years across five cities on two continents. Overall the study finds no effect of lockdown on avian escape distance (the distance at which the "target" individual flees the approaching observer). The study considers the variable of interest as both binary (during lockdown or prior to lockdown) and continuous, using the Oxford Stringency Index (with neither apparently affecting escape distance).

      Overall this paper presents interesting results which may suggest that behavioral responses to humans are rather inflexible over "short" (~2 year) timespans. The anthropause represents a unique opportunity to disentangle the mechanistic drivers of myriad hypothesized impacts humans have on the behavior, distribution, and abundance of animals. Indeed, this finding would provide important context to the larger body of literature aimed at these ends. However, the paper could do more to carefully fit this finding into the broader literature and, in so doing, be a bit more careful about the conclusions they are able to draw given the study design and the measures used. Taking some of these points (in no particular order):

      1) Oxford Stringency Index is a useful measure of governmental responses to the pandemic and it's true that in some scenarios (including the (Geng et al. 2021) study cited by this paper) it can correlate with human mobility. However, it is far from a direct measure of human mobility (even in the Geng study, to my reading, the index only explained a minority of the variation). Moreover, particular sub-components of the index are wholly unrelated to human mobility (e.g., would changes to a country's public information campaign lead to concomitant changes in urban human mobility?). Finally, compliance with government restrictions can vary geographically and over time (i.e., we might expect lower compliance in 2021 than in 2020) and the index is calculated at the scale of entire countries and may not be very reflective of local conditions. Overall this paper could do more to address the potential shortcomings of the Oxford Stringency Index as a measure of human mobility including attempting to validate the effect on human mobility using other datasets (e.g., the google dataset and/or those discussed in (Noi et al. 2022). This is of critical importance since the fundamental logic of the experimental design relies on the assumption that stringency ~ mobility.

      2) The interpretation of the primary finding (that behavioral responses to humans are inflexible) could use a bit more contextualization within the literature. Specifically, the study offers three potential explanations for the observed invariance in escape response: 1) these behaviors are consistent within individuals and this study provides evidence that there was no population turnover as a result of lockdowns; 2) escape response is linked to other urban adaptations such that to be an urban-dwelling species dictates escape response; and/or 3) these populations already exhibit maximum habituation and the reduction in human mobility would only have increased that habituation but that trait is already at a boundary condition. Some comments on each of these respectively:

      a) Even had these populations turned over as a result of a massive rural-to-urban dispersal event, it's not clear that the escape distance in those individuals would be different because this paper does not establish that these hypothetical rural birds have a different behavioral response which would be constant following dispersal. Thus the evidence gathered here is insufficient to tell us about possible relocations of the focal species. Additionally, the paper cites several papers that found no changes in abundance or movements of animals in response to lockdowns but ignore others that do. For example: (Wilmers et al. 2021), (Warrington et al. 2022) (though this may have been published after this was submitted...), and (Schrimpf et al. 2021). There is a missed opportunity to consider the drivers of some of these results - the findings in this paper are interesting in light of studies that *did* observe changes in space use or abundance - i.e., changes in space use could arise precisely *because* responses to humans are non-plastic but the distribution and activities of humans changed. To wit, the primary finding here would imply that the reaction norm to human presence is apparently fixed over such timescales - however, and critically, the putative reduction in human activity/mobility combined with fixed responses at the individual level might then imply changes in avian abundance/movement/etc.

      b) If this were the case, wouldn't this be then measurable as a function of some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here? Site accounted for ~15% of the total variation in escape distance but was treated as a random effect - perhaps controlling for the nature of the urban environment using some e.g., remotely sensed variable would provide additional context here.

      c) Because it's not clear the extent to which the populations tested had turned over between years, the paper could do with a bit more caution in interpreting these results as behavioral. This study spans several years so any response (or non-response) is not necessarily a measure of behavioral change because the sample at each time point could (likely does) represent different individuals. In fact, there may be an opportunity here to leverage the one site where pre-pandemic measures were taken several years prior to the pandemic. How much variance in the change in escape distance is observed when the gap between time points far exceeds the lifetime of the focal taxa versus measures taken close in time?

      d) Finally, I think there are a few other potential explanations not sufficiently accounted for here:

      i) These behaviors might indeed be plastic, but not over the timescales observed here.<br /> ii) Time of year - this study took place during the breeding season. The focal behavior here varies with the time of year, for example, escape distance for many of these species could be tied up in nest defense behaviors, tradeoffs between self-preservation and e.g., nest provisioning, etc.<br /> iii) Escape behaviors from humans are adaptively evolved, strongly heritable, and not context dependent - thus we would only expect these behaviors to change on evolutionary timescales.<br /> iv) See point one above - it's possible that the lockdown didn't modify human activity sufficiently to trigger a behavioral response or that the reaction norm to human behavior is non-linear (e.g. a threshold effect).

      LITERATURE CITED<br /> Geng DC, Innes J, Wu W, Wang G. 2021. Impacts of COVID-19 pandemic on urban park visitation: a global analysis. J For Res 32:553-567. doi:10.1007/s11676-020-01249-w

      Noi E, Rudolph A, Dodge S. 2022. Assessing COVID-induced changes in spatiotemporal structure of mobility in the United States in 2020: a multi-source analytical framework. Int J Geogr Inf Sci.

      Schrimpf MB, Des Brisay PG, Johnston A, Smith AC, Sánchez-Jasso J, Robinson BG, Warrington MH, Mahony NA, Horn AG, Strimas-Mackey M, Fahrig L, Koper N. 2021. Reduced human activity during COVID-19 alters avian land use across North America. Sci Adv 7:eabf5073. doi:10.1126/sciadv.abf5073

      Warrington MH, Schrimpf MB, Des Brisay P, Taylor ME, Koper N. 2022. Avian behaviour changes in response to human activity during the COVID-19 lockdown in the United Kingdom. Proc Biol Sci 289:20212740. doi:10.1098/rspb.2021.2740

      Wilmers CC, Nisi AC, Ranc N. 2021. COVID-19 suppression of human mobility releases mountain lions from a landscape of fear. Curr Biol 31:3952-3955.e3. doi:10.1016/j.cub.2021.06.050

    3. Reviewer #2 (Public Review):

      Mikula et al. have a large experience studying the escape distances of birds as a proxy of behavioral adaptation to urban environments. They profited from the exceptional conditions of social distance and reduced mobility during the covid-19 pandemic to continue sampling urban populations of birds under exceptional circumstances of low human disturbance. Their aim was to compare these new data with data from previous "normal" years and check whether bird behavior shifted or not as a consequence of people's lockdown. Therefore, this study would add to the growing body of literature assessing the effect of the covid-19 shutdown on animals. In this sense, this is not a novel study. However, the authors provide an interesting conclusion: birds have not changed their behavior during the pandemic shutdown. This lack of effects disagrees with most of the previously published studies on the topic. I think that the authors cannot claim that urban birds were unaffected by the covid-19 shutdown. I think that the authors should claim that they did not find evidence of covid-19-shutdown effects. This point of view is based on some concerns about data collection and analyses, as well as on evolutionary and ecological rationale used by the authors both in their hypotheses and results interpretation. I will explain my criticisms point by point:

      1) The authors used ambivalent, sometimes contradictory, reasoning in their predictions and results interpretation. Some examples:<br /> 1.1) The authors claimed that urban birds perceive humans as harmless (L224), but birds actually escape from us, when we approach them... Furthermore, they escape usually 5 to 20 m away. This is more distance that would be necessary just to be not trampled.<br /> 1.2) If we are harmless, why birds should spend time monitoring us as a potential threat (L102)? Indeed, I disagree with the second prediction of the authors. I could argue that reduced human activity should increase animal vigilance because real bird predators (e.g., raptors) may increase their occurrence or activity in empty cities. If birds should increase their vigilance because the invisible shield of human fear of their predators is no longer available, then I would expect longer escape distances.<br /> 1.3) To justify the same escape behavior shown by birds in pre- and pandemic conditions from an adaptive point of view, the authors argued a lack of plasticity and a strong genetic determination of such behavior. This contravenes the plasticity proposed in the previous point or the expected effect of the stringency index (L112). In my opinion, some degree of plasticity in the escape behavior would be really favorable for individuals from an adaptive perspective, as they may face quite different fear landscapes during their lives. Looking at the figures, one can see notable differences in the escape distance of the same species between sites in the same city. As I can hardly imagine great genetic differences between birds sampled in a park or a cemetery in Rovaniemi, for instance, I would expect a major role of plasticity to explain the observed variability. Furthermore, if escape behavior would not be plastic, I would not expect date or hour effects. By including them in their models, the authors are accepting implicitly some degree of plasticity.

      2) Looking at the figures I do not see the immense stochasticity (L156, Fig. S3, S5) claimed by the authors. Instead, I can see that some species showed an obvious behavioral change during the shutdown. For instance, Motacilla alba, Larus ridibundus, or Passer domesticus clearly reduced their escape distances, while others like the Dendrocopos major, Passer montanus, or Turdus merula tended to increase it. On the other hand, birds in Poland tended to have larger escape distances during the shutdown for most species, while in Rovaniemi there was an apparent reduction of escape distances in most cases. The multispecies and multisite approach is a strength of this study, but it is an Achilles' heel at the same time. The huge heterogeneity in bird responses among species and sites counterbalanced and as a result, there was an apparent lack of shutdown effects overall. Furthermore, as most data comes from a few (European) species (i.e., Columba, Passer, Parus, Pica, Turdus, Motacilla) I would say that the overall results are heavily influenced (or biased) by them. The authors realize that results are often area- or species-specific (L203), therefore, does a whole approach make sense?

      3) The previous point is worsened by the heterogeneity of cities and periods sampled. For instance:<br /> 3.1) I can hardly imagine any common feature between a small city in northern Finland (Rovaniemi) and a megacity in Australia (Melbourne). Thus, I would not be surprised to find different results between them.<br /> 3.2) Prague baseline data was for 2014 and 2018, while for the rest of the study sites were for 2018 and 2019. If study sites used a different starting point, you cannot compare differences at the final point.<br /> 3.3) Due to the obvious seasonal differences between the northern and southern hemispheres, data collection in Australia began five months later than in the rest of the sites (Aug vs Mar 2020). There, urban birds faced already too many months of reduced human disturbances, while European birds were sampled just at the beginning of the lockdown.<br /> 3.4) Some cities were sampled by a single observer, while others by many of them. Even if all of them are skilled birders, they represent different observers from a statistical point of view and consequently, observer identity was an extra source of noise in your data that you did not account for.

      4) Although I liked the stringency index as a variable, I am not sure if it captured effectively the actual human activity every day. Even if restrictive measures were similar between countries, their actual accomplishment greatly depended on people's commitment and authorities' control and sanctions. I would suggest using a more realistic measure of human activity, such as google mobility reports.

      5) The authors used escape trials from birds on the ground and perched birds. I think that they are not comparable, as birds on the ground probably perceive a greater risk than those placed some meters above the ground, i.e. I would expect shorter escape distances for perched birds. As this can be strongly dependent on the species preferences or sampling site (i.e, more or less available perches), I wonder how this mixture of observations from birds on the ground and perched birds could be affecting the results.

      6) The authors did not sample the same location in the same breeding season to avoid repeated sampling of the same individuals (L331). This precaution may help, but it does not guarantee a lack of pseudoreplication. Birds are highly mobile organisms and the same individuals may be found in different places in the same city. This pseudoreplication seems particularly plausible for Rovaniemi, where sampling points must be necessarily close due to the modest size of this city.

      7) An intriguing result was that the authors collected data for 135 species during the shutdown, while they collected data only for 68 species before the pandemic. Such a two-fold increase in bird richness would not be expected with a 36% increase in sampling effort during 2020-21. I wonder if this could be reflecting an actual increase in bird richness in urban areas as a positive result of the shutdown and reduced human presence.

      8) The authors dismissed the multicollinearity problem of explanatory variables unjustifiably (L383). However, looking at fig. S1, I can see strong correlations between some of them. For instance, period and stringency index were virtually identical (r=0.95), while temperature and date were also strongly correlated.

      9) The random structure of the models is a key element of the statistical analyses but those random factors are poorly explained and justified. I needed to look up the supplementary tables to fully understand the complex architecture of the random part of the models. To the best of my knowledge, random variables aim to account for undesirable correlations in the covariance matrix, which is expected in hierarchical designs, such as the present one. However, the theoretical violation of data independence may happen or not. As the random structure is usually of little interest, you should keep it as simple as necessary, otherwise random factors may be catching part of data variability that you would like to explain by fixed variables. I think that this is what is happening (at least, in part) here, as the authors included a too-complex random structure. For instance, if you include the year as a random factor, I think that you are leaving little room for the period effect. The authors simplified the random structure of the models (L387), but they did not explain how. Nevertheless, this model selection was not important at all, as the authors showed the results for several models. I assume, consequently, that the authors are considering all these models equally valid. This approach seems quite contradictory.

    1. AbstractRecent advances in bioinformatics and high-throughput sequencing have enabled the large-scale recovery of genomes from metagenomes. This has the potential to bring important insights as researchers can bypass cultivation and analyse genomes sourced directly from environmental samples. There are, however, technical challenges associated with this process, most notably the complexity of computational workflows required to process metagenomic data, which include dozens of bioinformatics software tools, each with their own set of customisable parameters that affect the final output of the workflow. At the core of these workflows are the processes of assembly - combining the short input reads into longer, contiguous fragments (contigs), and binning - clustering these contigs into individual genome bins. Both processes can be done for each sample separately or by pooling together multiple samples to leverage information from a combination of samples. Here we present Metaphor, a fully-automated workflow for genome-resolved metagenomics (GRM). Metaphor differs from existing GRM workflows by offering flexible approaches for the assembly and binning of the input data, and by combining multiple binning algorithms with a bin refinement step to achieve high quality genome bins. Moreover, Metaphor generates reports to evaluate the performance of the workflow. We showcase the functionality of Metaphor on different synthetic datasets, and the impact of available assembly and binning strategies on the final results. The workflow is freely available at https://github.com/vinisalazar/metaphor.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.1093/gigascience/giad055) and has published the reviews under the same license. These are as follows.

      **Reviewer 1. Thomas Brüls **

      The authors present a snakemake-based workflow to automate and chain the main computational ingredients (assembly and binning) of genome-centric metagenomics; the authors developed a technically sound tool for this purpose, and by itself it is certainly valuable to the research community and worth of publication. however, even if the article is casted as a technical note -hence with an emphasis on the design, implementation and assessment of the tool-, I feel that a more thorough discussion of both its abilities and inabilities (e.g. strain resolution, detection of low abundance organisms, identification of virus bins, etc) would be worth for a more general audience. On the same token, a more deep discussion of some of the results obtained with their tool (see below) would be of interest and would also illustrate useful use cases. I would suggest the following modifications/additions:-the experiments with the strain madness dataset suggest that the genomes (or fragments thereof, i.e. the bins) resolved should be viewed as "species" genomes, or composite genomes possibly originating from multiple strains. if so, do the authors think this represents a hard limit to the assembly + binning approach, or could further existing tools (e.g. performing variant detection on top of cross-assembly before the binning step) be integrated or developed in the future for strain-resolution (i.e. to identify strains not dominant in any sample)? -related, a simple summary of the number of individual strains recovered in individual bins for the strain madness experiment would be interesting.-another issue that would be worth discussing in my opinion is the impact of genome abundance on the recovery of corresponding bins and their quality. the platform developed by the authors appears to be well suited for such kind of analyses and the results would be of both theoretical and practical interest. to put it simply, what is the minimal initial coverage of genomes required in order for them to be recovered in bins of a given size and quality?-rem: theses two issues (strain-level diversity and individual strain genome abundances) likely interact to limit bin resolution, and this could be mentioned by the authors.-the data presented by the authors suggest that the metabat binning engine significantly outperforms the other two tools (concoct and vamb, which are both widely used), see e.g Figure 2; what would account for that, and do the authors think this is a general observation (i.e. beyond the specific CACB setting or marine metagenome shown in Fig 2)? -a bin refinement step (based on the DAS tool and dereplication) is frequently mentioned but should be more detailed (including a precise definition of the bin quality metric used).

      further rather minor comments: -in the abstract, when mentioning "technical challenges associated with...", it would be worth mentioning that algorithmic challenges are present as well. -in the introduction, "It is hypothesised that pooled assembly and binning may lead to improved results when analysing communities with high genetic diversity, and to poorer results when there is a high level of intraspecies/strain-level diversity". I would assume there are many instances in the real world that are both, i.e. that present both high inter-species and intra-species genetic diversity, what then?-in the future directions, the authors mention the identification of eukaryotic and viral contigs and bins, and could shortly elaborate how this could be done properly. -the sentence "In summary, our assessment of ..." at the end of the ms appears to have a syntactic problem.

    1. AbstractHetnets, short for “heterogeneous networks”, contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes — including genes, diseases, drugs, pathways, and anatomical structures — with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open source implementation of these methods in our new Python package named hetmatpy.Competing Interest Statement

      **Reviewer 2. Paolo Provero **

      In this work Himmelstein and collaborators introduce a statistically controlled way of extracting significant node pairs in heterogeneous networks (hetnets) without relying on a ground truth and related training. The method "explains" why two nodes are significantly connected by extracting the metapaths most responsible for the enrichment. This is based on computing a null distribution of the DWPC, which allows assigning a P-value to each metapath joining two nodes, and then to visualize the individual paths responsible for the enrichment. The method is novel and significant, and can be in principle be applied to many hetnets, in life sciences and beyond, when a ground truth is not available or not desirable as it would introduce bias. The software tools developed appear to be readily available to other researchers.

      Major comment: If I understand correctly, given two nodes (say "Alzheimer disease" and "Circadian rhythm") the method extracts, in a statistically controlled way, the most significant metapaths joining the two nodes, and then the individual paths responsible for the enrichment. But this is not the most obvious question a life scientist would ask the network, which would be instead something like "Which are the pathways most significantly connected to "Alzheimer disease"? Indeed this type of question would be the one to ask when aiming for drug repurposing (possibly replacing "pathways" with "compounds" or "pharmacologic classes"). Based on Fig. 4A, the pathways are presented, or "suggested," in decreasing order of number of metapaths, but this is hardly a ranking by significance. Would it be possible to summarize the results in such a way as to rank the pathway nodes connected to a given disease node by significance (or more generally to rank the nodes of a certain type by the significance of their connection to a given node of another type)? This should be discussed.

      I also have several minor concerns. (1) The authors introduce and compute a null distribution of the DWPC which takes into account node degree in a statistically controlled way when evaluating the connectivity between two nodes. However, the DWPC itself does take into account node degree, as the name implies, and contains a tunable parameter that can be optimized, at least when a ground truth is available (as in Ref 39 by the same first author). I understand such tuning is not possible when, as in the present case, no ground truth is available, but the authors should make this point more clearly. (2) I find Fig. 1B a bit confusing: according to the legend, the top rows are known treatments, which should have higher than expected connectivity. However, based on the colors as explained by the legend, the bottom treatment/disease pairs seem to have higher connectivity (3) The acronym DWPC is defined after it has been used several times (4) The legend of Figure 2 should specify that these results apply to the nodes "Alzheimer disease" and "Circadian rhythm", although this becomes clear in Fig. 4 (5) I don't think Figure 3, representing the home page of the web site, is especially useful (6) I found Fig. 4 confusing: the sum of the path counts for the selected metapaths in panel B is way larger than the 425 results shown in Panel C. As far as I understand no path can belong to more than one metapaths, so is there some further selection here? (7) The "Frontend" section of the Methods seems a bit too detailed for the Gigascience audience.

      Re-review: The authors have addressed all my comments in a satisfactory way.

    1. Author Response

      Reviewer #2 (Public Review):

      This work attempts to connect the diet of a mother to the physiology and feeding behaviors of multiple generations of her offspring. Using genetic and molecular biology approaches in the fruit fly model, the authors argue that this Lamarckian inheritance is mediated by germline-inherited chromatin and is regulated by the general activity of a histone methylase. However, many of the measured effects are small and variable, the statistical tests to prove their significance are missing or poorly described, and some experiments are inadequately described and lack important controls.

      1) The authors claim that the diet of a mother can influence the physiology of her progeny for several generations. However, the observed effects of maternal diet on later generations were small and variable for most assays (see Fig1C, S1.1A, B, D). Additionally, the effect size between F0 HSD to ND was often larger than the effect size between the progeny of F0 parents and ND. To put it another way, if the authors were to compare the F1, F2, etc. to the F0 HSD flies, they would conclude that the majority of the response to diet is not maternally transmitted, and is directly controlled by the diet of the individual being measured.

      We agree with the reviewer that the effect size of acute HSD exposure (in HSD-F0 flies) was stronger than that of transgenerational inheritance (in HSD-F1/2/3/4 flies). Similar observations were also made in other studies, see Klosin et al., Science, 2017, Bozler et al., eLife, 2019. We would argue this difference in effect size was as expected and with clear biological relevance.

      For all living organisms, acute environmental changes (diet change included) have direct and profound influences on their survival and reproduction, and therefore need robust and immediate responses. In comparison, ancestral environmental changes may only provide some vague and indirect indications of the current living environment of the offspring. Such information may be beneficial for the survival and reproduction of the offspring, but the effect size is expected to be much smaller, or at least smaller than that of acute environmental changes.

      Studies on Dutch Famine offers a good example. Human individuals who were prenatally exposed to famine were found to be associated with greater risk in metabolic diseases (Ravelli et al., NEJM, 1976). But nevertheless, direct high-fat diet exposure was still the much stronger risk factor for obesity and metabolic disorders (Bray et al., Am J Clin Nutr, 1998, Jéquier et al., Int J Obes Relat Metab Disord, 2002).

      We have added additional discussions in the manuscript for clarification.

      Furthermore, since our current study aimed to investigate the mechanism of behavioral transgenerational inheritance, we focused on the comparison between HSD-F1 flies (and their progeny) vs. ND-fed flies. As the ancestors of HSD-F1/2/3/4 flies were exposed to HSD, whereas HSD-F1/2/3/4 flies themselves were never exposed to HSD, any difference we observed between the two groups could be solely attributed to transgenerational inheritance of ancestral HSD exposure. With that saying, to better distinguish the effects of acute HSD exposure vs. transgenerational inheritance upon ancestral HSD exposure, we re-analysed and presented the comparisons among ND, HSD-F0, and HSD-F1 data in the manuscript (Figure 1. B-E, Figure 1-figure supplement 1. A-E, Figure 1-figure supplement 2. A-D, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B).

      2) The authors chose to study PER, which had the largest average effect sizes between conditions. However, PER was highly variable in the averaged data, with some individuals showing large effects and others having no effects. A better characterization of transgenerational PER may increase the robustness of this assay and confidence in its results. For example, the authors could measure PER in lineages derived from individual flies to determine when transgenerational effects on PER decline or disappear. This form of data collection could help to explain the high variation in the averaged data presented in the paper.

      We acknowledged that PER in general was quite a variable behavioural trait (probably as to most if not all behavioural measures). It was not surprising since animal behaviours, as complex traits, could be influenced by numerous intrinsic and extrinsic factors, such as genetic background, developmental environment, diet, population density, environmental conditions, etc. Numerous PER studies have exhibited similar variability (Masek et al., PNAS, 2010, Marella et al., Neuron, 2012, Charlu et al., Nature Communication, 2013, Wang et al., Cell Metabolism, 2016, Wang et al., Cell Reports, 2020).

      Nevertheless, in our current study we were able to identify statistically significant behavioural difference between ND-fed flies and HSD-F1/2/3 flies, demonstrating that ancestral HSD exposure imposed transgenerational inheritance on sweet sensitivity. To further increase the robustness of the study as suggested by the reviewer, we have conducted additional repetitions of many PER experiments and further confirmed the phenotype with less variability and more statistical power (Figure 1. G-I, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B). The reviewer also suggested the use of isogenic flies, which might help to minimize the variations of genetic background. However, we think that demonstrating the behavioural difference in genetically diverse fly populations is a more credible way to show that such transgenerational inheritance is a reliable and generalizable phenomenon.

      3) What do the error bars represent on any figure? There are many examples where the data is highly variable and lies completely outside of the error bars. What is the statistical test for significance that is carried out in each figure? The brief comment about statistics in the methods section is inadequate. The authors should also supply the raw data used to generate the figures so that readers can perform their own statistical tests.

      Data in the manuscript were represented as means ± SEM (standard error of the mean) in all of our figures, which is a standard practice in the field (Masek et al., PNAS, 2010, Charlu et al., Nature Comm, 2013, Wang et al., Cell Metabolism, 2016). We have provided detailed explanations of the statistical tests in the manuscript. We have also prepared raw data files as suggested by the reviewer.

      The model that global H3K27me3 is regulated by ancestral diet is unconvincing without further experimental validation and explanation. Points 4-10 address specific issues.

      4) The authors performed ChIP on cycle 11 embryos. This stage is extremely short (11 min) and contains roughly 10 times less chromatin than embryos only 30 minutes older. These features make it very difficult to collect large numbers of precisely staged embryos without significant contamination. It is also debatable whether early cell cycles (including and preceding cycle 11) are slow enough to deposit and propagate histone marks in the presence of new histone incorporation. See the opposing arguments in Zenk et al 2017 and Li et al 2014. The authors could perform ChIP on older embryos to avoid this controversy.

      We thank the reviewer for the clarification. Our embryo collection protocol involved allowing flies to lay eggs freely in a cage for 30 minutes followed by 50 minutes of incubation on a juice plate, and then completing the embryo sorting within 30 minutes. Therefore, to describe it in a more stringent way, our embryos should be in the stage between cycle 10-12. We have corrected this information in the manuscript (Figure 2. A).

      Since all the embryos were sorted using the same morphological criteria within the same time frame, their developmental stages should be comparable (i.e. all from cycle 10-12). In several references we consulted, a broader range (cycle 9-13) was used for ChIP-seq sequencing analysis (for example, see Zenk et al., Science, 2017).

      Surely any maternally inherited information will also be present in cycle 14 or 15 embryos if it is to influence the development or physiology of the brain. The observed differences in global H3K27me3 levels in F1 vs ND flies could be explained by slightly different aged embryo collections or technical variations in the ChIP protocol. The authors could strengthen their conclusion by performing more ChIP replicates. Alternatively, the authors could use orthogonal approaches like antibody staining or western blots to measure global H3K27me3 levels in precisely staged embryos.

      We chose to use cycle 10-12 embryos because we aimed to identify epigenetic modulations directly transmitted through the maternal germline. Embryos in cycle 14-15 might reveal more profound changes, but since embryos in that stage had entered the zygotic phase and started the remodeling of histone modifications, we think it might mask the maternally transmitted changes we sought to identify.

      In addition, we conducted two biological replicates for each group for the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, Ing-Simmon et al., Nature Genetics, 2021). In the current study we further verified the genes identified in the ChIP-seq analysis in RNA-seq and qPCR analysis.

      We further verified the ChIP-seq results by using western blot, which showed a ~2 folds increase in H3K27me3 modification in HSD-F1 early embryos vs. ND-fed embryos, in line with the ChIP-seq data (Figure 2-figure supplement 1. B). We have also provided immunofluorescence results for embryos at cycle 13 and cycle 14, which clearly showed a significant increase in H3K27me3 modifications in HSD-F1 embryos (Figure 2-figure supplement 1. C).

      5) The authors measure PRC2 subunit mRNA levels in adult fly heads to attempt to explain the observed differences in inherited H3K27me3 levels in fly embryos. The authors should examine PRC2 components in germ cells and early embryos to understand how germ cells and early embryos generate H3K27me3 patterns.

      We have now shown that Pcl and E(z) mRNA expression in HSD-F0 flies were not significantly changed vs. ND-fed flies (Figure 2-figure supplement 2. D-G). Meanwhile, H3K27me3 demethylase UTX and H3K27ac acetyltransferase Cbp showed significant decrease (Figure 2-figure supplement 2. H). Therefore, HSD exposure imposed complex epigenetic modifications in HSD-F0 flies, which then led to transmission of epigenetic marks to their progeny. Given the main scope of this study was to understand which epigenetic program mediated the behavioral transgenerational inheritance upon ancestral HSD exposure (but not that mediated acute HSD exposure), we focused our effect on H3K27me3 which was significantly changed between HSD-F1 flies vs. ND-fed flies.

      6) The RNAi experiment targeting PRC2 components in embryos is uninterpretable without appropriate controls and an explanation of the genotypes used in the experimental paradigm. Are the authors crossing nosNGT mothers to UAS-RNAi fathers and assaying the progeny? What is the genotype of the F1 flies and how does it compare to the genotype of the ND flies? The authors should also note that the Gal4 drivers they use are not necessarily restricted to the ovary, and could directly affect other tissues controlling PER like neurons and muscle. Additionally, the authors should supply the appropriate controls to verify that their experimental paradigm has the intended effect. PRC2 proteins are presumably loaded into embryos and would be immune to zygotic-expressed RNAi. The authors could validate when PRC2 RNAi is effective by staining embryos for H3K27me3.

      We have now added schematic diagrams and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3-figure supplement 1. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls. We have also noted in the manuscript that the GAL4 drivers we used were not restricted to the ovary.

      We have now verified the effect of PRC2 knockdown to reduce H3K27me3 in female germline by both western blot and immunofluorescence staining (Figure 3. B-C).

      7) Although the authors do not note this, nosNGT>RNAi affects the PER of ND flies (compare Gal4>RNAi to just RNAi or just Gal4 in ND columns in Fig3A-D). This could be due to RNAi expression in neurons or muscles or some other indirect effect. Regardless of the mechanism, this result makes it difficult to interpret how RNAi treatments affect the transgenerational inheritance of PER if there is an equivalently strong nontransgenerational effect.

      Although nosNGT>RNAi appeared to slightly affect PER response of ND-fed flies, there was no statistically significant difference (Figure 3-figure supplement 1. B and D, Figure 3-figure supplement 2. A-B). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3-figure supplement 1. B), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      8) The matalpha gal4 experiment is inadequately explained in the text or methods. Are the authors expressing RNAi in the ovaries of the F0 flies that are fed an HSD? Does the ovary influence their PER somehow? Similar to point 8, there appears to be a nontransgenerational component to the RNAi phenotype that clouds the interpretation of the transgenerational effect (compare F0 in S3.1A-C).

      We have now added a schematic diagram and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls.

      Similar to point 7, although Mat-tub-GAL4>RNAi might seem to affect PER responses of ND-fed flies, there was no statistically significant difference (Figure 3. D-E). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3. D), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      9) For the EED inhibitor experiments (both PER and calcium imaging), it is unclear whether the authors fed the mothers or their adult progeny the EED inhibitor. If adult progeny were fed, what tissues were affected? The authors should stain various tissues with an H3K27me3 antibody to verify the effectiveness of their inhibitor. Finally, the effect of the EED inhibitor on calcium imaging was not convincing because the variation was so large.

      We have added a new schematic diagram and provided more detailed explanations in the manuscript for pharmacological interventions (Figure 4. A-D). To verify the effect of the drug treatment, we showed that compared to the control group fed with DMSO, flies fed with the inhibitor showed a significant decrease in H3K27me3 levels, demonstrating the effectiveness of the inhibitor (Figure 4-figure supplement 1. A).

      We acknowledged the unsatisfactory quality of our calcium imaging experiments in our initial submission. We have now improved our experimental procedures to reach better data quality, while the conclusions remained consistent (Figure 4. E).

      10) In all of the PRC2 RNAi and inhibitor experiments, are there any other phenotypes that would suggest that the treatments are working? There are many published PRC2 loss-offunction phenotypes (molecular and developmental) in different tissues. The authors could assure the reader that their treatments are working as expected by doing these controls.

      As discussed above, we have now used western blot and immunofluorescence staining to validate the efficiency of PRC2 RNAi in female germline (Figure 3. B-C).

      11) The authors propose that a transgenerationally inherited state of the caudal gene is responsible for the transgenerationally inherited PER. However, the experiments investigating the methylation state and expression level of caudal are unconvincing. Cad mRNA abundance varied immensely in the ND RNAseq samples. When the authors compared cad levels across generations, the effect size was small. A single outlier in the ND sample in both the RNAseq and the RTPCR experiments appears to drive up its mean and effect size. The H3K27me3 ChIP on cad is very similar in the F1 and ND samples and the acetylation peak on its promoter appears unchanged. The authors could vastly improve the caudal experiments in this paper by simply using cad antibodies to stain the relevant tissues that contribute to PER. For example, the authors could stain GR5a neurons for cad expression in different generations that inherit (or don't inherit) maternal PER to more accurately determine if cad levels are indeed transgenerationally regulated. The authors could also perform more ChIP experiments at a less variable stage to convincingly correlate epigenetic marks on cad with its expression level.

      As discussed above, we conducted two biological replicates for each condition of the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, IngSimmon et al., Nature Genetics, 2021). We have also performed western blot and immunofluorescence for H3K27me3 in ND vs. HSD-F1 embryos to further validate our ChIP-seq data (Figure 2-figure supplement 1. B-C).

      As for Cad gene, H3K27m3 signals showed a statistically significant difference between ND-fed and HSD-F1 flies (Figure 5. D). We have also conducted additional qPCR experiments to verify the gene expression changes of the Cad gene (Figure 5. F, right), which was in line with the ChIP-seq data and further supported its validity.

      It was worth noting that during the developmental time window of our ChIP-seq analysis, the acetylation signals in the promoter region of cad were very low (Figure 5. D), making it impossible to make a comparison.

      Reviewer #3 (Public Review):

      Jie Yang et al. investigated the transgenerational behavioral modification of a high-sugar diet (HSD) in Drosophila and revealed the underlying molecular and neural mechanisms. It has been reported that HSD exposure decreases sweet sensitivity in gustatory sensory neurons, resulting in reduced sugar response (Proboscis extension reflex, PER) in flies. The current study reports that this effect can be transmitted across generations through the maternal germline. Furthermore, the authors show that H3K27me3 modification is enhanced in the first-generation progenies of HSD-treated flies (F1), and genetical or pharmacological disruption of PCL-PRC2 complex blocks the behavioral change and restores the sweet sensitivity in the Gr5a+ sweet sensory neurons. The authors further analyze the differentially expressed genes in the F1 flies. Among H3K27me3 hypermethylated regions, they focus on homeobox genes and find a transcription factor Caudal (Cad), which shows decreased expression in the F1 flies. Knocking down Cad in Gr5a+ neurons results in decreased PER response to sucrose.

      Transgenerational changes in physiology and metabolism have been broadly studied, while inherited changes at the behavioral level are much less investigated. This work provides convincing evidence for transgenerational modification of feeding behavior and digs out the underlying molecular and neural mechanisms. However, there still are several concerns that need to be clarified.

      1) The epigenetic regulator PCR2 has been found to play an essential role in the 7d-HSDinduced modification of the PER response. In this study, it's important to clarify for the transgenerational change, whether epigenetic modification is required in the flies exposed to HSD (F0), the progenies (F1), or both. It would be very helpful for better interpretation if the procedures of HSD treatment in RNAi experiments and the drug treatments were stated in more detail. In addition, the F0 flies should be examined as the control.

      In this current study our main scope was to understand the transgenerational influence of HSD exposure on the progeny. To this aim, we chose to study the physiological and behavioral differences between ND-fed flies vs. HSD-F1 flies (and their progeny on ND). HSD-F1 flies (and their progeny) were not exposed to HSD in their whole life cycle and therefore the physiological and behavioral changes we observed vs. ND-fed flies could be solely attributed to epigenetic modifications transmitted via germline cells from HSD-F0 flies. Therefore ND-fed flies were used as the main control.

      As for HSD-F0 flies, the acute effects of HSD exposure could be more complex. Epigenetic factor was likely involved, as evident in Figure 3-figure supplement 1. C, Figure 3-figure supplement 3. A-B and Figure 4. C. In addition, HSD exposure might also directly affect gene expression and multiple signaling pathways in HSD-F0 flies (see Chen et al., Science China Life Sciences, 2020). Therefore, we did not aim to investigate how HSD exposure affected HSD-F0 flies in this current study. We have added additional discussions in the manuscript for clarification.

      With that saying, we still added more HSD-F0 flies as controls when needed (Figure 2-figure supplement 2. D-G, Figure 3-figure supplement 1. C, Figure 4. C, Figure 5. F, left).

      We have also modified the schematic diagrams and added more detailed explanations in the manuscript, in order to provide a clearer illustration of the experimental procedures (Figure 3. A, Figure 3-figure supplement 1. A, Figure 4. A, B and D). Specifically, we employed two different RNAi approaches. Firstly, we used genetic methods to obtain homozygous Mat-tub-gal4>UAS-gene X RNAi fly lines on chromosomes Ⅱ and Ⅲ for germline-specific knockdown (Figure 3, Figure 3-figure supplement 3). Secondly, we used heterozygous nosNGT-gal4>UAS-gene X RNAi flies for embryo-specific knockdown (Figure 3-figure supplement 1 and 2). Our drug experiments involved both treating the flies and measuring their PER (Figure 4. A-C) and treating the parental flies and measuring the PER of their progeny (Figure 4. D).

      2) The information on the drug treatment period is also missing for imaging experiments (Fig.4C). Moreover, the response curve is very different from those recorded in the same neurons in previous studies. What’s the reason? Please also provide a representative image showing which part of the Gr5a neurons is recorded.

      The experimental procedures of drug treatments were shown in Figure 4. A now. We fed adult flies with specific compounds for five days after eclosion, then measuring the calcium signals of Gr5a+ neurons when flies were fed with sucrose.

      As suggested by the reviewer, we have now conducted calcium imaging experiments more carefully and thoroughly. We have now added the new data into the revised manuscript and the conclusions remained consistent (Figure 4. E). We recorded the calcium signal in the axons of Gr5a+ neurons in the SEZ.

      3) It's unclear whether the decreased Cad expression upon HSD treatment specifically occurred in Gr5a+ neurons or a lot of cells. If the change in gene expression is significant in the qPCR test, it should occur in a large number of cells, most likely including different types of gustatory sensory neurons. If lower cad expression led to lower neural response and thereby lower behavioral response, how to specifically decrease the PER response to sucrose but not to other tastes? -whether HSD-induced desensitization is specific to sucrose in the offspring?

      We agree that Cad expression might decrease in a lot of cells including Gr5a+ neurons in the proboscis. In order to investigate whether taste perception other than sweet sensing was also affected, we conducted PER experiments with fatty acids, which was another type of appetitive taste cues like sugars. Perception of fatty acids is mediated by ionotropic receptors such as ir25a, ir76b, and ir56b (Ahn, et al., eLife, 2017, Brown., et al, eLife, 2021).

      Our results indicate that PER of fatty acid in HSD-F0 and HSD-F1 was not significantly reduced compared to the ND-fed controls (Figure 1-figure supplement 2. E-F). This suggests that the impact of Cad on gustatory sensory neurons might be specific to sweet sensitivity of Gr5a+ neurons.

      4) In Fig.2D, data are sorted for genomic regions showing an up-regulated modification of H3K27me. It's unclear whether similar sorting was performed in panel C. This needs to be clarified.

      The analysis shown in Figure 2C and 2D were linked. As for 2C, we identified genomic loci with enriched H3K27me3, H3K9me3, and H3K27ac peaks, and found that H3K27me3 peaks showed the most robust changes between ND-fed and HSD-F1 flies. Therefore we concentrated on these loci where H3K27me3 modifications were significantly changed between the two groups, and further analyzed their difference. As shown in Figure 2D, within these loci, H3K27ac modifications, which was functionally antagonizing to H3K27me3, were significantly reduced; whereas H3K9me3 signals within these loci remained unchanged. Such results confirmed that ancestral HSD exposure induced robust H3K27me3 modifications in certain genomic loci.

    1. AbstractTransformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism and its analysis provides valuable insights into gene regulation and biomarker identification. Several deep learning-based methods have been proposed to identify DNA methylation and each seeks to strike a balance between computational effort and accuracy. Here, we introduce MuLan-Methyl, a deep-learning framework for predicting DNA methylation sites, which is based on five popular transformer-based language models. The framework identifies methylation sites for three different types of DNA methylation, namely N6-adenine, N4-cytosine, and 5-hydroxymethylcytosine. Each of the employed language models is adapted to the task using the “pre-train and fine-tune” paradigm. Pre-training is performed on a custom corpus of DNA fragments and taxonomy lineages using self-supervised learning. Fine-tuning aims at predicting the DNA-methylation status of each type. The five models are used to collectively predict the DNA methylation status. We report excellent performance of MuLan-Methyl on a benchmark dataset. Moreover, we argue that the model captures characteristic differences between different species that are relevant for methylation. This work demonstrates that language models can be successfully adapted to applications in biological sequence analysis and that joint utilization of different language models improves model performance. Mulan-Methyl is open source and we provide a web server that implements the approach.Key points

      **Reviewer 2. Jianxin Wang **

      In this manuscript, the authors present MuLan-Methyl, a deep-learning framework for predicting 6mA, 4mC, and 5hmC sites. They use DNA sequence and taxonomic identity as features, and implement five popular transformer-based language models in MuLan-Methyl. MuLan-Methyl is open-sourced, and a web server is also provided for users to access it. Overall, I think the methodology of MuLan-Methyl is clear and innovative, and the experiments seem comprehensive. However, I do have several concerns that I believe should be addressed before the paper is accepted by GigaScience.

      Major 1. One major concern is that, in my opinion, DNA methylation is dynamic. Cytosines in the same position of the DNA sequence may have different methylation status in different samples, different cells, or even in different development stages of a cell. So, how can we predict the methylation status of a site based on only its sequence (and taxonomic identity)? -- The authors should clarify that in what cases, MuLan-Methyl (as well as other methods that use only DNA sequence) can be used to study DNA methylation, in Introduction or Discussion section. -- The authors discuss motifs in Fig. 3, but only for positive samples. How about the motif distribution in the negative samples? Can I understand that this method is actually for discovering motifs (or sequence structures) that are highly correlated with methylation? -- How is the performance of MuLan-Methyl without taxonomic identity? 2. The authors compared MuLan-Methyl against iDNA-ABF and iDNA-ABT, especially on the independent test set (Fig. 2E). I think the authors should clarify that whether they trained the models of the three methods using the same training datasets. If not, the authors should clarify the reason. 3. I'm curious about the computational efficiency of MuLan-Methyl. How many parameters in its model? Does MuLan-Methyl have advantages over other methods in terms of computational efficiency?

      Minor 1. I don't understand why the references were not ordered from 1 in the main text. 2. I suggest that the authors re-organize the Introduction section. There are too many small paragraphs in this section. 3. At the end of Page 2, "The type 4mC type is present in 4 species" should be corrected.

      Re-review:

      The authors have addressed most of my concerns. However, I still have one minor concern about the computational efficiency. The response of the authors is not convincing by only saying "The number of models that MuLan-Methyl need to train and test on is less than the others, thus it has better computational efficiency than other models to some extent". If possible, I strongly suggest that the authors show some data to compare how much time and resources (GPU/CPU/RAM) needed by each method. The authors have addressed most of my concerns. However, I still have one minor concern about the computational efficiency. The response of the authors is not convincing by only saying "The number of models that MuLan-Methyl need to train and test on is less than the others, thus it has better computational efficiency than other models to some extent". If possible, I strongly suggest that the authors show some data to compare how much time and resources (GPU/CPU/RAM) needed by each method.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to make them larger (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate the overall survival effect of a change in reproductive effort to be close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival, a key theme in life-history and the biology of ageing, and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than maybe considering the original hypothesis could be false or inflated in importance. The reviewer’s inclination to question the premise of the data in favor of a held hypothesis we consider not necessarily the best scientific approach here. In many places in our manuscript do we question and address issues in the underlying data and interpretation (L101-105, L149-150, 182-185 and L229-233). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival and we are aware that other trade-offs could counter-balance or explain our findings, discussed on L189-191 & L246-253. Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, with trade-offs, there are endless possiblilities of where a trade-off might be incurred between traits. We purposfully focus on the one well-studied and theorised trade-off. We clearly acknowledge though that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have done just that (L250-253).

      So whilst, we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a generalised trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations. We agree that there could be longer-term costs, and so our estimate of the survival cost for manipulated birds is likely to be an underestimate, meaning that our interpretation still holds – the cost to reproduce prevents individuals from laying beyond their optimal level. Note, however, that much theory is build on the immediate costs of reproduction and as such these costs are likely overinterpreted.

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 360-362, where we explicitly state that this is lifetime enlargement. Of course such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of annual costs incurred.

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We will add additional detail to this section in our revised manuscript.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa.

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori. We will address non-linear effects in our revised manuscript.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately made a tailored selection of studies that matched the manipulation studies (L279-282). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the observational component of our analysis and thereby fails to acknowledge the question being addressed in this study.

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 258–260, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds who have larger clutch sizes also live longer, and we suggest this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, but this is the first time we have been asked to put equations in a manuscript rather than explain them in terms that are accessible to a wider audience. Note however that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We do not think we need to repeat such equations and we cite the relevant data. For the simulation, we simply simulated the resulting effects and this is not something that we feel is captured more accurately in equations rather than in text and the associated graphs. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand the reviewer feels the simulations were not explained thoroughly. We will revise our text to see if we can add additional explanation where relevant in our revision.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. We will simplify the terminology and define terms in our revised manuscript.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      We would like to thank the reviewer for bringing to our attention the lack of clarity on the details of our methodology. We will make this more clear in our revised manuscript.

      For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: 1) overall quality effects connecting reproduction and parental survival are present 2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is right however that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L63-65, L85-88 & L237-240), but we do not quantify this as this is dependent on the unknown relationships between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there is some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation. Such information is however not available for all studies and although we explored also analyzing this, currently this is not possible to do with sufficient confidence. We will include this rationale in our revision.

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There has been unexplained difference in temperate (seasonal) and tropical reproduction strategies. Most of our data come from seasonal breeders however. Although there is some variation in second brooding and such often these species only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life-history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies, and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show quality is important and that the effect we find in experimental studies, is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within-species. We do agree that there is a lot more work that can be done in this area. We hope we contribute to this, by questioning this central trade-off. We will try and incorporate some of these suggestions in the revision where possible.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we aim to do so in the revision.

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper discussion and we will add detail to our discussion in our revised manuscript.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there isn’t one. First, and importantly, we do find a trade-off but show this is only incurred when individuals lay beyond their optimal level. Secondly, we also state on lines 258-260 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. To appreciate whether this effect alone can explain why reproduction is constrained, we ran the simulations. From these simulations we find that this effect size is too small to explain the constraint so something else must be going on and we do spend a considerable amount of text discussing the possible explanations (L182-194). Note the possibly most parsimonious conclusion here is that costs of reproduction are not there so we also give that explanation some thought (L201-205 and L247-253).

      We are disappointed by the suggestion that we have dismissed complicating factors which could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We will add further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory. Although we do feel we have addressed this (L248-255).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L248-250). We would also like to highlight that in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L249 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So without a priori knowledge on this we kept our model simple. To test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort likely does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L246-250). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We woud like to thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here.The question we pose is “why all birds don’t lay at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained at such a level? As the reviewer outlines it ranges massively with some birds laying half of what other birds lay.

    1. How willing are we to acknowledge that our institutions, both their structures and cultures, have a history of, and may still in many ways be unsupportive and/or hostile to our students and their communities?

      I completely agree with the following quote, I feel like this relates to the education system a lot for 60+ years when POC started to receive education from schools. I believe there have been positive changes resulting in them attending school. But I also believe that the education system in the US negatively continues to fail them since POC are the minorities in America and before were the poorest people on the planet. And still to his day the education system fails to help under privileged students succeed in these social institutions like in schools. Especially because I feel like the education system has not been changed for years and is outdated and only very recently there has been few changes to change it. But I think we need to reform the policies to help under privileged students attending schools in America by first acknowledging insutions failed POC. And secondly, reform the information. And lastly, create more welcome in groups at schools to help them succeed socially and many other things as well to help students in the future.

    1. If I am understanding this right - I think the potential dilemma that arises from professional versus local archaeology is interesting. Local archeology efforts could provide insight into the past that would've gone unresearched otherwise, but with lower budgets and potentially greater mistakes (due to it not being 'professional quality' (?)) could harm the items in the dig site. Professional archaeology affords research in key places, motivated by economics, politics, or culture, and allows for the use of advanced techniques such as carbon dating. Unfortunately, this may remove the discovery that many of the locals of the area would've likely enjoyed performing (since it is their roots). How are we to weigh preservation, quality, and ethics together to form the idea of 'just' archeology?

    Annotators

    1. Back in 1945, there was this guy, Vannevar Bush. He was working for the US government, and one of the ideas that he put forth was, 00:01:35 "Wow, humans are creating so much information, and we can't keep track of all the books that we've read or the connections between important ideas." And he had this idea called the "memex," where you could put together a personal library of all of the books and articles that you have access to. And that idea of connecting sources captured people's imaginations.
      • for: memex, Vannevar Bush, Indyweb, Ted Nelson
    1. The Science Behind Hydrogen Rich Water Machine

      In the health and wellness world, a fascinating trend has emerged with the rise of hydrogen infused water machine. These innovative devices promise to deliver a refreshing beverage beyond ordinary hydration – hydrogen-rich water. Packed with potential health benefits, the science behind these machines is captivating and sheds new light on how we think about water consumption and its impact on our well-being.

      Hydrogen: The Unsung Hero Of Molecules

      Before delving into the science of hydrogen-rich water machines, it's essential to understand the pivotal role of hydrogen itself. Hydrogen is the lightest and simplest element on the periodic table, consisting of a single proton and an electron. While hydrogen is generally known for its explosive nature, it has recently garnered attention for its potential health benefits when dissolved in water.

      The Power Of Hydrogen-Infused Water

      Hydrogen-infused water, often called hydrogen-rich water, is created when molecular hydrogen gas (H2) is dissolved into plain water. This process typically involves using advanced technologies found in hydrogen-rich water machines. The resulting beverage is touted for its potential antioxidant properties, which could contribute to various health improvements.

      Antioxidant Action: Hydrogen's Hidden Potential

      Antioxidants are essential for neutralizing dangerous chemicals known as free radicals, which may damage cells and contribute to a variety of health problems such as chronic illnesses and ageing. Molecular hydrogen is thought to have antioxidant characteristics that are more effective than well-known antioxidants such as vitamins C and E.

      Hydrogen's unique antioxidant potential lies in its ability to easily penetrate cell membranes and access cellular compartments, including the nucleus and mitochondria. This attribute gives hydrogen an edge in protecting cellular components from oxidative stress, potentially reducing the risk of oxidative damage.

      The Mechanism: How Hydrogen Works Its Magic

      The exact mechanism behind hydrogen's antioxidant effects is still an area of ongoing research, but several theories have been proposed. One prominent theory suggests that hydrogen is a selective scavenger of harmful free radicals, targeting the most reactive and damaging ones without affecting beneficial molecules like oxygen or nitric oxide.

      Another theory is that hydrogen has the power to modify signalling pathways within cells. By altering these pathways, hydrogen may elicit preventive responses that boost the body's natural defence systems against oxidative stress and inflammation.

      Hydrogen-Rich Water Machines: The Technology

      Hydrogen-rich water machines are designed to harness the power of molecular hydrogen by infusing it into plain drinking water. These devices commonly use electrolysis, which involves sending an electric current through water to divide it into hydrogen and oxygen gases. The hydrogen gas is subsequently dissolved in water, yielding a beverage high in this beneficial chemical.

      These machines are equipped with advanced membranes that allow only hydrogen molecules to pass through while preventing the escape of potentially harmful byproducts like ozone. This ensures the purity and safety of the resulting hydrogen-infused water.

      Potential Health Benefits

      While research on the health benefits of hydrogen-rich water is still in its infancy, preliminary studies have shown promising results. Some of the potential benefits include the following:

      Antioxidant Defense: Hydrogen-rich water's antioxidant properties could help reduce oxidative stress and associated health risks. Anti-Inflammatory Effects: Hydrogen may have anti-inflammatory effects that could benefit conditions like arthritis and other inflammatory disorders. Cellular Health: Hydrogen might contribute to overall cellular health and function by protecting cellular components. Exercise Performance: Some research suggests that hydrogen-rich water might enhance exercise performance and reduce muscle fatigue. Conclusion: A Glimpse Into The Future Of Hydration

      Hydrogen-rich water machines are ushering in a new era of hydration, where molecular hydrogen's benefits are harnessed to enhance our well-being potentially. While more research is needed to understand the extent of these benefits fully, the early findings are exciting and have sparked interest among health-conscious individuals.

      As technology advances, we can anticipate more refined hydrogen-infused water machines and a deeper understanding of how molecular hydrogen interacts with our bodies. Whether you're an early adopter or a cautious observer, the science behind these machines invites us to explore the intriguing potential of hydrogen-infused water and its impact on our health.

    1. Reviewer #1 (Public Review):

      Summary:<br /> This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths:<br /> The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.

      Weaknesses:<br /> As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.

      1. The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).

      a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.

      b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc. It would be important when comparing scene-specific neural patterns as templates for reinstatement across conditions that, at the time of scene presentation itself, the two conditions are equal (e.g., no difference in familiarity and so on); otherwise, we do not know which trial period (and therefore which underlying process) is driving the differences.

      c. For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well. (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach. (3) I believe the fixation cross with itself is included in the "within category" score. Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1). (4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.

      2. I did not see any compelling statistical evidence for the claim of less robust consolidation in children. Specifically in terms of the behavioural results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioural differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be taken into account when interpreting the findings.

      3. Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately, but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.

      4. To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?

      5. Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?

      6. Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.

      7. Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.

      8. There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.

      9. The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.

      10. In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.

      a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.

      b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.

      c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.

      d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.

    1. Author Response

      We thank the reviewers for their work and their thoughtfulness. However, it seems to us that much (but not all) of the critique reflects a misunderstanding of the goals and methods of computational modeling. Details are below. We are grateful for the opportunity to include our views about this in the context of our replies to the Public Critiques of our paper. The comments of the reviewers were very helpful in allowing us to see what might not be clear to our readers.

      eLife assessment

      This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala.

      Most of our comments below are intended to rebut the sentence: “The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered”. Details are below in the answer to reviewers.

      We believe this work will be interesting to investigators interested in dynamics associated with plasticity, which goes beyond fear learning. It will also be of interest because of its emphasis on the interactions of multiple kinds of interneurons that produce dynamics used in plasticity, in the cortex (which has similar interneurons) as well as BLA.

      We note that the model has sufficiently detailed physiology to make many predictions that can be tested experimentally. In the revision, we will be more explicit about this.

      We thank Reviewer #1 for stressing our work's important contribution to providing concrete hypotheses that can be tested in vivo and highlighting the importance of examining in the future the synergistic role of the interneurons in the BLA in fear learning in the BLA. The weaknesses reported by the Reviewer concern deviations of the model compared to the experimental literature. We describe below why we think those differences are minor in the context of the aims of our model. Specifically,

      1) Some connections among neurons in the BLA reported by (Krabbe et al., 2019) have not been taken into account in the model. Some connections between cell types were excluded without adequate justification (e.g. SOM+ to PV+).

      In order to constrain our model, we focused on what is reported in (Krabbe et al., 2019) in terms of functional connectivity instead of structural connectivity. Thus, we included only those connections for which there was strong functional connectivity. For example, the SOM+ to PV+ connection is shown to be small (Supp. Fig. 4, panel t). We also omitted PV+ to SOM+, PV+ to VIP+, SOM+ to VIP+, VIP+ to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning. See below for comments on modeling strategies. We will explain this better in our revision.

      2) The construction of the afferent drive to the network does not reflect the stimulus presentations that are given in fear conditioning tasks. For instance, the authors only used a single training trial, the conditioning stimulus was tonic instead of pulsed, the unconditioned stimulus duration was artificially extended in time, and its delivery overlapped with the neutral stimulus, instead of following its offset. These deviations undercut the applicability of their findings.

      Regarding the use of a single long presentation of US rather than multiple presentations (i.e., multiple trials): in early versions of this paper, we did indeed use multiple presentations. We were told by experimental colleagues that the learning could be achieved in a single trial. We note that, if there are multiple presentations in our modeling, nothing changes; once the association between CS and US is learned, the conductance of the synapse is stable. Also, our model does not need a long period of US if there are multiple presentations. This point will be made clearer in our revision.

      We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like Poisson.

      Our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US coterminates with CS (Lindquist et al., 2004), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs existing in the literature, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect, as suggested in the Discussion of our paper, or by metabotropic effects as suggested above, or by the contribution from other brain regions. We will emphasize in our revision that the overlap in time, however instantiated, is a hypothesis of our model. It is hard to see how plasticity can occur without some memory trace of US. This is a consequence of our larger hypothesis that fear learning uses spike-timing-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature. We will discuss these points in more detail in our revision.

      We thank Reviewer #2 for their comments. Below, we reply to each of them:

      1) Gamma oscillations are generated locally; thus, it is appropriate to model in any cortical structure. However, the generation of theta rhythms is based on the interplay of many brain areas therefore local circuits may not be sufficient to model these oscillations. Moreover, to generate the classical theta, a laminal structure arrangement is needed (where neurons form layers like in the hippocampus and cortex)(Buzsaki, 2002), which is clearly not present in the BLA. To date, I am not aware of any study which has demonstrated that theta is generated in the BLA. All studies that recorded theta in the BLA performed the recordings referenced to a ground electrode far away from the BLA, an approach that can easily pick up volume conducted theta rhythm generated e.g., in the hippocampus or other layered cortical structure. To clarify whether theta rhythm can be generated locally, one should have conducted recordings referenced to a local channel (see Lalla et al., 2017 eNeuro). In summary, at present, there is no evidence that theta can be generated locally within the BLA. Though, there can be BLA neurons, firing of which shows theta rhythmicity, e.g., driven by hippocampal afferents at theta rhythm, this does not mean that theta rhythm per se can be generated within the BLA as the structure of the BLA does not support generation of rhythmic current dipoles. This questions the rationale of using theta as a proxy for BLA network function which does not necessarily reflect the population activity of local principal neurons in contrast to that seen in the hippocampus.

      In both modeling and experiments, a laminar structure does not seem to be needed to produce a theta rhythm. A recent experimental paper, (Antonoudiou et al. 2021), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. The authors draw this conclusion by looking at mice ex vivo slices. The currents that generate these rhythms are in the BLA, since the hippocampus was removed to eliminate hippocampal volume conduction and other nearby brain structures did not display any oscillatory activity. Also, in the modeling literature, there are multiple examples of the production of theta rhythms in small networks not involving layers; these papers explain the mechanisms producing theta from non-laminated structures (Dudman et al., 2009, Kispersky et al., 2010, Chartove et al. 2020). We are not aware of any model description of the mechanisms of theta that do require layers.

      2) The authors distinguished low and high theta. This may be misleading, as the low theta they refer to is basically a respiratory-driven rhythm typically present during an attentive state (Karalis and Sirota, 2022; Bagur et al., 2021, etc.). Thus, it would be more appropriate to use breathing-driven oscillations instead of low theta. Again, this rhythm is not generated by the BLA circuits, but by volume conducted into this region. Yet, the firing of BLA neurons can still be entrained by this oscillation. I think it is important to emphasize the difference.

      Many rhythms of the nervous system can be generated in multiple parts of the brain by multiple mechanisms. We do not dispute that low theta appears in the context of respiration; however, this does not mean that other rhythms with the same frequencies are driven by respiration. Indeed, in the above answer we showed that theta can appear in the BLA without inputs from other regions. In our paper, the low theta is generated in the BLA by VIP+ neurons. Using intrinsic currents known to exist in VIP+ neurons (Porter et al., 1998), modeling has shown that such neurons can intrinsically produce a low theta rhythm. This is also shown in the current paper. This example is part of a substantial literature showing that there are multiple mechanisms for any given frequency band. We will emphasize these points in our revision; we note that, for any individual case, such as this one, the mechanism needs to be tested experimentally.

      3) The authors implemented three interneuron types in their model, ignoring a large fraction of GABAergic cells present in the BLA (Vereczki et al., 2021). Recently, the microcircuit organization of the BLA has been more thoroughly uncovered, including connectivity details for PV+ interneurons, firing features of neurochemically identified interneurons (instead of mRNA expression-based identification, Sosulina et al., 2010), synaptic properties between distinct interneuron types as well as principal cells and interneurons using paired recordings. These recent findings would be vital to incorporate into the model instead of using results obtained in the hippocampus and neocortex. I am not sure that a realistic model can be achieved by excluding many interneuron types.

      The interneurons and connectivity that we used were inspired by the functional connectivity reported in (Krabbe et al., 2019) (see above answer to Reviewer #1). As reported in (Vereczki et al., 2021), there are multiple categories and subcategories of interneurons; that paper does not report on which ones are essential for fear conditioning. We did use all the highly represented categories of the interneurons, except NPY-containing neurogliaform cells.

      The Reviewer says “I am not sure that a realistic model can be achieved by excluding many interneuron types”. We agree with the Reviewer that discarding the introduction of other interneurons subtypes and the description of more specific connectivity (soma-, dendrite-, and axon-targeting connections) may limit the ability of our model to describe all the details in the BLA. However, this work represents a first effort towards a biophysically detailed description of the BLA rhythms and their function. As in any modeling approach, assumptions about what to describe and test are determined by the scientific question; details postulated to be less relevant are omitted to obtain clarity. The interneuron subtypes we modeled, especially VIP+ and PV+, have been reported to have a crucial role in fear conditioning (Krabbe et al., 2019). Other interneurons, e.g. cholecystokinin and SOM+, have been suggested as essential in fear extinction. Thus, in the follow-up of this work to explain fear extinction, we will introduce other cell types and connectivity. In the current work, we have achieved our goals of explaining the origin of the experimentally found rhythms and their roles in the production of plasticity underlying fear learning. Of course, a more detailed model may reveal flaws in this explanation, but this is science that has not been yet done.

      4) The authors set the reversal potential of GABA-A receptor-mediated currents to -80 mV. What was the rationale for choosing this value? The reversal potential of IPSCs has been found to be -54 mV in fast-spiking (i.e., parvalbumin) interneurons and around -72 mV in principal cells (Martina et al., 2001, Veres et al., 2017).

      A GABA-A reversal potential around -80 mV is common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020). Other computational works of the amygdala, e.g. (Kim et al., 2016), consider GABA-A reversal potential at -75 mV based on the cortex (Durstewitz et al., 2000). The papers cited by the reviewer have a GABA-A reversal potential of -72 mV for synapses onto pyramidal cells; this is sufficiently close to our model that it is not likely to make a difference. For synapses onto PV+ cells, the papers cited by the reviewer suggest that the GABA-A reversal potential is -54 mV; such a reversal potential would lead these synapses to be excitatory instead of inhibitory. However, it is known (Krabbe et al., 2019; Supp. Fig. 4b) that such synapses are in fact inhibitory. Thus, we wonder if the measurements of Martina and Veres were made in a condition very different from that of Krabbe. For all these reasons, we consider a GABA-A reversal potential around -80 mV in amygdala to be a reasonable assumption. We will discuss these points in our revision.

      5) Proposing neuropeptide VIP as a key factor for learning is interesting. Though, it is not clear why this peptide is more important in fear learning in comparison to SST and CCK, which are also abundant in the BLA and can effectively regulate the circuit operation in cortical areas.

      We do not think that VIP is necessarily more fundamental in fear learning, and certainly not for fear extinction. We will make this clear in the revision.

      We thank Reviewer #3 for their comments and for recognizing that we achieved our modeling aims. We reply to the criticisms below.

      Weaknesses:

      The main weakness of the approach is the lack of experimental data from the BLA to constrain the biophysical models. This forces the authors to use models based on other brain regions and leaves open the question of whether the model really faithfully represents the basolateral amygdala circuitry. Furthermore, the authors chose to use model neurons without a representation of the morphology. However, given that PV+ and SOM+ cells are known to preferentially target different parts of pyramidal cells and given that the model relies on a strong inhibition form SOM to silence pyramidal cells, the question arises whether SOM inhibition at the apical dendrite in a model representing pyramidal cell morphology would still be sufficient to provide enough inhibition to silence pyramidal firing. Lastly, the fear learning relies on the presentation of the unconditioned stimulus over a long period of time (40 seconds). The authors justify this long-lasting input as reflecting not only the stimulus itself but as a memory of the US that is present over this extended time period. However, the experimental evidence for this presented in the paper is only very weak.

      Many of these issues were addressed in the previous responses.

      1) Our neurons were constrained by electrophysiology properties in response to hyperpolarizing currents in the BLA (Sosulina et al., 2010). We choose the specific currents, known to be present in these neurons, to replicate those responses.

      2) Though a much more detailed description of BLA interneurons was given in (Vereczki et al., 2021), it is not clear that this level of detail is relevant to the questions that we were asking, especially since the experiments described were not done in the context of fear learning.

      3) It is true that we did not include the morphology, which undoubtedly makes a difference to some aspects of the circuit dynamics. As we described above, modeling requires the omission of many details to bring out the significance of other details.

      4) As described above, some form of memory or overlap in the activity of the excitatory projection neurons is necessary for spike-timing-dependent plasticity. In modeling, one must be specific about hypotheses, and describe why they are plausible, if not proved; indeed, modeling can explain known phenomena by showing how they are consequences of some (plausible) hypotheses, which themselves are open to experimental verification.

      5) The 40 seconds is not necessary if there are multiple presentations.

      Other critiques:

      1) It is correct that PV+ and SOM+ preferentially target different parts of excitatory projection neurons and that the model relies on a strong inhibition from SOM+ and PV+ to silence the excitatory projection neurons. This choice of parameters comes from using simplified models: it is standard in modeling to adjust parameters to compensate for simplifications.

      2) The SOM+ inhibition of the pyramidal cell firing can be seen as a hypothesis of our model. It is well known that VIP+ cells disinhibit pyramidal cells through inhibition of SOM+ and PV+ cells, which is all we are using in our model; hence this hypothesis is generally believed.

      The authors achieved the aim of constructing a biophysically detailed model of the BLA not only capable of fear learning but also showing spectral signatures seen in vivo. The presented results support the conclusions with the exception of a potential alternative circuit mechanism demonstrating fear learning based on a classical Hebbian (i.e. non-depression-dominated) plasticity rule, which would not require the intricate interplay between the inhibitory interneurons. This alternative circuit is mentioned but a more detailed comparison between it and the proposed circuitry is warranted.

      We agree with the reviewer that it would be good to have a more detailed comparison with the classical Hebbian rule (non-depression-dominated rule). However, we demonstrated in Supplementary Materials that the non-depression-dominated rule is less robust and only operates within a limited window of PV+ excitation. We will have a more robust discussion of plasticity in the revision.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      The authors report the first use of the bacterial Tus-Ter replication block system in human cells. A single plasmid containing two divergently oriented five-fold TerB repeats was integrated on chromosome 12 of MCF7 cells. ChIP and PLA experiments convincingly demonstrate the occupancy of Tus at the Ter sites in cells. Using an elegant Single Molecule Analysis of Replicated DNA (SMARD) assay, convincing data demonstrate the replication block at Ter sites dependent on the presence of the protein. As an orthogonal method to demonstrate fork stalling, ChIP data show the accumulation of the replicative helicase component MCM3 and the repair protein FANCM around the Ter sites. It is unclear whether the Ter sites integrated by a single copy plasmid have any effect on the replication of this region but the data show that the observed effects are dependent on expression of the Tus protein. The SMARD data do not reveal what proportion of forks are arrested at Tus/Ter, or how long the fork delay is imposed. Fork stalling led to a highly localized gammaH2AX response, as monitored by ChIP using primer pairs spread along the integrated plasmid carrying the Ter sites. This response was shown to be dependent on ATR using the ATR inhibitor VE-822. This contrasts with a single Cas9-induced DSB between the two Ter sites, which causes a more spread gammaH2AX response. While this was monitored only at a single distal site, the difference between the DSB and the Tus-induced stall is very significant. Interestingly, despite evidence for ATR activation through the gammaH2AX response, no evidence for phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33 could be found under fork stalling conditions. The global replication inhibitor hydroxyurea (HU) elicited phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33. In this context, it would have been of interest to examine if a single DSB in the Ter region leads to phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33 and cell cycle arrest. It is not shown whether the replication inhibitor HU leads to the same widely spread gamma H2AX response. Overall, this is a well written manuscript, and the data provide convincing evidence that the Tus-Ter system poses a site-specific replication fork block in MCF7 cells leading to a localized ATR-dependent DNA damage checkpoint response that is distinct from the more global response to HU or DSBs.

      Author response to public review:

      “It is unclear whether the Ter sites integrated by a single copy plasmid have any effect on the replication of this region but the data show that the observed effects are dependent on expression of the Tus protein.”

      -The lack of perturbation of the TerB sequence on fork progression has extensively been studied previously in both Willis et al, 2014 and Larsen et. al, 2014. Furthermore, as the detection of the SMARD signal at the TerB sites is dependent on the 7.5kb probe that spans the TerB sites (orange probe, Fig 2B & 2D), it would be impossible to study the effect on replication in this region, with and without the integration of the single copy plasmid.

      “The SMARD data do not reveal what proportion of forks are arrested at Tus/Ter, or how long the fork delay is imposed.”

      -The percentage of fork stalling at the TerB sites, with and without Tus expression, has been quantified in Figure 2E & 2F. Essentially, 36% forks stall at the TerB block, i.e. 18% of the forks stall in both the 5’ to 3’ (orange) and 3’ to 5’ (blue) direction when the Tus-TerB block is active.

      “It is not shown whether the replication inhibitor HU leads to the same widely spread gamma H2AX response.”

      -While we have not shown gH2AX accumulation via ChIP after HU treatment, Supplementary Figure 5A & 5B clearly show increased gH2AX foci when the cells are treated with HU, suggesting a global replication stress response that is in stark contrast to the response to Tus-TerB.

      Recommendations for the authors:

      Lines 78, 95: In the experimental set-up there are two divergent 5-TerB sites in the orientation that is non-permissive for the fork progression notwithstanding the direction. This raises an obvious question: How an intervening (~1kb-long) DNA segment in being replicated? Does it stay under-replicated and then break?

      -The reviewers pose an important question about how the intervening sequence flanked by the two TerB sites is replicated, and if this leads to formation of anaphase bridges resulting in breaks. We think this is very plausible and this very question is part of ongoing studies in the lab with the aim to understand how the cell resolves a site-specific block. Unfortunately, this falls outside the scope of the current study.

      Also, it is unclear what is meant with non-permissive orientation. This depends on the predominant replication direction. As the construct has Ter repeats in opposite orientation, any direction is non-permissive. These descriptions could be rephrased to avoid confusion

      -The text has been edited to clarify this.

      Fig 1A: It would be helpful to annotate the map to show the position of each primer relative to the Ter array. Why is there no signal for pp52?

      -Figure 1A has the map of the locus with the annotated primer pairs and their relative positions to the TerB array.

      -pp52 is positioned beyond the TerB array so binding of the Tus-His protein there is unlikely, confirming the specificity of the Tus binding to only the TerB array and not to the adjacent chromatin.

      Figure 1B: Change Tus to Tus-His to make it easier to understand that the anti-His ChIP is targeting Tus. Provide information what normalization method was used in the ChIP experiments.

      -Figure 1B has been edited to reflect this change

      Line 113: Willis et al. 2014 also worked with chromosomal Ter sites, which should be acknowledged here.

      The text has been modified to indicate this. We apologize for the oversight.

      Line 126: Define pWB15 and its significance in text.

      -The text has been edited to clarify this and mentions pWB15.

      Figure 2E, F: Define legend (blue, orange boxes and arrow heads).

      -The figure legend corresponding to Figure 2 has a detailed description of the boxes and the arrows.

      Figure 3E, 4C: Add map of primers like in Figures 1 and 2.

      -The map added to Figures 3 & 4 and text updated.

      Figure 4: Showing that the gammaH2AX response is spread like with the single DSB would bolster the conclusion about the difference between a local and global response. Fig 4A, Lane-3: A loading control for the chromatin fraction is missing.

      -Measuring gH2AX chromatin spread after global replication stress can be challenging. We have tried to address the question of global and local gH2AX response post replication stress by quantifying gH2AX foci in cells treated with and without hydroxyurea, comparing it with cells that have a functional Tus-TerB block (Supplementary Figure 5A& 5B). A single fork block seems to only elicit a local response while a global replication stress leads to gH2AX accumulation globally in the cell.

      -Lamin A/C has been added to Fig 4A as a loading control for the chromatin fraction.

      Figure S4: Analyzing ATR, CHK1 and RPA phosphorylation as well as cell cycle profile under single DSB condition may reveal that different localized responses exist. I mention this because it was reported in yeast that a single DSB in G1 cells leads to a similarly localized Mec1 (ATR) -dependent response that does not elicit phosphorylation of Rad53 (CHK1) and other downstream targets, but leads to H2A phosphorylation as well as phosphorylation of RPA and the Rad51 paralog Rad55 (see PMCID: PMC2853130). It might be of interest to the reader to discuss this publication and the commonalities and differences between both localized checkpoint response

      -The reviewers raise an interesting question about the phosphorylation of ATR/CHK1/RPA and its effect on cell cycle after a single DSB. The aim of using the Cas9 break site in this study was merely to corroborate previously published observations pertaining to the spread of gH2AX after a DSB and to contrast that with the local response seen with Tus-TerB. Thus, while an intriguing question, we do not think this particular experiment will help in the understanding of the localized checkpoint response after a single replication fork block. However, we have included the observations previous published in the yeast system (PMC2853130) in our discussion as it helps compare and contrast fork blocks and DSBs further. It is of worth though that the yeast studies were looking at the cellular response to a DSB in G1.

      Lines 256-260: In the discussion of ATRIP, unpublished data are discussed that show no increase in ssDNA. What is the effect of ATRIP depletion? Maybe delete this mention of unpublished data, if no new data can be provided. The authors are aware that this makes the mechanism of ATR activation at the 5-TerB site elusive.

      -This statement has been deleted and the text has been modified.

      Another possibility discussed by the authors is fork reversal. Since Tus/Ter complex block the CMG progression, fork reversal would result in a chicken foot structure with the long single-stranded 3'-overhang of an Okazaki fragment site. Such a structure should be protected by BRCA2 or RAD52 proteins from degradation. Any role for these proteins in the checkpoint activation at the TerB site?

      -The reviewers suggest an interesting scenario where the Tus-TerB block induced reversed fork structure could be protected by the loading of known DNA repair proteins and this in turn could lead to a signaling mechanism and checkpoint activation. While we have not tested this hypothesis, nor studied the temporal dynamics of the formation if the reversed fork with respect to gH2AX accumulation, we think the localized gH2AX signal observed in the vicinity of the block is what initiates the downstream DDR response, promoting fork stabilization, followed either by fork reversal and restart or fork collapse. If the reversed fork was responsible for the gH2AX signaling, one would envision the spread to be more widespread, perhaps decorating the entire stretch of DNA between the block and the reversed fork. However, further studies are warranted to tease out this mechanism and the spatio-temporal dynamics.

      Lines 292-294: The authors state that "unpublished work from our laboratory has demonstrated that replication forks are cleaved at or near the TerB site..." Unless the data are shown, it might be best to eliminate discussion of unpublished work, also because the occurrence of DNA ends at Ter sites was already described in Willis et al. 2017.

      -The statement has been deleted and Willis et al. 2017 has been referenced.

      Suppl Table 1: It would help to also show representative images of stretched fibers in addition to the summary data shown.

      -Since the data is negative, the fiber images do not show any discernible differences and we do not think it adds useful information.

      Suppl Fig 4. ChIP for gamma H2AX data. It would be helpful to show the distribution of the gamma H2AX signal along the chromosome for both the DSB response and the Tus/Ter response.

      -The gH2AX ChIP signal at PP0-2 and PP10 has been included in Supplementary Fig4D. Though not significant for PP0-2, the data strongly suggests that there is increased spread of gH2AX along the chromosome after a DSB, strongly contrasting with the response after Tus-TerB block. The text has been modified to include both primer pairs.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer comments

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Summary __

      The manuscript by Parker et al addresses the important question of how different organisms have evolved pre-messenger RNA systems that are either more or less complex. This question underlies the evolution of complex organisms and the genome adaptation of simple organisms to their specific environments, so is an important question to answer. This manuscript now provides the underlying molecular mechanisms of how 5' splice site sequence preference may have evolved which is both an interesting and exciting advance for the field.

      We thank the reviewer for these kind comments.

      __Major comments __

      __This manuscript builds on the previous work from this group where they identified the role of adenosine N6 methylation (m6A) of the U6 small nuclear RNA (snRNA) of the spliceosome by METTL16 as being important for 5' splice site selection. This work led to the speculation that loss of a METTL16 ortholog, or potentially other splicing factors, in some species could contribute to an evolutionary change in 5' splice site sequence preference. Here the authors now use the power of phylogenetics, interspecies association mapping and the available spliceosome structures to provide convincing conclusions that 5' splice site sequence preferences in the extensive number of organisms examined correlate with the presence of the U6 snRNA methyltransferase METTL16 and the splicing factor SNRNP27K. __

      __An analysis of METTL16 conservation was first carried out by comparing the METTL16 methyltransferase domain (MTD) in 29 diverse eukaryotic species. All the METTL16 orthologs were found to have either one or two globular domains. Three domain types were identified and compared in detail. What was not clear from this analysis was the functional significance of orthologs having either one or two domains. __

      We identified several species, including Drosophila melanogaster, whose METTL16 orthologs do not contain a VCR domain. However, in this study we do not draw specific conclusions about the functional significance of orthologs having different domain topologies.

      __In addition, while this analysis provides important new information on the domain structure of METTL16 orthologs, especially where these domains had not been identified previously, the link between this section of the results and the following sections is not that apparent. __

      We agree that there is a significant difference in approach between the first section of the Results and the following sections. However, we are keen to keep this part of the manuscript because it provides an orthogonal line of evidence suggesting that the ancestral role of METTL16 in eukaryotes is specifically the methylation of U6 snRNA.

      __Next novel bioinformatics pipelines were developed to compare both introns and orthologous groupings of protein coding genes between 227 Sacchromycotina genomes as well as 13 well-annotated eukaryote genomes. First, the 5' splice site sequence preference was compared and clearly indicates that the +4 position has the greatest variation in preferences within the Sacchromycotina. The ability to now compare a large number of genomes has provided novel information on the evolution of the 5' splice site sequence and the conclusion that there is more complexity to the 5' splice site in fungi that previously recognized. While it is apparent why only the 5' splice site signal was investigated here, with its relationship to the U6 snRNA and METTL16, it seems a shame the other splice site sequences were not analyzed using this novel pipeline. In any case, the complexity of the 5' splice site +4 position now allows, for the first time, interesting interspecies association studies. __

      We have now included the variance plots for 3’SS motifs (analogous to the 5’SS variance plots shown in Figure 2B) as Figure 2 supplementary figure 4A, and a traitgram for 3’SS -3C to U ratio as Figure 2 supplementary figure 4B. We have included a short section of text in the Results section to describe these additional findings.

      __With ____the 5' splice site +4 variation identified, the next step was to determine the underlying molecular mechanisms that dictate the evolution of the various sequence preferences. Some obvious players here are the U1 and U6 snRNAs which directly interact with the 5' splice site during splicing. However, no association was found between these snRNAs and the 5' splice site +4 sequence. __

      __The powerful interspecies association mapping was then used to determine whether the presence or absence of METTL16 ortholog or a splicing factor correlated with the 5' splice site +4 sequence variation. Interestingly, a clear association was found between METTL16 and the 5' splice site +4 position; METTL16 presence was associated with +4A at the 5' splice site and METTL16 absence was associated with +4U at the 5' splice site. This is an exciting and significant finding. __

      We thank the reviewer for these comments on the importance of this study.

      __Interestingly, the next most significant association with the 5' splice site +4 position was with SNRNP27K. This result makes sense as in the cryo-EM structure of the pre-B spliceosome complex the C-terminal domain of SNRNP27K is found near the region of the U6 snRNA that will interact with the 5' splices site. Absence of SNRNP27K was associated with an increased preference for +4U at the 5' splice site. Now the real power of the interspecies association mapping was demonstrated by investigating whether any association could be determined specifically within the C-terminus of SNRNP27K. Significantly, the methionine 141 position in SNRNP27K was found to be associated with the +4 position of the 5' splice site. This finding fits nicely with previous studies where mutation of M141 caused a shift in 5' splice site selection away from +4A 5' splice sites, to 5' splice sites without +4A. What is not clear is whether M141 is conserved or invariant between all the species that were compared? __

      M141 is not completely conserved across the species that were compared for the SNRNP27K C-terminus analysis. We did not test positions with very strong sequence conservation, because without variation in both the genotype and phenotype it is not possible to test for an association. We have rephrased the relevant Results and Methods sections to make this point clearer. In addition, we have incorporated a sequence logo to illustrate the degree of conservation of each position in the SNRNP27K C-terminal domain as Figure 5 -figure supplement 1A. Finally, we have included an additional box-plot to illustrate the finding that species which have lost SNRNP27K or have only lost the Methionine equivalent to human SNRNP27K position 141, show a similar preference for +4U at 5’ SSs. This is now included as Figure 5 - figure supplement 1B.

      Overall, this result reveals the power of the interspecies association approach and provides interesting and exciting information on the molecular determinants of 5' splice site evolution.

      We are grateful to the reviewer for these comments.

      __The final analysis was to investigate the interaction potentials of the U5 and U6 snRNAs with the 5' splice site in the Sacchromycotina genomes and try to relate this to species with fewer introns and less alternative splicing. Species with low intron numbers and low splicing complexity were revealed to have weaker U5 and U6 anti-correlation potentials and favor +4U at the 5' splice site. On the other hand, species with high intron number and presumably higher splicing complexity featured anti-correlated U5 and U6 snRNA interaction potentials and favored +4A 5' splice sites. This extensive analysis provides novel information on the interactions and splice site properties of species with simple and complex splicing. Again, I see why there is emphasis on the 5' splice site here but a similar analysis with the U2 snRNA and the branch site could also be informative. __

      We absolutely agree that inter-species association mapping could be applied to other splicing signal phenotypes including 3’ splice sites and intron branchpoints. Accordingly, we raise this subject in the final section of the Discussion. However, branchpoint sequences are challenging to predict with genomic data. Because preliminary analyses suggest independent variation in these other splicing signal phenotypes, we feel a separate focused study is required to properly explain (and substantiate) even the analytical approaches involved. We hope the reviewer would agree that incorporating U2 snRNA and branchpoint variation analyses into this manuscript as well, could detract from the clarity of the conceptual advances that we make here.

      __Minor comments __

      __Should the Title include SNRNP27K? __

      We have included SNRNP27K into the revised title.

      Should the title specify that it is the evolution of only the 5’ splice site sequence preference being studied here?

      Because apostrophes in titles can compromise some scholarly online search engines (https://insights.uksg.org/articles/10.1629/uksg.534), we would prefer not to include 5’ in the title.

      Include information on intron number and 5’ splice site interaction potential of U5 and U6 snRNA in the Summary?

      We thank the reviewer for this suggestion. We have updated the Summary to include our findings on U5 and U6 interaction potential in species with reduced intron number.

      __Figure 1C is not referred to in the text? __

      We apologise for this oversight. We have added references to figure 1C in the appropriate Results section.

      Page 8, line 5 – better to say “splicing signal phenotypes”.

      We have amended this statement on Page 8 and at other places in the text where related phrasing was made.

      __What are the other points on Figure 3B? What is the next point below SNRNP27K? Is it U2A’? __

      The other points on Figure 3B represent Orthofinder orthogroups which contain human orthologs that are known components of the spliceosome. The list of spliceosomal components was taken from Sales-Lee et al. 2021. The third most significant point is indeed the orthogroup containing the human ortholog of U2A’. As we state in the text, however, the correlation of U2A’ with the 5’SS+4 A to U ratio phenotype is no longer significant once METTL16 presence/absence is controlled for, indicating that the correlation of U2A’ with the +4A phenotype is likely explained by similarity in the patterns of gene loss of U2A’ and METTL16.

      __The second paragraph of the Discussion is vague and lacks a reference. “we could also identify an association with a methionine residue in the conserved C-terminal domain of SNRNP27K orthologs.” There are a few methionines in the C-terminus, which one? Please reference the statement “transcriptome analysis of C. elegans SNRP-27 M141T mutants..” __

      We apologise for the lower quality of writing in this section of the Discussion. We have updated the text, made the statements about the SNRNP27K C-terminus less ambiguous, and added the relevant citations as appropriate.

      Reviewer #1 (Significance (Required)):

      Overall, this is a well written and clearly presented study that provides some key molecular information on the splicing factors involved in the evolution of 5’ splice sites and shows the power of interspecies association studies. Some important conceptual principles have now been defined for the field going forward.

      With thank the reviewer for this kind comment on the importance of this work.

      __The question remains as to whether METTL16 and SNRNP27K are the sole determinants of 5’ splice site preference evolution at +4? __

      We cannot say for certain that METTL16 and/or SNRNP27K determine the 5’SS +4 phenotype – only that they are correlated with it. In our response to reviewer 3, and in a new Discussion section, we have detailed some of the scenarios that could explain these correlations. We also cannot rule out whether there are changes in the presence/absence (or domain/sequence-level changes) of other, untested proteins that correlate with the 5’SS +4 phenotype and we allude to this in the final section of the Discussion.

      One splicing factor that immediately comes to mind is Prp8 where there is extensive evidence for involvement in splice site selection and is clearly in the right location throughout splicing to be involved. This question should at least be discussed but Prp8 would also be a very interesting candidate for the interspecies association mapping.

      Prp8 is a core component of spliceosomes and is conserved throughout the Saccharomycotina. For this reason, we were unable to associate splicing phenotypes with Prp8 presence or absence variation at the level of orthogroups. However, we revisited this question posed by the reviewer. Our experience with inter-species association mapping, so far, indicates it works well with orthogroup presence/absence or when straightforward amino acid substitutions can be detected in conserved and hence alignable protein sequence domains. We analysed the conserved U6 snRNA-interacting region of the Prp8 linker domain, which maps close to the 5’ splice site in cryo-EM models, using the profile HMM PF10596 available from Pfam. We found that the majority of this domain was extremely highly conserved with variation in only a few species and positions. The strongest correlation with the +4A to U ratio phenotype was at position 58, which is conserved as a Glycine in all but 8 species (6 Dipodascaceae, 2 CUG-Ser1), that also tend to have a stronger preference for +4A. However, examination of the species contributing to this result (and to similar results at other positions) indicated that in the 6 Dipodascaceae species, this change is part of a larger deletion or replacement that makes the whole linker region align poorly to the model. Hence, the G58 position itself may not be specifically important for the +4 phenotype. Although the wholesale loss or replacement of the U6 snRNA-interacting region in these species is potentially interesting, these larger scale structural changes in a small number of species are difficult to interpret. Therefore, to maintain the focus of the manuscript and the clear links to METTL16 and SNRNP27K that have orthogonal support, we have decided not to add these results to the manuscript but present them here (Figure not available on biorXiv commenting window).

      Also, as mentioned previously, only the 5’ splice site was investigated here and the manuscript could become a more substantial piece of work if the other splice sites were included in some way.

      We agree that it will be exciting to apply this approach to other splicing signal phenotypes and in other phylogenetic clades with emerging tree-of-life-scale genomics data. We have included variation in 3’ splice sites in the revised manuscript. As the first of its kind, this study should pioneer a wider use of this approach, by us and others, to understand the mechanisms and functions of molecular interactions not only in splicing but in other areas of biology too.

      __The obvious audience here are those directly in the splicing field but the overall principles are relevant for evolutionary biologists and those studying organismal complexity. __

      We thank the reviewer for recognising the broad importance of this work.

      My expertise is in yeast and human splicing mechanisms. I do not have the expertise to critically evaluate the bioinformatic pipelines but they were clearly explained and presented.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript, Parker et al. investigate the evolutionary patterns of splice site preference, focusing on the A/U ratio at position A+4 on the 5´ splice site. Building upon prior studies in S. pombe and A. thaliana, the authors establish a strong correlation between this preference and the co-evolution of the METTL16 U6 snRNA methyltransferase. Furthermore, through inter-species association mapping, they identify the involvement of the splicing factor SNRNP27K in altered A/U ratios and highlight the significance of the residue Met-141 in SNRNP27K for this function. Overall, the paper effectively presents impactful new findings on the evolution of METTL16, U6 snRNA, and splicing.

      We thank the reviewer for these kind comments on the importance of our study.

      The computational analyses employed in this study are situated outside our field of expertise, preventing us from offering a comprehensive evaluation of the methodology’s appropriateness and rigor. Nonetheless, the identification of METTL16 through the authors’ methods, which aligns with previous research in S. pombe and A. thaliana, lends support to the validity of their approach. Notably, the close proximity between SNRNP27K and the methylated A43 residue in U6 snRNA within the spliceosome, particularly near Met-141, is an impressive finding. Previous studies have shown that a mutation at position M141T affects splicing at +4A introns, thus providing robust validation for their methods.

      We thank the reviewer for these kind comments on our work.

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We thank the reviewer for these kind comments.

      Minor suggestions for improvement:

      1. __ Given the significance of the interaction between U6 snRNA and the intron for understanding the data, it would be beneficial to include a figure illustrating the RNA-RNA base-pairing interactions between U6 snRNA and the 5´ splice site. This addition is particularly important if the paper is intended for publication in a journal with a general readership.__  We thank the reviewer for this excellent suggestion. We have included this as Figure 3A.

      __ Similarly, the section on U1 snRNA would be more comprehensible with the inclusion of U1 RNA-RNA intron diagrams and improved descriptions of both the figures and the assay. Despite being negative data in the supplement, clarifying this section is essential. As currently written, it is challenging to follow.__ 

      We agree that this section is difficult to follow. We have updated the text to improve the readability and included a figure of U1 snRNA:5’SS basepairing as Figure 3 – figure supplement 1A.

      __ Whenever possible, consider increasing the figure and font sizes to enhance readability for readers.__ 

      We agree that some of the more complex figures can be difficult to read when embedded into a Word document/pdf. We hope that providing high-resolution figures for reading online will mitigate this.

      __ In the text, there is no reference to Figure 1C.__ 

      We apologise for this oversight. We have resolved this issue with the appropriate references in the Results text.

      __ In Figure 5B, the y-axis in the top panel is labelled “species,” but the legend only mentions U5/6p as the y-axis. Please revise the legend to include the appropriate information.__ 

      We apologise for the confusion caused by our poorly written legend for this plot. We have updated the legend so that the text clearly refers to either the scatter plot or the marginal histograms.

      Reviewer #2 (Significance (Required)):

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We are grateful to the reviewer for these kind comments on the importance of this work.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Parker et al present a nice exploration of the evolutionary and mechanistic relationships between 5′ splice site consensus sequences, intron numbers and METTL16/SNRNP27K. By performing inter-species association mapping in Saccharomycotina species, they found that a T in position +4 is strongly associated with the absence of METTL16 (and/or in some cases SNRNP27K or mutations in it). They also provide solid structural modelling data in support of this association.

      In general, I think this is a very nice manuscript. I only have a few comments, which could be addressed by rewording specific parts and/or improving the current figures.

      We are grateful to the reviewer for the kind comments on this work.

      1) As the authors acknowledge, a key issue that cannot be fully resolved in this study is causality between the different events investigated. Overall, the authors are careful about this, but there are some exceptions that should be corrected. Probably the most important is in the abstract, where they write: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is crucial to evolutionary change in splicing complexity”. I suggest they write something more open (and correct), such as: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is associated with evolutionary changes in splicing complexity”. Similarly, other plausible scenarios should be discussed in the corresponding Discussion section.

      We agree with the reviewer that it is not possible to infer the causal relationship between METTL16 absence and 5’SS+4 preference change from the current data. We, therefore, apologise for failing to be more careful in the Summary and Introduction. We have reworded these statements to better reflect what we can currently say about the evolutionary relationship between METTL16 and 5’SS sequence preference.

      The correlation between METTL16 absence and 5'SS+4 sequence preference change could most likely be explained by one of several scenarios: (a) sudden loss of METTL16 causes a rapid necessity to change 5'SS sequence preferences. This is unlikely as such rapid change without widespread corresponding 5'SS changes would likely impose a high fitness cost. (b) Changes in 5'SS sequence preference occur first, driven by some other selective pressure, until there is no longer a benefit to retaining the METTL16 gene. (c) Gradual changes in the expression or catalytic efficiency of METTL16 reduce the stoichiometry of U6 snRNA m6A modification, which permits gradual change in 5'SS+4 sequence preference until complete loss of the METTL16 no longer imposes a major fitness cost. As we suggest in the Discussion, future work could examine this question by determining whether the METTL16 orthologs found in Zygosaccharomyces and Eremothecium species, which have altered their 5'SS+4 preference to a U, are expressed and functional. We have updated the Discussion to include a new section that addresses these scenarios.

      2) I do not agree with the statement that "The extent of alternative splicing is the best genomic predictor of developmental complexity". To start with, there are many ways to quantify "extent of alternative splicing" and there are also different types of alternative splicing that might have different prevalence and biological impact. Then, this claim is usually related with exon skipping, which is tightly linked with intron length, and that is likely a better prediction of complexity (yet clearly not causative). My concern is: to what extent has this claim been formally and properly assessed by comparing splicing prevalence with other genomic features, such as intergenic region length, intron length, or average distance between enhancer-promoter interactions (arguably the most relevant predictor, in light of many other studies)? Moreover, I found it a bit misleading to frame the work presented in this study as directly related with developmental (or even splicing) complexity. The work is very interesting on its own, and I doubt their findings on +4 position preference in Saccharomycotina has anything to do with developmental complexity (as the Abstract and Introduction seem to imply).

      On reflection, we agree with the reviewer. Some of our framing of the text isn’t balanced with other studies on the scaling of alternative splicing with developmental complexity. We have edited the Summary and Introduction sections accordingly and cited other references that broaden the consideration of this subject. We are grateful to the reviewer for this suggestion because the changes we make improve the focus of the manuscript since our findings relate more to splicing simplification than to an understanding of increased developmental complexity.

      __3) I found Figure 2 and its associated supplementary figure very difficult to follow. I suggest the authors try to improve it and make it clearer. Also, other trees summarizing the results might be helpful. __

      We apologise for the complexity of these figures. We opted to show phylogenetic trees with phenotypes plotted on the y axis, rather than simply trait histograms or box-plots, because the underlying structure of the tree is important for demonstrating that multiple independent changes in the 5’SS phenotype have occurred in the Saccharomycotina. We have tried to improve the comprehensibility of the figures in the following ways: (a) We have added 5’SS sequence motifs to the x-axis of figure 2B to make what the plot represents clearer, (b) as suggested by the reviewer, we have created a pruned tree showing the 5’SS motifs of a selection of Saccharomycotina species, which demonstrates that the changes in 5’SS+4 position preferences seen in S. cerevisiae and C. albicans are likely to be a result of convergent evolution. We have added this tree as Figure 2 - figure supplement 3.

      __4) I also found the Results section corresponding to Figure 5B a bit confusing. I would argue (as I think the authors do) that there are two main patterns here: below 500 introns, there is no association, while above 500 introns there is an increasingly negative association (correlation). I think it would help to more explicitly distinguishing these two patterns. Then, for the intron-poor species: is the correlation (or lack of) for species with a T or an A in position +4 different? __

      We do indeed think that there are two patterns here, as indicated by the reviewer. In the previous version of the manuscript, we separated species into those having an overall preference for A at the +4 position, and those having +4U. By showing regression lines for these two classes, rather than for the general relationship between intron number and U5/6rho, we somewhat imply that the switch in +4 base preference might be causing the loss of correlation between U5/6rho and intron number. However, since essentially all species with a 5'SS +4U preference are intron poor, it seems more likely that these trends are the result of a loss of the negative correlation between intron number and U5/6rho in intron poor species, as suggested by the reviewer. To address this issue, we have replaced the regression lines on Figure 6B with a single loess (locally estimated scatterplot smoothing) regression line for all species and updated the text to make it clearer that we think loss of U5/6rho and +4A preference are separate traits of intron poor species. Although this is not exactly what the reviewer requested, we hope that it satisfies their issue with the analysis.

      __Reviewer #3 (Significance (Required)): __

      __This is a very interesting study that sheds light on an intriguing evolutionary pattern: the change in consensus sequence at position +4 of the 5' splice site. This topic is relevant since it is closely associated with intron loss and splicing efficiency and evolution. __

      We thank the reviewer for the kind and constructive comments on this study.

    2. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The manuscript by Parker et al addresses the important question of how different organisms have evolved pre-messenger RNA systems that are either more or less complex. This question underlies the evolution of complex organisms and the genome adaptation of simple organisms to their specific environments, so is an important question to answer. This manuscript now provides the underlying molecular mechanisms of how 5' splice site sequence preference may have evolved which is both an interesting and exciting advance for the field.

      We thank the reviewer for these kind comments.

      Major comments

      This manuscript builds on the previous work from this group where they identified the role of adenosine N6 methylation (m6A) of the U6 small nuclear RNA (snRNA) of the spliceosome by METTL16 as being important for 5' splice site selection. This work led to the speculation that loss of a METTL16 ortholog, or potentially other splicing factors, in some species could contribute to an evolutionary change in 5' splice site sequence preference. Here the authors now use the power of phylogenetics, interspecies association mapping and the available spliceosome structures to provide convincing conclusions that 5' splice site sequence preferences in the extensive number of organisms examined correlate with the presence of the U6 snRNA methyltransferase METTL16 and the splicing factor SNRNP27K. 

      An analysis of METTL16 conservation was first carried out by comparing the METTL16 methyltransferase domain (MTD) in 29 diverse eukaryotic species. All the METTL16 orthologs were found to have either one or two globular domains. Three domain types were identified and compared in detail. What was not clear from this analysis was the functional significance of orthologs having either one or two domains.

      We identified several species, including Drosophila melanogaster, whose METTL16 orthologs do not contain a VCR domain. However, in this study we do not draw specific conclusions about the functional significance of orthologs having different domain topologies.

      In addition, while this analysis provides important new information on the domain structure of METTL16 orthologs, especially where these domains had not been identified previously, the link between this section of the results and the following sections is not that apparent.

      We agree that there is a significant difference in approach between the first section of the Results and the following sections. However, we are keen to keep this part of the manuscript because it provides an orthogonal line of evidence suggesting that the ancestral role of METTL16 in eukaryotes is specifically the methylation of U6 snRNA.

      Next novel bioinformatics pipelines were developed to compare both introns and orthologous groupings of protein coding genes between 227 Sacchromycotina genomes as well as 13 well-annotated eukaryote genomes. First, the 5' splice site sequence preference was compared and clearly indicates that the +4 position has the greatest variation in preferences within the Sacchromycotina. The ability to now compare a large number of genomes has provided novel information on the evolution of the 5' splice site sequence and the conclusion that there is more complexity to the 5' splice site in fungi that previously recognized. While it is apparent why only the 5' splice site signal was investigated here, with its relationship to the U6 snRNA and METTL16, it seems a shame the other splice site sequences were not analyzed using this novel pipeline. In any case, the complexity of the 5' splice site +4 position now allows, for the first time, interesting interspecies association studies.

      We have now included the variance plots for 3’SS motifs (analogous to the 5’SS variance plots shown in Figure 2B) as Figure 2 supplementary figure 4A, and a traitgram for 3’SS -3C to U ratio as Figure 2 supplementary figure 4B. We have included a short section of text in the Results section to describe these additional findings.

      With the 5' splice site +4 variation identified, the next step was to determine the underlying molecular mechanisms that dictate the evolution of the various sequence preferences. Some obvious players here are the U1 and U6 snRNAs which directly interact with the 5' splice site during splicing. However, no association was found between these snRNAs and the 5' splice site +4 sequence. 

      The powerful interspecies association mapping was then used to determine whether the presence or absence of METTL16 ortholog or a splicing factor correlated with the 5' splice site +4 sequence variation. Interestingly, a clear association was found between METTL16 and the 5' splice site +4 position; METTL16 presence was associated with +4A at the 5' splice site and METTL16 absence was associated with +4U at the 5' splice site. This is an exciting and significant finding.

      We thank the reviewer for these comments on the importance of this study.

      Interestingly, the next most significant association with the 5' splice site +4 position was with SNRNP27K. This result makes sense as in the cryo-EM structure of the pre-B spliceosome complex the C-terminal domain of SNRNP27K is found near the region of the U6 snRNA that will interact with the 5' splices site. Absence of SNRNP27K was associated with an increased preference for +4U at the 5' splice site. Now the real power of the interspecies association mapping was demonstrated by investigating whether any association could be determined specifically within the C-terminus of SNRNP27K. Significantly, the methionine 141 position in SNRNP27K was found to be associated with the +4 position of the 5' splice site. This finding fits nicely with previous studies where mutation of M141 caused a shift in 5' splice site selection away from +4A 5' splice sites, to 5' splice sites without +4A. What is not clear is whether M141 is conserved or invariant between all the species that were compared?

      M141 is not completely conserved across the species that were compared for the SNRNP27K C-terminus analysis. We did not test positions with very strong sequence conservation, because without variation in both the genotype and phenotype it is not possible to test for an association. We have rephrased the relevant Results and Methods sections to make this point clearer. In addition, we have incorporated a sequence logo to illustrate the degree of conservation of each position in the SNRNP27K C-terminal domain as Figure 5 -figure supplement 1A. Finally, we have included an additional box-plot to illustrate the finding that species which have lost SNRNP27K or have only lost the Methionine equivalent to human SNRNP27K position 141, show a similar preference for +4U at 5’ SSs. This is now included as Figure 5 - figure supplement 1B.

      Overall, this result reveals the power of the interspecies association approach and provides interesting and exciting information on the molecular determinants of 5' splice site evolution.

      We are grateful to the reviewer for these comments.

      The final analysis was to investigate the interaction potentials of the U5 and U6 snRNAs with the 5' splice site in the Sacchromycotina genomes and try to relate this to species with fewer introns and less alternative splicing. Species with low intron numbers and low splicing complexity were revealed to have weaker U5 and U6 anti-correlation potentials and favor +4U at the 5' splice site. On the other hand, species with high intron number and presumably higher splicing complexity featured anti-correlated U5 and U6 snRNA interaction potentials and favored +4A 5' splice sites. This extensive analysis provides novel information on the interactions and splice site properties of species with simple and complex splicing. Again, I see why there is emphasis on the 5' splice site here but a similar analysis with the U2 snRNA and the branch site could also be informative.

      We absolutely agree that inter-species association mapping could be applied to other splicing signal phenotypes including 3’ splice sites and intron branchpoints. Accordingly, we raise this subject in the final section of the Discussion. However, branchpoint sequences are challenging to predict with genomic data. Because preliminary analyses suggest independent variation in these other splicing signal phenotypes, we feel a separate focused study is required to properly explain (and substantiate) even the analytical approaches involved. We hope the reviewer would agree that incorporating U2 snRNA and branchpoint variation analyses into this manuscript as well, could detract from the clarity of the conceptual advances that we make here.

      Minor comments

      Should the Title include SNRNP27K?

      There is certainly a case that the title should include SNRNP27K. Our aim was to make the title as short and informative as possible without too many acronyms that need explaining. Since the clearest correlation is with METTL16 and this has broader implications for understanding the role of this enzyme not only in splicing but in possibly modifying other RNA targets too, we think not including SNRNP27K is a suitable compromise. In addition, retaining the current title simplifies the tracking of the manuscript from pre-print through to journal publication.

      Should the title specify that it is the evolution of only the 5’ splice site sequence preference being studied here?

      Because apostrophes in titles can compromise some scholarly online search engines (https://insights.uksg.org/articles/10.1629/uksg.534), we would prefer not to include 5’ in the title.

      Include information on intron number and 5’ splice site interaction potential of U5 and U6 snRNA in the Summary?

      We thank the reviewer for this suggestion. We have updated the Summary to include our findings on U5 and U6 interaction potential in species with reduced intron number.

      Figure 1C is not referred to in the text?

      We apologise for this oversight. We have added references to figure 1C in the appropriate Results section.

      Page 8, line 5 – better to say “splicing signal phenotypes”.

      We have amended this statement on Page 8 and at other places in the text where related phrasing was made.

      What are the other points on Figure 3B? What is the next point below SNRNP27K? Is it U2A’? 

      The other points on Figure 3B represent Orthofinder orthogroups which contain human orthologs that are known components of the spliceosome. The list of spliceosomal components was taken from Sales-Lee et al. 2021. The third most significant point is indeed the orthogroup containing the human ortholog of U2A’. As we state in the text, however, the correlation of U2A’ with the 5’SS+4 A to U ratio phenotype is no longer significant once METTL16 presence/absence is controlled for, indicating that the correlation of U2A’ with the +4A phenotype is likely explained by similarity in the patterns of gene loss of U2A’ and METTL16.

      The second paragraph of the Discussion is vague and lacks a reference. “we could also identify an association with a methionine residue in the conserved C-terminal domain of SNRNP27K orthologs.” There are a few methionines in the C-terminus, which one? Please reference the statement “transcriptome analysis of C. elegans SNRP-27 M141T mutants..”

      We apologise for the lower quality of writing in this section of the Discussion. We have updated the text, made the statements about the SNRNP27K C-terminus less ambiguous, and added the relevant citations as appropriate.

      Reviewer #1 (Significance):

      Overall, this is a well written and clearly presented study that provides some key molecular information on the splicing factors involved in the evolution of 5’ splice sites and shows the power of interspecies association studies. Some important conceptual principles have now been defined for the field going forward.

      With thank the reviewer for this kind comment on the importance of this work.

      The question remains as to whether METTL16 and SNRNP27K are the sole determinants of 5’ splice site preference evolution at +4?

      We cannot say for certain that METTL16 and/or SNRNP27K determine the 5’SS +4 phenotype – only that they are correlated with it. In our response to reviewer 3, and in a new Discussion section, we have detailed some of the scenarios that could explain these correlations. We also cannot rule out whether there are changes in the presence/absence (or domain/sequence-level changes) of other, untested proteins that correlate with the 5’SS +4 phenotype and we allude to this in the final section of the Discussion.

      One splicing factor that immediately comes to mind is Prp8 where there is extensive evidence for involvement in splice site selection and is clearly in the right location throughout splicing to be involved. This question should at least be discussed but Prp8 would also be a very interesting candidate for the interspecies association mapping.

      Prp8 is a core component of spliceosomes and is conserved throughout the Saccharomycotina. For this reason, we were unable to associate splicing phenotypes with Prp8 presence or absence variation at the level of orthogroups. However, we revisited this question posed by the reviewer. Our experience with inter-species association mapping, so far, indicates it works well with orthogroup presence/absence or when straightforward amino acid substitutions can be detected in conserved and hence alignable protein sequence domains. We analysed the conserved U6 snRNA-interacting region of the Prp8 linker domain, which maps close to the 5’ splice site in cryo-EM models, using the profile HMM PF10596 available from Pfam. We found that the majority of this domain was extremely highly conserved with variation in only a few species and positions. The strongest correlation with the +4A to U ratio phenotype was at position 58, which is conserved as a Glycine in all but 8 species (6 Dipodascaceae, 2 CUG-Ser1), that also tend to have a stronger preference for +4A. However, examination of the species contributing to this result (and to similar results at other positions) indicated that in the 6 Dipodascaceae species, this change is part of a larger deletion or replacement that makes the whole linker region align poorly to the model. Hence, the G58 position itself may not be specifically important for the +4 phenotype. Although the wholesale loss or replacement of the U6 snRNA-interacting region in these species is potentially interesting, these larger scale structural changes in a small number of species are difficult to interpret. Therefore, to maintain the focus of the manuscript and the clear links to METTL16 and SNRNP27K that have orthogonal support, we have decided not to add these results to the manuscript but present them here (Figure not available on biorXiv commenting window).

      Also, as mentioned previously, only the 5’ splice site was investigated here and the manuscript could become a more substantial piece of work if the other splice sites were included in some way.

      We agree that it will be exciting to apply this approach to other splicing signal phenotypes and in other phylogenetic clades with emerging tree-of-life-scale genomics data. We have included variation in 3’ splice sites in the revised manuscript. As the first of its kind, this study should pioneer a wider use of this approach, by us and others, to understand the mechanisms and functions of molecular interactions not only in splicing but in other areas of biology too.

      The obvious audience here are those directly in the splicing field but the overall principles are relevant for evolutionary biologists and those studying organismal complexity.

      We thank the reviewer for recognising the broad importance of this work.

      My expertise is in yeast and human splicing mechanisms. I do not have the expertise to critically evaluate the bioinformatic pipelines but they were clearly explained and presented.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In their manuscript, Parker et al. investigate the evolutionary patterns of splice site preference, focusing on the A/U ratio at position A+4 on the 5´ splice site. Building upon prior studies in S. pombe and A. thaliana, the authors establish a strong correlation between this preference and the co-evolution of the METTL16 U6 snRNA methyltransferase. Furthermore, through inter-species association mapping, they identify the involvement of the splicing factor SNRNP27K in altered A/U ratios and highlight the significance of the residue Met-141 in SNRNP27K for this function. Overall, the paper effectively presents impactful new findings on the evolution of METTL16, U6 snRNA, and splicing.

      We thank the reviewer for these kind comments on the importance of our study.

      The computational analyses employed in this study are situated outside our field of expertise, preventing us from offering a comprehensive evaluation of the methodology’s appropriateness and rigor. Nonetheless, the identification of METTL16 through the authors’ methods, which aligns with previous research in S. pombe and A. thaliana, lends support to the validity of their approach. Notably, the close proximity between SNRNP27K and the methylated A43 residue in U6 snRNA within the spliceosome, particularly near Met-141, is an impressive finding. Previous studies have shown that a mutation at position M141T affects splicing at +4A introns, thus providing robust validation for their methods.

      We thank the reviewer for these kind comments on our work.

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We thank the reviewer for these kind comments.

      Minor suggestions for improvement:

      1. Given the significance of the interaction between U6 snRNA and the intron for understanding the data, it would be beneficial to include a figure illustrating the RNA-RNA base-pairing interactions between U6 snRNA and the 5´ splice site. This addition is particularly important if the paper is intended for publication in a journal with a general readership.

      We thank the reviewer for this excellent suggestion. We have included this as Figure 3A.

      1. Similarly, the section on U1 snRNA would be more comprehensible with the inclusion of U1 RNA-RNA intron diagrams and improved descriptions of both the figures and the assay. Despite being negative data in the supplement, clarifying this section is essential. As currently written, it is challenging to follow.

      We agree that this section is difficult to follow. We have updated the text to improve the readability and included a figure of U1 snRNA:5’SS basepairing as Figure 3 – figure supplement 1A.

      1. Whenever possible, consider increasing the figure and font sizes to enhance readability for readers.

      We agree that some of the more complex figures can be difficult to read when embedded into a Word document/pdf. We hope that providing high-resolution figures for reading online will mitigate this.

      1. In the text, there is no reference to Figure 1C.

      We apologise for this oversight. We have resolved this issue with the appropriate references in the Results text.

      1. In Figure 5B, the y-axis in the top panel is labelled “species,” but the legend only mentions U5/6p as the y-axis. Please revise the legend to include the appropriate information.

      We apologise for the confusion caused by our poorly written legend for this plot. We have updated the legend so that the text clearly refers to either the scatter plot or the marginal histograms.

      Reviewer #2 (Significance):

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We are grateful to the reviewer for these kind comments on the importance of this work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript, Parker et al present a nice exploration of the evolutionary and mechanistic relationships between 5′ splice site consensus sequences, intron numbers and METTL16/SNRNP27K. By performing inter-species association mapping in Saccharomycotina species, they found that a T in position +4 is strongly associated with the absence of METTL16 (and/or in some cases SNRNP27K or mutations in it). They also provide solid structural modelling data in support of this association.

      In general, I think this is a very nice manuscript. I only have a few comments, which could be addressed by rewording specific parts and/or improving the current figures.

      We are grateful to the reviewer for the kind comments on this work.

      1) As the authors acknowledge, a key issue that cannot be fully resolved in this study is causality between the different events investigated. Overall, the authors are careful about this, but there are some exceptions that should be corrected. Probably the most important is in the abstract, where they write: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is crucial to evolutionary change in splicing complexity”. I suggest they write something more open (and correct), such as: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is associated with evolutionary changes in splicing complexity”. Similarly, other plausible scenarios should be discussed in the corresponding Discussion section.

      We agree with the reviewer that it is not possible to infer the causal relationship between METTL16 absence and 5’SS+4 preference change from the current data. We, therefore, apologise for failing to be more careful in the Summary and Introduction. We have reworded these statements to better reflect what we can currently say about the evolutionary relationship between METTL16 and 5’SS sequence preference.

      The correlation between METTL16 absence and 5'SS+4 sequence preference change could most likely be explained by one of several scenarios: (a) sudden loss of METTL16 causes a rapid necessity to change 5'SS sequence preferences. This is unlikely as such rapid change without widespread corresponding 5'SS changes would likely impose a high fitness cost. (b) Changes in 5'SS sequence preference occur first, driven by some other selective pressure, until there is no longer a benefit to retaining the METTL16 gene. (c) Gradual changes in the expression or catalytic efficiency of METTL16 reduce the stoichiometry of U6 snRNA m6A modification, which permits gradual change in 5'SS+4 sequence preference until complete loss of the METTL16 no longer imposes a major fitness cost. As we suggest in the Discussion, future work could examine this question by determining whether the METTL16 orthologs found in Zygosaccharomyces and Eremothecium species, which have altered their 5'SS+4 preference to a U, are expressed and functional. We have updated the Discussion to include a new section that addresses these scenarios.

      2) I do not agree with the statement that "The extent of alternative splicing is the best genomic predictor of developmental complexity". To start with, there are many ways to quantify "extent of alternative splicing" and there are also different types of alternative splicing that might have different prevalence and biological impact. Then, this claim is usually related with exon skipping, which is tightly linked with intron length, and that is likely a better prediction of complexity (yet clearly not causative). My concern is: to what extent has this claim been formally and properly assessed by comparing splicing prevalence with other genomic features, such as intergenic region length, intron length, or average distance between enhancer-promoter interactions (arguably the most relevant predictor, in light of many other studies)? Moreover, I found it a bit misleading to frame the work presented in this study as directly related with developmental (or even splicing) complexity. The work is very interesting on its own, and I doubt their findings on +4 position preference in Saccharomycotina has anything to do with developmental complexity (as the Abstract and Introduction seem to imply).

      On reflection, we agree with the reviewer. Some of our framing of the text isn’t balanced with other studies on the scaling of alternative splicing with developmental complexity. We have edited the Summary and Introduction sections accordingly and cited other references that broaden the consideration of this subject. We are grateful to the reviewer for this suggestion because the changes we make improve the focus of the manuscript since our findings relate more to splicing simplification than to an understanding of increased developmental complexity.

      3) I found Figure 2 and its associated supplementary figure very difficult to follow. I suggest the authors try to improve it and make it clearer. Also, other trees summarizing the results might be helpful. 

      We apologise for the complexity of these figures. We opted to show phylogenetic trees with phenotypes plotted on the y axis, rather than simply trait histograms or box-plots, because the underlying structure of the tree is important for demonstrating that multiple independent changes in the 5’SS phenotype have occurred in the Saccharomycotina. We have tried to improve the comprehensibility of the figures in the following ways: (a) We have added 5’SS sequence motifs to the x-axis of figure 2B to make what the plot represents clearer, (b) as suggested by the reviewer, we have created a pruned tree showing the 5’SS motifs of a selection of Saccharomycotina species, which demonstrates that the changes in 5’SS+4 position preferences seen in S. cerevisiae and C. albicans are likely to be a result of convergent evolution. We have added this tree as Figure 2 - figure supplement 3.

      4) I also found the Results section corresponding to Figure 5B a bit confusing. I would argue (as I think the authors do) that there are two main patterns here: below 500 introns, there is no association, while above 500 introns there is an increasingly negative association (correlation). I think it would help to more explicitly distinguishing these two patterns. Then, for the intron-poor species: is the correlation (or lack of) for species with a T or an A in position +4 different? 

      We do indeed think that there are two patterns here, as indicated by the reviewer. In the previous version of the manuscript, we separated species into those having an overall preference for A at the +4 position, and those having +4U. By showing regression lines for these two classes, rather than for the general relationship between intron number and U5/6rho, we somewhat imply that the switch in +4 base preference might be causing the loss of correlation between U5/6rho and intron number. However, since essentially all species with a 5'SS +4U preference are intron poor, it seems more likely that these trends are the result of a loss of the negative correlation between intron number and U5/6rho in intron poor species, as suggested by the reviewer. To address this issue, we have replaced the regression lines on Figure 6B with a single loess (locally estimated scatterplot smoothing) regression line for all species and updated the text to make it clearer that we think loss of U5/6rho and +4A preference are separate traits of intron poor species. Although this is not exactly what the reviewer requested, we hope that it satisfies their issue with the analysis.

      Reviewer #3 (Significance):

      This is a very interesting study that sheds light on an intriguing evolutionary pattern: the change in consensus sequence at position +4 of the 5' splice site. This topic is relevant since it is closely associated with intron loss and splicing efficiency and evolution. 

      We thank the reviewer for the kind and constructive comments on this study.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1*. This is a good paper dealing with gap of our knowledge in understanding reason of ICB failures. Subject being difficult it is expected that the design and content of such experiment will be complex.But the authors forget practicality of readers attention and making paper apear interesting. They need to organise and may be classify the varied information in such a way that reader can find a rhythm in excavating data more easily. It appears confusing at time, so they may try to make it more simple. In this way they may concentrate more on methods and classify results too. A thorough revision is suggested, to make it consize. *

      __Authors’ answer: __We thank the Reviewer for his positive evaluation and constructive feedback. We appreciate the complexity of single-cell RNA-sequencing analyses. In order to simplify our manuscript, our revised manuscript now focuses on the transitional states of tumor-resident and circulating T cells found in ovarian cancer patients. Our study is timely as it is the first to report the developmental relationship of TILs in ovarian cancer. We substantially edited our manuscript to make it clear that our findings suggest a gradual acquisition of the exhaustion program initiated by effector-like cells (cluster CD8_GZMH) that eventually gives rise to more terminal states with features of tissue residency and chemotaxis (clusters CD8_CCL4, CD8_XCL1, and CD8_CXCL13). We also include new analyses revealing the presence and proportion of these T cell states in different cancer patients (New Fig. 4A-B), and how these T cell states associate with clinical responses to immune checkpoint blockade (ICB). We hope the Reviewer will find our revised manuscript easier to read.

      Reviewer #2. I think the first half of the article, in which the GZMH-CD8 cluster is considered to be in an intermediate state of transition to exhaustion, is interesting, and I feel that the single-cell seq and TCR data are well analyzed to make the point. On the other hand, I feel that the latter part of the paper may not be anything more than a hypothesis. In particular, the part claiming that it is related to prognosis or applicable to the prediction of the effect of ICB is insufficient, since their gene signature is not described in detail and the contents of the Figure are not mentioned in the manuscript. In the latter part, the effects of GPR184 and 25-HC, or the effects of IL21, would require experiments to verify (to verify whether the addition of chemokine or the inhibition of the receptor changes the specific CD8 population).

      Author’s answer: Thank you for discussing the limitation of the signature employed. We agree with the reviewer’s comment. Old Figure 5 has been removed from the revised manuscript.

      Reviewer #2. Minor point: In particular, there is little mention of Figure 5 in the text, making it difficult to understand.

      Author’s answer: Thank you for your comment. As we previously discussed, we have removed Figure 5 from the revised manuscript. The method used to generate the signature was found to be inappropriate.

      Reviewer #2. The latter part is difficult to understand. To begin with, it is already known that ovarian cancer does not contribute much to ICB, so what does it mean to analyze the CD8 population, which is known as a marker of ICB response in other carcinomas, as an indicator? Especially for clinicians like us, it is hard to imagine that the results will lead to clinical trials that will attempt to sort out the population that ICB is favored in.

      Author’s answer: Although immune checkpoint blockade has demonstrated limited effectiveness against ovarian cancer, subset analyses suggest superior efficacy for some patients and according to subtype. Combination anti-PD-1/CTLA-4 therapy for instance achieved response rates up to 31% (Zamarin et al., 2020), and superior benefit for single agent PD-1 blockade has been reported in clear cell ovarian cancer. Moreover, encouraging clinical results have recently been reported in studies exploring combinations with PARP and VEGF inhibitors. As example, interim analysis of the phase 3 DUO-O trial (NCT03737643) showed a statistically significant and clinically meaningful improvement in PFS in patients with newly diagnosed advanced ovarian cancer without a BRCA1/2 mutation (Harter et al., 2023).

      Our study aimed to better understand how ovarian tumor-infiltrating T cells acquire their exhaustion program after migrating from the periphery and whether these mechanisms are unique or shared amongst cancer types. Recent studies in other cancer types had shown the dynamics of T cells and demonstrated the clonal replacement of intratumoral T cells after ICB and emphasized the role of peripheral clones in this process (Wu et al., 2020; Yost et al., 2019). In lung cancer, it has been proposed a transitional state between precursor and terminally differentially cells (Gueguen et al., 2021). Our study demonstrates, for the first time in ovarian cancer, the presence of similar transitional states of CD8 T cells. Our revised manuscript also now includes new data revealing that pre-effector GZMK- and intermediary GZMH-expressing CD8 cells are better biomarkers of ICB response than terminally differentiated XCL1 and CXCL13 expressing CD8 T cells (New Figure 4). Altogether, our study provides important and novel insights on the development of tumor-infiltrating T cells in ovarian cancer patients, which may serve to better select ovarian cancer patients for ICB therapy.

      Reviewer #2. Since the first half of the study is very interesting, we feel that it is more important to confirm the mechanism of exhaustion from the blood via the intermediate (GZMH_CD8), including functional experiments. Also, as a clinician, we are very interested in the perspective of whether some of the fractions identified in this study are different in proportion in different patients and whether they correlate with the clinical course of the disease since the study only analyzed a sample of 5 patients.

      Author’s answer: We thank the reviewer for proposing to extend our analysis. As suggested, our revised manuscript now includes new analyses which reveals the different proportions of our identified T cells states in different cancer patients (New Figure 4). We further investigated whether these T cell states associate with clinical responses and observed that pre-effector GZMK- and intermediary GZMH-expressing CD8 T cells are better biomarkers of ICB response than terminally exhausted XCL1- and CXCL13-expressing CD8 T cells (New Figure 4).

      Reviewer #3. Question 1: Whether the distribution patterns of CD4+ and CD8+ T cell clusters in Figure 1B were comparable among the 5 patient samples? Whether the proportion of five types of clones in Figure 3C are comparable among the 5 patient samples?

      Author’s answer: Thank you for the question. We included the results to answer these questions in the supplementary material (fig. S1C-D). For each patient, we calculated the proportion of a cluster among T cells in the blood or tumor. As observed in the boxplot (fig. S1C), the proportion of some subsets were higher in certain patients, such as the higher proportion of CD8_GZMK in the tumor of patient p09454. A recent study classified patients’ tumors based on the spatial distribution of CD8 T cells and performed scRNA-seq to identified cell subsets enriched in the groups inflamed/infiltrated (characterized by the distribution of CD8 T cells within the tumor epithelium), excluded (infiltrating CD8 T cells are restricted to the tumor stroma) or desert (T cells are not present or have low frequency) (Hornburg et al., 2021). Interestingly, this subset of CD8_GZMK cells were enriched in desert tumors, suggesting that the difference we observed in our dataset might reflect the spatial distribution of CD8 T cells in patient p09454. Regarding the TCR-seq data, the frequency of the five types of clones was different among patients. To show this data, we included a barplot (fig. S2D), showing for example, a higher proportion of tumor-expanded clones in patient p10329.

      Reviewer #3. Question 2: In Figure S2C, only a very small number of cells in the CD8-GZMK K-22 population. Are these cells representative? Do they generally exist in multiple samples or only in one sample?

      Author’s answer: Thank you for your comment. The subcluster k_22 indeed has a smaller number of cells compared to other subclusters. Nevertheless, the K_22 cluster was found in every patient and in every healthy donor. To clarify, we edited our revised manuscript to include a statement that cluster k_22 was composed of fewer cells compared to other clusters.

      Reviewer #3. Question 3: In the Fig.S6 legend, the authors stated "Our results suggest the differentiation of cluster CD8-GZMK into the effector-like subset CD8-GZMH." However, there seems to be no corresponding analysis in the main text to support this conclusion.

      Author’s answer: We appreciate your attention to this statement. We agree the results of our study doesn’t sustain this statement and so we have excluded it in the revised manuscript.

      Reviewer #3____. Question 4: Is there more detailed clinical information that can be provided for the 5 patients included in the study? Per the methods all patients were receiving debulking surgery and were treatment naïve, but did they differ in stage, age, comorbidities, etc.?

      Author’s answer: Thank you for your comment on this. We have included a table with clinical information on the stage, age, and menopause status of the five patients.

      Reviewer #3. Question 5: Were any cells included for sequencing from adjacent 'normal' tissue uninvolved with tumor (these samples are from surgical debulking of primary tumors, which may include such areas of non-involved tissue.) While shared TCR clonotypes between blood and intratumoral T cells strongly suggests the tumor-resident populations are recruited from the blood, the degree of sharing with normal tissue-resident T cells would be of interest as well.

      Author’s answer: Thank you for your comment. Samples were provided for sc-RNA-seq after pathology review and validation of tumor histology. We did not perform sc-RNA-seq on normal adjacent tissue (NAT) We agree this would be interesting as a follow up study, since in other cancer types (renal, colon and lung) it has been demonstrated that T clones expanded in the tumor and NAT are also present in peripheral blood (Wu et al., 2020).

      Reviewer #3. Question 6: Very little is discussed about HGSOC itself in the main text (eg clinical background, prior literature on the composition of infiltrating immune populations and potential reasons for at best modest poor responses to IO) until the first sentence of the discussion. As the entirety of the new data produced in this study is from HGSOC tumors there should be more focus on this tumor type and conversation with the prior literature on it (mainly from prior studies on the immune environment of HGSOC). Further, how distinct do the authors suspect the cell populations found in their study to be to ovarian as opposed to other epithelial tumor types?

      Author’s answer: Thank you for the suggestion. We now included more background information on immunotherapy of HGSOC. Specifically, we added the following paragraph in our introduction: “In ovarian cancer, the presence of both T and B cells improves patients' survival (Nelson, 2015; Nielsen et al, 2012). They are usually organized in lymphoid aggregates ranging from a small group of cells to a well-organized TLS (Kroeger et al, 2016). Organized TLSs correlate with better survival, such as observed in patients treated with ICB. Although immunotherapy has demonstrated limited effectiveness against ovarian cancer, subsets of patients may thus benefit from ICB. In support of this, combination anti-PD-1/CTLA-4 therapy can achieve response rates above 30% (Zamarin et al., 2020), and encouraging clinical results have recently been reported when combining ICB with with PARP and VEGF inhibitors (Harter et al., 2023)”.

      Reviewer #3. Question 7: Were the signature genes used for analysis in figure 5 remove chosen in a formal, unbiased manner, or simply hand-picked as representative of the respective cell types? This information is not provided in the supplement.

      Author’s answer: Another reviewer has also expressed similar concerns. The genes selected to represent cell types were chosen manually, which we acknowledge is not the best method for defining a signature. As a result, we have decided to exclude Figure 5 from the manuscript under review. We believe an unbiased approach is more suitable for characterizing the cell network proposed in our study.

      Reviewer #3. Question 8: While the NicheNet analysis of potential interactions among lymphocyte populations raises some strong hypotheses, it would be interesting to extend the interaction analysis to all CD45+ populations, given the sequencing was done on CD45+ immune cells.

      Author’s answer: Thank you for suggesting analysis. We have included the results of cell interaction including all CD45+ cells (fig. S3). We observed CD40L as one of the top predicted ligands highly expressed in CD4_CXCL13 subset mediating a response in subsets of antigen-presenting cells, such as B cells (cluster B), plasma cells (cluster PC_2), and plasmacytoid dendritic cells (cluster pDC). Interestingly, this result also support the hypothesis of Tfh-like cells (cluster CD4_CXCL13) coordinating the action of intratumoral immune cells involved in the antitumor immune response.

      Reviewer #3. Question 9: A sample size of 5 patients is relatively small for current single cell RNAseq studies of human tumor patients.

      Author’s answer: We agree with the reviewer that a sample size of 5 patients is relatively small. Thus, to validate our results in other patients, we included in the reviewed manuscript the analysis of scRNA-seq of 47 patients across10 cancer types (dataset from (Zheng et al., 2021). As demonstrated in figure 3 and figure 5, we could identify subsets of CD8 and CD4 T cells from our ovarian cancer patients in those 10 cancer types dataset.

      Reviewer #3.____ Minor

      *1. In lines 96-97, "CD8-GZMB" was mentioned twice in the description. *

      2. In line 126, this section did not discuss residency markers, yet a conclusion about residency was made in this sentence.

      Author’s answer: We appreciate you bringing these errors to our attention. We fixed them in the updated version of the manuscript.

      References:

      Gueguen, P., Metoikidou, C., Dupic, T., Lawand, M., Goudot, C., Baulande, S., … Amigorena, S. (2021). Contribution of resident and circulating precursors to tumor-infiltrating CD8 T cell populations in lung cancer. Science Immunology, Vol. 6, p. eabd5778. doi:10.1126/sciimmunol.abd5778

      Harter, P., Trillsch, F., Okamoto, A., Reuss, A., Kim, J.-W., Rubio-Pérez, M. J., … Aghajanian, C. (2023). Durvalumab with paclitaxel/carboplatin (PC) and bevacizumab (bev), followed by maintenance durvalumab, bev, and olaparib in patients (pts) with newly diagnosed advanced ovarian cancer (AOC) without a tumor BRCA1/2 mutation (non-tBRCAm): Results from the randomized, placebo (pbo)-controlled phase III DUO-O trial. Journal of Clinical Orthodontics: JCO, 41(17_suppl), LBA5506–LBA5506.

      Hornburg, M., Desbois, M., Lu, S., Guan, Y., Lo, A. A., Kaufman, S., … Wang, Y. (2021). Single-cell dissection of cellular components and interactions shaping the tumor immune phenotypes in ovarian cancer. Cancer Cell. doi:10.1016/j.ccell.2021.04.004

      Wu, T. D., Madireddi, S., de Almeida, P. E., Banchereau, R., Chen, Y.-J. J., Chitre, A. S., … Grogan, J. L. (2020). Peripheral T cell expansion predicts tumour infiltration and clinical response. Nature. doi:10.1038/s41586-020-2056-8

      Yost, K. E., Satpathy, A. T., Wells, D. K., Qi, Y., Wang, C., Kageyama, R., … Chang, H. Y. (2019). Clonal replacement of tumor-specific T cells following PD-1 blockade. Nature Medicine. doi:10.1038/s41591-019-0522-3

      Zamarin, D., Burger, R. A., Sill, M. W., Powell, D. J., Jr, Lankes, H. A., Feldman, M. D., … Aghajanian, C. (2020). Randomized Phase II Trial of Nivolumab Versus Nivolumab and Ipilimumab for Recurrent or Persistent Ovarian Cancer: An NRG Oncology Study. Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology, 38(16), 1814–1823.

      Zheng, L., Qin, S., Si, W., Wang, A., Xing, B., Gao, R., … Zhang, Z. (2021). Pan-cancer single-cell landscape of tumor-infiltrating T cells. Science, 374(6574), abe6474.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: This study used single-cell transcriptomics and T cell receptor profiling to identify the developmental relationships of T cell populations in ovarian cancer patients. The researchers proposed a model of differentiation pathway that showed how an intermediate GZMH-expressing CD8 T cell subset progressively reinforces exhaustion and tissue residency programs towards terminally exhausted cells. Then they also focus on the nature of TPEX, dual-expanded clone, which is considered an important indicator for the efficacy of ICB, and argue that it is strongly related to GPR183, 25-OHC, and IL21. Based on the analysis of clinical samples, they argue that their proposed gene signature may also be prognostically relevant and predictive of ICB efficacy.

      Major comment: I think the first half of the article, in which the GZMH-CD8 cluster is considered to be in an intermediate state of transition to exhaustion, is interesting, and I feel that the single-cell seq and TCR data are well analyzed to make the point. On the other hand, I feel that the latter part of the paper may not be anything more than a hypothesis. In particular, the part claiming that it is related to prognosis or applicable to the prediction of the effect of ICB is insufficient, since their gene signature is not described in detail and the contents of the Figure are not mentioned in the manuscript. In the latter part, the effects of GPR184 and 25-HC, or the effects of IL21, would require experiments to verify (to verify whether the addition of chemokine or the inhibition of the receptor changes the specific CD8 population).

      Minor point: In particular, there is little mention of Figure 5 in the text, making it difficult to understand.

      Significance

      It is interesting to note that the authors simultaneously analyze immune cells in the blood and in the tumor, and examine in detail what is characteristic of the blood, what is characteristic of the tumor, and what is seen in both. And it is very interesting that they specifically proposes an intermediate group that is recruited from the blood to the tumor and is in the process of becoming exhausted. I am sure there are many studies on TILs and TLSs, but this study would be helpful to understand how they are concentrated locally (near the tumor) in comparison with immune cells in the blood as well.

      However, the latter part is difficult to understand. To begin with, it is already known that ovarian cancer does not contribute much to ICB, so what does it mean to analyze the CD8 population, which is known as a marker of ICB response in other carcinomas, as an indicator? Especially for clinicians like us, it is hard to imagine that the results will lead to clinical trials that will attempt to sort out the population that ICB is favored in.

      Since the first half of the study is very interesting, we feel that it is more important to confirm the mechanism of exhaustion from the blood via the intermediate state, including functional experiments. Also, as a clinician, we are very interested in the perspective of whether some of the fractions identified in this study are different in proportion in different patients and whether they correlate with the clinical course of the disease, since the study only analyzed a sample of 5 patients.

    1. Reviewer #2 (Public Review):

      Kraus, Aurora et al. investigated the potential immune response of the olfactory bulb after exposure of the infectious hematopoietic necrosis virus (IHNV), via the olfactory epithelia. Specifically, they show that a) viral-specific neuronal activation of "OSNs" (Crypt cells), b) changes in behaviour of both adult and larval zebrafish after viral exposure, c) Pituitary adenylate-cyclase-activating polypeptide (PACAP), was enriched when assayed by single cell transcriptomic profiling of cells in the OB after OSNs are exposed to IHNV

      Although the paper does have strengths in principle, the weaknesses of the manuscript are that these strengths are not directly demonstrated and the referencing of the manuscript omits many references important for the understanding of the questions and the results of the study. Furthermore, the data presented are not sufficient to fully support the key claims in the manuscript. In particular:

      a) Viral-specific neuronal activation of OSNs:<br /> What type of neurons? The authors are a bit elusive and do not clearly state that the neurons are crypt cells (Sepahi et al.: rainbow trout) which have a very specific axonal projection to the brain and whose response characteristics are not well characterized (see work of Korsching lab). Crypt cells are not present in the olfactory epithelia of mammals. Furthermore, in their previous work the crypt cells die; so how do they think the (inflammatory) virus response is transmitted to the olfactory bulbs in order to protect the brain?<br /> The authors state from previous work that they never detected virus in the brain, but why would they? Does INHV move trans-synaptically?<br /> The neuronal activity was monitored using a pan-neuronal marker thus these data are of limited use when trying to understand the role of neuronal activity (crypt cells) in the IHNV-triggered activity: the authors may be looking at a generalized inflammation response, and the image presented is not particularly informative it is difficult to decipher the results. The authors assume IHNV is an odorant without carefully ruling out the possibility of a generalized inflammation response.<br /> b) Changes in behaviour of both adult and larval zebrafish after viral exposure:<br /> What is the motivating question for looking at behaviour of the virus infected animals? Do we know the effects of crypt cell loss on the behaviour in any fish species? Authors need to build a better conceptual framework for the behaviour experiments.

      c) Pituitary adenylate-cyclase-activating polypeptide (PACAP) was enriched when assayed by single cell transcriptomic profiling of cells in the OB after OSNs are exposed to IHNV. Authors draw many generous conclusions from limited data. Authors seem to have forgotten to cite papers previously published showing that PACAP-38 has anti-viral activities in fish (VHSV: trout) such as: Velasquez et al 2020, First in vivo evidence of pituitary adenylate cyclase-activating polypeptide antiviral activity in teleost.<br /> The histology for PACAP presented in the manuscript is not convincing. The antibody is against the human form of PACAP thus any labelling should be treated with caution (and called PACAP-38-like).

      Summary: The authors need to better develop their model (perhaps a diagram would be helpful) explaining exactly which neurons are transmitting the information. Because of the elusive nature of some referencing and the skirting of important issues such as clearly stating which neurons are affected (crypt cells), what the point of the behaviour is (relate to neuronal type infected by virus), and, the lack of an antibody specific to the zebrafish protein, the model appears to be built on an unstable base.

    1. Joint Public Review:

      In this manuscript, the authors challenge the fundamental concept that all neurons are derived from ectoderm. The key points of the authors argument are as follows:

      1) Roughly half of the cells in the small intestinal longitudinal muscle-myenteric plexus (LM-MP) that express a pan-neuronal marker do not, by lineage tracing, appear to be derived from the neural crest.

      2) Lineage tracing and marker gene imaging suggest that these non-neural crest derived neurons originate in the mesoderm, leading to their designation as mesodermal-derived enteric neurons (MENs).

      3) Single-cell sequencing of LM-MP tissues confirms the mesodermal origin of MENs.

      4) MENs progressively replace neural crest derived enteric neurons as mice age, eventually representing the bulk of the EN population.

      There is broad agreement among the reviewers that the identification and description of this cell population is important, and that the failure of these cells to be labeled by neural crest lineage tracers is not artifactual. The work with transgenic lines is convincing that some presumptive neurons in the enteric nervous system (ENS) likely originate from an alternative source in the postnatal intestine and that this population increases in aging mice.

      There is, however, ongoing disagreement between the authors and reviewers about whether the authors' provocative and potentially paradigm-changing proposal that these are neurons of mesodermal origin has been established. While the authors believe they have addressed the reviewers' concerns in multiple rounds of review (much of this prior to submission), the reviewers remain unconvinced and continue to request additional data and analyses.

      A key premise of the preprint review system is that the best interests of science are not served by endlessly litigating disagreements around papers by either compelling the authors to do extensive and expensive additional experiments that they do not believe to be necessary or by treating the authors' claim as established in the face of continued skepticism. Accordingly the editor believes it is time to present this work, which everyone agrees contains important observations and valuable data, along with the following editor's synthesis of the reviewers' concerns and author responses about the question of these cells' origins. We encourage anyone interested in the details to review the already posted reviews and authors' response.

      The following key issues have been raised during review:

      * Is the lineage tracing and marker gene expression data definitive as to mesodermal origin?

      * Are the cells analyzed in the genomic experiments the same as those identified in the lineage tracing experiments?

      * Does the genomic data establish that the sub-population of cells the authors focus on are of mesodermal origin?

      * Are there alternative explanations for the lineage tracing and genomic observations than a mesodermal origin?

      * Is the lineage tracing and marker gene expression data definitive as to mesodermal origin? *

      The proximal evidence that the authors present for a mesodermal origin of the non-NC derived cells is presented in Figure 2, which establishes the presence, via lineage tracing of Tek+ and Mesp1+ (and therefore mesoderm derived) and Hu+ (and therefore neuronal) cells. The fraction of lineage labeled cells in each case (~50%) corresponds roughly to the fraction of cells that do not appear to be NC derived.

      The reviewers raise several technical questions about the lineage tracing experiments, including issues of incomplete labeling, ectopic labeling and toxicity. The authors have addressed each of these with data and/or citations, and the editor believes they have demonstrated, subject to the broader limits of lineage tracing experiments, that there are Hu+ cells in the tissue that are derived from cells that do not express NC markers and that do express mesodermal markers.

      One reviewer raised the question of whether these cells are neurons. This appears to the editor to be a valid question, in that specific neuronal activity of these cells has not been established. But the authors' argument is persuasive that their Hu+ state would have led them to be designated neurons and that changing that designation based on not being derived from NC is circular. However the possibility that, despite this accepted designation, these cells are not functionally neurons should be noted by readers.

      * Are the cells analyzed in the genomic experiments the same as those identified in the lineage tracing experiments, and does this data establish mesodermal origin? *

      To provide orthogonal evidence for the presence of mesodermally derived enteric neurons, the authors carried out single-cell sequencing of dissociated cells from hand-dissected longitudinal muscle - myenteric plexus (LM-MP) tissue. They use standard methods to identify clusters of cells with similar transcriptomes, and designate, based on marker gene expression, two clusters to be neural crest derived enteric neurons (NENs) and mesoderm derived enteric neurons (MENs). However the reviewers raised several issues about the designation of the cells MENs, and therefore their equation with the cells identified in lineage tracing.

      While the logic behind specific choices made in the single-cell analysis is not always clear in the manuscript, such as why genes not-specific to MENs were used to identify the MEN cluster and how genes were selected for subsequent analysis (although both issues are explained better in the authors' response to reviewers), they in the end identify a single large cluster that has the characteristics of MENs (it expresses both neuronal and mesodermal markers) that is (by immunohistochemistry) broadly associated with the previously described tissue MENs.

      The standard methods for the delineation of clusters in single-cell sequencing data (which the authors use) are stochastic and defy statistical interpretation, and the way these data and analyses are used is often subjective. The editor shares the reviewers' confusion about aspects of the analysis, but also finds the authors' assertions that they have described a cluster of cells that express both neuronal and mesodermal genes, and that this cluster corresponds to the tissue MENs described in lineage tracing, to be broadly sound.

      The biggest weakness in the single-cell data and analysis - identified by all reviewers - is the massive overrepresentation of MENs relative to NENs. The authors' explanation - that some cells are more sensitive to manipulations required to prepare cells for sequencing - is certainly well-represented in the literature and is therefore plausible. But it isn't fully satisfactory, especially because it undermines the notion that the MENs and NENs are functionally equivalent (though one could argue in response that increased fragility of NENs is why they are progressively replaced by MENs).

      There are many additional questions about the single cell analysis that are difficult to resolve with the data in hand. I think everyone would agree that an ideal analysis would have more cells, deeper sequencing, and comprehensive validation of the identity of each cluster of cells. But given the time and expense required to carry out such experiments, we cannot demand them, and must take the data for what they are rather than what they could be. And in the end, it is the editors' view that these data and analyses bolster the authors' claims, without conclusively establishing them. That is, these data should neither be dismissed nor, on their own, considered definitive.

      * Are there alternative explanations for the data than that they are mesodermally derived neurons? *

      As discussed above, the reviewers generally agree that the lineage tracing experiments are careful and well-executed, and the authors have provided data that demonstrates that the data are highly unlikely to be due to either incomplete or ectopic lineage marking. The reviewers raise several possible alternative hypotheses, some based on the literature and some based on the genomic data. The authors discuss each in detail in their response. The editor would note that, at this stage in the history of single-cell analysis, the criteria for using single cell sequencing data to establish cell type and cell origin is are not well established, and that neither the presence nor absence of specific sets of genes in single cells should not, for both technical and biological reasons, be considered dispositive as to identity.

      * Additional aspects of paper: *

      There are additional intriguing aspects of the paper, especially the increase in the number of MENs relative to NENs over time, suggesting functional replacement of one population with the other, and some evidence for and speculation about what might be regulating this evolution. However these are somewhat secondary points relative to the central question at hand of whether the authors have discovered a population of mesodermally derived neurons.

      * Editor's summary and comment: *

      The editor believes it is a fair summary to say that the authors believe they have gone to great lengths to provide multiple lines of evidence that support their hypothesis, but that these reviewers, while appreciating the potential importance of the authors' discovery of an unusual cell type, are not yet convinced of its origin.

      In an ideal world, the authors, reviewers and editor would all ultimately agree on what claims the data presented in a paper supports, and indeed this is what the traditional journal publishing system tries to achieve. But the system fails in cases like this where no consensus between authors and reviewers can be reached, as it neither makes sense to "accept" the paper and imply that it has been endorsed by the reviewers, nor to "reject" it and keep the work in peer review limbo.

      There is certainly enough here to warrant the idea and the data and arguments behind it being digested and considered by people in the field. It may very well be that the authors - who have spent years working on this problem and likely know more about this population of cells than anyone on Earth - are right that they have discovered something that changes how we think about the development of the nervous system. To the extent the reviewers are representative, people are likely to need additional data to be convinced. But it is time to put that to the test.

    1. Reviewer #3 (Public Review):

      This manuscript aims to exploit experimental measurements of the extracellular voltages produced by colliding action potentials to adjust a simplified model of action potential propagation that is then used to predict the extracellular fields at axon terminals. The overall rationale is that when solving the cable equation (which forms the substrate for models of action potential propagation in axons), the solution for a cable with a closed end can be obtained by a technique of superposition: a spatially reflected solution is added to that for an infinite cable and this ensures by symmetry that no axial current flows at the closed boundary. By this method, the authors calculate the expected extracellular fields for axon terminals in different situations. These fields are of potential interest because, according to the authors, their magnitude can be larger than that of a propagating action potential and may be involved in ephaptic signalling. The authors perform direct measurements of colliding action potentials, in the earthworm giant axon, to parameterise and test their model.

      Although simplified models can be useful and the trick of exploiting the collision condition is interesting, I believe there are several significant problems with the rationale, presentation, and application, such that the validity and potential utility of the approach is not established.

      Simplified model vs. Hogdkin and Huxley<br /> The authors employ a simplified model that incorporates a two-state membrane (in essence resting and excited states) and adds a recovery mechanism. This generates a propagating wave of excitation and key observables such as propagation speed and action potential width (in space) can be adjusted using a small number of parameters. However, even if a Hodgkin-Huxley model does contain a much larger number of parameters that may be less easy to adjust directly, the basic formalism is known to be accurate and typical modifications of the kinetic parameters are very well understood, even if no direct characterisations already exist or cannot be obtained. I am therefore unconvinced by the utility of abandoning the Hodgkin-Huxley version.

      In several places in the manuscript, the simplified model fits the data well whereas the Hodgkin-Huxley model deviates strongly (e.g. Fig. 3CD). This is unsatisfying because it seems unlikely that the phenomenon could not be modelled accurately using the HH formulation. If the authors really wish to assert that it is "not suitable to predict the effects caused by AP [collision]" (p9) they need to provide a good deal more analysis to establish the mechanism of failure.

      (In)applicability of the superposition principle<br /> The reflecting boundary at the terminal is implemented using the symmetry of the collision of action potentials. However, at a closed cable there is no reflecting boundary in the extracellular space and this implied assumption is particularly inappropriate where the extracellular field is one objective of the modelling, as here. I believe this assumption is not problematic for the calculation of the intracellular voltage, because extracellular voltage gradients can usually be neglected, but the authors need to explain how the issue was dealt with for the calculation of the extracellular fields of terminals. I assume they were calculated from the membrane currents of one-half of the collision solution, but this does not seem to be explained. It might be worth showing a spatial profile of the calculated field.

      Missing demonstrations<br /> Central analytical results are stated rather brusquely, notably equations (3) and (4) and the relation between them. These merit an expanded explanation at the least. A better explanation of the need for the collision measurements in parameterising the models should also be provided.

      Adjusted parameters<br /> I am uncomfortable that the parameters adjusted to fit the model are the membrane capacitance and intracellular resistance. These have a physical reality and could easily be measured or estimated quite accurately. With a variation of more than 20-fold reported between the different models in Appendix 2 we can be sure that some of the models are based upon quite unrealistic physical assumptions, which in turn undermines confidence in their generality.

      p8 the values of both the extracellular (100 Ohm m) and intracellular resistivity (1 Ohm m) appear to be in error, especially the former.

      (In)applicability to axon terminals<br /> The rationale of the application of the collision formalism to axon terminals is somewhat undermined by the fact that they tend not to be excitable. There is experimental evidence for this in the Calyx of Held and the cerebellar pinceau. The solution found via collision is therefore not directly applicable in these cases.

      Comparison with experimental data<br /> More effort should be made to compare the modelling with the extracellular terminal fields that have been reported in the literature.

      Choice of term "annihilation"<br /> The term annihilation does not seem wholly appropriate to me. The dictionary definitions are something along the lines of complete destruction by an external force or mutual destruction, for example of an electron and a positron. I don't think either applies exactly here. I suggest retaining the notion of collision which is well understood in this context.

    1. Author Response

      We thank the Editor and the Reviewers for the kind words, the helpful suggestions, and the points of critique, which have all helped us substantially strengthen the manuscript. We have made the aesthetic changes requested by Reviewer 2.

      Response to Reviewer 2

      We thank the Reviewer for their thorough feedback. We provide point by point responses below.

      Concern 1

      In paragraph 4.2, I found it unclear why the authors find it unsurprising that different experiments would correspond to different betas. I think that this point should be discussed, as beta and N appear in combination in determining the interaction strength. Otherwise, they could try to fit all distributions with the same beta, which would be more natural for me. I guess that the fits would be anyway good to the eye, though quantitatively suboptimal (which could be quantified with the distance introduced).

      The reviewer raises valid concerns since as shown in Fig 3, the chosen values for beta, the additional fitting parameter introduced in the agent-based simulation, are: β = 0.18, 0.13, 0.12 and 0.64 respectively for N = 5, 10, 15, 20. We (RS, OM, and OP) find it intriguing that the optimum beta clusters around similar values for N = 5, 10, 15, while the optimum beta for N = 20 is significantly different. We acknowledge that we do not have an explanation why the fitted parameters values are what they are but note that the fitting curve is flat, implying that several beta values could possibly achieve a satisfactory fit. While further agent-based simulations could explore these findings more systematically, we believe that investigating this matter is outside the scope of this paper. Instead, we have acknowledged these points explicitly in the revised discussions.

      Portion added to discussions: “As shown in Fig. 3, the chosen values for beta, the additional fitting parameter introduced in the agent-based simulation, are: β = 0.18, 0.13, 0.12 and 0.64 respectively for N = 5, 10, 15, 20. Perhaps it is intriguing that the optimum beta clusters around similar values for N = 5, 10, 15, while the optimum beta for N = 20 is significantly different. While we do not currently have an explanation for why the fitted parameter values are what they are, we note that the fitting curve is flat, implying that several beta values could possibly achieve a satisfactory fit. Further agent-based simulations could explore these findings more systematically, and provide useful insights.”

      Concern 2

      Citation of previous work on dynamical quorum sensing (lines 51 & 52) I think misses two important points: first these works (and others following them) deal with the appearance of collective oscillations at high density (therefore, the same general problem addressed here); second, Taylor et al. studied also a transition where the oscillators involved did not oscillate at low density, whereas above a density threshold, they display coherent collective oscillations whose period decreases with density - similar to what observed here. I do not think this takes anything away from the originality of this work, which refers to a different system, and models it with different equations, but the parallelism between integrate-and-fire dynamics with quenched noise and excitable dynamics in the presence of noise should in my opinion not be overlooked.

      We have explicitly mentioned this in the revised text.

      Concern 3

      As the authors stress in lines 105 and 132, the analytical model shows that all that really matters in this phenomenon is the fastest frequency of the system. This could be used as an argument to say that the actual frequency distribution of individual fireflies is not all that important, as long as their fastest frequency is comparable. The assumption that they are identical would then sound less radical. Ideally, one could use the numerical simulations to check this, as well as the fact that the phenomenon does not break down when the shortest individual interburst interval Tbmin is narrowly distributed (which could also explain why having a few individuals who can flash at a higher frequency does not affect the outcome).

      We thank the reviewer for these observations.

      Concern 4

      I still feel that the agreement between the model and observations is a bit overstated (line 120). At least, I think the authors may stress that whereas the model predicts that the frequency of the 7-14 minutes oscillations should increase a lot with N, this is not observed in the data. Maybe this mismatch would be reduced if inter-individual variability was added.

      Please see the last three paragraphs of the discussion section. In reality, as the swarm size increases, we expect that swarms will no longer be all-to-all connected, and the dynamics of the system will depend upon the speed of propagation of information across the swarm. Precisely how this happens is outside of the scope of the current experimental work and theoretical description presented here.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In the manuscript entitled "Aurora A mediated new phosphorylation of RAD51 is observed in Nuclear Speckles", the authors unveil the Serine S97 as a novel phosphorylation site of the RAD51 recombinase and that this phosphorylation is mediated by the Aurora A kinase using a set of in vitro and in cellulo experiments. The authors also describe this phosphorylation being in the nucleus specifically in nuclear speckles where mRNA maturation and splicing occurs suggesting a role of RAD51 in the latter. The confocal microscopy images provided to test this hypothesis are convincing. However, using confocal images as well, the authors claim that RAD51 phosphorylated at S97 foci do not colocalize with the DNA damage marker -H2AX, hence a function not related to DNA damage, however the data provided does not fully support this statement. In this study, Alaouid et al, utilize mutants of RAD51 that alter S97 phosphorylation to further study its function and provide data that support RAD51 as an RNA binding protein. Overall, the manuscript shows some interesting observations that are worth pursuing however the in vitro and in cellulo results are not aligned, lack some controls, and many points should be reconsidered.

      Major comments:

      • Are the key conclusions convincing?

      Not as stated.

      Fig. 1A. The authors conclude that pS97-RAD51 favors RAD51 strand invasion capacity using the D-loop assay. Indeed, the S97D phosphomimic increased the D-loop activity 2.5-fold compared to WT RAD51. However, the S97A mutant, which is the non-phosphorylated form also increased the D-loop activity by 2-fold compared to WT (figure 1C). So, the phosphorylation or the absence of it seem to promote strand invasion. So, what is the role of the phosphorylation? There is no discussion about this. Besides, no representative image of the D-loop assay is shown, this is very important as these experiments need to be run with the relevant controls to be meaningful.

      Fig. 1D. The polymerization rate of RAD51 is probably irrelevant for its function in the absence of DNA. What do they want to get at with this assay?

      In figure 2B, the authors conclude that RAD51 phosphorylation at S97 is dynamically regulated throughout the cell cycle. Indeed, the pS97-RAD51 is well observed in asynchronous cells, and the double thymidine block time course experiment followed by PI staining shows the oscillation of the pS97-RAD51 from G1 to G2/M stage. The authors quantified the ratio of pS97-RAD51/total RAD51 to conclude this. However, it would be more accurate to also divide the above over the intensity of the loading control (tubulin) because in figure 3A for example, they quantified the ratio of pS97-RAD51/tubulin but did not consider the levels of RAD51 in their quantifications.

      In figure 3B, the authors state that pS97-RAD51 is decreased after CPT treatment and that the pS97-RAD51 foci do not localize with the DNA damage marker -H2AX. The signal of gH2AX is already weird as it does not change from Ctrl to CPT conditions (especially in HCC1806 cells). A pre-extraction of soluble protein with CSK should be used to then look at the co-localization, with the pan-staining of the two signals is difficult to draw any conclusions of colocalization. Nevertheless, the signal of RAD51 seems equal in all conditions in the images shown and it does not seem to be reduced after CPT.

      In figure 4A, the authors show that Aurora A is responsible for the S97-RAD51 phosphorylation in cellulo. Indeed, the use of an Aurora A inhibitor reduces the pS97-RAD51 signal, however, this is only true in one cell line (HCC1806) but no effect was observed in HeLa cells. Is this effect cell-specific?

      The authors find that RAD51 binds both DNA and RNA and measure the affinities of the RAD51 bearing the S97D and S97A mutations. S97D shows the highest affinity for ssDNA and RNA in Fig. 7A, B, however the opposite is true for dsDNA in Fig 7C, D. All three forms of RAD51 bind RNA although with different affinities however no error bars are shown. The description of the results does not seem accurate. Importantly, these data should somehow correlate/be discussed with respect to the D-loop assay performed in Fig. 1. The authors conclude that the binding to RNA is reduced in S97D-RAD51 suggesting that the pRAD51 that they observe at nuclear speckles would be probably not associated with RNA at these nuclear speckles, right? this goes against their idea of this phosphorylated form being related to RNA splicing... - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The manuscript seems to be in early days and requires lots of editing, rewriting to relate the in vitro and in cell data and make a coherent story - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The authors performed chromatin fractionation to determine the correct localization of the pS97-RAD51 and looked for the phosphorylated form by western blots. But then they confirmed the finding using immunofluorescence. I think it would be more convincing and consistent if the authors do a pre-extraction before the use the antibody because as such, they would be indeed confirming the localization of the protein they are looking at that is specifically in the nucleus.

      As well, in order to test the specificity of the pS97-RAD51 antibody they generated, a simple treatment of the lysates with phosphatases would be a good control for the specificity of their antibody These and the critics mentioned above need to be address. - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      This manuscript is not ready for submission - Are the data and the methods presented in such a way that they can be reproduced?

      Yes. However, the legends of the images are way too concise. - Are the experiments adequately replicated and statistical analysis adequate?

      In Fig. 2B, the authors performed a double thymidine block followed by a time course release to track cell cycle progression of the cells and phosphorylation of RAD51 at S97. They do not indicate the biological replicates they performed. There are no error bars in the estimated KD shown in Fig.7.

      Minor comments:

      • Specific experimental issues that are easily addressable.

      The authors conclude that the S97 is specifically phosphorylated by the Aurora A kinase. How? Have they looked at other documented kinases known to phosphorylate RAD51?

      In figure 6 the authors overexpress HA-tagged RAD51 proteins corresponding to WT, S97D and S97A mutants in cells and label them for immunofluorescence. Maybe it would be better to downregulate the endogenous RAD51 to discard possible combined effects.

      In figure - Are prior studies referenced appropriately?

      The authors show in their manuscript that RAD51 protein CAN interact with RNA in vitro, a finding not previously described to my knowledge. However, a recent study entitled "RAD51-dependent recruitment of TERRA lncRNA to telomeres through R-loops, Nature, 2020" provides in vitro data showing the binding of RAD51 to TERRA, a LncRNA, which I think would be worth mentioning their manuscript.

      The authors should mention previous contributions in the field especially when it comes to RAD51 in the HR pathway post DNA damage, which is quite documented and updated. For example, in this section of the introduction, "RAD51 is a recombinase protein implicated in the strand exchange mechanism during the DSB repair by the Homologous Recombination (HR) pathway. In the absence of DNA Damage (DD), RAD51 is predominantly cytoplasmic and translocates to the nucleus during the DNA Damage Response (DDR) to manage HR repair. As it needs the undamaged sister chromatid as a template, the HR repair pathway occurs mainly in the late S, G2 phases of the cell cycle. However, it has been documented that HR repair can also occur during G1 and early S phases, and in this case, the undamaged template used for the repair could be the homologous chromosome or an RNA transcript2". This statement is definitely worth more references.

      The same problem is recurrent in the rest of the introduction; therefore, it needs to be updated and better referenced. - Are the text and figures clear and accurate?

      The text needs a lot of editing to accurately describe the results, see for example: "The resulting KD evaluation shows that the S97D mutant had a dsDNA binding affinity lower to that of the WT (a KD of 2.26 μM for the S97D-RAD51 vs a KD of 0.38 μM for the WT RAD51). Concerning, the S97A mutant comparison to the WT RAD51, we observed modified association and dissociation curves that resulted in an identical affinity to dsDNA (a KD of 0.33 μM for the S97A-RAD51 vs a KD of 0.38 μM for the WT RAD51). We can conclude that in our in vitro conditions, the Ser97 phosphorylation has a high impact on RAD51 affinity for DNA by dividing its affinity by 5.8." Besides, the figures are of low quality and should be more carefully crafted and presented. Some experiments (such as the D-loop) are not represented in the figures.<br /> - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Using a different representation for the graphs would be a plus (also see previous comments)

      Referees cross-commenting

      I think the other reviewers and I have raised very important and complementary points that will help the authors improve the quality of the manuscript substantially.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      The discovery of a new phosphorylation site in RAD51 (S97) by Aurora A is potentially interesting for the field of the maintenance of genome stability as it could broaden the understanding of how such an important recombinase may be regulating the maintenance of genome integrity throughout the cell cycle. Also, the idea of RAD51 being involved in splicing and mRNA maturation seems very attractive and a very important conceptual advance. However, given the premature status of the text and the figures, the manuscript falls short to show convincing evidence. - Place the work in the context of the existing literature (provide references, where appropriate).

      Many works are highlighting the role of RNA binding proteins as an integral part of the DNA damage response. In addition, a wealth of evidence in the literature suggest that many DNA repair proteins are RNA binding proteins, and that RNA is an important player in the DDR. The possible finding that RAD51 interacts with RNA and localize to nuclear speckles possibly acting in splicing is very interesting and attractive. How is Aurora A involved in this, what is the trigger, and whether RAD51 is binding RNA at these sites is still unclear. - State what audience might be interested in and influenced by the reported findings.

      Labs working in genome integrity mechanisms and the crosstalk between transcription and DNA repair would be interested. - Define your field of expertise with a few keywords to help the authors contextualize your point of view.

      Genome Instability, homologous recombination

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the Reviewers for their careful reading and the many thoughtful suggestions to improve our manuscript, as well as both the Editors and Reviewers for the generally positive evaluations and encouraging statements.

      Editorial assessment:

      This important work presents an interesting perspective for the generation and interpretation of phase precession in the hippocampal formation. Through numerical simula- tions and comparison to experiments, the study provides solid evidence for the role of the DG-CA3 loop in generating theta-time scale correlations and sequences, which would be reinforced through the clarification of the concepts introduced in the study, in particular the notion of intrinsic and extrinsic sequences. This study will be of interest for the hippocampus and neural coding fields.

      We appreciate that our work has been considered important. In our revision we made a considerable effort to improve on the presentation of our results and the justification of our model assumptions. Particularly we aimed to clarify the meaning of intrinsic and extrinsic sequences by ad- ditional figure panels as well as fleshing out their definition via spike-timing correlations being independent or dependent on the direction of the running trajectory, respectively. To address all the requests, we added 3 new Fig- ures, multiple new Figure panels and simulated a new model variant.

      Reviewer #1 in their public review assessed ”The manuscript has the potential to contribute to the way we interpret hippocampal temporal coding for navigation and memory.”

      They criticized

      • The findings generally relate to network models of phase precession (re- viewed in e.g., Maurer and McNaughton, 2007, Jaramillo and Kempter, 2017). An important drawback of these models with respect to explaining specific experimentally observed features of phase precession, is that they cannot straightforwardly explain phase precession upon first exposure onto a novel track. This is because, specific connectivity in network models may re- quire experience-dependent plasticity, which would not be possible upon first exposure. This is essential, given that the manuscript addresses the possible origin of phase precession in terms of network models and at minimum, this weakness should be discussed.

      We agree with Reviewer # 1 (and also with Reviewer # 2, who brought up a similar point) that models based on recurrence struggle to ex- plain how the recurrent connectivity matrix should come about. While we feel that a full model of how the 2-d topology in the recurrent weights can be learned goes far beyond the scope of this paper (and to our knowledge has not been solved so far in any existing model), we added a new model variant (new Figure 6 and Supplementary Figure 1), which explains the ba- sic phenomenology of extrinsic and intrinsic sequences without the need of recurrent connections, only using feed-forward synaptic facilitation. Thus, assuming recurrent connection is not necessary for our main findings. How- ever, we would like to point out that this does not exclude the possibility that recurrent connections, if set up in an appropriate way, also contribute to phase precession and theta sequences.

      • An important and perhaps essential component of the manuscript, is the distinction between extrinsic and intrinsic models. However, the main con- cepts on which this hinges, namely extrinsic and intrinsic sequences (and the related extrinsicity and intrinsicity) could be better explained and illustrated. Along these lines, the result suggested by the title, namely, hippocampal theta correlations, may be important yet incidental in light of the new concepts (e.g., extrinsicity, intrinsicity) and computational models (e.g., DG-CA3 recurrent loop) that are put forward.

      We have added substantial new explanatory material to the figures, captions and text to more didactically introduce the concepts of in- trinsicity and extrinsicity. We have also completely rewritten the abstract and added a subtitle: ”extrinsic and intrinsic sequences”

      • The study seems to put forward novel computational ideas related to neural coding. However, assessing novelty is challenging as this manuscript builds on previous work from the authors, including published (Leibold, 2020, Yiu et al., 2022) and unpublished (Ahmadi et al., 2022. bioRxiv) work. For example, the interpretation of intrinsic sequences in terms of landmarks had been introduced in Leibold, 2020.

      We agree with the reviewer that this paper touches on many related ideas from previous papers (not only of our lab) and is supposed to tie loose ends. Thus, the novel contribution is a biologically plausible mechanistic model of how intrinsic sequences and 2-d place maps interact on the level of interconnected spiking neurons. Such a level of explanation has not yet been available in previous work. We have considerably extended the Discussion section in our revision detailing the bigger picture underlying this theory. Also our addition of the non-recurrent model variant (see above) adds considerable novelty, since it provides an account of phase precession and preplay in novel environments.

      • The significance of the readout tempotron neuron could be expanded on. In particular, there is room for interpretation of the output signal of that neuron (e.g., what is the significance of other neurons downstream? Why is the rationale for this output to being theta-modulated?)

      We have added an additional Figure 8 to better illustrate the inner workings of the tempotron. We also extended the discussion to better explain the potential use of the tempotron output (see above). In short, we consider the tempotron to signal a unique behaviorally important context that is independent of remapping induced by changes of sensory cues, which is a new prediction of the model. Since the context signal is resulting from DG loops it requires a stable code to also exits in the DG. Evidence for such long-term stability in DG has been found in Hainmu¨ller & Bartos (2018).

      Reviewer #2 in their public review find ”this research topic to be both important and interesting” and appreciates ”the clarity of the paper.”, com- mending our ”efforts to integrate previous theories into their model and con- duct a systematic comparison”.

      We are very happy about these positive remarks and sincerely would like to thank the reviewer!

      Reviewer #1 made the following specific recommendations for changes:

      The abstract is somewhat difficult to parse. I have identified some words and/or sections that could be improved.

      • ’ ....inherently 1 dimensional’. This statement seems to be related to an a priori interpretation of the authors. On the other hand, if offline sequences are trivially 1 dimensional because they are sequences (i.e., they constitute a vector), then online sequences would be 1-dimensional as well. What is the key difference between offline and online? Is it the omnidirectional place fields in two dimensions? Perhaps more importantly, how relevant is this fact with respect to the main results of the manuscript, which concern ex- trinsic and intrinsic sequences?

      We indeed meant that the sequences are trivially 1-dimensional. The main challenge that we would like to address in this paper is how a 2-d topology of place cells (and direction dependent theta sequences) and a 1-d sequence topology of intrinsic theta correlations and during (p)replay can be reconciled. We hope this has become clearer in the rewritten abstract.

      • The language in lines 36-38 is overly technical. I suggest modifying the language, the language was less technical and more understandable in the body of the manuscript, which should be also reflected in the Abstract.

      We would would like to apologize for making the abstract too technical. Also in response to Reviewer #2, we decided to rewrite the ab- stract entirely.

      The authors use a mixture of conductance based models and Izhikevich neurons, presumably for the spiking generating mechanism. The conductance component can be readily interpreted in terms of the underlying biophysics. The Izhikhevich neuron model, however, is phenomenological. I suggest you address i) the rationale for using Izhikevich model, 2) its biophysical inter- pretation, 3) and its combination with conductance-based currents.

      The reviewer is correct that spike generation is modelled using Izhikevich’s model whereas synaptic integration is included in a conductance- based manner. As suggested by the reviewer, we have added further expla- nation in the Methods part, explaining that the Izhikevich approach allows to adjust burst firing properties with only few parameters by efficiently em- ulating the bifurcation structure of spike generation in the full biophysical model (1&2) and otherwise has no effect on the integration of conductance- based synaptic currents in a subthreshold regime (3).

      Line 126: when you say preferred angle, do you mean preferred (heading) direction? If so, please maintain consistency throughout.

      We thank the reviewer for pointing out the inconsistency. We have added the word ”heading” throughout the manuscript whenever ap- propriate. To further improve the consistency, we have clarified the meanings of ”best” (or ”worst”) direction and reserved the use of it solely for cases when trajectory direction is compared with the preferred heading direction, namely, ”best” (”worst”) direction when trajectory is along (opposite) the preferred heading direction.

      Line 174: When discussing cross-correlation, sometimes you mean a cross-correlation function between two place fields and sometimes to the his- togram of all such correlations? Please clarify.

      We used histograms to empirically estimate the underlying cross-correlation function. For clarity, we have specified that it is a cross- correlation histogram in the revised manuscript whenever we refer to the empirical estimate.

      Figure 3:

      Understanding the difference between extrinsic and intrinsic sequences is fundamental for the manuscript. I suggest that in the section that refers to Figure 3 (or Figure 3 itself), you kindly provide an example depicting how extrinsic and intrinsic sequences can

      1) coexist yet be distinctly identified

      2) depend on trajectory

      3) depend on DG input

      By coexistence, we meant the heterogeneous population of ex- trinsic and intrinsic cell pairs and, hence, the extrinsic and intrinsic theta correlations, as shown in Figure 3J. To improve the clarity, we added the following sentence in the section that refers to Figure 3: ”In our simula- tion, extrinsically and intrinsically driven cell pairs are both present in the population (Figure 3J), indicating a coexistence of extrinsic and intrinsic sequences.”. To illustrate how extrinsic and intrinsic sequences depend on both tra- jectory and DG recurrence, we have also added annotations in Figure 3F to mark the extrinsic and intrinsic part of the sequence.

      Moreover, the caption of Figure 3 refers to the directionality of the theta sequences. How does this again relate to the extrinsic/intrinsic distinction?

      We hope the highlighting in panel F of Figure 3 has resolved this problem.

      Figure 5:

      • This is a crucial figure that should illustrate the differences between extrinsic and intrinsic sequences, as the figure caption suggests. Surprisingly, it is not at all clear where (i.e., in which panel) and how (i.e., methodologi- cally) should one distinguish one type of sequence from another. I suggest that at least one such panel is dedicated to illustrating the difference and/or detection of these sequences in time and/or from phase precession plots. Moreover, there is significant visual crowding that makes the interpretation challenging (e.g., insert a space between G and E)

      We would like to apologize that in the previous version of the manuscript, we seemed to have evoked the impression that the difference between intrinsic and extrinsic sequences should be mainly illustrated in Figure 5. We hope that our revisions of Figures 1 and 3 have made it sufficiently clear to this point. The main purpose of Figure 5 was (and is) to illustrate how intrinsic sequences can lead to out-of-field firing. We have modified the figure caption (and text) accordingly. To address the visual crowding problem in Figure 5, we have inserted a space between panels and also removed repeated labels.

      Tempotron neuron and Figure 6:

      From the reviewer’s questions on Figure 6, we feel that our presentation caused considerable confusion about the motivation and inter- pretation of the tempotron simulations. We therefore rewrote parts of the associated text and Figure caption. We hope that the revised presentation clarifies the issues. We therefore only briefly respond to the reviewer’s points here, because we think they largely resulted from misunderstandings.

      • Intuitively, and as the manuscript results suggest, late phases are asso- ciated to extrinsic mechanisms while early phases are associated to intrinsic. Why not construct a simpler classifier readout based on this fact? How does it compare to a tempotron?

      Opposite to the reviewer’s comment, extrinsic mechanisms are visible at early phases (late in the field), intrinsic mechanisms at late phases (early in the field). In fact, what the tempotron does is learning to identify the intrinsic (late phase) part and to disregard the extrinsic (early phase) part.

      • What is the significance of theta-modulated output of the tempotron (readout) neuron?

      The theta modulation of the tempotron output is a trivial re- sult of the theta-modulation of the input, i.e., the detection of the intrinsic sequence pattern is done once every cycle.

      Suggestion for Figure 6 related to Tempotron readout: Focus on ’with DG loop condition’, as the challenge and most important point here is to identify extrinsic and intrinsic sequences. The No-loop condition could be left as a supplementary figure or side panel.

      The no-loop condition is the essential control showing that the tempotron only responds to the previously learned intrinsic pattern and can- not identify spatial location based on the extrinsic pattern.

      Further work/predictions.

      Lines 196-198. ”Since intrinsic sequences can also propagate outside the trajectory (Figure 5) and activate place cells non-locally, our model predicts direction-dependent expansion of place fields.” If remote activation is ’suffi- ciently’ remote, wouldn’t this predict two separate place fields instead of an expansion?

      The reviewer is completely correct. Out of field spiking can be also affecting remote locations, if the intrinsic sequences link to remote place fields. This would lead to double fields, however, the intrinsic part would only be active at late theta phases. For simplicity, we have not added such a case in our paper, but we would like to thank the reviewer for this comment, since it leads to a nice prediction of the model, which can be experimentally tested and therefore was included to the discussion.

      Lines 556-558. ”In our model, firing rate is determined by both low-phase spiking from sensory input and high-phase spike arrivals of DG-CA3 loops, both producing opposing effects on the phase distribution.” Is it possible to make a differential prediction based on lesions here, e.g., along the lines of reduced range phase precession, for either high phases or for low phases?

      We thank the reviewer for this great suggestion. Lesion of DG in the model does indeed reduce the phase range and mean spike phase. This further corroborates the effect of DG-loop on theta compression and high-phase spiking. We have included a new panel D in Figure 4 and a corresponding mention in the result section.

      Line 570. ”We speculate that the functional roles of intrinsic sequences may not be limited to spatial memories.”. Is there any relationship to re- play and/or sleep-dependent memory consolidation? Some speculation in the Discussion section would be welcome and appropriate.

      We have added some further speculative ideas to the last section of the Discussion. We propose that replay and preplay reflects the intrinsic sequences that express the current expectation of the animal. We have not yet thought well enough about their relation to memory consolidation to phrase this in the manuscript, but would suggest that they could serve to signal multimodal context information to the neocortex where it can evoke retrieval of unimodal memory traces.

      The description of the results, as stated in the public review, can be im- proved. A key component is the definition and identification of extrinsic and intrinsic sequences.

      Some comments:

      • I think that the words ’extrinsic’ and ’intrinsic’ are problematic as both types of sequences/models rely on external (spatial) input, hence both are in some sense ’extrinsic’. On the other hand, both are network mechanisms, thus in some sense ’intrinsic’, where the asymmetry is either programmed directly onto the weights or due to synaptic depression. To add to the con- fusion, ’intrinsic’ mechanisms very often refer to cellular mechanisms in neurophysiology. I kindly ask you to, ideally, reconsider the terminology, or at the very least, be very thorough and precise when describing the mech- anisms. For example, sometimes extrinsic (intrinsic) ’models’ are referred to, sometimes ’sequences’, sometimes ’factors’, sometimes ’pairs’, etc.

      We understand and appreciate the reviewers argument, but would like to stick to the terminology, since it was already used in our prior publication. We have made considerable effort to improve the explanation and illustration of extrinsic vs. intrinsic pairs in the main text, Figure 1 and 3 to highlight our definition that is based on pair correlations: Extrin- sic pairs flip the correlation lag with reversal of running direction, intrinsic pairs don’t. This is simply a functional definition and should not be con- fused with potential microscopic mechanisms. One of those (DG-loops) is suggested in our paper.

      • As discussed in the public review, network mechanisms may require experience-dependent plasticity and hence cannot easily explain phase pre- cession on the first pass. Please discuss why and/or how your model fits with this observation.

      We agree that the two models under consideration both require the recurrent network be set up appropriately and there is no theory so far that would explain how. The reason we chose these two models is because they are well known in the community and relatively similar. We reasoned that comparison between an intrinsic model and an extrinsic model would make most sense if the two are a similar as possible. Nevertheless, we ex- tended the manuscript by a new set of simulations in which we do not use re- current CA3 connections and obtain phase precession solely be feed-forward synaptic facilitation (new Figure 6 and supplementary Figure S1). The new simulations show that the basic phenomenology can also be obtained with- out using recurrent CA3 connections, however, as expected when removing one mechanisms of phase precession, the range of phase range is somewhat reduced as compared to the full model.

      Along a similar vein, phase precession in Figure 1E only has a range of pi/2, which is about half of the typical range of phase precession for single runs. This should be characterized as a weakness of the intrinsic model.

      The precession range in spiking models is highly sensitive to a large number of parameters such that it is hard to make such definite claims (see also above response). In the original Tsodyks et al. 1996 paper the phase range went up to 270 degrees with a slightly different implementation to ours in terms of current vs. conductance-based synapses, an exponen- tial instead of a Gaussian recurrent weight function, and 1-d (original) vs 2-d (ours). We chose conductance-based synapses, and a Gaussian weight profile for better comparison with the Romani and Tsodyks (2015) model. In the original non-spiking implementation by Romani and Tsodyks (2015), the phase range was hardly 70 degrees. Our model implementation of the Romani and Tsodyks (2015) model fits the experimentally reported phase ranges of about 70 to 180 degrees in CA3 (Harris et al., 2001).

      Lines 282-284: ”...since phase precession properties change in relation to running directions, nor are they solely intrinsic since reversal of correlation is still observed in most of the sequences (Huxter et al., 2008; Yiu et al., 2022).”. To which extent is this a consequence of the phase precession model (extrinsic vs intrinsic) or the fact that place fields are sometimes directional?

      The reversal of sequences with reversed running direction is how we define extrinsic correlation. We hope our changes in relation to Figure 1 has clarified this point.

      Figure 2: Is it i) directional input or ii) short-term facilitation that gives rise to lower phase? (or perhaps both?) Please clarify.

      It’s both. This is now clarified in the revised version of the Re- sults sections related to Figure 2: higher depolarization always yields earlier phases in spiking models, however, pair correlations are not affected by ei- ther of the two mechanisms.

      Line 320. ”...onset of phase precession”. Do you mean in CA3/CA1/DG?

      Thank you for pointing this out. We have clarified that this statement refers to CA3.

      Line 323. ”....at a different location”. Please add rationale why it has to be at a different location and a reference to the appropriate equation.

      The sequence rationale as well as the equation number have been added.

      Line 384. ” ... predicting that loss of DG inputs is compensated for by the increase of release probability in the spared afferent synapses from the MEC.”. It wasn’t clear whether this was a ’homeostasis prediction’, or and implementation in the model. Please clarify.

      Since the model explained the experimental observations by implementing an increased probability of release, the model predicts that in animals with DG lesion the probability of release should be enhanced. We have modified the wording to avoid confusion.

      Line 428 ”...and near future locations) is obvious, the potential role of the lesser expressed intrinsic sequence contributions is not straightforward.”. Similar to my comments above regarding terminology, please clarify what are both contributions and why are intrinsic sequences ’lesser expressed’.

      We have rewritten this passage to avoid unclear wording.

      Line 474. ”...we showed that the trajectory-independent sequences”. Do you mean ’intrinsic sequences’?

      We thank the reviewer for careful reading! We have changed the wording ”intrinsic sequences” in the revision.

      Line 482. ”...field pairs being extrinsic”. Please clarify, as the usage of extrinsic now refers to field pairs.

      Thank you for pointing this out. We went through the whole manuscript and clarified the terms.

      Line 245 (heading). Consider rewriting as ’Dependence of theta se- quences on heading directions’. Extrinsic and Intrinsic models have not yet been introduced.

      Since the main purpose of the first Results section is to explain the difference between extrinsic and intrinsic sequences we kept these terms in the heading but modified it to ”Dependence of theta sequences on head- ing directions: Extrinsic and intrinsic sequences”. Additionally, we have put more emphasis on introducing the terms ”extrinsic” and ”intrinsic” in this section.

      Figure 1.

      • I suggest using the same font - C and D, and F and G are too close to each other, consider adding space. For example, the exponent, 10-2 makes reading cumbersome. Line 300. Phase tail means offset phase? Phase tail may be too informal. Line 325: DG loop. Do you mean CA3-DG projection?

      We thank the reviewer for the suggestions. In the revised manuscript, we have ensured that the same font is used in all of the fig- ures. To improve the readability of Figure 1, we have added space between panels as suggested, removed repeated axis label and downsized the text ”10-2”. Furthermore, we have rewritten the referenced line without using the word ”tail”, and also, clarified the meaning of DG loop as the short form of CA3-DG projection.

      Figure 4 caption: ”DG lesion reduces temporal correlations...”. It is more precise to say that the lesion reduces the slope of the fitted lag vs dis- tance. And how is this related to sequence compression?

      In the paragraph referring to Figure 4, we have elaborated on the meaning of theta compression and its relation with the the lag-distance plot. However, we argue that ”reduces the slope of the fitted curve” is not comprehensive enough to express our summarized conclusion in a caption title. We have modified the wording to be ”DG lesion reduces theta compression”.

      In addition, we have changed the slope unit to be radians per cm rather than radians per maximum pair distance, in conformity to unit standards.

      General comment about terminology with regards to tuning and connec- tivity: it is not formally correct to compare connectivity with trajectories (e.g., lines 388-395, caption of Figure 5A, etc). Perhaps compare tuning to particular directions/preference or receptive field?

      We have corrected the wording such that the direction of DG- loop projection is compared to the direction of trajectory.

      Line 470. ’...fixed recursive loop.” Sentence is not clear, do you mean recurrent loops?

      The reviewer is correct. We corrected the wording

      Reviewer #2 had the following recommendations.

      M1. The abstract focuses on the differences between online and offline hippocampal replays. However, the replay topic is not touched upon in the rest of the manuscript. I found this very confusing when I first read the pa- per. I suggest the authors reconsider the best way to approach the opening or at least discuss if and how their model would incorporate replay phenomena.

      Also in response to reviewer #1 we have rewritten the abstract focusing on the problem of how to generate 2-d topology from 1-d sequences. In addition, also in response to Reviewer#1 we added a paragraph in the discussion detailing a hypothesis on how er think replay and intrinsic se- quences work together.

      m2. On lines 89-91, the authors provide the selection of neuronal pa- rameters for excitatory pyramidal cells and inhibitory cells in the Izhikevich model. While the choice of model is reasonable, it would be helpful to clarify the source of these neuronal parameters, especially for readers who are not familiar with the model.

      Again, also in response to reviewer # 1, we have added more motivation for the Izhikevich model.

      M3. On lines 94-98, the model considers a 2D sheet of CA3 neurons. One of the most significant assumptions is that each 2x2 tile of place cells is considered a unit with four directional angles. What is the basis for this assumption? Is there any experimental result supporting this, or is it a completely artificial design for the model? This is important since the or- ganization of CA3 cells also affects the network architecture discussed later and impacts the realism of the model.

      This comment is related to Reviewer #1’s concern on experience- dependent plasticity: How is this connectivity pattern established? We fully agree that this is an open problem for the Tsodyks et al.-type networks. The main reason for choosing them (as argued in our response to reviewer #1) is to have two published models, representing one type of sequence each, that are similar enough for comparison. In addition, we added new simulations (new Figure 6 and Supplementary Figure S1), showing that the basic phe- nomenology can also be obtained in a model without recurrent connections (see also response to Reviewer # 1)

      m4. Similarly, on lines 111 and 140, the model uses 500 ms for the timescales of short facilitation and short-term synaptic depression. The choices of these two timescales are vital for producing directionality in extrin- sic and intrinsic sequences, yet their experimental sources are not clarified.

      In the Methods section of the revised manuscript, we have in- cluded the sources of previous experimental data and modelling work to support our choice of the time constants.

      M5. On line 126, the authors assume that the synaptic strengths be- tween CA3 cells, Wij, are given by the distances between neurons and the similarity between their directional preferences. While this assumption seems reasonable in the sensory cortex, I am unsure if this is also the case in the hippocampus, and the authors should clarify the basis for this assumption.

      The distance dependence simply reflects the original Romani and Tsodyks 2015 model (see response to M3) and we share the concern of the reviewers. The increased connectivity for neurons with the same di- rectional preference was necessary to recover the direction dependent phase precession properties (Figure 2) in the realm of the Romani and Tsodyks 2015 model. Please also see our new Figure 6 showing simulations without the recurrent matrix.

      More importantly, the existing connections within CA3 and DG cells completely determine the ”intrinsic” sequences. But wouldn’t this be fragile when place cells undergo global remapping, which can take place within only a few seconds? The author should comment on this in the discussion.

      We would like to thank the reviewer for bringing up this inter- esting point. In our thinking, the DG-CA3 connectivity is fixed (multiple 1-d trajectories, not necessarily requiring 2-d topology), i.e., the same in- trinsic sequence should show up in multiple environments (and should not remap), although it may just not be active in some environments). This is a prediction of our model and we have added it to the Discussion.

      M6. I found the setup of DG place cells unreasonable. DG place cells are found to be granule cells rather than pyramidal cells. Moreover, the model does not consider recurrent connections between DG cells (These setups are closer to CA1 place cells).

      We agree with the reviewer, DG granule cells should rather be modelled as high-input resistance EIF neurons. However, the feedback loop via the dentate is not a direct one. It involves hilar mossy cells plus multiple hierarchies of feedback inhibition (this is probably what the reviewer means with recurrent connections between DG neurons, because granule cells are not recurrently connected in the non-pathological state). To our knowledge a biologically realistic model of the hilar-DG network does not exist and it would be far beyond the scope of this paper to develop one. We therefore see our DG feedback model rather as phenomenological. The discussion paragraph on the anatomy of the dentate gyrus touches on these points.

      Therefore, a significant concern is: Why should it be the DG feedback projection to CA3 responsible for the ”intrinsic” sequences instead of pro- jections from other brain areas?

      The reviewer is generally correct, any brain structure which im- plements fixed sequences via a loop would do. The reason why we suggest the DG to be the best candidate is purely empirical referring to papers with dentate lesions: Sasaki et al. 2018 and Ahmadi et a. 2022. We have added a similar argument to the discussion.

      m7. On line 166, the authors claim that there are no connections between inhibitory cells at all. While I understand that this is for simplification of the model, the lack of recurrent inhibition between interneurons may have limited the model’s ability to produce gamma-band dynamics (referring to PING and ING mechanisms), which are robust rhythms produced in CA3. I am very curious if the model can incorporate theta-gamma coupling by in- troducing connections between CA3 inhibitory cells.

      We have omitted the gamma oscillation for simplicity, because we do not have a hypothesis for a functional role in the context of dis- tinguishing extrinsic from intrinsic sequences (Occam’s razor) and, as the reviewer correctly anticipates, they unavoidably show up when inhibitory in- terneurons connect to each other (e.g. Thurley et al. 2013). Of course, one could envision situations in which gamma for intrinsic sequences my have different frequency than for extrinsic ones, by differentially manipulating the CA3 and DG basket cell networks, but, as long as there is no experimental data, it would be pure speculation and thus we have not included it in the model.

      m8. The authors should clarify the source of parameters in Table 1, especially the synaptic strengths. These values are vital for extrinsic and intrinsic theta sequences.

      The weight values have been chosen to allow for large theta phase precession range, coexistence of extrinsic and intrinsic sequences, and stability of the network activity. A similar statement has been added to the manuscript.

      M9. I have another concern regarding the measurements of ”extrinsic- ity” and ”intrinsicity” defined on lines 185-196. Are they the best measures? To distinguish the cause of spike correlations, the ”extrinsicity” and ”intrin- sicity” of a pair of spikes should not be high at the same time. However, this is clearly not the case in the model, according to Figs 3 and 5. Moreover, in the data analysis carried out later, spike pairs are considered extrinsic or intrinsic merely by comparing the two measurements. I suggest the authors consider counterfactual methods in causal inference. For example, would a spike pair (cell1, cell2) still exist if we change the sensorimotor inputs or the DG-CA3 projections? If this is difficult to implement, the authors should at least discuss how different choices of measurements would impact the con- clusions of the paper.

      The problem the reviewer has identified arises from the funda- mental symmetry of theta phase quantification: if spikes of a pair of place fields have a phase difference of 180◦ one cannot say which cell leads and which cell follows, hence, the phase difference is both intrinsic (because the peak doesn’t flip) and extrinsic (because the peak flips and ends up at the same phase). The fact that in some cases extrinsicity as well as intrinsicity are high simply means that the field pair has a correlation peak lag close to 180◦. Since in the experimental data set in (Yiu et al. 2022) only field pairs were available, we have not been able to use a different quantification then and decided to apply the same quantification in our model for comparison. Moreover, Figure 5F nicely shows that the measures are able to retrieve the ground-truth intrinsic DG-loop structure when considered on the population level.

      In our model, though, we can go beyond 2-nd order statistics and derive sequence similarity measures including multiple cells, e.g., Chenani et al. 2019. However, since, we already know the ground truth by construction, we decided to not use these methods. We added a paragraph in the discus- sion elaborating on beyond 2nd order sequence quantification.

      m10. The authors begin discussing ”intrinsic sequences” from line 316. However, it is not defined before that (and in the rest of the paper as well), causing confusion when reading the paper. The exact definitions of extrinsic and intrinsic sequences should come earlier.

      We hope that our changes to the beginning of the results section (Figure 1), also asked for by Reviewer # 1 could clarify the confusion.

      m11. On lines 345-347, the authors claim that ”the intrinsic sequences are played out backward as determined by the direction of fixed recurrence (Figure 3F),” which is vague. If such sequences are present in that panel, it should be more explicitly indicated graphically.

      Also in response to Reviewer #1, we have graphically high- lighted the two types of sequences.

      M12. On lines 309, 356, 484, 495, 515, and possibly other instances, the authors repeatedly claim that the model simulations are in ”quantitative agreement” with their previous experimental paper. However, no experimen- tal data or comparison with the simulations are presented in this paper. The authors should at least create one figure to demonstrate the degree of consistency between them, instead of merely asking the reader to refer back to their previous paper.

      We agree with the reviewer that the experimental data of our previous paper should be presented in the manuscript. However, creating more panels or figures is likely to clutter the already crowded visuals and ob- scure our main message. We therefore decided to give numerical comparisons the previous findings in the main text whenever appropriate, specifically, in the sections referring to Figures 2, 3 and in the Discussion.

    1. Author response

      Reviewer #1 (Public Review):

      This careful study reports the importance of Rab12 for Parkinson's disease associated LRRK2 kinase activity in cells. The authors carried out a targeted siRNA screen of Rab substrates and found lower pRab10 levels in cells depleted of Rab12. It has previously been reported that LLOMe treatment of cells breaks lysosomes and with time, leads to major activation of LRRK2 kinase. Here they show that LLOMe-induced kinase activation requires Rab12 and does not require Rab12 phosphorylation to show the effect.

      We thank the reviewer for their comments regarding the carefulness and importance of our work and for their specific feedback which has substantially improved our revised manuscript.

      1) Throughout the text, the authors claim that "Rab12 is required for LRRK2 dependent phosphorylation" (Page 4 line 78; Page 9 line 153; Page 22 line 421). This is not correct according to Figure 1 Figure Supp 1B - there is still pRab10. It is correct only in relation to the LLOMe activation. Please correct this error.

      We appreciate the reviewer’s comment around the requirement of Rab12 for LRRK2-dependent phosphorylation of Rab10 and question regarding whether this is relevant under baseline conditions or only in relation to LLOMe activation. Using our MSD-based assay to quantify pT73 Rab10 levels under basal conditions, we observed a similar reduction in Rab10 phosphorylation when we knockdown Rab12 as we also observed with LRRK2 knockdown (Figure 1A). Further, we see comparable reduction in Rab10 phosphorylation in RAB12 KO cells as that observed in LRRK2 KO cells using our MSD-based assay (Figure 2A and B). Based on this data, we believe Rab12 is a key regulator of LRRK2 activation under basal conditions without additional lysosomal damage. However, as the reviewer noted, we do observe some residual Rab10 phosphorylation upon Rab12 knockdown when assessed by western blot analysis (Figure 1D and Figure 1- figure supplement 1). A similar signal is observed upon LRRK2 knockdown, which may suggest that some small amount of Rab10 phosphorylation may be mediated by another kinase in this cell model. Nevertheless, we appreciate this reviewer’s point and have therefore modified the text to remove any reference to Rab12 being required for LRRK2-dependent Rab phosphorylation and now instead refer to Rab12 as a regulator of LRRK2 activity.

      As noted by the reviewer, our data does suggest that Rab12 is required for the increase in Rab10 phosphorylation observed following LLOMe treatment to elicit lysosomal damage, and we now refer to this appropriately throughout the text.

      2) The authors conclude that Rab12 recruitment precedes that of LRRK2 but the rate of recruitment (slopes of curves in 3F and G) is actually faster for LRRK2 than for Rab12 with no proof that Rab12 is faster-please modify the text-it looks more like coordinated recruitment.

      The reviewer raises an excellent point regarding our ability to delineate whether Rab12 recruitment precedes that of LRRK2 on lysosomes following LLOMe treatment. As noted by the reviewer, we do see both the recruitment of Rab12 and LRRK2 to lysosomes increase on a similar timescale, so we cannot truly resolve whether Rab12 recruitment precedes LRRK2 recruitment in our studies. Based on this, we have modified the text to emphasize that this data supports coordinated recruitment, as suggested, and we have further removed any mention of Rab12 preceding LRRK2. The specific change is as follows “Rab12 colocalization with LRRK2 increased over time following LLOMe treatment, supporting potential coordinated recruitment of these proteins to lysosomes upon damage (Figure 3I). Together, these data demonstrate that Rab12 and LRRK2 both associate with lysosomes following membrane rupture.” and can be found on lines 460-463 of the updated manuscript.

      3) The title is misleading because the authors do not show that Rab12 promotes LRRK2 membrane association. This would require Rab12 to be sufficient to localize LRRK2 to a mislocalized Rab12. The authors DO show that Rab12 is needed for the massive LLOME activation at lysosomes. Please re-word the title.

      To address the reviewer’s concern regarding the title of our manuscript, we have modified the title from “Rab12 regulates LRRK2 activity by promoting its localization to lysosomes” to “Rab12 regulates LRRK2 activity by facilitating its localization to lysosomes” to soften the language around the sufficiency of Rab12 in regulating the localization of LRRK2 to lysosomes. We show that Rab12 deletion significantly reduces LRRK2 activity (as assessed by Rab10 phosphorylation on lysosomes) and significantly increases the localization of LRRK2 to lysosomes upon lysosomal damage. The updated title better reflects the regulatory role of Rab12 in modulating LRRK2 activity, and we thank the reviewer for their suggestion to modify this accordingly.

      Reviewer #2 (Public Review):

      This study shows that rab12 has a role in the phosphorylation of rab10 by LRRK2. Many publications have previously focused on the phosphorylation targets of LRRK2 and the significance of many remains unclear, but the study of LRRK2 activation has mostly focused on the role of disease-associated mutations (in LRRK2 and VPS35) and rab29. The work is performed entirely in an alveolar lung cell line, limiting relevance for the nervous system. Nonetheless, the authors take advantage of this simplified system to explore the mechanism by which rab12 activates LRRK2. In general, the work is performed very carefully with appropriate controls, excluding trivial explanations for the results, but there are several serious problems with the experiments and in particular the interpretation.

      We appreciate the reviewer’s comments regarding the rigor of our work and the potential impact of our studies to address a key unanswered question in the field regarding the mechanisms by which LRRK2 activation is mediated. Our studies focused on the A549 cell model given its high endogenous expression of LRRK2 and Rab10, and this cell line provided a simple system to investigate the mechanism and impact of Rab12-dependent regulation of LRRK2 activity. We agree with the reviewer that future studies are warranted to understand whether similar Rab12-dependent regulation of LRRK2 occurs in relevant CNS cell types.

      First, the authors note that rab29 appears to have a smaller or no effect when knocked down in these cells. However, the quantitation (Fig1-S1A) shows a much less significant knockdown of rab29 than rab12, so it would be important to repeat this with better knockdown or preferably a KO (by CRISPR) before making this conclusion. And the relationship to rab29 is important, so if a better KD or KO shows an effect, it would be important to assess by knocking down rab12 in the rab29 KO background.

      The reviewer raises a good point regarding the importance of confirming that loss of Rab29 has no effect on Rab10 phosphorylation. To address potential concerns about insufficient Rab29 knockdown, we measured the levels of pT73 Rab10 in RAB29 KO A549 cells by MSD-based analysis. RAB29 deletion had no effect on Rab10 phosphorylation, confirming findings from our RAB siRNA screen and the observations of Dario Alessi’s group reported previously (Kalogeropulou et al Biochem J 2020; PMID: 33135724). We have included this new data into our updated manuscript in Figure 1- figure supplement 1 and comment on it on page 6 in the updated Results section.

      Secondly, the knockdown of rab12 generally has a strong effect on the phosphorylation of the LRRK2 substrate rab10 but I could not find an experiment that shows whether rab12 has any effect on the residual phosphorylation of rab10 in the LRRK2 KO. There is not much phosphorylation left in the absence of LRRK2 but maybe this depends on rab12 just as much as in cells with LRRK2 and rab12 is operating independently of LRRK2, either through a different kinase or simply by making rab10 more available for phosphorylation. The epistasis experiment is crucial to address this possibility. To establish the connection to LRRK2, it would also help to compare the effect of rab12 KD on the phosphorylation of selected rabs that do or do not depend on LRRK2.

      The reviewer raises an interesting question regarding whether Rab12 can further reduce Rab10 phosphorylation independently of LRRK2. Using our quantitative MSD-based assay, we observe that pRab10 levels are at the lower limits of detection of the assay in LRRK2 KO A549 cells. Unfortunately, this means that we are unable to detect whether there might be any additional minor reduction in Rab10 phosphorylation with Rab12 knockdown in LRRK2 KO cells. We cannot rule out that Rab12 may play a LRRK2-independent role in regulating Rab10 phosphorylation in other cell lines, and future studies are warranted to explore whether Rab12 knockdown can further reduce Rab10 phosphorylation in other systems, including in CNS cells.

      Regarding exploring the effects of RAB12 knockdown on the phosphorylation of other Rabs, we also assessed the impact of RAB12 KO on phosphorylation of another LRRK2-Rab substrate, Rab8a. We observed a strong reduction in pT72 Rab8a levels in RAB12 KO cells compared to wildtype cells, suggesting the impact of RAB12 deletion extends beyond Rab10 (see representative western blot in Author response image 1). Due to potential concerns with the selectivity of the pT72 Rab8a antibody (potentially detecting the phosphorylation of other LRRK2-Rabs), we cannot definitively demonstrate that Rab12 mediates the phosphorylation of other Rabs. This question should be revisited when additional phospho-Rab antibodies become available that enable us to selectively detect LRRK2-dependent phosphorylation of additional Rab substrates under endogenous expression conditions.

      Author response image 1.

      A strength of the work is the demonstration of p-rab10 recruitment to lysosomes by biochemistry and imaging. The demonstration that LRRK2 is required for this by biochemistry (Fig 4A) is very important but it would also be good to determine whether the requirement for LRRK2 extends to imaging. In support of a causal relationship, the authors also state that lysosomal accumulation of rab12 precedes LRRK2 but the data do not show this. Imaging with and without LRRK2 would provide more compelling evidence for a causative role.

      We thank the reviewer for their suggestion to assess Rab12 recruitment to damaged lysosomes with and without LRRK2 using imaging-based analyses to add confidence to our findings from biochemical approaches. To address this comment, we have imaged the recruitment of mCherry-tagged Rab12 to lysosomes (as assessed using an antibody against endogenous LAMP1) and observed a significant increase in Rab12 levels on lysosomes following LLOMe treatment. This occurs to a similar extent in LRRK2 KO A549 cells, suggesting that Rab12 is an upstream regulator of LRRK2 activity. This new data has been incorporated into the revised manuscript (Figure 3E) and is presented on page 20 of the updated manuscript.

      Our conclusions on this are further strengthened by new data assessing Rab12 recruitment to lysosomes using orthogonal analysis of isolated lysosomes biochemically. Using the Lyso-IP method, we observed a strong increase in the levels of Rab12 on lysosomes following LLOMe treatment that was maintained in LRRK2 KO cells. These data have been added to the updated manuscript (new data added to Figure 3- figure supplement 1).

      Together, these data support our hypothesis that Rab12 recruitment to damaged lysosomes is upstream, and independent, of LRRK2.

      The authors also touch base with PD mutations, showing that loss of rab12 reduces the phosphorylation of rab10. However, it is interesting that loss of rab12 has the same effect with R1441G LRRK2 and D620N VPS35 as it does in controls. This suggests that the effect of rab12 does not depend on the extent of LRRK2 activation. It is also surprising that R1441G LRRK2 does not increase p-rab10 phosphorylation (Fig 2G) as suggested in the literature and stated in the text.

      We agree with the reviewer that it is quite interesting that RAB12 knockdown significantly attenuates Rab10 phosphorylation in the context of PD-linked variants in addition to that observed in wildtype cells basally and after LLOMe treatment. As noted by the reviewer, we did not observe increased levels of phospho-Rab10 in LRRK2 R1441G KI A549 cells at the whole cell level (Figure 2G). However, we observed a significant increase in Rab10 phosphorylation on isolated lysosomes from LRRK2 R1441G KI cells compared to WT cells (Figure 4B). This may suggest that the LRRK2 R1441G variant leads to a more modest increase in LRRK2 activity in this cell model. Previous studies in MEFs from LRRK2 R1441G KI mice or neutrophils from human subjects that carry the LRRK2 R1441G variant showed a 3-4 fold increase in Rab10 phosphorylation (Fan et al Acta Neuropathol 2021 PMID: 34125248 and Karaye et al Mol Cell Proteomics 2020 PMID: 32601174), supporting that this variant does lead to increased Rab10 phosphorylation and that the extent of LRRK2 activation may vary across different cell types.

      Most important, the final figure suggests that PD-associated mutations in LRRK2 and VPS35 occlude the effect of lysosomal disruption on lysosomal recruitment of LRRK2 (Fig 4D) but do not impair the phosphorylation of rab10 also triggered by lysosomal disruption (4A-C). Phosphorylation of this target thus appears to be regulated independently of LRRK2 recruitment to the lysosome, suggesting another level of control (perhaps of kinase activity rather than localization) that has not been considered.

      The reviewer suggests an interesting hypothesis around the existence of additional levels of control beyond the lysosomal levels of LRRK2 to lead to increased Rab10 phosphorylation of lysosomes. Given the variability we have observed in measuring endogenous LRRK2 levels on lysosomes, we performed two additional replicates to assess lysosomal LRRK2 levels in LRRK2 R1441G KI and VPS35 D620N KI cells at baseline and after treatment with LLOMe. We observed a significant increase in LRRK2 levels on lysosomes in cells expressing either PD-linked variant and a trend toward a further increase in the levels of LRRK2 on lysosomes after LLOMe treatment in these cells (Figure 4D in the updated manuscript). We have updated the text on page 24 to reflect this change, suggesting that the PD-linked variants do not fully occlude the effect of lysosomal disruption on the lysosomal recruitment of LRRK2.

      LLOMe treatment leads to a stronger increase in Rab10 phosphorylation on lysosomes from LRRK2 R1441G and VPS35 D620N cells compared to the modest increase in LRRK2 levels observed. This could suggest that, as the reviewer noted, additional mechanisms beyond increased lysosomal localization of LRRK2 may be driving the robust increase in Rab10 phosphorylation observed. We have modified the results section on lines 548-551 to highlight this possibility: “Rab10 phosphorylation showed a more significant increase in response to LLOMe treatment than LRRK2 on lysosomes from LRRK2 R1441G and VPS35 D620N KI cells, suggesting that there may be more regulation beyond the enhanced proximity between LRRK2 and Rab that contribute to LRRK2 activation in response to lysosomal damage.”

      Reviewer #3 (Public Review):

      Increased LRRK2 kinase activity is known to confer Parkinson's disease risk. While much is known about disease-causing LRRK2 mutations that increase LRRK2 kinase activity, the normal cellular mechanisms of LRRK2 activation are less well understood. Rab GTPases are known to play a role in LRRK2 activation and to be substrates for the kinase activity of LRRK2. However, much of the data on Rabs in LRRK2 activation comes from over-expression studies and the contributions of endogenously expressed Rabs to LRRK2 activation are less clear. To address this problem, Bondar and colleagues tested the impact of systematically depleting candidate Rab GTPases on LRRK2 activity as measured by its ability to phosphorylate Rab10 in the human A549 type 2 pneumocyte cell line. This resulted in the identification of a major role for Rab12 in controlling LRRK2 activity towards Rab10 in this model system. Follow-up studies show that this role for Rab12 is of particular importance for the phosphorylation of Rab10 by LRRK2 at damaged lysosomes. Increases in LRRK2 activity in cells harboring disease-causing mutants of LRRK2 and VPS35 also depend (at least partially) on Rab12. Confidence in the role of Rab12 in supporting LRRK2 activity is strengthened by parallel experiments showing that either siRNA-mediated depletion of Rab12 or CRISPR-mediated Rab12 KO both have similar effects on LRRK2 activity. Collectively, these results demonstrate a novel role for Rab12 in supporting LRRK2 activation in A549 cells. It is likely that this effect is generalizable to other cell types. However, this remains to be established. It is also likely that lysosomes are the subcellular site where Rab12-dependent activation of LRRK2 occurs. Independent validation of these conclusions with additional experiments would strengthen this conclusion and help to address some concerns that much of the data supporting a lysosome localization for Rab12-dependent activation of LRRK2 comes from a single method (LysoIP). Furthermore, there is a discrepancy between panel 4A versus 4D in the effect of LLoMe-induced lysosome damage on LRRK2 recruitment to lysosomes that will need to be addressed to strengthen confidence in conclusions about lysosomes as sites of LRRK2 activation by Rab12.

      We thank the reviewer for their comments regarding our work that identifies Rab12 as a novel regulator of LRRK2 activation and the appreciation of the parallel approaches we employed to add confidence in this effect.

      As suggested by the reviewer, we have updated our manuscript to now include independent validation of our conclusions using imaging-based analyses to complement our data from biochemical analyses using the Lyso-IP method. Specifically, we have included new imaging data that confirms that Rab12 levels are increased on lysosomes following membrane permeabilization with LLOMe treatment and demonstrates that this occurs independent of LRRK2, providing additional support that Rab12 is an upstream regulator of LRRK2 activity (Figure 3E in the updated manuscript).

      Regarding the reviewer’s comment on a discrepancy between our findings in Figure 4A and Figure 4D, we have performed additional independent replicates in Figure 4D to assess the impact of lysosomal damage on the lysosomal levels of LRRK2 at baseline or upon the expression of genetic variants. We observed a significant increase in LRRK2 levels on lysosomes following LLOMe treatment in our set of experiments included in Figure 4A and a non-significant trend toward an increase in LRRK2 levels on isolates lysosomes in Figure 4D. As described in more detail below (in response to the second point raised by this reviewer), we think this variability arises because of a combination of low levels of LRRK2 on lysosomes with endogenous expression and variability across experiments in the efficiency of lysosomal isolation. Our observations of increased recruitment of LRRK2 to lysosomes upon damage are further supported by parallel imaging-based studies (Figure 3F-I) and are consistent with previous studies using overexpression systems.

      We thank the reviewer for all of the suggestions which have added further confidence to our conclusions and substantially improved the manuscript.

    1. Author response

      Reviewer #1 (Public Review):

      The potential role of the CaMKII holoenzyme in synaptic information processing, storage, and spread has fascinated neuroscientists ever since it has been described that self-phosphorylation of CaMKII at T286 (pT286) can maintain the kinase in an activated state beyond the initial Ca2+ stimulus that induced kinase activation and pT286. The current study by Lučić et al utilizes biochemical and biophysical methods to re-examine two pT286 mechanisms and finds:

      (1) that a previously proposed activation-induced subunit exchange within the holoenzyme can not provide pT286 maintenance or propagation; and

      (2) that pT286 can occur not only within a holoenzyme but also between two holoenzymes, at least at sufficiently high concentrations.

      For the observation regarding the subunit exchange, the authors go above and beyond to demonstrate that a previously proposed activation-induced subunit exchange does not actually occur in their hands and that the previous appearance of such a subunit exchange may instead be due to activation-induced interactions between the kinase domains of separate holoenzymes. This provides important clarification, as the imagination about the possible functions of this subunit exchange has been running wild in the literature.

      By contrast, pT286 between holoenzymes at sufficiently high concentrations was largely predicted by the previously reported concentration-dependence of pT286 between monomeric truncated CaMKII (although these previous experiments did not rule out that such pT286 could have been excluded for intact full-length holoenzymes). Notably, the reaction rate reported here for pT286 between two holoenzymes is more than two orders of magnitude slower compared to the previously described rate of the pT286 reaction within a holoenzyme.

      The only point on which we disagree (and we think it’s unarguable) is that the current consensus is that inter-holoenzyme phosphorylation simply doesn’t happen (whether or not monomers can phosphorylate each other). The reviewer is of course right that this view seems now less and less likely. We now performed new experiments to investigate this critical point further (see below).

      The probable reason for the discrepancy in reported half-time of phosphorylation measured in earlier reports and in our paper is the fact that earlier reports (for example Bradshaw et al., 2002) measured autophosphorylation rate of wild-type CaMKII holoenzymes, at catalytically-competent enzyme concentrations of 0.1-5 µM. We are reporting the phosphorylation rate of 4 µM kinase-dead CaMKII, which is only a substrate, by 10 nM catalytically competent enzyme (CaMKII wild-type). There is up to 500 times less catalytically competent enzyme in our reactions, which is probably the reason why the reaction itself is several orders of magnitude slower.

      In summary, this study contains two somewhat disparate parts: (1) one technical tour-de-force to provide evidence that argues against activation-induced subunit exchange, which was a tremendous effort that provides influential novel information, and (2) another set of experiments showing the somewhat predictable potential for pT286 between holoenzymes, but without indication for the functional relevance of this rather slow reaction. Unfortunately, in the current/initial title of the manuscript, the authors chose to emphasize the weaker part of their findings.

      We agree with the reviewer that the title should be modified to emphasize both findings of our study. We also hope that our new experiments do bolster our findings with regard to pT286 between holoenzymes, as the reviewer puts it.

      The seemingly slow inter-holoenzyme phosphorylation is only slow under conditions in which one of the proteins is kinase-dead. In situation in which all CaMKII holoenzymes are wild-type and therefore capable of performing phosphorylation (both intra- and inter-holoenzyme) the reaction rates for pT286 are expected to be orders of magnitudes faster, than those reported here for the phosphorylation of T286 on kinase-dead protein.

      Reviewer #2 (Public Review):

      This well-written manuscript provides a technical tour-de-force to provide a novel mechanism for sustaining CaMKII autophosphorylation through an interholoenzyme reaction mechanism the authors term inter-holoenzyme phosphorylation (IHP). The authors use molecular engineering to create designer molecules that permit detailed testing of the proposed interholoenzyme reaction mechanism. By catalytically inactivating one population of enzymes, they show using standard assays that the inactive enzyme can be phosphorylated by active holoenzymes. They go on to show that in cells, the inactive enzyme is phosphorylated only in the presence of co-expressed active CaMKII and that this does not appear to be due to active and inactive subunits mixing within the same holoenzyme. The authors suggest reasons for why previous experiments failed to expose IHP and in some experiments provide evidence that reproduces and then extends earlier studies. Some noted differences from earlier experiments are the reaction temperature, the time course of the reactions, and that significantly higher concentrations of the inactive (substrate) kinase in the present study amplify the IHP. These are plausible reasons for earlier studies not finding significant evidence for IHP and the presented data is well-controlled and of high quality.

      The authors then take on the idea of subunit exchange employing multiple strategies. Using genetic expansion, they engineer an unnatural amino acid into the hub domain of the kinase (residue 384). In the presence of the photoactivatable crosslinker BZF and UV illumination, a ladder of subunits was generated indicating intraholoenzyme crosslinks were established. Using this cross-linked enzyme, presumably incapable of subunit exchange, the authors show significant phosphorylation of the kinase-dead mutant. This further supports that IHP is the cause of phosphorylation and not subunit exchange. Extending these experiments, they could not find evidence when CaMKIIF394BZF was mixed with the kinase-dead mutant and exposed to UV light, that there was evidence of the kinasedead subunits exchanged into CaMKIIF394 (active) enzymes.

      Just a note, instead of residue 384, this should read 394.

      With an entirely different approach, the authors use isotopic labeling of different pools of wt CaMKII (N14 or N15) followed by bifunctional cross-linking and mass spec to assess potential intra- and interholoenzyme contacts. Several interesting findings came of these studies detailed in Figure 4, mapped in detail in Figure 5, and extensively documented in supplementary tables. Critically, numerous crosslinks were found between different domains of the enzyme (catalytic, regulatory, hub) that are themselves a nice database of proximity measurements, but critical to the hypothesis, no heterotypic cross-links were found in the hub domains at any activated state or time point of incubation. This data supports two findings, that catalytic domains come into close proximity between holoenzymes when activated, supporting the potential for IHP, but that no subunit exchange occurs.

      The authors then pursue the approach used originally to provide evidence of subunit mixing, single molecule-based fluorescence imaging. Using pools of CaMKII labeled with spectrally separable dyes, the authors reproduce the earlier findings (Stratton et al, 2016) showing that under activating conditions, but not basal conditions, colocalized spots were detected. Numerous controls were done that confirm the need for full activation (Ca2+/CaM + Mg2+/ATP) to visualize co-localized CaMKII holoenzymes. Extending these studies, the authors mix holoenzymes, fully activate them, and after sufficient time for subunit exchange (if it occurs), the reactions were quenched, and then samples were analyzed. The result was that no evidence of dual-colored holoenzymes was present; if subunits had mixed between holoenzymes, dual-colored spots should have been evident after quenching the reactions. This was not the case. Further, experiments repeated with pools of differentially labeled kinase dead enzymes produced no colocalization, as predicted, if activation of the catalytic domains is necessary to establish IHP.

      Finally, the authors employ mass photometry to investigate the potential for interholoenzyme interactions. At basal conditions, only a mass peak consistent with CaMKII dodecamers was evident. Upon activation, a small fraction of dimeric complexes was evident (with Ca2+/CaM bound) but the majority of the peak was a dodecamer with 12 associated CaM molecules, and importantly, a significant fraction of a mass population was found consistent with a pair of holoenzymes with associated CaM. As an aside, the holoenzyme population appeared to be modestly destabilized as evidence of a minor fraction of dimers appeared as the authors diluted the enzyme, but the pools of holoenzyme and pairs of holoenzymes (with CaM) remained the dominant species when activated under all three enzyme concentrations assessed. Supporting the importance of activation for interactions between holoenzymes, the catalytically dead kinase even under activating conditions, shows no evidence of dimers of holoenzymes.

      Each of the approaches is well-controlled, the data is of uniformly high quality, and the authors' interpretations are generally well-supported.

      We are very grateful for these supportive comments.

      Reviewer #3 (Public Review):

      CaMKII is a multimeric kinase of great biologic interest due to its crucial roles in long-term memory, cardiac pacemaking, and fertilization. CaMKII subunits organize into holoenzymes comprised of 1214 subunits, adopting a donut-like, double-ringed structure. In this manuscript, Lucic et al challenge two models in the CaMKII field, which are somewhat related. The first is a longstanding topic in the field about whether the autophosphorylation of a crucial residue, Thr286, can be phosphorylated between intact holoenzymes (inter-holoenzyme phosphorylation). The second is a more recent biochemical finding, which tested the long-running theory that CaMKII exchanges subunits between holoenzymes to create mixed oligomers. These two models are connected by the idea that subunit exchange could facilitate phosphorylation between subunits of different holoenzymes by allowing subunits to integrate into a different holoenzyme and driving transphosphorylation within the CaMKII ring. Here, the authors attempt to show that one intact holoenzyme phosphorylates another intact holoenzyme at Thr286. The authors also provide evidence suggesting that subunit exchange is not occurring under their conditions, and therefore not driving this phosphorylation event. The authors propose a model where instead of exchanging subunits, two holoenzymes interact via their kinase domains to enable transphosphorylation at Thr286 without integrating into the holoenzyme structure. In order for the authors to successfully convince readers of all three facets of this new model, they need to provide evidence that 1) transphosphorylation at Thr286 happens when subunit exchange is blocked, 2) subunit exchange does not occur under their conditions, and 3) there are interactions between kinases of different holoenzymes that lead to productive autophosphorylation at Thr286.

      Strengths:

      The authors have designed and performed a battery of cleverly designed and orthogonal experiments to test these models. Using mutagenesis, they mixed a kinase-dead mutant with an active kinase to ask whether transphosphorylation occurs. They observe phosphorylation of the kinase-dead variant in this experiment, which indicates that the active kinase must have phosphorylated it. A few key questions arise here: 1) whether this phosphorylation occurred within a single CaMKII holoenzyme ring (which is the canonical mechanism for Thr286 phosphorylation), 2) whether the phosphorylation occurred between two separate holoenzyme rings, and 3) why was this not observed in previous literature? To address questions 1 and 2, the authors implemented an innovative strategy introducing a geneticallyencoded photocrosslinker in the oligomerization domain, which when crosslinked using UV light, should lock the holoenzyme in place. The rate of phosphorylation was the same when comparing uncrosslinked and crosslinked CaMKII variants, indicating that phosphorylation is occurring between holoenzymes, rather than through a subunit exchange mechanism that would require some type of disassembly and reassembly (presumably blocked by crosslinking). The 3rd question remains as to why this has not been previously observed, as it has not been for lack of effort. The authors mention low temperature and low concentration as culprits, however, Bradshaw et al, JBC v. 277, 2002 carry out a series of careful experiments that indicated that autophosphorylation at T286 is not concentration-dependent (meaning that the majority of phosphorylation occurs via intra-holoenzyme), and this is done over a concentration and temperature range. It is possible that due to the mutants used in the current manuscript, it allows for the different behavior of the kinase-dead domains, which will have an empty nucleotide-binding pocket. Further studies will need to elucidate these details, and importantly, understand what physiological conditions facilitate this mechanism.

      We thank the reviewer for their assessment of our work.

      The paper cited by the reviewer (Bradshaw et al, JBC v. 277, 2002) is indeed a carefully designed biochemical investigation of CaMKII activity. As the reviewer pointed out, one of the conclusions of the paper is that the autophosphorylation of CaMKII is not concentration dependent, implying that it has to occur exclusively intra-holoenzyme. However, there are some limitations which colour the interpretation of this classic paper. Bradshaw and colleagues used only CaMKII wild-type protein, so the autophosphorylation which is taking place in their reactions is possible both within holoenzymes and between holoenzymes, but this is impossible to distinguish. The authors of the cited paper then used “Autonomous activity assay” (not any measurement of pT286 on CaMKII itself) in which they first stopped the initial autophosphorylation reaction at T286 by adding a quench solution which contained a mixture of EDTA and EGTA, and then measured phosphorylation of the peptide-substrate of CaMKII (autocamtide-2), in the absence of Calmodulin binding (autonomous activity). They also diluted the autophosphorylation reaction to 10 nM CaMKII before adding it to the “Autonomous activity assay”.

      As a side point, each reaction was quenched and diluted to the same final CaMKII concentration of 10 nM. They measured the activity of this dilution with phosphorylation of a peptide-substrate (autocamptide-2), in the absence of CaM binding. The authors contend that autonomous activity reported in this way reflects the amount of pT286, which is not impossible, but it is not a direct measure of pT286.

      All this adds up to allowing the autophosphorylation of wild-type CaMKII at various concentrations ranging from 0.1 to 4.6 µM in the presence of 10 µM Ca/CaM and 500 µM Mg/ATP. This is a very fast reaction, concentrations of enzyme (CaMKII wild-type), activator (Ca/CaM) and ATP/Mg are all high at the beginning of the autophosphorylation reaction and would expect to allow for maximal autophosphorylation in very short times (seconds). Most importantly, this experiment does not exclude a inter-holoenzyme reaction slower than the intra-holoenzyme one. It certainly could not detect it.

      In any case, to relate these concepts to our experiments and current understanding of CaMKII, we performed a new set of experiments modelled on the Bradshaw paper. Critically, we used CaMKII wild-type as the enzyme, and CaMKII kinase-dead, as the substrate. Intraholoenzyme phosphorylation cannot occur in this reaction, which was designed to detect a concentration-dependent phosphorylation reaction. We used a fixed concentration of the substrate kinase (4 µM), and 4 different concentrations of CaMKIIWT ranging from 0.5 -100 nM. In our assay, the level of phosphorylation on substrate CaMKII(CaMKIIKD) was dependent on concentration of enzyme CaMKII (CaMKIIWT) (Figure 1-figure supplement 3), adding more evidence to the hypothesis that CaMKII autophosphorylation can occur inter-holoenzyme.

      The possibility that empty nucleotide binding pocket is influencing the phosphorylation status of T286 in the regulatory domain of kinase-dead CaMKII is highly unlikely. One could maybe envision that empty nucleotide binding pocket might expose the regulatory domain in kinase-dead CaMKII for phosphorylation, which would be prevented in CaMKIIWT, but in all available structures of CaMKII (Chao et al, 2011; Myers et al., 2017, Buonarati et al., 2021), the regulatory domain is docked to the kinase domain of CaMKII, although the nucleotide binding pocket is empty (either by mutation of residue K42 and/or simply by not adding the ATP/Mg to reduce chemical dispersity of the sample). The only time the regulatory domain was not docked on the kinase domain is when CaMKII was in complex with Calmodulin (Rellos et al., 2010). Finally, in our crosslinking mass spectrometry experiments, we used both heavy and light forms of CaMKII wild-type, and there we can clearly see interactions between kinase/regulatory domains of two different species of CaMKIIWT, which are dependent on activation.

      The most convincing data that subunit exchange does not occur is from the crosslinking mass spectrometry experiment. The authors created mixtures of 'light' and 'heavy' CaMKII holoenzymes, either activated or not and then used a Lys-Lys crosslinker (DSS) to trap the enzyme in its final state. The results of this experiment indicate that subunit exchange is not occurring under their conditions. A caveat here is that there are not many lysines at hub-hub interfaces, which is the crux of this experiment. If there is no subunit exchange under their conditions, how does transphosphorylation occur between holoenzymes? The authors show very nice mass photometry data indicating that there are populations of 24-mers, which corresponds to a double-holoenzyme. Paired with the data from their crosslinking mass spectrometry which shows crosslinks between kinase domains of different holoenzymes, this indicates that perhaps kinases between holoenzymes do interact, and they do so in a competent manner to allow transphosphorylation to occur.

      It is true that there are “only” 6 Lysines in the hub domain of CaMKII. However, it is clear from our crosslinking mass spectrometry data that we can detect hub:hub peptides coming from the same holoenzymes (homocrosslinks, either 14N: 14N or 15N: 15N species), but never between holoenzymes (14N with 15N). The fact that peptides can be detected in the homocrosslinks speaks to the validity of using Lysine crosslinkers in this experiment.

      Weaknesses:

      The authors should be commended for performing three orthogonal experiments to test whether CaMKII holoenzymes exchange subunits to form heterooligomers. However, there are technical issues that dampen the strength of the results shown here. For simplicity, let's consider that CaMKII holoenzymes are comprised of two stacked hexameric rings. It has been proposed that the stable unit of CaMKII assembly and perhaps also disassembly and subunit exchange is a vertical dimer unit (comprised of one subunit from each hexameric ring). In the UV crosslinking data shown in this paper, the authors have a significant number of monomers, some crosslinked dimers (of which there are two populations), and fewer higher-order oligomers. To effectively block subunit exchange, robust crosslinking into hexamers is necessary, which the authors have not done. Incomplete crosslinking results in smaller species that can still exchange (and/or dissociate), confounding the results of this experiment. In addition, Figure 3 shows a trapping experiment, where if the exchange was occurring, there would be an oligomeric band in Lane 8, which is visible and highlighted with a blue arrow by the authors. This result is explained by nonspecific UV effects, however by eye it is not clear if there is an equivalent band in lane 10. The overall issue here is inefficient crosslinking.

      We agree with the reviewer that the robustness of the UV-induced crosslinking is not extremely high. However we do observe higher order oligomers on the gel (Figure 2 and Figure 3B, pT286 blot), which states that at least a portion of the holoenzymes is crosslinked. On the other hand, the UVinduced crosslinking is not slowing down the trans-phosphorylation reaction, which would be expected if the subunit exchange would be the prevailing mechanism for spread of kinase activity between holoenzymes.

      In figure 3, lanes 8 and 10 show a small portion of dimers (less than 5% by densitometry), and at the absolute limit of detection. This dimer band is most likely due to unspecific UV-induced disulfide bridging (we already lessened it by adding 50 mM TCEP prior to UV treatment (Figure 3-figure supplement 1B and C). Previous reviewers of this manuscript criticized the small dimer band in lane 8, and we wanted to address this transparently in the submission to eLife.

      Unfortunately, if we absolutely crank up the contrast to see this band in lane 10, we start to see other features in the noise as well. We have now edited the image in Figure 3B to highlight these minor bands more clearly, but this is also not ideal.

      With regard to the trapping experiment, the overall problem is not inefficient crosslinking, because we see that P-T286 signal is quite nicely represented in higher order bands from F394BzF protein, but kinase dead protein (Avi-tagged signal in Figure 3) is almost entirely absent. Any crosslinking of Avitagged protein (possibly corresponding to subunit exchange) is a minor process at the limit of detection on WB.

      Unfortunately we did not yet find any better crosslinking sites than the two we report (we have tried about 10). But the results we did obtain encouraged us to employ other techniques to probe subunit exchange (for example, the MS X-linking).

      The authors also employ a single-molecule TIRF experiment to further interrogate subunit exchange. Upon inspection of the TIRF images, it is not clear that the authors are achieving single molecule resolution (there are evident overlapping and distorted particles). The analysis employed here is Pearson's correlation coefficient, which is not sufficient for single molecule analysis and would not account for particle overlap, particles that are too bright, and/or particles that are too dim. For example, an alternative explanation for the authors' results is that activation results in aggregation (high correlation), and subsequent EGTA treatment leads to dissociation at these low concentrations (low correlation). However, further experimentation and analysis are necessary.

      In the manuscript we present raw images, not processed. As we wrote in the material and methods, we thresholded the images for further processing. All colocalization methods have drawbacks, but we found that our thresholding combined with the Pearson coefficient was highly reproducible. We did also look at Manders coefficients, but these are less straightforward to understand, whilst still giving in our hands the same answer. We agree, there are more experiments that can be done, with particular predictions based on our new mechanism. And we are doing them and will report them when they are ready.

      At the risk of repeating ourselves, the reversible loss of overlap of the two labelled populations is the key result and cannot be explained by spurious dim or bright particles, or by a few overlapping profiles.

      Taken together, the authors have provided important food for thought regarding inter-holoenzyme phosphorylation and subunit exchange. However, given the shortcomings discussed here, it remains unclear exactly what mechanisms are at play within and between CaMKII holoenzymes once activated.

      We thank the reviewer for their critical assessment of our manuscript. We will continue to investigate the relevant points and refine the overall picture of CaMKII, to better clarify the mechanisms.

    1. Author Response

      Reviewer #1 (Public Review):

      Sučević and Schapiro investigated a neurobiologically inspired model of human hippocampal structure and computation in category learning. In three separate simulations, the model (CHORSE) is presented with learning tasks defined by various category structures from prior work and evaluated for its ability to learn the category structure, generalize categorization to novel stimuli, and accurately recognize previously encountered stimuli. Although originally conceived of as a computational model of associative memory, C-HORSE is demonstrated to quite naturally account for human-like learning of the three category tasks. Notably, the authors characterize the mechanisms underlying the model's learning by way of additional simulations in which "lesions" to the model's monosynaptic pathway (MSP; direct connections between ERC and CA1) are contrasted with lesions to its trisynaptic pathway (TSP; pathway connecting ERCDG-CA3-CA1). These in silico lesions offer key insight into the computational principles underlying theorized hippocampal functions in category learning: whereas MSP provides incremental learning of shared features diagnostic to category membership that are important for category generalization, TSP learns item-specific information that drives recognition behaviour. The authors propose that C-HORSE's successful account of a broad set of category learning datasets provides clear support for the role of complementary hippocampal functions mediated by MSP and TSP in category learning. This work adds compelling computational evidence to a growing literature linking hippocampus to a broader role in cognition that extends beyond declarative memory.

      The model simulations are clear and properly conducted. The three datasets examined offer a relatively broad set of findings from the category learning literature; that the models provide reasonable accounts of human performance in all three speaks to the model's generalizability. Overall, I find this work exciting and an important step in linking longstanding well-established formal learning theories of psychology with neurobiological mechanism. Several weaknesses dampen this excitement, each of which are detailed below:

      1) C-HORSE is presented as a new entry into a rich field of formal computational models of category learning. As noted above, the datasets examined span a broad range of learning contexts and structures and the model's ability to account for learning behaviour is compelling. However, no other models are leveraged to perform a direct evaluation. In other words, CHORSE's predictions are compelling, but is it better than other competing models in the literature? To be clear, C-HORSE offers a novel alternative with its fundamental mechanisms originating from anatomical structure and connectivity. As such, a proof-of-concept showing that such a neurobiologically inspired framework can account for category learning behaviour is a worthwhile contribution in its own right and a clear strength of this paper. However, how to consider this model relative to existing theoretical frameworks is not well described in the manuscript.

      We very much appreciate this point — see response to Editor summary point #3 above.

      2) Relatedly, C-HORSE is evaluated in terms of qualitative fit to behaviour measures from prior studies and in all three simulations restricted to measure of end of learning performance. Again, an appeal to the proof-of-concept nature of the current work may provide an appropriate context for this paper. But, a hallmark of well-established category learning models (e.g., SUSTAIN, DIVA, EBRW, SEA, etc.) is their ability to account for both end of learning generalization (and in some cases, recognition) and behaviour throughout the learning process. C-HORSE does provide predictions of how learning unfolds over time, but how well this compares to human measures is not considered in the current manuscript. Such comparisons would strengthen the support for C-HORSE as a viable model of category learning and help position it in the busy field of related formal models.

      We completely agree about the value of this, and we have added empirical timecourse data for comparison with all simulations, as described in response to Editor summary point #7, above.

      3) A consistent finding across all three simulations is that the TSP provides item-specific encoding. Evidence for this can be inferred by contrasting categorization and recognition performance across the TSP- and MSP-only model variants. In the discussion, the authors draw a parallel between exemplar theories of category learning and the TSP, which is a compelling theoretical position. However, as noted by the authors, unlike exemplar theories, the TSP-only model was notably impaired at categorization. The author's suggestions for extensions to CHORSE that would enable better TSP-based categorization are interesting. But, I think it would be helpful to understand something about the nature of the representations being formed in the TSP-only model. For example, are they truly item-specific, are the shared category features simply lost to heightened encoding of item-unique features, are category members organized similarly to the intact model just with more variability, and so on. Characterizing the nature of these representations to understand the limitations of the TSP-only model seems important to understanding the representational dynamics of C-HORSE, but are not included in the current manuscript.

      The RSA results, now included for Simulations 2 and 3 in addition to Simulation 1, provide the information needed to characterize the nature of the TSP representations. Generally speaking, they are truly item specific, meaning that each item is represented by its own distinct set of units. This is a demonstration of the classic pattern separation function of this pathway, taking similar inputs and projecting them to orthogonal populations of neurons. Simulation 1 is the clearest example of this, where there is virtually no similarity and very low variability in the item similarity structure in DG and CA3. The new Simulation 3 RSA shows us where the limit is to this pattern separation ability of the TSP, with highly typical items being represented by somewhat overlapping populations of neurons in DG and CA3. To the extent that the TSP can succeed in generalization, it seems to involve this pattern separation failure.

      We have made these points more explicit in new discussion of the RSA results:

      • Simulation 1: “In the initial response, there was no sensitivity at all to category structure in DG and CA3 — items were represented with distinct sets of units. This is a demonstration of the classic pattern separation function of the TSP, applied to this domain of category learning, where it is able to take overlapping inputs and project them to separate populations of units in DG and CA3.” • Simulation 3: “As in the prior simulations, DG and CA3 represented the items more distinctly than CA1, and settled activity after big-loop recurrence increased similarity, especially in CA1. This simulation was unique, however, in that DG and CA3 showed clear similarity structure for the prototype and highly prototypical items. There is a limit to the pattern separation abilities of the TSP, and these highly similar items exceeded that limit. This explains why, at high typicality levels, the TSP could be quite successful on its own in generalization (Figure 5e), and why it struggled with atypical feature recognition for these items (Figure 5f).”

      4) In general, a detailed description that links model mechanisms and analyses to the learning constructs of interest for the different simulations is lacking. For example, RSA results for simulation 1 are contrasted for initial and settled representations, but what is meaningful about these two timepoints is not directly stated (moreover, what initial and settled response mean in terms of the current model is not explained). The authors do briefly suggest that differences between initial and settled representations may reflect encoding dynamics before and after bigloop recurrence, but this is not established as a key metric for evaluating the nature of the model representations. In general, more motivation is needed to understand what the chosen analyses reveal about the nature of the model's learning process and representations.

      We have added more description of the motivation for our analyses. See response to Editor summary point #6 above.

      5) I appreciate the comparison in the discussion to extant models of categorization. Certainly, the exemplar and prototype models are fixtures of the category learning literature and they somewhat align with the type of learning that TSP and MSP, respectively, provide. REMERGE and SUSTAIN are also briefly mentioned, but their discussion is limited which is unfortunate as they are actually more functionally equivalent to C-HORSE. I think, however, that the authors are missing an opportunity to discuss how C-HORSE offers a means for bridging levels of analysis to connect neurobiological mechanisms with these notably successful psychological models of category learning. Rather than framing C-HORSE as a competitor to existing models, it should be viewed as an account existing on a different level of analysis. In this sense, it complements existing approaches and potentially extends a theoretical olive branch between the psychology and neuroscience of category learning.

      We love this point about bridging levels of analysis and have added it to our discussion of the model’s relationship to other models, see Editor summary point #3 above.

      6) The discussion takes a broad perspective on covering evidence concerning hippocampal contributions to category learning. Although comprehensive, some sections are not well connected back to the main thrust of the paper. For example, a section on neuropsychological accounts of the hippocampus and category learning summarizes central aspects of this literature but is never reflected on through the lens of the current findings. I do think this prior work is relevant, especially since it a central theme of the hippocampus not being necessary for category/concept learning, but its connection back to the current study is not well argued. Similarly, the section on consolidation and sleep is relevant, but in its current form does not seem to fit with the rest of the paper.

      We have implemented these suggestions through very significant revisions to the Discussion. We now better connect the sections to the main argument of the paper and made cuts throughout, including removing the section on consolidation and sleep.

      Reviewer #2 (Public Review):

      The authors present a model of the hippocampal region that incorporates both the (indirect) trisynaptic and (direct) mono-synaptic pathways from entorhinal cortex (EC) to CA1 - the former incorporating projections from EC to dentate gyrus (DG), DG to CA3, and CA3 to CA1, and exhibiting a higher learning rate. They demonstrate that exposing this network to stimuli consistent with standard empirical tests of category learning (e.g. where within-category exemplars share a set of common features) allows the network to reliably assign both novel and previously encountered stimuli to the correct category (e.g. the network can learn to classify stimuli and generalise this knowledge to new examples). They show that the tri-synaptic pathway (TSP) preferentially supports the encoding of individual exemplars (e.g. analogous to episodic memory) while the mono-synaptic pathway (MSP) preferentially supports category learning.

      The manuscript is well written, the simulation details appear sound, and the results are clearly and accurately presented. This model builds on a long tradition of computational modelling of hippocampal contributions to human memory function, strongly grounded in anatomical and electrophysiology data from both rodents and humans, and is therefore able to link phenomena at the level of individual cells and circuits to emergent behaviour - a major strength of this, and similar, work. However, I have two major concerns relating to the relationship between these findings and previously published work by the same and other authors.

      First, it is not clear to me - from the manuscript - whether these results represent a significant novel advance on previous publications from the same senior author. Figures 1 and 3D are almost identical to figures published in Schapiro et al. (2017) Phil Trans B, and the take-home message (that the MSP might support statistical learning) is the same. In brief, it seems that the authors have subjected an identical network to some new (but related) tasks and reached the same set of conclusions. I see no distinction between learning to extract 'statistical regularities' (in previous work) and learning 'the structure of new categories' (described here). As an aside, demonstrating that an autoencoder network can learn stimulus categories and generalise to new exemplars is also well established.

      We appreciate the opportunity to better articulate the novelty and importance of applying the model to the domain of category learning. There are crucial differences between statistical learning and category learning that make these simulations nontrivial (it did not have to be the case that the results would replicate for these category learning paradigms), and, importantly, many of the insights in the current work are category-learning specific (e.g., the effects of atypical features, trade-offs between generalization and recognition of exemplar-specific features). On the other hand, we of course agree that there are principles in common between statistical learning and category learning that are leading to the consistent findings. We added new material to the Introduction to explain the importance of these new simulations in the domain of category learning, and the value we see in demonstrating convergence across domains. See response to Editor point #1 above.

      Second, I have some concerns with the relationship between the properties of this hippocampal network model and well described properties of single cells in the rodent and human hippocampus. In particular, the CA1 units in this model (and to some extent, also the CA3 units) come to respond strongly to all exemplars from within each category (e.g. as shown in Figure 3D, bottom right panel). This appears to be at odds with the known properties of place and concept cells from the rodent and human hippocampus, respectively, which show little generalisation across related concepts (i.e. the Jennifer Aniston neuron does not fire in response to other actors from Friends, for example). If the emergent properties of this model are not consistent with existing data, then it is not a valid model.

      We appreciate the opportunity to discuss connections to the physiology literature. See response to Editor summary point #2 above.

      More generally, the authors are clear that this model is "a microcosm of [the] hippocampusneocortex relationship" and that the properties of the MSP "mirror those of neocortex". Why not assume that category learning is supported by an interaction between hippocampus and neocortex, then, as in the complementary learning systems (CLS) model? Aside from some correlational fMRI data and partial deficits in hippocampal amnesics - either of which could have a myriad of different explanations - what empirical data is better accounted for by this model than CLS? Put differently, what grounds are there for rejecting the CLS model? To some extent, this model appears to account for less empirical data than CLS, with the exception of a few recent neuroimaging studies (which are hard to interpret at the level of single cells)

      This is an important point for us to clarify, so we very much appreciate this comment. The crucial issue with CLS that motivated the microcosm theory is that the neocortex in the CLS framework learns far too slowly to support the kind of category learning studied in these paradigms, which unfolds over the course of minutes or hours. The neocortex in CLS was proposed to learn novel structure across days, months, and years.

      We have added the following to the Introduction:

      • “Despite its analogous properties, the MSP is not redundant with neocortex in this framework: the MSP allows rapid structure learning, on the timescale of minutes to hours, whereas the neocortex learns more slowly, across days, months, and years. The learning rate in the MSP is intermediate between the TSP (which operates as rapidly as one shot) and neocortex. The proposal is thus that the MSP is crucial to the extent that structure must be learned rapidly.”

      We also have this description in the Discussion:

      • “The MSP in our model has properties similar to the neocortex in that framework, with relatively more overlapping representations and a relatively slower learning rate, allowing it to behave as a miniature semantic memory system. The TSP and MSP in our model are thus a microcosm of the broader Complementary Learning Systems dynamic, with the MSP playing the role of a rapid learner of novel semantics, relative to the slower learning of neocortex.”

      Reviewer #3 (Public Review):

      The current work aimed to determine how the hippocampus may be able to detect regularities across experiences and how such a mechanism may serve to support category learning and generalization. Rapid learning in the hippocampus is critical for episodic memory and encoding of individual episodes. However, the rapid binding of arbitrary associations and one-shot learning was long thought suboptimal for finding regularities across experiences to support generalization, which were instead ascribed to other, slower-learning memory systems. More recent work has started to highlight hippocampal role in generalization, renewing the question of how generalization can be accomplished alongside memory for episodic details within a single memory structure. The current paper offers a reconciliation, presenting a biologically-inspired model of the hippocampus that is able to learn categories alongside stimulus-specific information comparably to human performance. The results convincingly demonstrate how distinct pathways within the hippocampus may differentially serve these complementary memory functions, enabling the single structure to support both episodic memory and categorization.

      Major strengths and contributions

      The paper includes simulation of three distinct categorization tasks, with a clear explanation of the unique aspects of each task. The key results are consistent across tasks, lending further support to the main conclusions of the role of distinct hippocampal pathways in learning specific details vs. regularities. Together with prior work on how the same architecture can support statistical learning in other types of tasks, this work provides important evidence of the broad role of the hippocampus in rapid integration of related information to serve many forms of cognition.

      Throughout the paper, the authors nicely explain in conceptual terms how the same underlying computations may serve all three categorization tasks as well as statistical learning and episodic inference tasks. Thus, the paper will be of broad interest, beyond researchers focused on modeling and/or categorization.

      On a conceptual level, this work provides a fruitful framework for understanding hippocampal functions, representations and computations. It provides a highly plausible mechanistic explanation of how category learning and generalization can be accomplished in the hippocampus and how distinct types of representations may emerge in distinct hippocampal subfields. The framework can be used to derive new testable predictions, some of which the authors themselves introduced. It also provides new insights into how the outputs of different pathways influence each other, providing a more nuanced view of the division of labor and interactions between hippocampal subfields. For example, the big loop recurrence would eventually lead to category influences even on the initially sparse, pattern separated representations in the CA3, which is an idea consistent with empirical observations.

      The presented computational model of the hippocampus is currently the most detailed and biologically plausible hippocampal model easily applicable in the area of cognitive neuroscience and behavioral simulations. The commonalities and differences with other related models (conceptual and computational) are well explained. Both the conceptual and technical descriptions of the model are exceptionally clear and detailed. The model is also publicly available for download for any researcher to use with their own task and data. All these aspects make it likely that other researchers may adopt the model in a wider range of tasks, stimulating new discoveries.

      The autoencoder nature of the model and the use of categorization tasks meant that some measures of interest, like recognition of exemplar-specific information, could not be evaluated by direct reading of the output layer to compare with some label (like old/new). The authors however came up with clever ways how to evaluate recognition performance in each task that was sensible and highlighted the multiple ways how one may think about information contained in neural representations in each layer. This approach can also be utilized by others for evaluating item-specific and category information in activation patterns, for example in analyses of fMRI.

      Finally, I thought the current paper and provided model may also serve as an excellent introduction to computational modeling for those new to this approach. The exceptional clarity of the conceptual and technical description of this model and the clear logic of how one may model a cognitive task and interpret results made this paper fairly accessible. Furthermore, the paper offered new insights and predictions based on analyzing the model's hidden layers, lesion performance, and/or noting some patterns of behavior unique to specific tasks. This was also instructive for highlighting the distinctive contributions that the computational modeling approach can have for furthering our understanding of cognition and the brain.

      We are extremely appreciative of the value the Reviewer sees in this work.

      Weaknesses

      The paper's strengths far outnumbered the weaknesses, that are minor. For one, the selected categorization tasks nicely complemented each other, but only covered stimuli with discretevalue dimensions (features like color, shape, symbol, etc). The degree to which the results generalize (or not) to continuous-value stimuli and different category structures (for instance information-integration or rule-based in COVIS framework) is not clear. How the model could be adjusted for continuous-value stimuli was not specified.

      We agree that the simulation of only discrete valued dimensions is a limitation. We chose to do this simply because it is easier to use discrete values in the model as currently implemented, but future work will certainly need to test whether the model can simulate the various paradigms that make use of continuous-valued dimensions. We have added an explicit acknowledgement of this issue in the Methods:

      • “The inhibition simulates the action of inhibitory interneurons and is implemented using a set-point inhibitory current with k-winner-take-all dynamics (O’Reilly, Munakata, Frank, Hazy, & Contributors, 2014). All simulations involved tasks with discrete-valued dimensions, as these are more easily amenable to implementation across input/output units whose activity tends to become binarized as a result of these inhibition dynamics. It will be important for future work to extend to implementations of category learning tasks with continuous-valued dimensions.”

      There is compelling evidence for the dissociation between different hippocampal pathways and subfields (CA1 vs. CA3) that the model is based on. As the authors noted, there is also compelling evidence for functional dissociations along the long hippocampal axis, with anterior portions more geared towards coarse, generalized representations while posterior towards more detailed, specific representations. The authors nicely pointed out that these proposals of withinhippocampus division of labor are less orthogonal than they may first appear, as there is greater proportion of CA1 in the anterior hippocampus. However, it is premature to imply that this resolves the CA1/CA3 vs. anterior/posterior question; the idea that existing anterior findings may be simply CA1 findings is currently only speculation. Furthermore, first studies indicating that anterior/posterior representational gradients may exist within each subfield are beginning to emerge.

      We completely agree that this is speculative at this point, which needed acknowledgment. See response to Editor summary point #2 above.

    1. Under Our Team can Maternity Leave be replaced with just "on leave" -- i don't think it's necessary to share why they are on leave. May as well add Jazz Cook as well. Also, should we not include our pronouns here?

    1. ourses and programs within the “academic” curriculum emphasize subject-matter knowledge and the development of broadly applicable skills—think history, science, language studies, etc.

      Trades are academic programs. In Hairstyling alone we learn history - how has the trade evolved? Which cultures developed certain styles and why? Science - Formulating colours is chemistry, Mixing disinfectants is math and science! Technical Terminology is language. My trade may not identify as one single category but it includes several dimensions of learning. It offers students the opportunity to indulge in a variety of aspects and perhaps thats why it has become increasingly interesting to students who possess multiple intelligences.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:<br /> In this study, the authors generate a Drosophila model to assess disease-linked allelic variants in the UBA5 gene. In humans, variants in UBA5 have been associated with DEE44, characterized by developmental delay, seizures, and encephalopathy. Here, the authors set out to characterize the relationship between 12 disease-linked variants in UBA5 using a variety of assays in their Drosophila Uba5 model. They first show that human UBA5 can substitute all essential functions of the Drosophila Uba5 ortholog, and then assess phenotypes in flies expressing the various disease variants. Using these assays, the authors classify the alleles into mild, intermediate, and severe loss-of-function alleles. Further, the authors establish several important in vitro assays to determine the impacts of the disease alleles on Uba5 stability and function. Together, they find a relatively close correlation between in vivo and in vitro relationships between Uba5 alleles and establish a new Drosophila model to probe the etiology of Uba5-related disorders.

      Strengths:<br /> Overall, this is a convincing and well-executed study. There is clearly a need to assess disease-associated allelic variants to better understand human disorders, particularly for rare diseases, and this humanized fly model of Uba5 is a powerful system to rapidly evaluate variants and relationships to various phenotypes. The manuscript is well written, and the experiments are appropriately controlled.

      Reviewer #2 (Public Review):

      Relative simplicity and genetic accessibility of the fly brain make it a premier model system for studying the function of genes linked to various diseases in humans. Here, Pan et al. show that human UBA5, whose mutations cause developmental and epileptic encephalopathy, can functionally replace the fly homolog Uba5. The authors then systematically express in flies the different versions of the gene carrying clinically relevant SNPs and perform extensive phenotypic characterization such as survival rate, developmental timing, lifespan, locomotor and seizure activity, as well as in vitro biochemical characterization (stability, ATP binding, UFM-1 activation) of the corresponding recombinant proteins. The biochemical effects are well predicted by (or at least consistent with) the location of affected amino acids in the previously described Uba5 protein structure. Most strikingly, the severity of biochemical defects appears to closely track the severity of phenotypic defects observed in vivo in flies. While the paper does not provide many novel insights into the function of Uba5, it convincingly establishes the fly nervous system as a powerful model for future mechanistic studies.

      One potential limitation is the design of the expression system in this work. Even though the authors state that "human cDNA is expressed under the control of the endogenous Uba5 enhancer and promoter", it is in fact the Gal4 gene that is expressed from the endogenous locus, meaning that the cDNA expression level would inevitably be amplified in comparison. The fact that different effects were observed when some experiments were performed at different temperatures (18 vs. 25) is also consistent with this. While I do not think this caveat weakens the conclusions of this paper, it may impact the interpretation of future experiments that use these tools, and thus should be clearly discussed in the paper. Especially considering the authors argue that most disease variants of UBA5 are partial loss-of-functions, the amplification effect could potentially mask the phenotypes of milder hypomorphic alleles. If the authors could also show that the T2A-Gal4 expression pattern in the brain matches well with that of endogenous RNA or protein (e.g. using HCR-FISH or antibody), it would help to alleviate this concern.

      We thank the reviewer for pointing out this limitation.

      Regarding the humanization strategy we used in the study, we agree that this is a binary system which may lead to overexpression of the target protein. However, as the

      reviewer also points out, this temperature-sensitive system also enables us to flexibly adjust the expression level of the target protein, which is especially useful to study

      partial LoF variants such as the UBA5 variants in this study. In our study we have successfully compared the relevant allelic strength of most of the variants, which

      supports the use of our system in future studies. However, we do agree that the gene dosage effect could vary widely, so it is difficult to directly predict the effects of one variant in humans based upon results obtained in a model organism.

      We agree with the reviewer that a masking effect may exist in our system due to its gene overexpression nature. However, we cannot conclude that this masking effect

      really affects the interpretation of Group IA variants in our tests. The three variants are mild LoF, which is also supported by the biochemical assays. Hence, the variants may not cause any phenotype even when they are expressed at a physiological level.

      Regarding the temporal and spatial expression pattern of the T2A-GAL4, the Bellen lab has generated T2A-GAL4 lines for more than 3,000 genes. The expression pattern of the vast majority of these GAL4 lines faithfully reflects the expression pattern of the endogenous genes, which has been documented in our previous publications (PMIDs 25824290, 29565247, 31674908, 35723254).

      Reviewer #3 (Public Review):

      Summary:<br /> Variants in the UBA5 gene are associated with rare developmental and epileptic encephalopathy, DEE44. This research developed a system to assess in vivo and in vitro genotype-phenotype relationships between UBA5 allele series by humanized UBA5 fly models and biochemical activity assays. This study provides a basis for evaluating current and future individuals afflicted with this rare disease.

      Strengths:<br /> The authors developed a method to measure the enzymatic reaction activity of UBA5 mutants over time by applying the UbiReal method, which can monitor each reaction step of ubiquitination in real time using fluorescence polarization. They also classified fruit fly carrying humanized UBA5 variants into groups based on phenotype. They found a correlation between biochemical UBA5 activity and phenotype severity.

      Weaknesses:<br /> In the case of human DEE44, compound heterozygotes with both loss-of-function and hypomorphic forms (e.g., p.Ala371Thr, p.Asp389Gly, p.Asp389Tyr) may cause disease states. The presented models have failed to evaluate such cases.

      We agree with the reviewer that our model did not reflect the situation of the individuals who are compound heterozygous for a Group IA variant (p.Ala371Thr, p.Asp389Gly, or p.Asp389Tyr) and a strong LoF variant. However, we argue that our results do show that the Group IA variants alone do not cause disease. As discussed in the manuscript, individuals homozygous for the p.Ala371Thr variant are healthy and do not present with obvious phenotype. This is consistent with our findings in flies, and shows that the p.Ala371Thr variant is a mild LoF variant.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the thoughtful suggestions made by the Reviewers. We have addressed all of their comments below, with our responses bulleted and in italics. We believe these changes have helped clarify the manuscript and strengthen it overall.

      Reviewer 1

      1) Figures 1B and Supp. Figure 1A: It would be worth mentioning that the wave-form in the 129 strain in response to QLA starts out like AJ and B6, but transitions to looking like the wild-derived strain. So, although not quite as drastic as the NZO and NOD strains, it is not quite like the other classical inbred strains.

      • We thank the reviewer for pointing this out. We have added further language to clarify the point:

      “Additionally, even with the clear separation between the clusters, inter-strain variation was still observed within the clusters (e.g. more 129 islets had plateau responses to 8G/QLA than the B6 or AJ).”

      2) The figures are generally excellent and really help to clarify the work in the paper. For Figure 2A, it would help even further if you could number the six different Ca++ parameters that are measured. They're all there, but it takes a bit of time to find them on the figure and numbering will make it easier on your reader.

      • We appreciate this suggestion and have implemented it in our revised Figure 2A. The Ca2+ parameters are now numbered, and the description of this figure has been adjusted accordingly in the results section.

      We added the revised text in the results section:

      “To elucidate strain differences in Ca2+ dynamics, we focused on six parameters of the Ca2+ waveform (Figure 2A): 1) peak Ca2+ (the top of each oscillation); 2) period (the length of time between two peaks); 3) active duration (the length of time for each Ca2+ oscillation measured at half of the peak height, also known the oxidative “secretory” phase, or “MitoOx” (8); 4) pulse duration (active duration plus extra time for Ca2+ extrusion); 5) silent duration (the electrically-silent “triggering” phase, also known as “MitoCat” (8), which culminates in KATP closure and membrane depolarization); and 6) plateau fraction (the active duration divided by the period, or the fraction of time spent in the active “secretory” phase).”

      3) Figure 4A, B: I was expecting to see Ca++ vs insulin parameters in the different strains/sexes. In addition to the heat maps, it would be useful to see the regression plots, showing where each strain and sex falls for the insulin and Ca++ parameters.

      • This is an excellent suggestion, and we have added a new Supplemental Figure 5 to provide examples of various strain/sex patterns that drive the correlations used for the heatmap and histogram in Figure 4A and B.

      We added text in the results section referring to this point:

      “Clustering the Ca2+ responses into distinct groups based on our observations of the waveforms (Figure 1B, Figure 4C-E, and Supplemental Figures 1 and 2) also occurs when correlating individual Ca2+ parameters to ex vivo secretion and clinical data (Supplemental Figure 5). For example, the anticorrelation between the 1st frequency component in 8G and percent insulin secreted in 8.3G/QLA (Supplemental Figure 5A) separates the classic inbred, wild-derived, and diabetes-susceptible strains into distinct groups despite the variability in the trait. Correlation between the silent duration in 8G/QLA to insulin secretion in 8.3G/QLA, likewise groups by strain (Supplemental Figure 5B). Finally, some correlations, such as that between 8G/QLA/GIP silent duration and plasma insulin at sacrifice (Supplemental Figure 5C), can be strongly influenced by outlier strains; e.g., NZO. Collectively, these data demonstrate that genetics has a profound influence on key parameters of islet Ca2+ oscillations.”

      4) Please include methods for the insulin measurements collected in Fig. 4.

      • Thank you for pointing out this missing information. We have clarified that prior insulin measurements (plasma insulin and ex vivo static insulin secretion that were used in Figure 4 for correlation analysis) were completed in another previously published cohort of mice (reference 17: Mitok KA, Freiberger EC, Schueler KL, Rabaglia ME, Stapleton DS, Kwiecien NW, et al. Islet proteomics reveals genetic variation in dopamine production resulting in altered insulin secretion. The Journal of biological chemistry. 2018;293(16):5860-77).

      We added this new text (highlighted) to the results section to help clarify this point:

      “Fasting blood glucose and insulin levels were measured in mice at 19 weeks of age, except for the NZO males which were measured at 12 weeks of age. Glucose was analyzed by the glucose oxidase method using a commercially available kit (TR15221, Thermo Fisher Scientific), and insulin was measured by radioimmunoassay (RIA; SRI13K, Millipore). This is the same assay that was used to measure plasma insulin for the previously published cohort used for the correlation analysis in Figure 4 (17).”

      5) In the methods, please include details on the four conditions used for Ca++ imaging of the islets, and the timing for each condition.

      • We appreciate this guidance in clarifying our manuscript, and we have now included the conditions and timing for each condition in the methods section.

      We added the following text to the results section to help clarify this:

      “The solutions included 8 mM glucose (8G), 8 mM glucose + 2 mM glutamine, 0.5 mM leucine, and 1.5 mM alanine (8G/QLA), 8G/QLA + 10 nM glucose-dependent insulinotropic polypeptide (8G/QLA/GIP), and 2 mM glucose (2G), each of which were kept in a 37°C water bath.”

      Reviewer 2

      One major critique is that the authors studied "the human orthologues of the correlated mouse proteins that are proximal to the glycemia-associated SNPs in human GWAS". This implies two assumptions - (1) human and mouse proteins do not differ in terms of islet physiology and calcium signaling; (2) the proteins proximal to the SNPs are the causal factors for functional differences, though the SNPs could affect protein/gene function distant from the SNPs.

      • Thank you very much for highlighting this limitation in our study. We think this is very important to address which we have done in our discussion section.

      We have added the following text to discuss this important issue:

      “Our approach to merge human GWAS with our findings in mouse assumes that the glycemic-related SNPs we nominated alter the abundance or function of the human orthologues. Most SNPs that are strongly associated with phenotypes in human GWAS are noncoding, residing within introns, promoters, 3’UTRs, or intergenic regions (e.g. Figure 6). Therefore, a limitation of our approach is the assumption that SNPs regulate the gene they are proximal to, which is not always accurate (76-78). To infer a more direct link between SNPs and potential target genes, we incorporated human islet chromatin data (37). Physical contact between a region containing SNPs and a distal gene supports a regulatory role, as for ACP1 (Figure 6B). Additionally, SNPs within regions of open chromatin (ATAC-seq) and actively transcribed regions (histone markers) suggest a higher likelihood of regulating transcription factor access. While this approach does not conclusively show a link between the SNPs and expression of the orthologue for our candidate proteins, these chromatin data more strongly suggest that the orthologue expression may be regulated by the candidates’ SNPs.”

    1. Reviewer #3 (Public Review):

      The authors report a study in which they use intracranial recordings to dissociate subjectively aware and subjectively unaware stimuli, focusing mainly on prefrontal cortex. Although this paper reports some interesting findings (the videos are very nice and informative!) the interpretation of the data is unfortunately problematic for several reasons. I will detail my main comments below. If the authors address these comments well, I believe the paper may provide an interesting contribution to further specifying the neural mechanisms important for conscious access (in line with Gaillard et al., Plos Biology 2009).

      The main problem with the interpretation of the data is that the authors have NOT used a so-called "no-report paradigm". The idea of no report paradigms is that subjects passively view a certain stimulus without the instruction to "do something with it", e.g., detect the stimulus, immediately or later in time. Because of the confusion of this term, specifically being related to the "act of reporting", some have argued we should use the term no-cognition paradigm instead (Block, TiCS, 2019, see also Pitts et al., Phil Trans B 2018). The crucial aspect is that, in these types of paradigms, the critical stimulus should be task-irrelevant and thus not be associated with any task (immediately or later). Because in this experiment subjects were instructed to detect the gratings when cued 600 ms later in time, the stimuli are task relevant, they have to be reported about later and therefore trigger all kinds of (known and potentially unknown) cognitive processes at the moment the stimuli are detected in real-time (so stimulus-locked). You could argue that the setup of this delayed response task excludes some very specific report related processes (e.g., the preparation of an eye-movement), which is good, however this is usually not considered the main issue. For example when comparing masked versus unmasked stimuli (Gaillard et al., 2009 Plos Biology), these conditions usually also both contain responses but these response related processes are "averaged out" in the specific contrasts (unmasked > masked). In this paper, RT differences between conditions (that are present in this dataset) are taken care of by using this delayed response in this paper, which is a nice feature for that and is not the case for the above example set-up.

      Given the task instructions, and this being merely a delayed-response task, it is to be expected that prefrontal cortex shows stronger activity for subjectively aware versus subjectively unaware stimuli. Unfortunately, given the nature of this task, the novelty of the findings is severely reduced. The authors cannot claim that prefrontal cortex is associated with "visual awareness", or what people have called phenomenal consciousness (this is the goal of using no-cognition paradigms). The only conclusion that can be drawn is that prefrontal cortex activity is associated with accessing sensory input: and hence conscious access. This less novel observation has been shown many times before and there is also little disagreement about this issue between different theories of consciousness (e.g., global workspace theory and local recurrency theories both agree on this).

      The best solution at this point seems to rewrite the paper entirely in light of this. My advice would be to state in the introduction that the authors investigate conscious access using iEEG and then not refer too much to no-cognition paradigm or maybe highlight some different strategies about using task-irrelevant stimuli (see Canales-Johnson et al., Plos Biology 2023; Hesse et al., eLife 2020; Hatamimajoumerd et al Curr Bio 2022; Alilovic et al., Plos Biology 2023; Pitts et al., Frontiers 2014; Dwarakanth et al., Neuron 2023 and more). Obviously, the authors should then also not claim that their results solve debates about theories regarding visual awareness (in the "no-cognition" sense, or phenomenal consciousness), for example in relation to the debate about the "front or the back of the brain", because the data do not inform that discussion. Basically, the authors can just discuss their results in detail (related to timing, frequency, synchronization etc) and relate the different signatures that they have observed to conscious access.

      I think the authors have to discuss the Gaillard et al PLOS Biology 2009 paper in much more detail. Gaillard et al also report a study related to conscious access contrasting unmasked and masked stimuli using iEEG. In this paper they also report ERP, time frequency and phase synchronization results (and even Granger causality). Because of the similarities in approach, I think it would be important to directly compare the results presented in that paper with results presented here and highlight the commonalities and discrepancies in the Discussion.

      In the Gaillard paper they report a figure plotting the percentage of significant frontal electrodes across time (figure 4A) in which it can be seen that significant electrodes emerge after approximately 250 ms in PFC as well. It would be great if the authors could make a similar figure to compare results. In the current paper there are much more frontal electrode contacts than in the Gaillard paper, so that is interesting in itself.

      In my opinion, some of the most interesting results are not highlighted: the findings that subjectively unaware stimuli show increased activations in the prefrontal cortex as compared to stimulus absent trials (e.g., Figure 4D). Previous work has shown PFC activations to masked stimuli (e.g., van Gaal et al., J Neuroscience 2008, 2010; Lau and Passigngham J Neurosci 2007) as well as PFC activations to subjectively unaware stimuli (e.g., King, Pescetelli, and Dehaene, Neuron 2016) and this is a very nice illustration of that with methods having more detailed spatial precision. Although potentially interesting, I wonder about the objective detection performance of the stimuli in this task. So please report objective detection performance for the patients and the healthy subjects, using signal detection theoretic d'. This gives the reader an idea of how good subjects were in detecting the presence/absence of the gratings. Likely, this reveals far above chance detection performance and in that case I would interpret these findings as "PFC activation to stimuli indicated as subjectively unaware" and not unconscious stimuli. See Stein et al., Plos Biology 2021 for a direct comparison of subjectively and objectively unaware stimuli.

      In Figure 7 of the paper the authors want to make the case that the contrast does not differ between subjectively aware stimuli and subjectively unaware stimuli. However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo. Because several P values are very close to significance I anticipate that a test across subjects will clearly show that the contrast level of the subjectively aware stimuli is higher than of the subjectively unaware stimuli, at the group level. A solution to this would be to subselect trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions.

      Related, Figure 7B is confusing and the results are puzzling. Why is there such a strong below chance decoding on the diagonal? (also even before stimulus onset) Please clarify the goal and approach of this analysis and also discuss/explain better what they mean.

      I was somewhat surprised by several statements in the paper and it felt that the authors may not be aware of several intricacies in the field of consciousness. For example a statement like the following "Consciousness, as a high-level cognitive function of the brain, should have some similar effects as other cognitive functions on behavior (for example, saccadic reaction time). With this question in mind, we carefully searched the literature about the relationship between consciousness and behavior; surprisingly, we failed to find any relevant literature." This is rather problematic for at least two reasons. First, not everyone would agree that consciousness is a high-level cognitive function and second there are many papers arguing for a certain relationship between consciousness and behavior (Dehaene and Naccache, 2001 Cognition; van Gaal et al., 2012, Frontiers in Neuroscience; Block 1995, BBS; Lamme, Frontiers in Psychology, 2020; Seth, 2008 and many more). Further, the explanation for the reaction time differences in this specific case is likely related to the fact that subjects' confidence in that decision is much higher in the aware trials than in the unaware trials, hence the speeded response for the first. This is a phenomenon that is often observed if one explores the "confidence literature". Although the authors have not measured confidence I would not make too much out of this RT difference.

      I would be interested in a lateralized analysis, in which the authors compare the PFC responses and connectivity profiles using PLV as a factor of stimulus location (thus comparing electrodes contralateral to the presented stimulus and electrodes ipsilateral to the presented stimulus). If possible this may give interesting insights in the mechanism of global ignition (global broadcasting), supposing that for contralateral electrodes information does not have to cross from one hemisphere to another, whereas for ipsilateral electrodes that is the case (which may take time). Gaillard et al refer to this issue as well in their paper, and this issue is sometimes discussed regarding to Global workspace theory. This would add novelty to the findings of the paper in my opinion.

    1. We’ve chosen to keep highlights private to avoid pages being cluttered by highlights that have no surrounding discussion. We understand that people may want to share highlights with others, and we think there are effective ways we can address that in the future.

      You would imagine that by now you would be able to share some of your highlights without having to add some weird annotation to it especially when you are trying to share it on a private group.

      it is also quite worrisome that the last time there was a comment about this it was in 2019. it is almost like there's no work or effort put into this one.

      so as much as there's a comment about "thinking of effective ways", there's no clear indication that there's some were going into it.

    1. This is the main concern raised by the public, a risk of large-scale or even unprecedented impacts on public health or the biosphere. This is one example of many: I am extremely concerned that this proposed action could potentially contaminate native life forms on Mars and/or bring back alien virus, bacteria, or other life forms from Mars to Earth. I understand that there are planetary protection protocols. However, Murphy's Law says that if something horrible could happen, it eventually may indeed occur. History is filled with examples where Acts of God and/or human arrogance caused otherwise unforeseen disasters. .... The Earth is already dealing with increasingly serious problems from invasive or alien species being transported to new locations, and viruses mutating and causing deadly pandemics. We have not been able to solve many of these problems. What happens if a Mars life form escapes containment and, without evolving in Earth's ecosystems, spreads uncontrollably and devastates Earth's species including us humans? There might be no way to reverse or even mitigate for that devastation. I support scientific research when it is safe and in the public interest. However, I oppose research when there is no absolute guarantee of safety and when the risks outweigh the potential benefits. (Spotts, 2022) I provide direct links to all the comments submitted in the final round of public comments with a brief summary of the level of concern for each one here: Most public comments share Sagan's priority that NASA can't take a risk of large-scale harm NASA's response to Spotts was: "Refer to the previous response for HS-002" (NASA, 2023 : B-5) HS-002 is their answer to another similar question: Granger:Are you certain that in any way, this mission won’t end with the total annihilation of the entire planet, or force us to live in biomes for the rest of time? NASA: As discussed in Section 3.2 of the PEIS, the exact nature of the Mars sample constituents regarding biosignatures and potential biological activity is currently unknown. The PEIS cites several sources supporting the position that contamination of Earth by Martian microorganisms is extremely unlikely to pose a risk of significant harmful effects. However, the risk cannot be demonstrated to be zero (see Response ID HS-001 for information regarding containment measures). As a result, a comprehensive quantitative analysis of the potential impacts of a sample release in the event of an off-nominal landing and the effects of Mars samples on Earth’s environment cannot be accomplished with current data; any such analysis would be theoretical at best, involving substantial speculation and supposition. For this reason, the emphasis of the MSR approach is on sample containment (NASA, 2023 : B-43) So even in response to a concern by a member of the public who asked NASA if it is possible that one consequence would be that we have to live in biomes on Earth for the rest of time or total annihilation of the planet (presumably meaning extinction of all terrestrial life) NASA were not able to rule this out as a possible consequence of their mission. Instead NASA responds by saying that the emphasis is on sample containment, since they can't predict consequences if the samples are not contained. As we saw at the start expert opinion is that the risk of such scenarios is very low, and the analogy of a house fire and a smoke detector fits them well. But we take great care to protect our houses from the very low risk scenario of a house fire. Smoke detector analogy for the low risk of large-scale harm to human health and Earth's biosphere Later in this paper we look at a couple of examples of a likely very low risk but of unprecedented harm. The mirror life scenario in worst case where we can't engineer microbes to stop it could be incompatible with our ecosystems and take over the soils, and then we'd need to maintain the terrestrial ecosystems in biomes and keep out mirror life. It wouldn't happen instantly but as it radiates and spreads through the ecosystems we'd then need to work to rescue them and the only solution might be large dome-like biomes covering them and barriers in the soil and then measures within to sterilize them of mirror life and to keep it out. Detailed scenarios of mirror life and a novel fungal genus to motivate biosafety planning This doesn't fit their conclusion in the PEIS itself that any environmental effects would not be significant. A non zero risk of large-scale harm to Earth's biosphere that could lead to humans having to live in biomes for the rest of time is NOT identical to NASA's conclusion in the PEIS of no risk of global harm. Chester Everline, the expert on probabilistic risk assurance who commented on the last day of public comments put it like this: Given our lack of scientific insight into possible life on Mars, relics of life we may return from Mars, or simply organic substances from Mars that could interact with certain life forms on Earth, how can we possibly assert with confidence that MSR poses an acceptable risk to Earth's biosphere, even if the incredibly difficult target of a 99.9999% target for successful containment is satisfied? Given that sample return missions of the type proposed for MSR have never been attempted before, is it even feasible to do enough testing to assure that a 99.9999% target can be achieved? (NASA, 2023 : B38) NASA's response: Please see Response IDs HS-001 and HS-002 regarding risks to Earth’s biosphere and NASA’s approach to addressing that. With regards to the assurance case (HS- 017), no outcome in science and engineering processes can be predicted with 100% certainty. NASA’s extensive testing activities serve to support the assurance case (NASA, 2023 : B38) NASA's statement there "no outcome in science and engineering processes can be predicted with 100% certainty" is not valid. It is frequently the case that we can predict outcomes in science and in engineering with 100% certainty. In this case, for instance, we can predict with 100% certainty that if NASA doesn't return these samples, there is no risk to Earth;s biosphere or inhabitants from the samples that Perseverance is currently caching on Mars. We can also achieve the very high level of "no appreciable risk" or essentially 100% safety by sterilizing all samples returned to Earth with a sufficiently high level of ionizing radiation. We are not required to take ANY risks with Earth's biosphere. Whether to take such a risk is an ethical decision and not a decision that can be mandated by scientists or engineers. Chester Everline continues: Does NASA intend to impose a threshold for acceptable risk (i.e., a value above which the mission is considered too risky to proceed)? A possible consequence of unsuccessful containment is an ecological catastrophe. Although such an occurrence is unlikely, NASA should at least be clear regarding what level of risk it is willing to assume (for the biosphere of the entire planet)

      I think there is no mention of experts already having problems with the way NASA are dealing with this. If there is ignore this comment. But I feel that a mention of this should be high above and then saying to see down here for more info on this.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes that simultaneous inhibition of LOXL2 and BRD4 reduces proliferation of TNBC in vitro and reduces growth in vivo.

      This observation is followed by extensive mechanistic studies that suggest physical interaction between LOXL2 and short isoform of BRD4-MED1. Inferences from Chip-seq analyses suggest that this interaction is involved in regulation of multiple transcriptional programs. Authors focus on differential activation of DREAM complex, to claim that this interaction "is fundamental for proliferation of TNBC". The manuscript is very well written and mechanistic inferences are based on a set of sophisticated epigenetic analyses and bioinformatical inferences. The phenotypic effects from LoxL2 inhibition by itself, or in combination with BRD4 inhibition are relatively modest. These modest effects, as well as many of the reported changes in gene expression are clearly inconsistent with the frequently used adjectives as "dramatic", "fundamental", "deeply affected", "drastically hampered" etc. Given the modest phenotypic effects, many of the key claims and conclusions are not supported by the data.

      We thank the reviewer for appreciating our work, defining the manuscript as well-written, and saying that it comprises extensive mechanistic studies as well as sophisticated epigenetic analysis.

      We apologize if some of our statements seemed exaggerated. In this revised version, we revisited some of our conclusion to moderate them.

      Moreover, we took the reviewer's criticism as an opportunity to strengthen our findings. In the revised version of the manuscript, we included an additional TNBC PDX (PDX-127), and results from this experiment clearly reinforce our claims (Fig. 6D and Fig. EV9E-F). In this new in vivo experiment, we selected a PDX model in which the expression of BRD4L is not detectable, while BRD4S is clearly expressed. Therefore, the treatment with JQ1 would specifically affect the activity of BRD4S, making the treatment selective. Additionally, we reduced by half the dose of JQ1 administrated to limit the effect of BRD4S inhibition alone on tumor growth. The combinatorial treatment (JQ1+PXS) induced a clear superior effect in this setting as compared with single-agent treatments. In addition to this, we discarded that the observed growth reduction is not the result of the sole inhibition of LOXL2, which could affect FAK/Src activity or extracellular Collagen crosslinking. In conclusion, our data show that the combinatorial inhibition of LOXL2 and BRD4S is effective in reducing tumor proliferation in TNBC in vivo models, independently of the inhibition of BRD4S and of other pathways known to be regulated by LOXL2.

      Specifically:

      1) It is unclear why authors generalize their conclusions to TNBC. Figure 1B demonstrates synergy for 1/3 cell lines, which is chosen for the follow up study. Even for MDA231, the synergy is confined to low concentrations of BRD4i (S1c). While MDA231 cell line is frequently used in experimental studies of TNBC, it is quite dissimilar to majority of clinical TNBC, and contains mutant RAS, which is rare in this disease.

      The synergistic effect is observed in MDA-MB-231 cells because only this cell line expresses both BRD4S and LOXL2. Indeed, in Fig. 1C we show that MDA-MB-468 cells do not express LOXL2, while BT549 only express minimal BRD4 levels.

      To corroborate this hypothesis, in the revised version of the manuscript we added:

      1. A new cell line (Cal51) expressing the same LOXL2 and BRD4 levels (Fig. EV8C) but showing greater resistance to JQ1 than MDA-MB-231 (Fig. EV8D). Also, in this cell line, we could show that the combinatorial treatment had a superior effect on cell viability than the single agents’ treatment (Fig. EV8E).
      2. A western blot panel of different TNBC PDXs shows that the majority of them express medium to high levels of both BRD4S and LOXL2 proteins, as is the case of MDA-MB-231 (Fig. EV9E) and Cal51 (Fig. EV8C). This result suggests that the combinatorial treatment could be used in the majority of TNBC patients as they are expected to express both BRD4S and LOXL2.
      3. Finally, as explained above, we performed another in vivo choosing a PDX that expresses BRD4S (but not BRD4L) and LOXL2 (PDX-127) (Fig. 6D and Fig. EV9E-F). Also, in this new model, we could observe that the combinatorial inhibition had a superior effect than single treatments.

        2) In vivo, the effect appears to be modest even in the MDA231 model, selected for evidence of synergy in vitro. In vivo, the combination appears to have an additive effect. Tumor growth rates are reduced, but no shrinkage is occurring. In the PDX model, LOXL2i does not have an effect as a monotherapy, while modestly enhancing the impact of BRD4i. These results are at odds with the claim of the interaction being fundamental for proliferation.

      We agree with the reviewer that the combinatorial inhibition appears to have an additive effect in vivo using the MDA-MB-231 model.

      1. For that reason, we have now performed the in vivo PDX experiment mentioned above (PDX-127; Fig. 6D and Fig. EV9E-F) in which we decreased the dose of JQ1 by half to avoid strong tumor growth effect due to BRD4 inhibition alone. In this new experiment, the synergistic effect is evident. While single-agent treatment showed a very moderate effect (0% or 20% tumor growth reduction for LOXL2 and JQ1, respectively), the combinatorial treatment showed a 50% reduction in tumor volume, further supporting our conclusions.
      2. We also performed either BRD4 or MED1 pull-down experiments in the presence of PXS and JQ1. We show that upon PXS treatment, the interaction between LOXL2 and BRD4S is maintained while the interaction with MED1 is reduced (Fig. 5A-C). However, in the presence of JQ1, the interaction between LOXL2 and MED1 is maintained while BRD4S-LOXL2 and BRD4S-MED1 interactions are impaired (Fig. 5D-F). These new results explain why monotherapy does not have a sufficient effect in vivo and set the rationale for the use of the combinatorial treatment. We believe that these new results corroborate our initial findings and we hope to have been able to satisfy the reviewer comments.

      3) No analysis of cell proliferation was shown in vivo. Authors should have performed BrdU or KI67 staining to support the claim. For in vitro analyses, authors also used indirect assays for proliferation. PI staining by itself does not have sufficient resolution to clearly capture modest effects that authors demonstrate. BrdU-PI double staining would have been much more useful.

      We appreciate the reviewer’s comment. In the revised manuscript we have added Ki67 and H3S10p staining in the tumor samples for the new in vivo PDX experiment (Fig. 6E and Fig. EV10A-C). We show that the combinatorial treatment significantly induces a reduction of both proliferation markers, which is in agreement with a reduced tumor volume. Regarding the in vitro analysis, we did not only use PI staining to show a reduced proliferation state but also H3S10p staining (Fig. 4B) and an SLBP1 fluorescent reporter MDA-MB-231 cell line (Fig. 4D, Fig. EV6B, E, and Movie EV). In the revised version of the manuscript, we included a new FACS-PI analysis (Fig. 4A, C) to better represent the effects we see on the cell cycle.

      Minor points:

      Dose dependent decrease in phosphorylated H3 is not at all obvious from eyeballing the data in S1A; the only effect that I see is a modest reduction at the highest concentration of the inhibitor. Authors need to quantify the results to support the claim.

      We agree with the reviewer and we apologize for the misinterpretation. We have changed the revised manuscript as follows: “The selective LOXL2 inhibitor PXS-538224 (hereafter, PXS) efficiently reduced the levels of oxidized histone H3 (H3K4ox) in MDA-MB-231 cells at 40 μM (Fig. EV6C), indicating an efficient inhibition of LOXL2 catalytic activity in the nucleus.”

      Most of breast cancer cell lines are derived from metastatic disease, including pleural effusion, thus the point that because MDA231 cell line is derived from pleural effusion, it is metastatic does not have sufficient logical foundation.

      Many publications have shown the high metastatic capacity of MDA-MB-231 (e.g. https://doi.org/10.1016/j.bbabio.2011.04.015, doi: 10.1038/s41467-017-01829-1), which are therefore used as TNBC metastatic model. The scope of the analysis reported in Fig. 6C was just to show whether any of the used treatments could reduce the metastatic capacity of this cell line. We believe we do not overstate the results but just report them as they are.

      How is loss of cell-cell junction in vitro consistent with LOXL2 role in modulating ECM? There is no evidence of ECM production in MDA231 in vitro. On the other hand, this loss is associated with EMT.

      We thank the reviewer for identifying this mistake. In the revised manuscript we changed the text as follows: “Gene set enrichment analysis (GSEA) revealed that LOXL2 KD induced upregulation of processes involved in cell morphology, secretion, membrane trafficking, and cell differentiation, with cell-cell junction being one of the most significantly affected pathways (Fig. EV5E). These results agree with the role of LOXL2 in regulating epithelial-to-mesenchymal transition, corroborating the high quality of our dataset.”

      Reviewer #1 (Significance (Required)):

      Discovery and characterization of LOXL2-BRD4 interaction is advancing the ever-deepening understanding of molecular mechanisms of regulation of gene expression. The studies and analyses appear to be sufficiently rigorous and reported with clarity, and the claimed discovery of the biological interaction between LOXL2 and BRD4 is well supported. However, given the magnitude of the reported (rather than claimed) effects of this interaction, and concerns about generalizability of authors conclusions, it is not clear how these results are promising for the development of new therapies in TNBC. Moreover, in contrast to luminal BC, there is no clear evidence for utility of cytostatic drugs in constraining TNBC. Therefore, biological and clinical significance of the authors discovery is unclear and claims in this regard appear to be overblown

      We thank the reviewer for stating that our analysis is rigorous and reported with clarity. We really took the criticisms as an opportunity to strengthen our findings, as explained above.

      For the newly presented in vivo PDX model, we performed immunohistochemistry of Ki67, H3S10p and Cleaved Caspase 3 to check whether the reduction of tumor volume observed in the combinatorial treatment was a result of a cytotoxic and/or a cytostatic effect (Fig. 6E and Fig. EV10A-C). As shown in the figure, the combination of the two inhibitors induced a superior decrease of Ki67, H3S10p, and a clear increase of Cleaved Caspase 3. Therefore, these new data indicate that the combinatorial treatment does not only have a cytostatic effect but also cytotoxic, suggesting a clinical exploitability for the treatment of TNBC patients.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their study, Pascual-Reguant et al. show that combined inhibition of BRD4 and LOXL2 can synergize to restrict triple-negative breast cancer (TNBC) proliferation. BRD4 and LOXL2 are transcription regulators that can read and write epigenetic information, respectively. The authors employ three distinct breast cancer cell lines and mouse models with cell line-derived xenografts, and they show that combined inhibition of BRD4 and LOXL2 can be superior to single BRD4/LOXL2 inhibition in these model systems. In an attempt to identify a connection between BRD4 and LOXL2, the authors find that the two proteins can bind to each other. The authors performed most of the experiments in the breast cancer cell line MDA-MB-231. To assess the impact of LOXL2-inhibition on transcription, the authors assessed changes of the transcriptome in MDA-MB-231 cells following LOXL2 knockdown. They found that genes related to cell differentiation and morphology were upregulated, while genes related to the cell cycle were downregulated. ChIP-seq data of BRD4 showed that BRD4 can bind to cell cycle gene promoters and that this binding was enhanced upon loss of LOXL2. The authors found that LOXL2 and BRD4 interacted with the transcriptional cell cycle regulators B-MYB, FOXM1, and LIN9, which are components of the MYB-MuvB-FOXM1 (MMB-FOXM1) complex that is known to promote the expression of late cell cycle genes with important functions during mitosis. The authors conclude that LOXL2/BRD4 interact with each other and with the MMB-FOXM1 complex to drive the expression of cell cycle genes and cell proliferations. Vice versa, they conclude that inhibition of LOXL2/BRD4 reduces cell proliferation through inhibiting the expression of cell cycle genes.

      Major:

      • The data and methods are presented well. The experiments are adequately replicated and analyzed. However, except for the first section, all experiments were performed using only one cell line. It is important to validate key findings in at least a second cell line.

      We thank the reviewer for valuing our work.

      To address the reviewer’s comment, in the revised manuscript we added an additional cell line (Cal-51), that expresses similar levels of LOXL2 and BRD4 as compared to MDA-MB-231 (Fig. EV8C). Even though this cell line is clearly more resistant to JQ1 than the MDA-MB-231 cell line (Fig. EV8D), the combinatorial treatment is significantly more effective as compared with single agents’ treatment (Fig. EV8E).

      Moreover, we have also performed an additional in vivo experiment using another TNBC PDX (PDX-127) that expresses LOXL2 and BRD4S, but not BRD4L. Given that JQ1 can inhibit both BRD4 isoforms, this in vivo system allowed us to demonstrate that the tumor antiproliferative capacity of the combinatorial treatment is due to the simultaneous inhibition of LOXL2 and BRD4S (rather than BRD4S and L) (Fig. 6D and Fig. EV9E-F).

      • There appears to be a misunderstanding of the concept of cell cycle-dependent gene regulation by the DREAM complex and its related factors. Early (G1/S) cell cycle genes contain E2F promoter motifs, while late (G2/M) cell cycle genes contain CHR promoter motifs. The DREAM complex can bind both, while RB-E2F and MuvB recognize only E2F and CHR motifs, respectively. B-MYB and FOXM1 bind to MuvB and regulate late cell cycle genes, but they do not bind to early cell cycle genes. Given this concept, the authors' rationale to connect BRD4/LOXL2 through MuvB/B-MYB/FOXM1 with E2F promoter sequences and early cell cycle genes and the subsequent conclusions must be corrected.

      We thank the reviewer for their expert explanation. We corrected our conclusion in the revised version of the manuscript following the reviewer’s comment.

      • I felt that the suggested functional connection between LOXL2/BRD4 and DREAM is not strongly supported by the authors' data. Figure S6E: A similarity score of Fig. EV6E: We agree with the reviewer that a similarity score of Fig. 4E: We thank the reviewer for this comment. The performed pulldown showed that BRD4S, LOXL2, and MED1 interact with Lin9 and B-Myb, but not with FOXM1, thus FOXM1 itself is an internal negative control of the pulldown. Additionally, BRD4L does not show the same interaction pattern as BRD4S, LOXL2, and MED1, again acting as an internal negative control. We, therefore, believe that the pulldown is properly controlled and that the observed interaction is trustful. We furthermore agree with the reviewer that it would be interesting to characterize the interactions between the DREAM complex and BRD4S, LOXL2, and MED1. However, we believe that the dissection of these interactions at the mechanistic levels would require a deeper study, which can be a project in itself that we aim to explore in the future. For example, it would be interesting to investigate whether either the inhibition or the downregulation of LOXL2 and/or BRD4S specifically impairs the formation of the DREAM complex or the recruitment of specific DREAM complex subunits, as well as how these effects impair the DREAM complex chromatin binding. We are afraid that the suggested pulldowns would not be sufficient to answer these questions, which would require extensive cross-interaction studies in either BRD4/LOXL2 and BRD4+LOXL2 inhibition or downregulation followed by ChIP-seq and transcriptomics for all the conditions. We believe that the provided data, together with the functional characterization (both, in vitro and in vivo), of the phenotypes triggered by BRD4S and LOXL2 inhibition make a strong case for our manuscript and leave out of scope the suggested experiments. We hope the reviewer will understand our explanation and will appreciate that we are planning to pursue this further in the future.

      Fig. 3: We thank the reviewer for this important comment. The ChIP-seq technique very often does not provide exhaustive results due to sequencing depth limits and antibody performance. We believe that the fraction of DREAM target genes found in our dataset as bound by BRD4S is not exhaustive and that the analysis proposed by the reviewer would not lead to clear conclusive results. However, we understand the importance of verifying that DREAM target genes whose promoter is bound by BRD4 are indeed downregulated when LOXL2 is inhibited. To give an answer to this question, in the revised manuscript we added gene expression analysis of selected DREAM target genes upon treatment with JQ1, PXS their combination. We could successfully show that both JQ1 and PXS treatment impairs the transcription of the selected DREAM target genes, however, the combinatorial treatment almost shut down their expression, in agreement with our hypothesis (Fig. 5J).

      • The authors state that it is surprising to find that LOXL2 can promote target gene transcription because it is rather known as a transcriptional repressor. To this point, the authors should perform standard analyses using their RNA-seq and ChIP-seq data. Compare differential expression of genes that are bound by BRD4S/L/S+L and genes not bound by BRD4. Perform motif search and enrichment analyses for transcription factor and co-factor binding data (public ChIP-seq repositories). Such analyses may suggest what gene sets are up- and downregulated by LOXL2 through BRD4S/L and what other factors could be involved in LOXL2-dependent up- and downregulation of gene transcription.

      We thank the reviewer for this valuable comment that certainly provides the rationale for a follow-up project. However, we believe that the proposed study goes beyond the scope of our work at this moment.

      Minor:

      • I felt that background information on the BRD4 isoforms was missing. The short and long isoforms of BRD4 should be introduced briefly.

      We agree with the reviewer. In the revised manuscript, we addressed this by presenting BRD4 isoforms in the introduction part of the manuscript.

      • Given that BRD4 inhibition is known to activate p53 (e.g., PMID 23317504 and 33431824) and p21 (PMID 31265875), the authors should discuss the p53 status of their cell lines (largely mutant). In general, I felt that the authors could better cite and discuss the current literature on BRD4 and LOXL2.

      We appreciate the comment of the reviewer regarding p53. Given the fact that p53 is mutant in MDA-MB-231, we believe that the proliferation defect observed with the combinatorial treatment may be due to the activation of alternative cytostatic or cytotoxic signaling cascades, independently of P53 activation. We have now briefly mentioned this point in the manuscript discussion.

      • It was unclear to me why the authors did not actually test experimentally whether their predicted interaction models 2 or 4 are likely true (Figure 2E+G).

      We understand the reviewer’s comment. The fact that JQ1 treatment almost abrogates the interaction between LOXL2 and BRD4S strongly suggests that models 1 and 3 are likely wrong, therefore pointing towards models 2 and 4 as the correct ones. To test whether models 2 and 4 are indeed the correct models we are now performing extensive mutagenesis studies, which are producing preliminary results suggesting indeed that models 2 and 4 are correct. The reason why we did not include this study in the current manuscript, is that we started a parallel line of investigation aimed at identifying residues fundamental for the interaction that can be exploited in compound screening campaigns to identify molecules able to block the described interaction and thus cancer proliferation. Publishing these preliminary results at this stage could jeopardize the drug discovery campaign and we hope that the reviewer will understand our constraints.

      • The transcription of cell cycle genes depends on the cell cycle (i.e., reduced cell cycle entry correlates with reduced cell cycle gene expression). Given that the authors showed LOXL2 inhibition reduce MDA-MB-231 cell proliferation, they should note that reduced expression of cell cycle-related genes is expected upon LOXL2 knockdown.

      We understand the reviewer’s comment. We believe that we provide sufficient data supporting our hypothesis that LOXL2 controls the expression of cell cycle genes at the transcriptional level together with BRD4S. In addition, the sole inhibition of LOXL2 has practically no effect on tumor proliferation in vivo but largely enhances the antiproliferative effect of low-dose JQ1 (Fig. 6D). We hope these clarifications would satisfy the reviewer.

      • The authors specify in their discussion that their data show a function of LOXL2/BRD4 in the cell cycle interphase, while there were no experiments that support that specific conclusion. At least it is unclear to me why the authors rule out a function in mitosis?

      We thank the reviewer for this comment. We referred to interphase genes because these are the early cell cycle genes, while mitotic genes are the late ones. We do not discard a possible function for BRD4S and LOX2 regulating mitotic progression, however, we believe this would be a consequence of dysregulated G1-S-G2 gene expression, rather than a direct transcriptional effect. This conclusion derives from the fact that while we observe interactions between LOXL2, BRD4S, and MED1 with Lin9 and B-Myb, these are not fully conserved with FOXM1, which is typically required for the transcription of mitotic genes. To avoid confusion, we have now anyway removed the word “interphase” from the text.

      • I felt that the first part of the manuscript (combination of BRD4 and LOXL2 inhibitors in TNBC) was a bit uncoupled from the functional studies on LOXL2 and its connection to BRD4. The transition between these parts and the final discussion on why the joint control of cell cycle genes by LOXL2/BRD4 may be important for the synergistic effect of LOXL2/BRD4 inhibitors. To this point, the authors' model was not clear to me.

      We really appreciate the reviewer’s comment. To better connect the functional studies with the clinical significance of the proposed combinatorial treatment, we restructured the manuscript. In the revised version, the use of the combinatorial treatment is shown in Figure 6. Moreover, to better explain why we focused all the studies on BRD4 and LOXL2, we also included data from the Cancer Cell Line Encyclopedia (CCLE)-associated chemotherapeutics sensitivity (Fig. 1A and Fig. EV1) showing that LOXL2 expression levels can predict the response to BRD4 inhibition, suggesting a functional interaction between BRD4 and LOXL2 and the possibility to exploit it for therapeutical purposes. We believe that these data set the rationale to further explore the connection between LOXL2 and BRD4, both at the mechanistic and functional levels.

      Reviewer #2 (Significance (Required)):

      The study by Pascual-Reguant et al. shows that inhibitors of BRD4 and LOXL2 can be combined to achieve better efficacy in reducing proliferation of breast cancer cell lines and breast tumor growth in xenograft models. They provide strong evidence for a functional interaction between LOXL2 and BRD4 and investigate their common transcriptional targets. Intriguingly, some evidence points towards a direct regulation of the DREAM complex and its cell cycle gene targets.

      The findings are novel and can be the basis for further research on TNBC combination therapy using BRD4 and LOXL2 inhibitors. The link to the DREAM complex is preliminary.

      The study is of interest for a basic research audience with some translational aspects.

      I reviewed this manuscript as a researcher in gene regulatory mechanisms, with cell cycle genes as one focus area. I have no expertise in the computational modeling of protein-protein interactions and I am no expert for breast cancer.

      We thank the reviewer for the positive comments. We also would really like to thank the reviewer for their criticism, which, we believe, contributed to a new and improved manuscript version.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript, Laura Pascual-Reguant et al. identified a novel role of the LOXL2 oxidase in sustaining cell cycle progression through a so far uncharacterized gene-activating function is mediated by the BRD4S epigenetic reader and exerted on key DREAM-target genes in TNBC. Moreover, the authors showed that combinatorial treatment of TNBC with LOXL2- and BRD4-specific inhibitors result in a tremendous anti-tumorigenic effect. For all findings, they leveraged in vitro and in vivo settings as well as high-throughput sequencing approaches. However, the following points should be addressed and explained.

      Major points:

      -The authors on their working hypothesis propose that dual inhibition of BRD4 and LOXL2 is a novel strategy for curing TNBC. For my taste, just because both targets are quite promising for TNBC, the jump to this combinatorial treatment is kind of abrupt. Knowing the difficulty and time-/financial- investment, authors could optionally perform a mass spectrometry analysis on nuclei lysates with LOXL2 pull down to identify physical interactors. Due to the augmented resources and analysis of raw data, authors may necessitate a generous revision period (approx. 4 months for starters). By that, this can provide a more unbiased approached to look at nucleus-specific gene-regulatory functions and particularly at epigenetic readers. It would be also interesting to see if LOXL2 interacts with other members of the BRD family. Selecting BRD4 and no other members of the bromodomain family cannot be the only choice given that other BRD members can also interact with several of these mediator subunits.

      We thank the reviewer for the suggestion and we agree with the fact that the rationale for combining BRD4 and LOXL2 inhibitors was not sufficiently argued in the first version of the manuscript. For that reason, in the revised manuscript, we added new data to explain why we explored this topic. In particular, to better explain why we focused all the studies on BRD4 and LOXL2, we included data from the Cancer Cell Line Encyclopedia (CCLE)-associated chemotherapeutics sensitivity (Fig. 1A and Fig. EV1) showing that LOXL2 expression levels can predict the response to BRD4 inhibition (but not to other approved chemotherapeutic drug), suggesting a functional interaction between BRD4 and LOXL2 and the possibility to exploit it for therapeutical purposes. Moreover, we restructured the manuscript to make the story more linear, explaining first the functionality of BRD4S-LOXL2 interaction at the molecular and cellular levels, and then presenting the in vivo systems in the last part of the manuscript.

      We agree with the reviewer that it may be interesting to explore whether LOXL2 interacts with other BRD family members. However, given the prominent role of BRD4 in promoting cancer proliferation, we believe that understanding the relevance of BRD4S-LOXL2 interaction in TNBC is, per se, of great interest and provide a novel mechanistic understanding of how TNBC proliferation is controlled at the transcription level. In the specific case of TNBC, it has been shown that BRD4S has an oncogenic effect, while BRD4L is an oncosuppressor. In the manuscript, we now showed that LOXL2 downregulation sensitizes cells to JQ1 treatment (Fig. 1D). Additionally, while the downregulation of BRD4L does not have any additional effect on cell treated with PXS, the downregulation of BRD4S sensitize them to LOXL2 inhibition (Fig. EV8B). These results, once again, indicate the relevance of studying the functional interaction between BRD4S and LOXL2.

      -LOXL enzymes have been shown to promote collagen and fibronectin assembly, thereby sustaining the pro-survival effect of the ITG5A/FN1/FAK/SRC signaling cascade and shielding TNBC cells against chemotherapy treatment (32415208). Did authors observe if LOXL2 loss or inhibition decreased the active status of FAK and SRC, which are well known to promote G1-S transition (25381661)?

      Probably the cell cycle defects upon LOXL2 loss may also partially arise from the impairment of this cascade.

      We really appreciate the reviewer’s suggestions. In the revised version of the manuscript, we checked FAK and Src activation status in tumor samples from one of our in vivo experiments (Fig. EV10D). We did not observe any difference in phospho-FAK or phospho-Src upon treatment either with PXS, JQ1, or their combinations, suggesting that alterations in the activity of these factors were not driving the observed proliferation defects.

      -Authors exclusively use JQ1 as a BRD4 inhibitor. As JQ1 may have an unspecific effect on BRD2 as well, authors should consider reproducing key experiments with siControl- and siBRD4-treated cells and increasing doses of PSX as well as repeating the JQ1 dose response assay in Figure 1B using siRNA-mediated silencing of LOXL2. Given that both players are part of the same complex, silencing of one and inhibition of the other should sensitize cells compared to their control counterparts.

      We agree with the reviewer and we addressed this comment in the revised manuscript. In particular, we have added two additional experiments:

      • We transduced MDA-MB-231 cells with isoform-specific shBRD4s (shBRD4L and shBRD4S) (Fig. EV5H) and checked cell sensitivity to PXS treatment (Fig. EV8B). As explained also above, we observed that only when the short isoform of BRD4 was downregulated cells displayed higher sensitivity to PXS treatment. This result corroborates that BRD4S and LOXL2 are required for TNBC proliferation.

      • We transduced MDA-MB-231 cells with shLOXL2 and assessed JQ1 sensitivity (Fig. 1D). We showed that upon LOXL2 downregulation, cells became more sensitive to JQ1 treatment, again corroborating the fact that TNBC proliferation requires BRD4S and LOXL2.

      -Moreover, in Figures 1G and S3D the differential sensitivity of low and high LOXL2 cell lines is unclear. Do authors know if any of these growth kinetic lines represent one of the tested cell lines in Figure 1A-B? Authors should provide respective legends. In addition, authors should take advantage of their homemade data given that they have already selected a panel of TNBC cell lines with various LOXL2 expression at basal state (Figure 1A) for which dose response assays have been performed (Figure 1B). Therefore, I would perform an IC50 graph for JQ1 (without PSX treatment) using the existing data from Figure 1B.

      We apologize if our representation was confusing. In the revised manuscript we have changed the sensitivity plots (Fig. 1A and Fig. EV1) to make them easier to grasp. Additionally, in Figure 1A we included the analysis of CCLE cell lines stratified based on their LOXL2 expression levels. This analysis showed that LOXL2 expression levels could overall predict the response to BETi treatment. As suggested by the reviewer, we also plotted the IC50 of the 3 cell lines tested. However, their JQ1 sensitivity curves did not show any difference that could be attributed to their different LOXL2 levels. Our speculation is that only 3 cell lines do not provide a sufficient size to reach a meaningful conclusion, which, in contrast, can be achieved by comparing the CCLE BETi sensitivity.

      -In Figure 2D, the pull-down assay is inconclusive, as the molecular weight for each construct is not mentioned. I would probably add this information also in all performed western blots. Also, the overexpression of the BD1/BD2-mutated and especially the BD1/BD2-lacking construct is unclear if it still interacts with LOXL2, probably because of the lack of molecular weight reference of each band. Therefore, the authors should make this pull-down assay more descriptive regarding the size of the bands. Also, BD1 mutagenesis at N140 was shown to dislodge the binding of JQ1 to BRD4 (24497639), which implies that BD1 mutagenesis or overexpression of the BD1-deficient construct should abrogate the interaction of LOXL2 with BRD4, reminiscent to the abrogated interaction of BRD4/LOXL2 upon JQ1 that binds to both BDs (Figure 2F). And, what happens if a BD2-deficient construct is expressed?

      We thank the reviewer for spotting this distraction. We apologize for this and in the revised version of the manuscript we included molecular weights for all western blots.

      We acknowledge that BD1 mutagenesis displaces JQ1 binding, however, we respectfully disagree that because of this BD1-N140 mutant should not bind to LOXL2. Our docking analysis indeed showed that none of the poses is impaired either by BD1 or BD2 mutagenesis (Fig. EV4D). The fact that JQ1 disrupts the interaction between BRD4S and LOXL2 (Fig. 2F, G) is not due to the fact that they compete for the same binding residue, but rather for the space occupied by JQ1 inside the AcK binding pocket of either BD1 or BD2, which impedes proper binding to LOXL2. Our pulldown data indeed showed that mutant BD1 and BD2 retain the ability to bind to LOXL2 (Fig. 2C), as predicted by the docking.

      We did not try to express constructs either lacking BD1 or BD2 and we cannot speculate what could happen to the BRD4S-LOXL2 interaction in this scenario. Even though this experiment could help dissect the interaction between LOXL2 and BRD4S, we decided to rather perform mutagenesis of specific residues that have been predicted to be important for the interaction. The reason why we did not include this study in the current manuscript, is that we started a parallel line of investigation aimed at identifying residues fundamental for the interaction that can be exploited in compound screening campaigns to identify molecules able to block the described interaction and thus cancer proliferation. Publishing these preliminary results at this stage could jeopardize the drug discovery campaign and we hope that the reviewer will understand our constraints.

      -If authors support that BRD4S is the predominant isoform driving the expression of DREAM-targets, this means that DREAM-targets are mainly bound by BRD4S, relying on Figure 3E-F. However, based on the author's ChIPseq tracks in Figure 3H, DREAM targets such as EZH2 and HMGB2 are co-occupied by both BRD4 isoforms at the basal state on their promoter region. Also, especially for EZH2 and PLK4, authors should set to 'group auto-scale' both conditions in a smaller scale range for ChIPseq- and RNAseq tracks, although I do not these two genes as good candidates representing your analysis. Therefore, authors should initially show all genes (e.g in a table format) that enrich the 'DREAM-targets' signature and select for a greater panel of genes (like for AURKB and HMGB2) demonstrating a preferential occupancy of the BRD4S at their promoter region. Finally, authors are recommended to perform a ChIP-qPCR on these genomic regions at basal state (no LOXL2 silencing) to validate the predominant occupancy of BRD4S and the low/absent occupancy of BRD4L at these genomic sites.

      We apologize for the confusion. To make the figure more understandable, we now scaled all the panels to the same scale and highlighted in grey the promoter region of each selected DREAM target gene. As the reviewer can appreciate, none of these genes is bound by BRD4L in basal conditions (Fig. 3F).

      To better characterize the differential binding, following the reviewer’s suggestion, we performed ChIP-qPCR using Ab2 (which recognizes both BRD4 isoforms), in cells either downregulated for BRD4L or BRD4S with isoform-specific shRNAs (Fig. EV5H). Results showed that only the downregulation of BRD4S reduced the binding of Ab2 to the promoter of the selected DREAM target genes (Fig. 3D), corroborating our hypothesis and validating our ChIPseq strategy.

      -Authors in Figure 3G should select an equal-sized population of randomly chosen non-DREAM-target genes, otherwise, the comparison of log2FC difference between these two gene cohorts is unreliable and difficult to make. Mann-Whitney test should also be performed.

      We thank the reviewer for this suggestion, which was added to the revised version of the manuscript (Fig. 3E, lower panel).

      -Authors should repeat the cell cycle analysis (Figure 4A) as the number of cells subjected to flow cytometry is quite discrepant between the conditions. Also, it is not clear if the experiment was performed in at least biological triplicates (although in the respective legend, it is stated so). If performed in biological triplicates, authors should make a new graph where each cell cycle phase cell population differs between the two conditions. Moreover, the difference in cell cycle defects in LOXL2-inhibited cells (Figure 4C) is indifferent compared to their control counterpart. Therefore, authors should address these inconsistencies.

      We thank the reviewer for the suggestion. In the revised version of the manuscript, we represent the cell cycle also as a bar plot with statistical analysis (Fig. 4A, C). Even though the number of cells was the same across conditions, the sub-G1 population of the LOXL2 KD cells may have distorted the profile of the cell cycle. To avoid misinterpretations, we repeated the analysis in the revised version of the manuscript. Statistical analysis supports that LOXL2 inhibition or downregulation has a significant effect on cell cycle progression (Fig. 4A, C, right panel).

      -Furthermore, authors should explain what was the rational selecting a mediator subunit and specifically MED1 as a possible interacting partner of LOXL2 and BRD4s since MED12 and MED24 were also highly essential (Figure 4F).

      We selected MED1 as a Mediator Complex proxy. In our essentiality analysis MED 1, 9, 10, 12, 15, 16, 19, 23, 24, 25 score as significant, suggesting a functional interaction between LOXL2 and the Mediator Complex, rather than a specific subunit. MED1 has been previously described as a BRD4 partner and it is often used in immunofluorescence to visualize transcriptional foci, which made it the best candidate for follow-up study in our project.

      -Moreover, do authors also observe this functional relationship of LOXL2 and BRD4S in cell cycle progression in other breast cancer subtypes presenting a high proliferation index e.g HER2+?

      Presumably, the author's proposed mechanism applies to a wide panel of breast cancer entities, for which, only key experiments could be performed.

      We thank the reviewer for the suggestion. We hypothesized that other cancer types expressing LOXL2 and BRD4S could also benefit from the combinatorial treatment. Indeed, the CCLE drug sensitivity panel in Fig. 1A comprises cancer cell lines of different origins, not just TNBC, and corroborates that the relationship between LOXL2 expression levels and BRD4 sensitivity exist also beyond TNBC. Even though it is important to experimentally verify this hypothesis, we decided to pursue it in the future to broaden the applicability of the proposed strategy in preclinical settings.

      -Authors in Figure 5H represent LOXL2 and BRD4s as integral chromatin looping factors together with MED1 at promoter and enhancer regions. However, this illustration is an overrepresentation of their finding because authors did not address the differential occupancy of BRD4S upon LOXL2 loss in DREAM-target-specific enhancer regions. If they wish to do so, they may use the RANK ORDERING OF SUPER-ENHANCERS (ROSE) package to call for super-enhancer regions in the proximity of DREAM-targets and confirm similar results as for their TSS-proximal sites.

      We thank the reviewer for the useful suggestion. In the new version of the manuscript, we have simplified the representation, which now does not show super-enhancers. However, following the reviewer’s suggestion, we performed super enhancer analysis using ROSE. Results showed that BRD4S binds to super-enhancers more than BRD4L, including DREAM target gene super-enhancers. Additionally, while LOXL2 KD did not alter the binding of LOXL2 to DREAM target gene super-enhancers, it decreased the binding of BRD4S to them (Fig. EV7D, E). Overall, these data are in agreement with our hypothesis that BRD4S together with LOXL2 controls the expression of DREAM target genes.

      -In the current manuscript, authors did not address the translational relevance of their proposed mechanism in the context of conventional therapies. Knowing that several BRD-specific compounds currently undergo clinical trials, authors should address if LOXL2 low (MDAMB468) and high (BT549) cells demonstrate a differential sensitivity to increasing doses of chemotherapy, in the presence or absence of BRD4. By doing that, LOXL2 apart from being a therapeutic target could be also used as a prognostic marker to stratify patients and achieve better response to standard therapies.

      We really appreciate the reviewer’s suggestion and we think this is a fundamental point. In the new version of the manuscript, we have performed further analysis using a greater panel of chemotherapeutic agents from the CCLE sensitivity database. We now show that LOXL2 low-expressing cells show significantly more sensitivity to BETi treatments, but not to conventional chemotherapeutic agents (e.g. doxorubicin, Olaparib, 5-fluorouracil, paclitaxel, etc.) (Fig. 1A and Fig. EV1), which set the rationale to further explore the functional relationship between BRD4 and LOXL2.

      Minor points:

      -In Figure 1D, the authors should convert the y-axis to a logarithmic scale to better represent the differences between JQ1, PXS, and combo. Also, One-way Anova should be performed between JQ1, PXS and combo.

      We don’t understand the reviewer’s suggestion since Fig. 1D (Fig. 6B, right panel in the revised version) is a tumor picture for which the y-axis cannot be converted to a logarithmic scale.

      -In Figure S6F, authors did not show the sensitivity of LOXL2 low and high cell lines for BRD4 KO. If LOXL2-proficient cells are less sensitive to JQ1, based on Figure 1B, authors should consider showing something similar from the gene essentiality database.

      We agree with the reviewer and we apologize for this mistake. We have included the sensitivity of LOXL2 low and high cell lines for BRD4 KO and also for MYC KO (Fig. EV6G).

      -Authors failed to discuss the work from Ozge Saatci et al (PMID: 32415208) regarding LOXL2 in TNBC and ECM reorganization as well as in other cancer entities (PMID: 35428659) in the context of ECM remodeling. Authors should realize that these published works and the current ones are not conflicting but complement each other.

      We thank the reviewer for the suggestion. In the revised version of the manuscript, we discussed this work.

      Reviewer #3 (Significance (Required)):

      SIGNIFICANCE

      The conception and findings are of enlightening significance for TNBC therapy, especially given the lack of targeted therapies in this particularly aggressive breast cancer subtype. Hence, I posit this work as highly relevant for the cancer epigenetics research community interested in characterizing unknown factors that facilitate the gene-activating function of epigenetic readers in health and disease.

      My field of expertise is to uncover epigenetic vulnerabilities responsible for transcriptional plasticity driving drug tolerance in aggressive forms of breast cancer.

      We would like to take the opportunity to thank the reviewer for the relevant suggestions. We strongly believe the revised version of the manuscript has been substantially improved by addressing the comments the reviewer made.

    1. Author Response

      Many thanks for the detailed and sometimes sharp, yet appropriate criticism of our study. It was an incentive for us to carry out additional analyses and to devote more effort to an elaboration of concepts. The outcome is that the results have changed slightly and that we now give more space to a discussion of concepts. We first address here the points raised by more than one reviewer before responding to comments contributed by individual reviewers.

      The points raised can be divided into three thematic groups, 1) conceptual issues, 2) experimental and analytical questions, and 3) comments challenging the novelty of our results. On the first theme, we think it is essential to make a clear distinction between the conceptual and observational domains. As such, the criteria defining a “mirror neuron” and what is meant by the term "mirror mechanism" belong to the conceptual domain. This understanding of terms requires agreement among scientists, but is not experimentally testable. Unfortunately, there is no agreement on how to define a “mirror neuron” and what is meant by “mirror mechanism”. Thus, for the present work, the only option is to refer to specific definitions or to use our own, definitions which try to capture what others, and here most importantly Rizzolatti and colleagues, probably meant. We have adjusted the introduction in an attempt to convey our understanding and usage of the two terms in a hopefully comprehensible manner. Briefly, we use a definition for "mirror neuron" that we take from the first paragraph of the results section of Gallese et al. (Brain, 1996). We do not consider the "properties of mirror neurons" described in that paper as defining a mirror neuron (MN). Classifying neurons as MNs only on the basis of the presence of a modulation of discharge rate during an executed and an observed action compared with a baseline is a common practice also in other single neuron studies on MNs, consistent with this definition. Regarding "mirror mechanism", we refer to Rizzolatti and Sinigaglia (2016) and make a distinction between a broad and a strict definition. Given our finding that there are almost no F5 MNs whose activity during observation is a motor representation according to our strict definition of a mirror mechanism, and also given the problem that the term “mirror mechanism” itself is not uniformly understood, the question arises whether and how the term "mirror neuron" should be used in the future. The answer to this may vary and belongs to the conceptual domain. We briefly address this question at the end of the discussion of the revised manuscript.

      From that understanding of terms, conceptual hypotheses are to be distinguished, which of course must allow experimental predictions, i.e., must be falsifiable. We now distinguish more clearly between a "representation hypothesis" and an "understanding hypothesis". Both hypotheses focus on F5 MNs and are based on the strictly defined mirror mechanism. We test the “representation hypothesis” in our study, and just because it is the basis for the “understanding hypothesis”, falsifying the “representation hypothesis” would allow us to conclude that the “understanding hypothesis” is not valid. In contrast, confirmation of the “representation hypothesis” would not, of course, allow us to conclude that the “understanding hypothesis” holds. That would really be circular reasoning (this conclusion was drawn by some and rightly criticized). However, support for the “representation hypothesis” would be the necessary prerequisite for the “understanding hypothesis” to be true. These two hypotheses take up the original argument that a certain understanding of observed actions could follow from an equality of action-specific F5 MN activity during execution and observation. Because we considered the data on equality of action- specific F5 MN activity to be insufficient, we designed this study. Since our result largely argues against the "representation hypothesis" and thus against the "understanding hypothesis," we now discuss alternative concepts for the function of F5 MNs in more detail. It should be noted here that our fourth concept ("goal-pursuit-by-actor") could well represent the observed action without contradiction to our broad definition of a mirror mechanism, which in principle could also serve a subjective experience (which could be conceived as a kind of understanding). The way we structure the concepts in the discussion of this revised manuscript is, in our opinion, a useful overview of the concepts. The third concept is new in this context. We would like to emphasize that we focus on F5 MNs and intentionally avoid a discussion of mirror neurons beyond F5 in this paper. With the data from this study, we cannot say anything about MNs outside of F5.

      Regarding the key question of how the "understanding hypothesis" is testable, or whether it may not be testable at all, we agree, of course, that for the conclusion of whether F5 MNs contribute to perception, only a manipulation of F5 MNs can clarify it. We now say that explicitly in the introduction. We agree with reviewer #2 that "understanding" here is not limited to "action recognition" or "action categorization”, which in principle could be implemented by purely sensory processing. Therefore, we also do not believe that the approach proposed by reviewer #3, which builds on the distinction of actions, would allow for a critical examination of the "understanding hypothesis”. But we disagree that the "understanding hypothesis" is not testable at all. Operationalization is necessary. If we accept that we can measure certain visual or auditory perceptions of an animal by operationalization (e.g., the subjective visual vertical, see for example Khazali et al., PNAS, 2020), then we must also accept that we can, in principle, measure other subjective experiences by operationalization, such as pain or aiming at a goal or even the co- experience of pain. An example of how to approach this is the study by Carrillo et al. (Curr Biol, 2019), which reviewer #2 and colleagues discussed in a recent review article (Bonini et al., TCS, 2022).

      With regard to the second theme, experimental and analytical questions, we noticed while reading the comments that in our first version we did not distinguish clearly enough between statements about single neurons and statements about populations of neurons. Therefore, we now clearly separate single neuron analysis and population code analysis in the structure of the article. In view of the fact that statements about mirror neurons in the literature mostly refer to single neurons, we added extensive single neuron analyses, so that only now statistically reliable statements about single neurons are possible. This has led to the realization that the number of neurons with exclusively shared code is so small that these neurons should be considered a rare exception. Given the small number of time periods with shared code, we additionally tested against a hypothesis already rightly proposed as an alternative explanation by G. Csibra in 2005 (Mirror neurons and action observation: Is simulation involved? In: What do mirror neurons mean? Interdisciplines Web Forum 2005). We were able to reject this hypothesis based on two of three methods for testing for a shared code. This is the second piece of evidence besides the clustering of time periods with shared code already described in the first version that time periods with shared code cannot be considered random.

      We discuss in more detail the question of whether neurons that exhibit a shared code at least at times support the representation hypothesis. To this end, we additionally examined whether certain action segments are more frequently represented with a shared than with a non-shared code, whether neurons with shared code differ from those with non-shared code in anatomical location, and whether an accuracy can be achieved with a time bin-wise selection of neurons with shared code by population cross-task classifiers as with within-task classifiers in the whole population.

      Another issue was how to test for shared code and how to decide if a code has enough sharing. To answer the question, the exact hypothesis we intended to test here is crucial. The representation hypothesis states that the representation of the observed actions in F5 MNs corresponds to the representation as it occurs during the execution of the same actions. Therefore, the relationship between discharge rate and actions that holds during execution should also hold during observation, which is measurable with a classifier trained on execution trials and tested on observation trials. Moreover, the actions should not be more distinguishable during observation with a classifier other than the execution-trained classifier, because if that were so, it would mean that the representation of observed actions is different from that of executed actions. The detection of a cluster of time bins for which both conditions are satisfied confirms that it is possible to discover in this way the shared codes postulated by the representation hypothesis.

      With respect to concerns that the monkey may not have used the cue at all when the action was executed, we added a comparison with control trials with a non-informative cue and also compared the duration of the approach phase between the three actions. Regarding oculomotor behavior, we verified that the monkey had actually directed his gaze toward the action during action observation for all three actions.

      On the third issue, concerning the novelty of our results, we have now explained in more detail in the introduction why we felt it necessary to conduct a study we considered fundamental. As a result of our study, it can be clearly stated now that representations of observed actions as predicted by the strictly defined mirror mechanism are rare in F5 MNs, but nevertheless cannot be dismissed as random. This dispels the objection rightly raised by Csibra in 2005 and contradicts the currently prevailing view that such a representation can only be found at a population level. Even if these representations are ultimately explained by a concept other than the strictly defined mirror mechanism, their existence must be accounted for by any theory of the function of F5 neurons. Moreover, it is also shown that the observed actions are well discriminated with a non- shared code, at times even optimally. This contradicts the notion – which has been widespread for a long time since the work of Gallese et al. (Brain, 1996) – that mapping to motor representations in terms of broad congruence is simply not perfect. The applied cross-task decoding approach seems promising to test also in the future for a shared action code. Finally, reconsideration of alternative concepts has led us to highlight the possibility of a representation of a goal pursuit by the observer.

      Reviewer #1 (Public Review):

      The authors set out to investigate the hypothesis that mirror neurons in ventral premotor area F5 code actions in a common motor representation framework. To achieve this, they trained a linear discriminant classifier on the neural discharge of three types of action trials and test whether the thus trained classifier could decode the same categories of actions when observed. They showed that codes were fully matched for a small subset of neurons during the action epoch, while a wider set of "mirror neurons" showed only poorly matched codes for different epochs.

      This is one of the descriptions of our results, where we realized that in our first version we did not distinguish clearly enough between statements about single neurons and statements about populations of neurons. This prompted us to perform a detailed single neuron analysis.

      The authors controlled for potential visual object confounds by having identical objects be manipulated in three different ways and by having the animal carry out the motor execution in the dark. The main strength of the study lies in the clever decoding approach testing the matched tuning to behavioural categories in a model-free way. The central result is in the identification of the small sub-group of mirror neurons that show true matching during the execution epoch, which can dissociate the three types of action almost perfectly. This aligns well with some previous work while offering a novel avenue to identify and investigate those neurons. The underlying neuronal mechanism and behavioural relevance of these neurons remain an open question. It would have been interesting to understand better whether the specific motor representations at a recording site, for instance identified through microstimulation prior to recording (see Methods), the reaction times on individual trials or the specific gaze targets (object/hand) had a bearing on the decoding performance for a neuron/trial.

      We agree that these are interesting questions.

      In this study, the focus is on testing for a shared code according to a strictly defined mirror mechanism. We have now compared the anatomical locations of neurons with only time bins in which observed actions were discriminated with a shared code (according to one of the methods) to the locations of neurons with only time bins with non-shared code (see last paragraph in Results). We did not find any relevant difference and this is why one cannot expect topographically specific effects of microstimulation.

      We do not expect the reaction time (i.e., the time interval between LED onset and start button release, or the duration of the approach epoch) during execution or observation to have any effect on our results on shared coding as the analysis was based on relative time bins. The observed actions were predominantly distinguished late in the approach epoch, but especially in the manipulation epoch. At this time, reaction time is not expected to have a relevant influence.

      The relationship between gaze/eye position and the activity of mirror neurons, during execution or observation, is an interesting topic in itself. However, for testing for a shared code according to a strictly defined mirror mechanism, it is only relevant that the observing monkey actually observes the action. We have ensured this in our experiment by a fixation window and have now also confirmed that the monkey actually looked into the area of the object during all three actions (see Results, lines 209-219 in the manuscript with tracked changes).

      Ultimately, the uncovered matched mirror representations should in future experiments be tested with causal interventions and linked trial-by-trial to action selection performance.

      The authors put the focus of their discussion on the wider, less well-matched neuronal pool to support an action selection framework, which is of course a valid view and well established in motor representations. From a sensory perspective, sparse coding, as suggested by the small group of "true" mirror neurons identified with the decoding approach, should also be considered as the basis for a possible neuronal mechanism. A particular strength of the paper is that it could give new data and impetus to the important discussion about how motor and sensory coding frameworks come together in cortical processing.

      We have expanded the discussion considerably and also address the possibility of sparse coding.  

      Reviewer #2 (Public Review):

      The paper by Pomper and coworkers is an elegant neurophysiological study, generally sound from a methodological point of view, which presents extremely relevant data of considerable interest for a broad audience of neuroscientists. Indeed, they shed new light on the mirror mechanism in the primate brain, trying to approach its study with a novel paradigm that successfully controls for some important factors that are known to impact mirror neuron response, particularly the target object. In this work, a rotating device is used to present the very same object to the monkey or the experimenter, in different trials, and neurons are recorded while the monkey (motor response) or the experimenter (visual response) performed a different action (twist, shift, lift) cued by a colored LED.

      The results show that there is a small set of neurons with congruent visual and motor selectivity for the observed actions, in line with classical mirror neuron studies, whereas many more cells showed temporally unstable matched or even completely non-matched tuning for the observed and executed actions. Importantly, the population codes allow to accurately decode both executed and observed actions and, to some extent, even to cross-decode observed actions based on the coding principles of the executed ones.

      In my view, however, the original hypothesis that an observer understands the actions of others by the activation of his/her motor representations of the observed actions constitutes circular reasoning that cannot be challenged or falsified, as the author may want to claim. Indeed, 1) there is no causal evidence in the paper favoring or ruling out this hypothesis (and there couldn't be), 2) there is no independent definition (neither in this paper nor in the literature) of what "action understanding" should mean (or how it should be measured). Instead, the findings provide important and compelling evidence to the recently proposed hypothesis that observed actions are remapped onto (rather than matched with) motor substrates, and this recruitment may primarily serve, as coherently hypothesized by the authors, to select behavioral responses to others (at least in monkeys).

      1) One of the main problems of this manuscript is, in my view, a theoretical one. The authors follow a misleading, though very influential, proposal, advanced since the discovery of mirror neurons: if there are (mirror) neurons in the brain of a subject with an action tuning that is matched between observation and execution contexts, then the subject "understands" the observed action. This is clearly circular reasoning because the "understanding" hypothesis uniquely derives from the neuron firing features, which are what the hypothesis should explain. In fact, there is no independent, operational definition of the term "understanding". Not surprisingly there is no causal evidence about the role of mirror neurons in the monkey, and the human studies that have claimed to provide causal evidence of "action understanding" ended up using, practically, operational definitions of "recognition", "match-to-sample", "categorization", etc. Thus, "action understanding" is a theoretical flaw, and there is no way "to challenge" a theoretical flaw with any methodologically sound experiment, especially when the flaw consists of circular reasoning. It cannot be falsified, by definition: it must simply be abandoned. On these bases, I strongly encourage the authors to rework the manuscript, from the title to the discussion, by removing any useless attempt to falsify or challenge a circular concept and, instead, constructively shed new light on how mirror neurons may work and which may be their functional role.

      Please see the response to all.

      2) An important point to be stressed, strictly related to the previous one, concerns the definition of "mirror neuron". I premise that I am perfectly fine with the definition used by the authors, which is in line with the very permissive one adopted in most studies of the last 20 years in this field. However, it does not at all fulfill the very restrictive original criteria of the study in which "action understanding" concept was proposed (see Gallese et al. 1996 Brain): no response to object, no response to pantomimed action or tool actions, activation during execution in the dark and during the observation of another's action.

      We do not agree that the enumerated "very restrictive original criteria" emerge from the Gallese et al. (Brain, 1996) study. Except for the first paragraph in the results section, there is no clear statement on how mirror neurons should be defined.

      If the idea (which I strongly disagree with) was to simply challenge a (very restrictive) definition of mirroring (a very out-of-date one, indeed, and different from the additional implication of "action understanding"), the original definition of this concept should be at least rigorously applied. In the absence of additional control conditions, only the example neuron in Figure 2A could be considered a mirror neuron according to Gallese et al. 1996.

      We have the impression that the question does not distinguish clearly enough between the definition of "mirror neuron" and the definition of "mirror mechanism". In defining "mirror mechanism", we refer to the work of Rizzolatti and Sinigaglia (Nat Rev Neurosci, 2016). We do not think that this definition is out-of-date (see for example the 2018 article by Rizzolatti and Rozzi in Handbook of Clinical Neurology). If the term "mirror mechanism" is to be defined differently, then another term should be used for a new definition or an annotation should be added (such as "version 2"). This would be necessary to avoid unnecessary confusion resulting from unclear terms.

      Permissive criteria implies that more "non-mirror" neurons are accepted as "mirror": simply because they are permissively named "mirror", does not imply they are mirroring anything as initially hypothesized

      Even for a neuron that would be classified as a "mirror neuron" according to your previously stated "very restrictive original criteria”, it does not follow that it "mirrors” according to a mirror mechanism. And, of course, it is quite possible that more neurons do not "mirror” according to a mirror mechanism if one tests more neurons.

      (Example neuron in Fig 2B, for example, could be related to mouth, rather than hand, movements, since it responds strongly and similarly around the reward delivery also during the observation task, when the monkey should be otherwise still).

      We agree, it is not excluded that this neuron has a relation to mouth movements. However, since the neuron meets the conditions to be classified as a "mirror neuron", an additional relation to mouth movements would not be relevant. If mouth movements are to be an exclusion criterion, then this would have to be included and justified in the definition of a "mirror neuron".

      Clearly, these concerns impact all the action preference analyses. To practically clarify what I mean, it should be sufficient to note that 74% (reported in this study) is the highest percentage ever reported so far in a study of neurons with "mirror" properties in F5 (see Kilner and Lemon 2013, Curr Biol) and it is similar to the 68% recently reported by these same authors (Pomper et al. 2020 J Neurophysiol) with very similar criteria. Clearly, there is a bias in the classification criteria relative to the original studies: again, no surprise if by rendering most of the recorded neurons "mirror by definition" then they don't "mirror" so much. I suggest keeping the authors' definition but removing the pervasive idea to challenge the (misleading) concept of understanding.

      We think that it is very important to clearly separate "mirror neuron" from "mirror mechanism". And the question arises whether one should not include a mirroring criterion, which is derived from a definition of a mirror mechanism, in the definition of mirror neurons. We address this briefly in the discussion. Ultimately, the point of our study is to find out how many of the - if you want to put it that way - "permissively defined" mirror neurons actually “mirror”. And the answer depends on how one defines “mirror mechanism”. We provide an answer by resorting to a “strictly defined mirror mechanism”. We have now also given throughout the results section the percentages of neurons with certain properties with respect to all measured F5 neurons. This is a reference that allows comparisons among studies, provided that no neurons were directly discarded during recording, which we avoided in our study.

      3) It would be useful to provide more information on the task. Panel B in Figure 1 is the unique information concerning the type of actions performed by the monkey and the experimenter. Although I am quite convinced of the generally low visuomotor congruence, there are no kinematics data nor any other evidence of the statement "the experimental monkey was asked to pay attention to the same actions carried out by a human actor". First, although the objects were the same, the same object cannot be grasped or manipulated in the same way by a human and a macaque, even just because of the considerable difference in the size of their hands; this certainly changes the way in which monkeys' and experimenter's hands interact with the same object, and this is a quantifiable (but not quantified) source of visuomotor difference between observed and executed actions and a potential source of reduced congruency.

      We agree, of course, that there are kinematic differences in how a monkey and how a human manipulate the same object. We have not measured the kinematics and thus cannot make a systematic statement about this. We now report in the results section the rather incidental observation that already the reaching trajectories for the three actions differed and show corresponding differences in the timing of the approach epoch. However, for the question of this study, how many neurons are eligible to represent observed actions according to a strictly defined mirror mechanism, the kinematic repertoire of the observed actor is irrelevant. The reference is the F5 mirror neuron activity during the monkey's own action, i.e., how the monkey approaches the object with his hand, how he grasps it, and how he brings it to a certain target position and holds it there. The observed action, according to the strictly defined mirror mechanism, is to be mapped to this reference. Therefore, we did not collect kinematic data. But it is of course a possible explanation for a non-shared code if the strictly defined mirror mechanism does not apply.

      Second, there is little information about monkey's oculomotor behavior in the two conditions, which is known to affect mirror neuron activity when exploratory eye movements are allowed (Maranesi et al. 2013 Eur J Neurosci), potentially influencing the present findings: a {plus minus}7 (vertical) and {plus minus}5 (horizontal) window at 49 cm implies that the monkey could explore a space larger than 10 cm horizontally and 14 cm vertically, which is fine, but certainly leaves considerable freedom to perform different exploratory eye movements, potentially different among observed actions and hence capable to account for different "attention" paid by the monkey to different conditions and hence a source of neural variability, in addition to action tuning.

      We agree that the topic of the relationship between F5 MNs activity and eye movements is interesting. And we know from the work of Maranesi et al. (2013) that at least larger eye movements during action observation are related to the activity of F5 MNs. In our study, we ensured that the observing monkey was actually observing the action. For this purpose, we used a fixation window. We now additionally verified that the monkey really looked into the area of the object during all three actions (see Results, lines 209-219 in the manuscript with tracked changes). In our study, the fixation window was so small that the monkey could not see the face of the human actor, in contrast to the study of Maranesi et al. (2013). It was mainly the face that attracted the monkey's attention in that study (measured by gaze position). In our study, the risk that the gaze of observing monkey was out of the fixation window was high when he looked at the human actor's hand above the wrist. The execution of the action by the monkey took place in darkness. We did not use a fixation window because the monkey's own execution of the action can be assumed to direct his attention to the action.

      We cannot rule out the possibility that smaller eye movements during observation, larger eye movements during execution in darkness, covert shifts of spatial attention, or more generally attentional fluctuations have an influence on F5 MNs that might have counteracted a shared action code in our study. However, if this were the case, then the investigated hypothesis that the activity of F5 MNs during action observation is a motor representation according to the strictly defined mirror mechanism would also have to be rejected.

      4) Information about error trials and their relationship with action planning. The monkey cannot really "make errors" because, despite the cue, each object can be handled in a unique way. The monkey may not pay attention to the cue and adjust the movement based on what the object permits once grasped, depending on online object feedback. From the behavioral events and the times reported in Table 1, I initially thought that "shift" action was certainly planned in advance, whereas "lift" and "twist" could in principle be obtained by online adjustments based on object feedback; nonetheless, from the Methods section it appears that these times are not at all informative because they seem to depend on an explicit constraint imposed by the experimenters (in a totally unpredictable way). Indeed, it is stated that "to motivate the monkey even more to use the LED in the execution task, another timeout was active in 30% (rarely up to 100%) of trials for the time period between touch of object to start moving the object: 0.15 (rarely 0.1) for a twist and shift, 0.35 (rarely 0.3s) for a lift". This is totally confusing to me; I don't understand 1) why the monkey needed to be motivated, 2) how can the authors be sure/evaluate that the monkeys were actually "motivated" in this way, and 3) what kind of motor errors the monkey could actually do if any. If there is any doubt that the monkeys did actually select and plan the action in advance based on the cue, there is no way to study whether the activity during action execution truly reflects the planned action goal or a variety of other undetermined factors, that may potentially change during the trials. Please clarify.

      It is true that the three actions could in principle be performed without using the LED as an informative cue. While this is unlikely under the assumption that a monkey prefers the easiest and fastest way to get reward, it remains a possibility. For this reason, we introduced time constraints in a part of the trials. The selection of time constraints and the proportion of trials in which they were applied, was a pragmatic compromise between a time limit, at which the LED must be used as an informative cue for action selection in order to comply with the task, and a time span that allows the task to be completed even when overall motivation is low. The latter takes into account the general experimental experience that a monkey's engagement or motivation in such experiments varies across trials, sessions, and days. To evaluate whether the LED color was, indeed, used as a cue for action planning in the execution task, we randomly interleaved trials with a different LED, non-informative regarding the type of object, as a control in 5% of the trials. We compared the behavioral responses in trials with informative cues and those with a non-informative cue. The behavioral analysis established that both monkeys indeed used the informative cues to guide their choices (see Fig. 1D).

      Further evidence that the monkey used the cue for action selection and planning is the finding that the type of action was encoded before the release of the start button and then further during the approach phase, i.e., much earlier than somatosensory feedback about the manipulability of the object was available (see Fig. 3A and Fig. 6A).

      Regarding the question, which "motor errors" were possible: The answer can be found in the description of the cases in which a trial was aborted (see Material and methods): releasing the start button too early (< 100 ms after turning on the LED), manipulating the object too slowly after touching it (the time constraints mentioned), not holding the object until the reward was given, or not performing the task at all (10 s timeout).

      5) Classification analysis. There seems to be no statistical criterion to establish where and when the decoding is significantly higher than chance: the classifier performance should be formally analyzed statistically. I would expect that, in this way, both the exe-obs and the obs-exe decoding may be significant. Together with the considerations of the previous point 2 about the permissive inclusion criteria for mirror neurons, this is a remarkable (even quite unexpected) result, which would prove somehow contrary to what the authors claim in the title of the paper. The fact that in any classification the "within task" performance is significantly better than the "between task" performance does not appear in any way surprising, considering both the inclusive selection criteria for "mirror neurons" and the unavoidably huge different sources of input (e.g. proprioceptive, tactile, top-down, etc. afferences) between execution and observation. So, please add a statistical criterion to establish and show in the figures when and where the classifications are significantly above chance.

      We have added - in addition to the statistics already performed in the first version (Fig. 3A in the previous version, now Fig. 6A) - a number of analyses including statistics. This mainly concerns the analyses regarding a shared code at the single neuron level, in which we additionally tested against the null hypothesis proposed by Csibra in 2005 using permutation tests. And we have now also calculated confidence intervals for the population classifications that allow the comparison with chance level. We re-performed the classification analyses using eight-fold cross-validation. We also added a statistical analysis to the finding of clustering of time periods with shared code (Fig. 4). In Figure 5, we additionally compared the frequency of action segments with shared and non-shared codes, which is a descriptive, exploratory analysis. For this reason, it does not make sense to perform inferential statistics. Overall, these analyses represent a significant expansion of the analyses in the first version. We have done this primarily to arrive at statistically sound conclusions at the single neuron level.

      Regarding the comparison between within-task classification (o2o) and cross-task classification (e2o), it is important to keep in mind that the goal was to test the hypothesis that the activity of F5 MNs during action observation is a motor representation of the observed action according to the strictly defined mirror mechanism. This hypothesis requires both, 1) an above chance level accuracy of the e2o classifier and 2) no better accuracy of the o2o classifier as compared to the e2o classifier. If the o2o classifier were better, then the actions would not be represented as they are executed. And the reference in this hypothesis is the motor representation, that is, the code at execution. Thus, the direction e2o classification is the crucial one, not the reverse direction (o2e). One explanation for the fact that o2o shows better accuracy in the population may be the different sensory inputs mentioned above. In this case, the tested hypothesis has to be rejected and replaced by another one, which should then have a different name.

      Nevertheless, we also show the result of the o2e cross-task classification in Fig. 6 (yellow curve), which was already included in Fig. 3 of the first version. However, we do not address it in more detail in the main text because it is not relevant for the hypothesis to be tested. It is only a reportable additional result.

      6) "As the concept of a mirror mechanism posits that the observation performance can be led back to an activation of a motor representation, we restricted this analytical step to a comparison of the exe-obs and the obs-obs discrimination performance". I don't understand the rationale of this choice. The so-called "concept" of mirror mechanism in classical terms posits that mirror neurons have a motor nature and hence their functioning during observation should follow the same principle as during action execution. But this logical consideration has never been demonstrated directly (it is indeed costated by several papers), and when motor neurons are concerned (e.g. pyramidal tract neurons, see Kraskov et al. 2009) their behavior during action observation is by far more complex (e.g. suppression vs facilitation) than that hypothesized for classical "mirror neurons". Furthermore, when across-task decoding for execution and observation code has been used, both in neurophysiological (e.g. Livi et al. 2019, PNAS) and neuroimaging (Fiave et al. 2018 Neuroimage) data, the visual-to-motor direction typical produce better performance than the opposite one. Thus, I don't see any good reason not to show also (if not even just) the obs-exe results. Furthermore, I wonder whether it is considered the possible impact of a rescaling in the single neuron firing rate across contexts, as the observation response is typically less strong than the execution response in basically all brain areas hosting neurons with mirror properties, and this should not impact on the matching if the tuning for the three actions remains the same (e.g. see Lanzilotto et al. 2020 PNAS). The analysis shown in Figures 4 and 5 is, for the rest, elegant and very convincing - somehow surprising to me, as the total number of "congruent" neurons (7.5%) is even greater than in the original study by Gallese et al. (5.4%).

      As to the rationale of our approach, please see our response to the previous point.

      On the issue of rescaling: the hypothesis tested here requires that the F5 MNs activity on observation is a motor representation of the observed action. Hence, from the activity during observation the action should be just as readable as from the execution-related activity. If we had to use rescaling to find a shared code, then observed actions would not be represented in F5 MNs in the same way as on execution. Additional information on whether the action is being executed or observed would be needed. This would of course be possible in principle, but would contradict the hypothesis. And we then not only have the difficulty of which readout is the physiological one (here we make a parsimonious assumption with a linear readout), but we would have to make an additional assumption about rescaling. For this study, we have now chosen the solution of performing the action preference analysis on a single neuron level in a statistically clean way. This represents a very liberal form of rescaling, as it only tests whether the action with the highest or lowest discharge rate is the same when executed and observed. That is, if the result here is not fundamentally different, which is the case, then it can also be assumed that one does not get qualitatively different results for other forms of rescaling.

      7) The discussion may need quite deep revision depending on the authors' responses and changes following the comments; for sure it should consider more extensively the numerous recent papers on mirror neurons that are relevant to frame this work and are not even mentioned.

      The discussion has been thoroughly revised considering the comments raised and suggestions of this and the other two reviewers.

      Reviewer #3 (Public Review):

      Mirror neurons are a big deal in the neuroscience literature and have been for thirty years. I (and many others) remain skeptical of whether they serve the functions often attributed to them - specifically, whether they are motor planning neurons that contribute to understanding the actions of others. Testing their functions, therefore, is of great interest and importance. The present study, however, is not a cogent or convincing test. I do not think this study helps to answer the questions surrounding mirror neurons. It purports to provide a crucial test, that comes out mostly against the mirror neuron hypothesis, but the test has too many weaknesses to be convincing.

      Thank you for the clear words. We take from it, first of all, that in the first version of the manuscript we failed to convey the relevance of our study for the discussion of mirror neuron function. The concerns of this reviewer are in line with those of the others and are addressed in our response to all three reviewers.

      First, consider that the motor tuning and the visual tuning match "poorly." How poor or good must the match be before the mirror neuron hypothesis is rejected? I do not know, and the study does not help here. Even a "poor" match could contribute significantly to a social perception function.

      The specific hypothesis tested here assumes that an action-specific activity of F5 MNs evoked by observed actions corresponds to an action-specific activity of these actions if executed. The approach taken here to compare cross-task classification accuracy (execution-trained, tested in observation) with within-task classification accuracy (observation-trained, tested in observation) tests this hypothesis. The fact that we found a cluster of time periods of single neurons in which both accuracies are almost equal supports this approach and also the hypothesis for these time periods. In principle, of course, the decision for the presence of a difference or equality is always only a statistical statement and contains assumptions. For example, the assumption that a linear readout has physiological relevance enters here. But this problem exists in all studies that ultimately try to understand biological neuronal networks in order to explain perceptions and behavior. However, it is such studies that attempt to elucidate what information is contained in which neurons that set the stage for experiments that, in the optimal case, manipulate certain neurons in a particular way in order to then measure the behavior of an animal that is just right for those neurons.

      Second, the results remind me in some ways of other multi-modal responses in the brain. For example, in the visual area MST, neurons are tuned to optic flow fields that imply specific directions of self-motion. Many of the same neurons are tuned to vestibular signals that also imply specific directions of self-motion. But the optic flow tuning and the vestibular tuning are not perfectly matched. There is considerable slop and complexity in how the two tunings compare within individual neurons. That complexity is not evidenced against multi-modal tuning. Instead, it suggests a hidden-layer complexity that is simply not fully understood yet. Just so here, the fact that the apparent motor tuning and apparent visual tuning match "poorly" is not evidence against both a motor planning and a visual encoding function.

      We hope that it is now clearer, in contrast to the first version, that we tested a specific hypothesis that is only a prerequisite for the hypothesis of a very specific form of understanding. Referring to the example, the hypothesis analogous to ours would be that the representation of self-motion direction due to optic flow ("observation") corresponds to the representation of self-motion direction due to vestibular stimulation ("execution"). If it were then found that the self-motion direction due to optic flow cannot be predicted from a classifier trained on vestibular stimulation, and that another classifier trained on optic flow performs better, then the hypothesis would have to be rejected. This is then a reason to realize that "everything is a bit more complex" and to search for better explanations.

      Third, the animals are massively over-trained in three actions. They perform these actions and see them performed thousands of times toward the same object. Surely, if I were in the place of the monkey, every time I saw the object, I'd mentally imagine all three actions. As I saw a person act on the object, I'd mentally imagine the alternative two actions at the same time. Even if the mirror neuron hypothesis is strictly correct, this experiment might still find a confusion of signals, in which neurons that normally might respond mainly to one action begin to respond in a less predictable way during all three trial types.

      In our study, we tested a specific hypothesis related to the time an action is observed. Here, you suggest an alternative hypothesis. The question is whether this alternative hypothesis better explains the result of our study. The alternative hypothesis can be formulated as follows: the F5 MNs activity elicited by an observed action in this experiment corresponds to a mixture of the activities that occur when the other two actions are executed. This hypothesis is to be rejected because it fails to explain why a shared code occurs in single neurons and why cross-task population classifiers show an accuracy above chance level. A modified alternative hypothesis, which states that what is represented in the experiment during observation is a mixture of all three actions, cannot explain why the three actions are very well represented in the population and are optimally represented exactly when the target position of the object is reached.

      Fourth, the experiment relies on a colored LED that acts as an instructional cue, telling the monkey which action to perform. What is to stop the neurons from developing a cue-sensitive response, as in classic studies from Steve Wise and others in the premotor cortex? Perhaps the neuronal signal that the experimenters are trying to measure is partly obscured by other, complex responses influenced in some manner by the instructional cue?

      In principle, there is the possibility that purely sensory information is also represented in area F5, at least in some neurons or at certain points in time. We take your suggestion and discuss this as one of the alternative concepts (we call it "sensory concept"). However, several findings argue against this concept. For example, neural responses to cues usually represent the subsequent action, but not sensory information of the cue such as the color of the cue. In our study, it is evident from Figure 3A, 6A and 6B that during action execution, actions are discriminated even before the start button is released. Since this discrimination of actions occurs with a time delay after the cue and then increases continuously, this is evidence that the action to be executed is represented, but not the cue itself.

      Fifth, finally, and most importantly, the fundamental problem with this study is that it is correlational. Studies that purport to test the function of a set of neurons, and do so by use of correlational measurements, cannot provide strong answers. There are always half a dozen different interpretations and caveats, such as the ones I raised here. Both sides of a debate can always spin the results, and the arguments are never resolved. To test the mirror neuron hypothesis properly would require a causal study. For example, lesion area F5 and test if the monkey is less able to discriminate the actions of others. Or, electrically microstimulate in area F5 and test if the stimulation interferes (either constructively or destructively) with the task of discriminating the actions of others. Only in this way will it be possible to answer the question: do mirror neurons functionally participate in understanding the actions of others? The present study does not answer that question.

      We would like to reiterate that studies aimed at elucidating what information is contained in which neurons or areas are necessary to understand neural network processes and are a prerequisite for conducting well-considered experiments that measure behavioral effects through specific manipulation of the neural network. Without the work of Gallese, Rizzolatti and colleagues, the idea of associating F5 neurons with action understanding would not have occurred in the first place. The current tricky question is whether at all, and if so, to what understanding, to what perception, to what behavior that uses information about mental states of another, F5 MNs might be able to contribute. And for this, it helps to have a clearer idea of what information is contained in F5 MNs during action observation.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Septate junctions provide the barrier function in insect tissues, serving as analogs to the vertebrate tight junctions. Here the authors explore an interesting question-how do epithelial tissues respond to loss of barrier function in vivo. They use a powerful and well-studied system, the Drosophila pupal notum, which allows them to bring powerful genetic tools to bear and use state of the art imaging. Their data are lovely and carefully quantified. Together, they reveal some significant surprises. 1. Disrupting septate junctions leads to elevated accumulation of adherens junction proteins and myosin, and reduced apical area. 2. Disrupting septate junctions led to accumulation of many ESCRT-0-positive vesicles and of enlarged ESCRTIII vesicles. 3. Disrupting septate junctions led to elevated accumulation of Crumbs apically and of integrin-based focal adhesions basally. These observations are well supported by the data and in the results section conclusions are carefully drawn. I had some relatively minor comments outlined below about the results. My only significant suggestion concerns the Abstract and Discussion. The Abstract includes a statement that goes well beyond the data shown, and the Discussion is sometimes hard to follow. With these issues corrected, this will provide important new insights for cell and developmental biologists.

      1. The Abstract states: "We report that the weakening of SJ integrity, caused by the depletion of bi- or tricellular SJ components, reduces ESCRT-III/Vps32/Shrub-dependent degradation and promotes instead Retromer-dependent recycling of SJ components." This is too strong, as the role of the retromer, while plausible, is not directly tested. It's fine to speculate about this in the Discussion but drawing a conclusion like this in the Abstract is unwarranted.
      2. Similarly, the title suggests that "ESCRT-III-dependent adhesive and mechanical changes are triggered by a mechanism sensing paracellular diffusion barrier alteration". They show that knocking down septate junctions alters localization of vesicle trafficking machinery, and that it leads to alterations in apparent recycling of cargo, but do they ever really assess whether these changes are ESCRT-III-dependent? Wouldn't this require knocking down ESCRT-III in cells with defects in septate junctions? There was a lot of data in this paper and perhaps I missed it but was this experiment done? I am not suggesting they do it, but that they temper this conclusion if not.
      3. The authors assessed "poly-ubiquitinylated proteins aggregates appearance, marked using anti-FK2" . They need to define FK2-what does it detect.
      4. Fig 4-is this a clone, and are we far from the boundary? Make this clearer
      5. The authors state: "Despite these apparent similarities, we noticed that, in contrast to Shrub depletion, NrxIV did not accumulate in enlarged intracellular compartments upon Cora depletion" Could the authors reference a Figure here?
      6. The authors state: "Hence, if both Shrub and bSJ/tSJ defects lead to Crumb enhanced signals" It might be better to say "altered" as they then point out the differences.
      7. I found the Discussion challenging to follow. Rather than focusing on the core observations, it addresses many, not very well-connected speculative possibilities, and in my opinion, will be challenging for most readers to follow. I would encourage the authors to revisit it from top-to-bottom.

      Referees cross-commenting

      I think we largely agree that the authors present important data, but that certain points need to be better explained or more clearly documented. While Reviewer 1 is correct that adding context about the basolateral polarity proteins would be helpful, I do not feel as strongly about this as a deficit. The authors did not manipulate Scrib, Dlg or Lgl, and i think their polarity functions may be distinct from those of the more "structural" septate junction proteins analyzed here.

      Significance

      Septate junctions provide the barrier function in insect tissues, serving as analogs to the vertebrate tight junctions. Here the authors explore an interesting question-how do epithelial tissues respond to loss of barrier function in vivo. They use a powerful and well-studied system, the Drosophila pupal notum, which allows them to bring powerful genetic tools to bear and use state of the art imaging. Their data are lovely and carefully quantified. Together, they reveal some significant surprises. 1. Disrupting septate junctions leads to elevated accumulation of adherens junction proteins and myosin, and reduced apical area. 2. Disrupting septate junctions led to accumulation of many ESCRT-0-positive vesicles and of enlarged ESCRTIII vesicles. 3. Disrupting septate junctions led to elevated accumulation of Crumbs apically and of integrin-based focal adhesions basally. These observations are well supported by the data and in the results section conclusions are carefully drawn. I had some relatively minor comments outlined below about the results. My only significant suggestion concerns the Abstract and Discussion. The Abstract includes a statement that goes well beyond the data shown, and the Discussion is sometimes hard to follow. With these issues corrected, this will provide important new insights for cell and developmental biologists.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this paper, using the triploid biotype of planarian Schmidea polychroa, the first half of the paper presents the results of the analysis of genome structure and the second half shows that (de novo) mutations in individuals that undergo regeneration are passed on by the next generation.

      While I think this paper contains interesting biological findings, I am skeptical about its novelty. I was convinced by the results and discussion of the analysis of genome structure, but the results and that of the analysis of (de novo) mutation were very confusing. This may be due to my lack of knowledge in this field. But even so, the author needs to improve this manuscript so that the general reader will better understand it.

      Major comments:

      1. The author mentions that it is important to note that this study was conducted using a parthenogenetic triploid biotype. However, I think that the parthenogenesis undergoing by a triploid biotype of S. polychroa is very unusual. It is not typical apomictic parthenogenesis. Triploid oocytes arise by meiosis from hexaploid oocytes derived from triploid adult somatic stem cells called neoblasts. On the other hand, haploid sperm arise by meiosis from diploid spermatogonia derived from neoblasts. Embryogenesis of triploid eggs then occurs by pseudogamy. Occasional sex is also known to occur even if the offspring's chromosome number remains triploid. I think this background is important information to give the reader. Also, don't the authors need to treat the results in this paper with this complex phenomenon also taken into account?
      2. Fig.4B-C: Analysis by lineage-specific mutations of parental controls.<br /> The authors do not specifically mention or discuss this result. What about the accumulation of mutations within such populations in typical parthenogenesis (daphnia and aphids)? In other words, are the results in Fig. 4B-C due to the special mechanism for parthenogenesis in the triploid S.polychroa as described above?
      3. Throughout this paper, the authors show that regeneration increases de novo mutations in the progeny. The authors conclude that many of the mutations occurred in neoblasts during regeneration. However, I would like you to explain the biological significance of this results in S. polychroa, which naturally does not reproduce by fission and regeneration. There are already reports of mutations accumulating in neoblasts in Dugesia japonica, which reproduce aexually by fission. For these reasons, I do not think this paper presents extremely novel results.
      4. p15, Discussion:<br /> "Tissue regeneration is best seen in the liver of mammals, and the regrowth of relapsed tumours following surgery can also be considered an example of a regenerative process. Mutagenesis accompanying these processes is relevant to subsequent tumorigenesis or the development of resistance, and the planarian system can provide a useful model for the mutagenic effect of tissue regeneration."

      Isn't it an overstatement to associate the regenerative system of planaria with the liver regeneration of mammals?<br /> 5. p10, Results:<br /> "We compared the two de novo spectra to the spectrum of germline heterozygous SNPs, present in all animals, and found that the pattern of germline substitutions resembled more closely the de novo spectrum of the control group (Fig 5D, Fig S3), implying that regeneration has a minor contribution to germline mutations in S. polychroa populations."<br /> p14, Discussion:<br /> "The high similarity of the spectrum of heterozygous SNPs and de novo mutations of control animals suggests that the species primarily reproduces in a non-regenerative manner. The increased mutation rate and the altered mutation spectrum upon regeneration confirmed our hypothesis that regeneration is a mutagenic process."

      I was very confused by these sentences and it took me some time to understand them. Triploid S. polychroa naturally does not reproduce by fission and regeneration, namely a non-regenerative manner. I do not understand why the author insists on this. Please explain the results for the regenerated case in Fig. 5D (0.88) in a way that is also easy to understand. Also, what is the biological significance of asserting here that de novo mutation by regeneration increases in a species that does not increase by regeneration and division in the first place?

      Minor comments:

      1. The author should add a schematic diagram showing the distribution of reproductive organs in Fig.1 to help the reader understand that the ovaries are not included in the regenerative fragment.
      2. P12, line12: Fig 6D-E, it's F, not E, right?
      3. P9, line 8:<br /> "these mutations were missing from the original egg but were present in the egg laid by the parent and thus represent the total mutation load of a generation."

      The author mentions that the de novo mutation found in offspring derived from parents that do not undergo regeneration was already present in the eggs, but I can find no evidence of this. Can you rule out the possibility that these mutations occurred between hatching and adulthood?<br /> 4. p10, Results:<br /> "Interestingly, the majority of mutations were shared in the siblings F4A and F4B. This suggests that the germ cells of these animals were descendants of the same stem cell, which underwent a high number of cell divisions early during the regeneration process prior to oocyte differentiation. The same finding also confirms that the detected clonal filial mutations were present in the respective oocyte and were not generated by embryonic cell divisions."

      The shared de novo mutations detected in the siblings (F4A and F4B) derived from the parent that underwent regeneration in Fig. 5A suggest that the germ cells of these siblings are descended from the same stem cell. The authors say that these mutations occurred in a large number of cell divisions early in the regenerative process prior to oocyte differentiation.

      So why is there no shared de novo mutation in the siblings (Fc4A and Fc4B) derived from the non-regenerating parent in Fig. 5A? As mentioned in Minor comment 3, the author states that the de novo mutations were already present in the parent-laid eggs, but when did these mutations, which are not shared, arise?<br /> 5. p11, Results:<br /> "Interestingly, in the case of FR4A-FR4B sibling pair, shared de novo mutations present in both were subclonal in R4 in a proportion comparable to the other samples (7/15 by WGS, 46.7%), while the three unique mutations could not be detected in R4 by the PCR approach, indicating again that the unique mutations, which amounted to approximately 10% of total clonal filial mutations in these two animals, arose late during germ cell regeneration."

      "during germ cell regeneration." the expression is too vague to know which stage you are referring to. In relation to minor comment 4, why not create a new chart to clearly show when the expected mutations occurred?<br /> 6. p12, Results:<br /> "Altogether 7/30 regenerant mutations were detected in PR animals, and these included those with the highest AF in the regenerants (Fig. 6C). This suggests that parental animals, even before regeneration, contained a diverse set of stem cells, and some of the detected de novo mutations in the filial generation resulted from the expansion of mutation-containing stem cell clones contributing also to germ cells in the regenerant animals."

      If the mutation in the offspring is derived from the parent (PR) prior to the time of tail amputation, wouldn't it be wouldn't it be strange to assume that it is a de novo mutation?<br /> 7. p12, Results:<br /> "The remaining 23/30 R- subclonal mutations may have arisen during regeneration. On average, ~250 dividing neoblasts were detected in cut tails of animals from the same population as the sequenced individuals, as determined by immunofluorescence of phosphorylated H3 histone (Fig 6D-E). However, the high proportions of body cells carrying regenerant-specific mutations suggest that certain stem cells contribute to disproportionately large parts of the regenerated body, including the germline."

      I did not quite understand the relevance of this discussion to the photos shown here of the M period (Fig. 6e).

      Significance

      General assessment: This paper contains important biological information. The finding that mutations in planarian stem cells cause diversity in the next generation of parthenogenesis is very interesting. However, I think that the author needs to carefully explain and change his argument, for example, that the mutations were caused by regeneration, which does not naturally occur in the species used.

      Advance: The finding that accumulation of mutations is occurring in planarian stem cells has already been reported in Dugesia japonica. Please cite the papers and clarify what is the key finding in this paper.

      Audience: Basic Research_Evolutionary Ecology, Developmental Biology (Stem Cells), Reproductive Biology

      Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      My field of study is reproductive biology. I am familiar with the transcriptome but unfamiliar with genome analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for carefully reading our manuscript and for all considerations, ideas, suggestions, and comments. These were all very helpful for us to strengthen the scientific statements of our manuscript. Please, note that all changes are marked in red in the manuscript and supplement. Below you will find, point by point, our responses to all questions and comments.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Overall, this is an exciting work. There are, however, several open questions that the authors could address to facilitate understanding of their work. These points are:

      1.) On page 5, lines 113ff, the authors mention the membrane bulges that they analyse in figure 1. They show these deformations by light (confocal) and electron microscopy. However, the bulges seen by confocal microscopy seem to be bigger that those seen by electron microscopy. The authors could quantify the sizes of the bulges for clarification.

      We quantified the size of the membrane bulges. At the confocal we measured in average 750nm as mean value of identified bulges (n=12) with 650nm as minimal and 890nm as maximal sizes. At the TEM we measure ~243nm as mean value (n=61), with a range between 62nm as minimum and 442 as maximum value. These measurements are shown as Figure 1E.

      Please note that measurements of TEM images do not always capture the three-dimensionality of bulges and may show only parts of them. In addition, ultrastructure is more sensitive and can easily detect small membrane changes that we cannot observe with confocal and airsycan microscopy. In contrast, even with our high-quality objective (63x Zeiss Plan Neofluar, Glycerin, 1.3 NA), standard confocal analysis is limited at ~200nm on the XY axis (airyscan ~110nm) and ~450nm on Z-axis. Therefore, TEM analysis detects smaller bulges than confocal analysis, and consequently, this method detected a large range of bulge lengths between 63nm and 441nm. In contrast, the airyscan method detected a range of bulge length between 0.65 and 0.83 µm. However, confocal and TEM analyses provide evidence of membrane bulges in pio mutant embryos. Please note that we extended our studies and now show membrane bulges in two different pio mutant alleles (17C and 5M) with airsycan microscopy.

      2.) The subject of the manuscript is rather complicated; presentation of data from Figure 1C and D on lines 113ff and 169ff is confusing.

      We apologize and thank the reviewer for careful reading. We revised both paragraphs (lines 108 – 123 and lines 166 - 174) and are confident that the descriptions are now much more understandable. All changes are marked in red.

      3.) The quality of the sub-images of Figure 2E differs. Especially, the phenotype of the wurst, pio transheterozygous embryo is not well visible.

      We apologize for it. We repeated the experiment with wurst;pio transheterozygotes, and generated wurst;pio double mutant embryos to improve the quality. The gas filling assay is shown in Fig. 3. With brightfield microscopy in overview images (10x air objective) and close-ups of the dorsal trunks (25x Glycerin objectives). Both show the gas-filling defects of dorsal trunk tubes. In a subsequent confocal analysis of chitin stainings in late-stage 17 embryos, we found that tracheal tube lumens are collapsed in the transheterozygotes and double mutant embryos.

      4.) Lines 246ff: the protein size are given for the mCherry:chimeric proteins; an estimate of the native Pio portions should be given.

      The endogenous Pio protein has a calculated mass of about 50.82 kDa. We state it now in the according legend of Fig. 6.

      5.) In Figure 6A, the appearance of chitin in the wildtype tube is different compared to the Np mutant situation, more filamentous. Can the authors comment on that?

      The author is correct. The chitin cable formation in Np mutant embryos is normal but lacks the condensation process, and, therefore, fiber structure of the chitin matrix differs from control embryos in late stage 16 and stage 17 embryos (see Drees et al., PLOS Genetics, 2019).

      6.) In the discussion section, I would appreciate if the timing of events was discussed or even shown in a model. The central question is: how are the functions of Pio and Np coordinated in time? As I understand, Np should not cleave Pio before morphogenesis is completed. Is there any example in the literature for how such an interaction could be controlled? The overexpression of Np shows that either the ratio between Np and Pio is important, or the btl promoter expresses Np at the "wrong" time point.

      We thank the reviewer for this interesting comment.

      Of course, we did not measure forces, but it has been published that axial forces appear at the apical cell membrane during stage 16 tube expansion. Our data show that Np cleaves Pio ZP domain and subsequent release increase during stage 16. The cleaved and released Pio enriches in the lumen during stage 16, from where cleaved Pio is internalized during stage 17 with the help of Wurst-mediated endocytosis. This is supported by several in vivo studies, video microscopy, antibody stainings and biochemical data, such as the interaction of Pio and Dumpy as well as the identification of different Pio products with and without Np cleavage. Moreover, we found membrane bulges that increase in size during stage 16 and identified a subsequent tear-off of the chitin matrix in Np mutant embryos. Thus, we propose that Np is required to cleave Pio-Dumpy linkages at the membrane-matrix when tubes elongate and postulated forces appear at the cell membrane during tube elongation in stage 16 embryos.

      We stated this in the discussion as follows:

      “The membrane defects observed in both Pio and Np mutants indicate errors in the coupling of the membrane matrix due to the involvement of Pio (Figs. 1,7). ..., the large membrane bulges in Np mutants affect the membrane and the apical matrix (Fig. 7). Since apical Pio is not cleaved in Np mutants (Fig. 7D), the matrix is not uncoupled from the membrane as in pio mutant embryos but is likely more intensely coupled, which leads to tearing of the matrix axially along the membrane bulges (Figs. 7, 9), when the tube expands in length.”

      How could Np be regulated at the membrane? Np is a zymogen that very likely undergoes ectodomain shedding for activation, similar to what has been described for matriptases. Additionally, human matriptase requires transient interaction of the stem region with its cognate inhibitor HAI-2, which Drosophila lacks (see Drees et al., PLOS Gen, 2019). Thus, the regulation of Np activation is not known.

      Further, we observed that Dumpy is not degraded in Np mutant embryos during stage 17. Nevertheless, in a previous publication, we showed that btl-G4 driven Np expression rescues Np mutant phenotypes in a time-specific manner. We used the btl-G4 driver line for these rescue experiments to express Np in tracheal cells. This restored tracheal Dumpy degradation in Np mutant embryos. Thus, btlG-G4 driven Np overexpression is able to rescue Np mutant tracheal phenotypes in a time-specific manner, although Gal4 is expressed from early tracheal development onwards. Further, btl-Gal4 driven Np expression mimics the endogenous Np, which is expressed from stage 11 onwards in all tracheal cells throughout embryogenesis (see Drees et al., PLOS Gen, 2019).

      Based on these experiments, we conclude that the btl-G4-driven Np overexpression can cleave Pio ZP domain in stage 16 embryos at the correct time.

      However, the ratio of Np expression and Pio is essential in the way that btl-Gal4 driven Gal4 Np overexpression may cause cleavage of a higher number of Pio proteins and the release of critical Pio-Dumpy linkages at the cell membrane and matrix. Thus, increased Pio shedding into the lumen reduces Pio linkages at the membrane, resulting in a pio mutant like tracheal overexpansion in btl-Gal4 driven Gal4 Np overexpression.

      Finally, we were able to prove the reviewer’s question in a new experiment. We used btl-Gal4 driven UAS-Np embryos for Pio antibody staining. This revealed Pio enrichment at the tracheal chitin cable in stage 14 and 15 embryos. In contrast, stage 16 embryos showed numerous Pio puncta appearing across the entire tube lumen, indicating that Np mediates Pio shedding specifically in stage 16 embryos and not before. This Np-controlled Pio releases modifies tube length control.

      Therefore, we stated this in the manuscript as follows:

      Results:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13). Consistently tracheal Np overexpression led to tube overexpansion in stage 16 embryos resembling the pio mutant phenotype (Fig. 8A,B). Thus, Np-mediated Pio shedding controls Pio function.”

      Discussion:

      “The btl-Gal4-driven Np expression mimics the endogenous Np from stage 11 onwards in all tracheal cells throughout embryogenesis (Drees et al., 2019), suggesting that Np is not expressed at a wrong time point. However, the ratio between Np and Pio is essential. We assume that Np overexpression increases Pio shedding, resulting in a pio loss-of-function phenotype. Thus, the tube length overexpansion upon Np overexpression indicates that Pio cleavage is required for tube length control.

      Our observation that the membrane deformations are maintained in Np mutant embryos supports our postulated Np function to redistribute and deregulate membrane-matrix associations in stage 16 embryos when tracheal tube length expands. In contrast, Np overexpression potentially uncouples the Pio-Dpy ZP matrix membrane linkages resulting very likely in unbalanced forces causing sinusoidal tubes.”

      7.) Also for the discussion: We have two situations where Pio amounts/density are enhanced at the apical plasma membrane. The wurst experiments on lines 136ff show that Pio amount and density depends on endocytosis; is the wurst phenotype (Figure 2), at least partially, due to over-presentation of Pio? Likewise, in Figure 2C, there is more Pio in Cht2 overexpressing tracheae (but there is overall more Pio in these tracheae) - is actually endocytosis reduced in chitin-less luminal matrices? First: does the Pio signal at the apical plasma membrane correspond to membrane-Pio or free-Pio? Second, as in the case of wurst: would more Pio on the membrane (density) affect tracheal dimensions in Cht2 over expressing tracheae? Or are the consequences of Pio accumulation in the apical plasma membrane different in Cht2 and wurst backgrounds? Maybe cleavage of Pio and its endocytosis are dependent on its interaction with the chitin matrix. These questions connect to the question immediately above: how are the functions of the different players coordinated in space and time? We need a discussion on this issue.

      We thank the reviewer for this very important idea to discuss the functions of the different players in a coordinated space and time and apologize that we haven’t done before.

      As this is an important point, we tried to figure out all questions raised by the reviewer and discussed it in several new paragraphs in the discussion:

      "Indeed, the anti-Pio antibody, which can detect all different Pio variants, showed a punctuate Pio pattern overlapping with the apical cell membrane marker Uif at the dorsal trunk cells of stage 16 embryos. Additionally, Pio antibody also revealed early tracheal expression from embryonic stage 11 onwards, and due to Pio function in narrow dorsal and ventral branches, strong luminal Pio staining is detectable from early stage 14 until stage 17, when airway protein clearance removes luminal contents.

      We generated mCherry::Pio as a tool for in vivo Pio expression and localization pattern analysis during tube lumen length expansion. The mCherry::Pio resembled the Pio antibody expression pattern from early tracheal development onwards. However, luminal mCherry::Pio enrichment occurs specifically during stage 16, when tubes expand. The stage 16 embryos showed mCherry::Pio puncta accumulating apically in dorsal trunk cells. Moreover, mCherry::Pio puncta partially overlapped with Dpy::YFP and chitin at the taenidial folds, forming at apical cell membranes. Supported by several observations, such as antibody staining, Video monitoring, FRAP experiments, and Western Blot studies (Figs. 4,5), these findings indicate that Pio may play a significant role at the apical cell membrane and matrix in dorsal trunk cells of stage 16 embryos.

      Furthermore, we show that Np-mediates Pio ZP domain cleavage for luminal release of the short Pio variant during ongoing tube length expansion. The luminal cleaved mCherry::Pio is enriched at the end of stage 16 and finally internalized by the subsequent airway clearance process during stage 17 after tube length expansion. Such rapid luminal Pio internalization is consistent with a sharp pulse of endocytosis rapidly internalizing the luminal contents during stage 17 (Tsarouhas et al., 2007). Wurst is required to mediate the internalization of proteins in the airways (Behr et al., 2007; Stümpges and Behr, 2011). In consistence, during stage 17, luminal Pio antibody staining fades in control embryos but not in Wurst deficient embryos.

      Nevertheless, Pio and its endocytosis depend on its interaction with the chitin matrix and the Np-mediated cleavage. In stage 16 wurst and mega mutant embryos, we detect Pio antibody staining at the chitin cable, suggesting that Pio is cleaved and released into the dorsal trunk tube lumen. Also, the Cht2 overexpression did not prevent the luminal release of Pio. However, reduced wurst, mega function, and Cht2 overexpression caused an enrichment of punctuate Pio staining at the apical cell membrane and matrix (Figs. 1,2). Although the three proteins are involved in different subcellular requirements, they all contribute to the determination of tube size by affecting either the apical cell membrane or the formation of a well-structured apical extracellular chitin matrix, indicating that changes at the apical cell membrane and matrix in stage 16 embryos affect the Pio pattern at the membrane. It also shows that local Pio linkages at the cell membrane and matrix are still cleaved by the Np function for luminal Pio release, which explains why those mutant embryos do not show pio mutant-like membrane deformations and Np-mutant-like bulges. This is in line with our observations that tracheal Pio overexpression cannot cause tube size defects as the Np function is sufficient to organize local Pio linkages at the membrane and matrix. Therefore, it is unlikely that tracheal tube length defects in wurst and mega mutants as well as in Cht2 misexpression embryos are caused the apical Pio density enrichment.

      Nevertheless, oversized tube length due to the misregulation of the apical cell membrane and adjacent chitin matrix may cause changes to local Pio set linkages and the need for Np-mediated cleavage. Strikingly, we observe a lack of Pio release in Np mutants. This shows that Pio density at the membrane versus lumen depends predominantly on Np function. The molecular mechanisms that coordinate the Np-mediated Pio cleavage are unknown and will be necessary for understanding how tubes resist forces that impact cell membranes and matrices. On the other hand, Pio is required for the extracellular secretion of its interaction partner Dpy. At the same time, Dpy is needed for Pio localization at the cell membrane and its distribution into the tube lumen. Consistently, in vivo, mCherry::Pio and Dpy::eYFP localization patterns overlap at the apical cell surface and within the tube lumen. These observations support our model that Pio and Dpy interact at the cell surface where Np-mediates Pio cleavage to support luminal Pio release by the large and stretchable matrix protein Dpy (Fig. 9).

      Taenidial organization prevents the collapse of the tracheal tube. Therefore, cortical (apical) actin organizes into parallel-running bundles that proceed to the onset of cuticle secretion and correspond precisely to the cuticle's taenidial folds (Matusek et al., 2006; Öztürk-Çolak et al., 2016). Mutant larvae of the F-actin nucleator formin DAAM show mosaic taenidial fold patterns, indicating a failure of alignment with each other and along the tracheal tubes (Matusek et al., 2006). In contrast, pio mutant dorsal tracheal trunks contained increased ring spacing (Fig. 3A). Fusion cells are narrow doughnut-shaped cells where actin accumulates into a spotted pattern. Formins, such as Diaphanous, are essential in organizing the actin cytoskeleton. However, we do not observe dorsal trunk tube fusion defects as found in the presence of the activated diaphanous.

      On the other hand, ectopic expression of DAAM in fusion cells induces changes in apical actin organization but does not cause any phenotypic effects (Matusek et al., 2006). DAAM is associated with the tyrosine kinase Src42A (Nelson et al., 2012), which orients membrane growth in the axial tube dimension (Förster and Luschnig, 2012). The Src42 overexpression elongates tracheal tubes due to flattened axially elongated dorsal trunk cells and AJ remodeling. Although flattened cells and tube overexpansion are similar in pio mutant embryos, we did not observe a mislocalization of AJ components, as found upon constitutive Src42 activation (Förster and Luschnig, 2012). Instead, we detected an unusual stretched appearance of AJs at the fusion cells of pio mutant dorsal trunks, which to our knowledge, has not been observed before and may play a role in regulating axial taenidial fold spacing and tube elongation.

      Self-organizing physical principles govern the regular spacing pattern of the tracheal taenidial folds (Hannezo et al., 2015). The actomyosin cortex and increased actin activity before and turnover at stage 16 drive the regular pattern formation. However, the cell cortex and actomyosin are in frictional contact with a rigid apical ECM. The Src42A mutant embryos contain shortened tube length but increased taenidial fold period pattern due to decreased friction. In contrast, the chitinase synthase mutant kkv1 has tube dilation defects and no regular but an aberrant pearling pattern caused by zero fiction (Hannezo et al., 2015).

      In contrast, pio mutant embryos do not contain tube dilation defects or shortened tubes but increased tube length (Figs. 1; 8; S1). Furthermore, our cbp and antibody stainings reveal the presence of a luminal chitin cable and a solid aECM structure in pio mutant stage 16 embryos (Figs. 8, S1; S6). In addition, apical actin enrichment in tracheal cells of pio mutant embryos appeared wt-like. Nonetheless, pio mutant embryos show an increased taenidial fold period compared with wt, indicating a decreased friction. Thus, we propose that the lack of Pio reduces friction. Reasons might be subtle defects of actomyosin constriction or chitin matrix, which we have not detected in the pio mutant tracheal cells. Further reasons for lower friction might be the loss of Pio set local linkages between apical cortex and aECM in stage 16 embryos, which are modified by Np, as proposed in our model (Fig. 9).

      Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      8.) The sentence on line 242ff should be rephrased: "dynamic" and "elastic" are not opposites.

      We thank the reviewer for careful reading. We revised the sentence as follow:

      “Our FRAP data suggest that Pio is the dynamic part of the tracheal ZP-matrix, while the static Dpy modulates mechanical tension within the matrix”

      9.) A central question to me is the amounts and the density of factors in different genetic backgrounds as mentioned above. Is there any mechanism adjusting the amounts or the density of the players according to the size of the apical plasma membrane or the tracheal lumen? Pio seemingly responds to these changes.

      We would like to know the molecular mechanisms that control the density of players at the apical membrane. This question is important and could be the starting point for novel scientific investigations. Mechanisms of protein trafficking, such as exocytosis, recycling and endocytosis regulate delivery and internalization of proteins at the apical cell membrane. Furthermore, protein junctions at the lateral membrane may recognize and therefore may respond to low and high mechanical stresses between cells that appear during tube length expansion. However, we did not observe any hint for misregulation of Pio expression levels in the different mutants which affect endocytosis, SJs and luminal ECM. But we observed a shift of Pio levels between apical cell membrane/matrix and lumen in wurst, mega mutants and Cht2 overexpression. This shift is analyzed with diverse ZEN tools and quantified (Fig. 2D-F; Fig. S4B). As discussed in the new paragraph, this shift is very likely caused by changes at the apical cell membrane and chitin matrix which impact Pio shedding. Moreover, we observe the lack of Pio release in Np mutants. This shows that Pio density at the membrane versus lumen depends predominantly on Np-mediated cleavage. As discussed above, how Np is activated at the apical cell membrane to cleave Pio is not known.

      10.) The connection of Pio and taenidia is mentioned in the results section (page 7) but not discussed.

      We appreciate the careful reading and comments of the reviewer very much. We included the connection of Pio and taenidial in the discussion section as follows:

      “Taenidial organization prevents the collapse of the tracheal tube. Therefore, cortical (apical) actin organizes into parallel-running bundles that proceed to the onset of cuticle secretion and correspond precisely to the cuticle's taenidial folds (Matusek et al., 2006; Öztürk-Çolak et al., 2016). Mutant larvae of the F-actin nucleator formin DAAM show mosaic taenidial fold patterns, indicating a failure of alignment with each other and along the tracheal tubes (Matusek et al., 2006). In contrast, pio mutant dorsal tracheal trunks contained increased ring spacing (Fig. 3A). Fusion cells are narrow doughnut-shaped cells where actin accumulates into a spotted pattern. Formins, such as Diaphanous, are essential in organizing the actin cytoskeleton. However, we do not observe dorsal trunk tube fusion defects as found in the presence of the activated diaphanous.

      On the other hand, ectopic expression of DAAM in fusion cells induces changes in apical actin organization but does not cause any phenotypic effects (Matusek et al., 2006). DAAM is associated with the tyrosine kinase Src42A (Nelson et al., 2012), which orients membrane growth in the axial tube dimension (Förster and Luschnig, 2012). The Src42 overexpression elongates tracheal tubes due to flattened axially elongated dorsal trunk cells and AJ remodeling. Although flattened cells and tube overexpansion are similar in pio mutant embryos, we did not observe a mislocalization of AJ components, as found upon constitutive Src42 activation (Förster and Luschnig, 2012). Instead, we detected an unusual stretched appearance of AJs at the fusion cells of pio mutant dorsal trunks, which to our knowledge, has not been observed before and may play a role in regulating axial taenidial fold spacing and tube elongation.

      Self-organizing physical principles govern the regular spacing pattern of the tracheal taenidial folds (Hannezo et al., 2015). The actomyosin cortex and increased actin activity before and turnover at stage 16 drive the regular pattern formation. However, the cell cortex and actomyosin are in frictional contact with a rigid apical ECM. The Src42A mutant embryos contain shortened tube length but increased taenidial fold period pattern due to decreased friction. In contrast, the chitinase synthase mutant kkv1 has tube dilation defects and no regular but an aberrant pearling pattern caused by zero fiction (Hannezo et al., 2015).

      In contrast, pio mutant embryos do not contain tube dilation defects or shortened tubes but increased tube length (Figs. 1; 8; S1). Furthermore, our cbp and antibody stainings reveal the presence of a luminal chitin cable and a solid aECM structure in pio mutant stage 16 embryos (Figs. 8, S1; S6). In addition, apical actin enrichment in tracheal cells of pio mutant embryos appeared wt-like. Nonetheless, pio mutant embryos show an increased taenidial fold period compared with wt, indicating a decreased friction. Thus, we propose that the lack of Pio reduces friction. Reasons might be subtle defects of actomyosin constriction or chitin matrix, which we have not detected in the pio mutant tracheal cells. Further reasons for lower friction might also be the loss of Pio set local linkages between apical cortex and aECM in stage 16 embryos, which are modified by Np, as proposed in our model (Fig. 9).

      Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      11.) Dp remains cytoplasmic in pio mutant background - is the pio mutant phenotype due to defects by lack of Pio AND Dp function? What is the tracheal phenotype of dp mutants?

      It has been discussed that dumpyolvr and pio mutants show similar phenotypes in early tracheal development (Jazwinska, 2003) and it has been discussed that dumpyolvr mutant embryos compromise tube size in combination with shrub mutants. The additional quantifications of the dumpyolvr mutant showed significantly increased tube length (Dong 2014). We used dumpyolvr mutant [In(2L)dpyolvr], an X-ray induced mutation of the dumpy gene locus (Wilkin 2000). dumpyolvr mutant resemble pio null mutant tracheal phenotypes including detached dorsal and ventral branches and oversized tracheal dorsal trunk with curly appearance in late embryos. We included chitin and Uif staining’s of stage 16 dumpy mutant embryos (Fig. S10).

      This data suggest that Pio mutant phenotype is due to a lack of Pio and Dumpy, which would support our model, of Pio and Dumpy protein interaction in the extracellular space of the tube lumen.

      In wt embryos Pio is predominantly in the luminal chitin cable, in contrast in dumpy mutant embryos most Pio is predominantly not at the luminal chitin cable. Less luminal Pio staining in dumpy mutant embryos but Pio accumulation apically shows that Dumpy is required for luminal Pio release in stage 16 embryos. This supports our model that Pio and Dumpy interaction may link membrane and matrix and that this link reacts on mechanical stress during tube expansion by Np-mediated cleavage of Pio and its accompanied luminal release due to linked Dumpy.

      12.) Lines 374ff: the reduced dorsal trunk in Np mutants is not significant; the respective statement should be formulated carefully. If we believe the statistics (no significance), this would mean that attachment of the apical plasma membrane to the luminal chitin via Pio is needed to restrict axial extension; release of Pio is needed for differentiation (taenidia formation, luminal clearance) beyond morphogenesis.

      We agree with the reviewer that the reduction of the dorsal trunks in Np mutant is statistically not significant. However, the mean value is clearly below that of WT. Therefore, we revised our statement as follow: “In Np mutant embryos, tracheal dorsal trunk length shows the tends to be reduced compared to wt embryos.” Further, the btlG4-driven UAS-Np overexpression of Np suggests strong Pio release from the apical membrane and therefore resembles the pio mutant tube length overexpansion (Fig. 8A,B; Fig S13). Thus, our current observations indicate that Np-mediated Pio release at the cell membrane enables precise tube length elongation.

      We thank the referee for discussing that Pio is needed for taenidial fold formation which would fit to our findings in pio null mutant embryos. Pio mutant embryos show the appearance of taenidial folds in stage 16 embryos (airyscan) and stage 17 embryos (TEM images). However, TEM images also show chitin matrix reduction in pio mutant stage 17 embryos. Further, co-stainings of Pio with Crb and Uif, as well as co-stainings of mCherry::Pio with Dpy-GFP and cbp confirms that the Pio localize at the apical cell membrane where taenidial folds form in late stage 16 embryos. Thus, our observations suggest that Pio and Dumpy are required at the apical membrane and matrix to stabilize taenidial folds and tube lumen during 17. This also includes the Np-mediated Pio release at the apical cell membrane. As requested by the referee we summarized Pio function during late tracheal development in our simplified model (see Fig. 9).

      However, it is of note that Np-mediated Pio release increases at late stage 16 (Fig. 5A, 6D; Fig. S13) but is strongly reduced in stage 17 embryos. In contrast, thin taenidial fold are formed at late stage 16 and becomes thicker and form at fusion points during stage 17 and reach their most mature form when the intraluminal chitin cable is cleared (Öztürk-Colak et al., elife, 2016). Thus, the pattern of Pio release and taenidial fold differentiation do not fully match. Moreover, in preliminary experiments we observe Pio antibody staining in stage 17 embryos at the apical cell membrane of dorsal trunks (data not shown). Furthermore, lumen clearance of Obst-A, Knk, Sepr and Verm are not affected in pio mutant embryos, but unknown luminal ECM contents remained (Fig. 1D). Therefore, we will follow this very interesting idea in future experiments.

      Nonetheless, we state in the results that Pio shedding is essential:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13). Consistently tracheal Np overexpression led to tube overexpansion in stage 16 embryos resembling the pio mutant phenotype (Fig. 8A,B). Thus, Np-mediated Pio shedding controls Pio function.”

      13.) Why don't we see the apical Pio signal in Figure 4B?<br />

      The red arrowhead points to apical mCherry::Pio punctuate staining in the Fig. 5B (before 4B) in the close up of the “bleached area” before bleaching and 56min post bleaching. However, in vivo bleaching experiments do not allow additional antibody stainings to detect precisely the apical cell membrane. Further, the Dpy::eYFP marks the tube lumen and the apical cell surface. The latter showed adjacent mCherry::Pio punctuate staining. However, due to bleaching Dpy signal was not detectable in the area.

      14.) The Strep signals in the merges in Figure 7C are not well visible.

      We are not sure which Strep signal the reviewer is referring to in Fig. 7C, which is now Fig. 8C. The top panel shows the Strep signal (right panel) overlapping with GFP in cells that do not express Np or human matriptase. Thus, the TGFB3 ZP domain is not cleaved, and the intracellular GFP and also the extracellular Strep signals are maintained and overlap.

      In contrast, when Np or human matriptase is added, the TGFB3 ZP domain is cleaved and only the intercellular GFP signal is retained, whereas the extracellular Strep signal is released from the cell surface. This explains why the Strep signal is barely detectable in the middle and lower panels of Fig. 8C.

      Reviewer #1 (Significance):

      This work brings together several factors (Pio, Dp, Np, Wst etc) already known to be needed for tracheal morphogenesis and differentiation in the embryo of D. melanogaster. Having worked myself with some of these factors, however, I recognize that the interaction between these factors is novel and very exciting. The experiments strongly indicate a new mechanism of cell-ECM connection that seems to be conserved to some extent (as they provide preliminary data on an example from humans). By integrating the functions of different factors, the work provides ample opportunity for future projects to elucidate this mechanism in detail. Therefore, I expect that it will have a significant impact not only on the field of developmental cell biology but also, due to the conserved proteins involved (ZP proteins, Matriptase), on the field of cell biology of human diseases.

      Reviewer #2 (Evidence, reproducibility and clarity):

      _The figures are clear, and the questions well addressed. However, I find that some of the claims are not completely backed by the data presented and have some suggestions that will hopefully make some points clearer.

      Major comments

      1.) In the abstract and at the end of the introduction the authors claim that they show that Pio, Dpy and Np support the balancing of mechanical stresses during tracheal tube elongation. However, this is not shown in this manuscript, where tension or mechanical stress were not measured and it is therefore speculative._

      As requested by the reviewer, we deleted “support balancing of” at the final sentence of the Introduction. Please, note that we did not use the term balancing of mechanical stresses at the abstract.

      However, we revised the abstract.

      It has been shown previously that forces and mechanical tension rise when apical membrane expands and elastic extracellular matrix, which is anchored to the membrane balances theses forces (Dong et al., 2014). Furthermore, its has been shown that the gigantic and elastic Dumpy protein modulates mechanical tension (Wilkin et al., 2000). Thus, these previous publications state that mechanical tension rise at the apical cell membrane and matrix when tubes expand during stage 16 and that Dpy is part of that molecular process, which we included in the abstract as essential background information.

      “The apical membrane is anchored to the apical extracellular matrix (aECM) and causes expansion forces that elongate the tracheal tubes. The aECM provides a mechanical tension that balances the resulting expansion forces, with Dumpy being an elastic molecule that modulates the mechanical stress on the matrix during tracheal tube expansion.”

      Nonetheless, our results show that Np-mediated Pio cleavage increases during stage 16 as response to tube length expansion which is accompanied by forces as postulated by others (see above). We further observe that the membrane bulges and chitin matrix tear off, when Pio cleavage does not occur in Np mutant embryos. Our data further show that Pio and Dumpy interact and that Pio release is prevented in Dpy mutant embryos. Altogether this suggests that the Np-mediated Pio cleavage responds to tube expansion and requires Dpy for luminal Pio release.

      We therefore claim in the final sentence of the introduction that “…ZP domain proteins Pio and, Dumpy, as well as the protease Np respond to mechanical stresses when tracheal tubes elongate”. The according changes are marked in red.

      2.) The authors state that all pio CRISPR/Cas9 generated mutants display identical tracheal phenotypes, however these data are not shown. Tracheal phenotypes, in particular DT phenotypes, of all mutants generated should be shown in supplementary materials.

      As requested by the reviewer, we included the data in the supplement. The pio5M and pio11R alleles showed embryonic lethality and a 100% gas filling defect resembling the pio17C allele. Additionally, we extended the tracheal analysis with the pio5M allele and identified tube size defects, irregular pattern of taenidial folds and apical membrane deformation, altogether resembling the pio17C allele. These new data are shown in the supplement Fig. S1.

      We clarify this in the results section as follows:

      “The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In all other Figures, we show images of the pio17c allele. “

      3.) At stage 16, pio null mutants display DT overelongation phenotypes (Fig. 1). The authors should quantify this phenotype.

      As requested by the reviewer, we quantified the DT overelongation phenotypes for pio5M (Fig. S1). The quantification of pio17C was shown already in Fig. 6B, now Fig 8B.

      4.) The authors analyse Pio distribution under tubular stress, using mega mutants and Chitinase overexpression. Pio localization changes in these genetic backgrounds and this is shown in Figure 2 only in a qualitative manner. The authors should measure Pio localization at the lumen and at the membrane and provide quantitative data.

      As requested by the referee, we measured Pio localization recognized by the anti-Pio antibody at the lumen and at the membrane to provide quantitative data. These are shown in Fig. 2E.

      All images were taken with a Zeiss Airyscan. For statistical analysis we used the the profile tool of the Zeiss ZEN 2.3 black software. This tool allows the measurement and comparison of fluorescence pixel intensities of individual channels. We determined the fluorescent intensities profile across the tube to identify values at apical membrane and tube lumen at minimum 10 different position of DTs (metameres 5 to 6) of two distinct embryos for each genetic background. The maximum values of membranes versus tube lumen were set into ratio and compared between control, mega mutant and Cht2 overexpression. The control embryos showed a ration below 0.4, the Cht2 overexpression a ratio of 1.2 and mega mutants a ratio of about ~0.9. These quantitative data confirm the statement that Pio localization increases at and near the apical cell membrane with respect to the lumen in mega mutants and in Cht2 overexpression embryos.

      5.) Surprisingly and interestingly, wurst;pio transheterozygotes display very strong tracheal defects. The authors say they observe gas filling defects; however it is not clear from figure 2E if this indeed the case. From the panel in the figure, it looks like these embryos suffer from strong tracheal morphogenetic defects. It would be necessary to have a better analysis of these embryos. What is the penetrance of this phenotype. If this is 100% penetrant, one would expect it to be lethal. Therefore, double mutant balanced stocks are not viable? Having analyzed the phenotypes and confirmed which morphogenetic defects the transheterozygote embryos present, how does this genetic interaction fit with the model presented?

      We are thankful to the reviewer for this interesting point of view suggesting that the wurst;pio embryos display tracheal morphogenetic defects. First, our data show that only 11.6% of the wurst;pio transheterozygous embryos completed gas filling and survived until adulthood. In contrast, 88.4% of transheterozygous wurst;pio mutant embryos did not complete gas filling which is now presented in Fig. 3B. The corresponding quantifications is presented in Fig. 3D. Importantly, the 88.4% wurst;pio transheterozygous embryos which show gas filling defects do not hatch as larvae and die.

      As requested, we performed a better morphogenetic analysis, which is presented in Fig. 3C. Analysis of the gas filling defects with light microscopy were repeated with a better objective (Zeiss Apochromat 25x Gly; 0.8 NA). Indeed, this analysis revealed a strongly compromised tube lumen morphology with irregular tube lumen pattern as if tubes twist and bend. This tube lumen deformation was further confirmed with the confocal analysis of chitin staining (cbp). The tube lumen of stage 17 transheterozygous wurst;pio mutant embryos showed irregular lumen pattern with unusual twists and even partially collapsed tubes.

      Furthermore, as asked by the referee, we generated the wurst,pio double mutation. All wurst,pio double mutant embryos lacked gas filling. In a more in-depth analysis of the tube lumen with a high-performance objective we could not identify any normal tube lumen in stage 17 embryos. Instead the double mutant embryos revealed completely collapsed tracheal tubes. This was confirmed by the chitin staining and confocal analysis. All new data are presented in the supplement.

      As shown in our manuscript and in previous publications, neither pio nor wurst mutant embryos affect cell polarity or gross organization of the actin and tubulin cytoskeleton. However, we found that wurst mutant embryos showed irregular apical membrane expansion at tube lumen (Behr et al., 2007; legend Fig. 4), irregular chitin fiber organization and to some extend collapsed tube lumen. In pio mutant embryos we found deformed apical membrane of DTs, irregular pattern of taenidial folds and to some extend collapsed tube lumen. Thus, the apical membrane is their common target of both proteins in late embryonic development, suggesting that pio functions provide stability and wurst functions the internalization of proteins at the apical membrane.

      We discussed it as follows:

      “Nevertheless, Pio and its endocytosis depend on its interaction with the chitin matrix and the Np-mediated cleavage. In stage 16 wurst and mega mutant embryos, we detect Pio antibody staining at the chitin cable, suggesting that Pio is cleaved and released into the dorsal trunk tube lumen. Also, the Cht2 overexpression did not prevent the luminal release of Pio. However, reduced wurst, mega function, and Cht2 overexpression caused an enrichment of punctuate Pio staining at the apical cell membrane and matrix (Figs. 1,2). Although the three proteins are involved in different subcellular requirements, they all contribute to the determination of tube size by affecting either the apical cell membrane or the formation of a well-structured apical extracellular chitin matrix, indicating that changes at the apical cell membrane and matrix in stage 16 embryos affect the Pio pattern at the membrane. It also shows that local Pio linkages at the cell membrane and matrix are still cleaved by the Np function for luminal Pio release, which explains why those mutant embryos do not show pio mutant-like membrane deformations and Np-mutant-like bulges. This is in line with our observations that tracheal Pio overexpression cannot cause tube size defects as the Np function is sufficient to organize local Pio linkages at the membrane and matrix. Therefore, it is unlikely that tracheal tube length defects in wurst and mega mutants as well as in Cht2 misexpression embryos are caused by the apical Pio density enrichment.”

      “Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      6.) mCherry::Pio Dpy::eYFP time lapse analysis and FRAP experiments is very interesting. However, it is not clear to which degree bleaching occurs in the tracheal lumen. The authors claim that recovery is very fast and can be seen from minute 2, however, frame-by-frame analysis of Movie S2 does not show a clear different between luminal Pio from minute 0 to minute 2. Rough comparison with the luminal area surrounding the bleached area, does not show a clear difference in luminal Pio before and after photobleaching. To claim fast recovery of luminal Pio after photobleaching, the authors should quantify luminal Pio, before and after bleaching.

      We agree with the reviewer and deleted “fast”. The Video2 shows intracellular mCherry::Pio recovery within 2min after photobleaching. The Video 2 shows extracellular (luminal) recovery within 6min after photobleaching, when first large mCherry::Pio puncta appear at the apical surface of the bleached area. Nonetheless, mCherry::Pio puncta appear in the lumen indicating recovery, whereas Dpy::eYFP did not.

      We state this in the Results section as follows:

      “In stage 16 embryos mCherry::Pio puncta reappeared in tracheal cells within 2 minutes of bleaching and in the tubular lumen within 6 minutes.”

      In addition, in figure 4D, the normalized mCherry::Pio fluorescence in the graph what does it refer to? Intracellular Pio?

      Figure 4D, now 5D, shows Western Blot signals. We guess that you refer to Fig 4B which is Fig. 5B.

      We are sorry for confusion and named it now Fig. 5B’.

      We stated in the Material section:

      “The bleaching was performed with 405nm full laser power (50mW) at the ROI for 20 seconds. A Z-stack covering the whole depth of the tracheal tubes in the ROI were taken at each imaging step. “Fluorescence intensity in the bleached ROIs was measured after correction for embryonic movements using Fiji.”

      Thus, to clarify this point, we added to the legends:

      “Fluorescence intensities refer to the bleached ROIs as indicated with the frame in corresponding Movie S2 and was measured after correction for embryonic movements.”

      7.) When mCherry::Pio Dpy::eYFP time lapse analysis and FRAP experiments was done in an Np mutant background, the authors describe lack of Pio recovery within the lumen (Movie S3). However, when comparing control and Np mutant background embryos, Pio is not properly released into the lumen of Np mutants (as stated by the authors and seen by comparing movies S1 and S4). Furthermore, on minute 0 of the FRAP experiment in Np embryos, there is no detectable Pio in the DT lumen. Therefore, recovery was not expected in Np mutants and should not be claimed as a conclusion for this experiment.

      We thank the reviewer for careful reading and apologize our wrong description. We changed it accordingly as follows:

      “In contrast to the control, extracellular mCherry::Pio is not released into the tube lumen within 56 min after bleaching in Np mutant embryos (Fig. 6C, Video S3).”

      8.) Brodu et al (Dev Cell 2010) have shown that Pio is important for cytoskeletal modulation during tracheal maturation. Pio is important for non-centrosomal microtubule (MT) arrays anchored at the tracheal cell apical membranes. In addition, MT disruption in tracheal cells leads to lumen formation defects (Brodu et al, Dev Cell 2010). In the absence of Pio, the tracheal cytoskeleton is altered, and this could explain some of the results observed. Ideally, the work should be complemented with a basic cytoskeletal analysis, but if this is not possible, the authors should discuss some of the phenotypes in light of this Pio function.

      Dear reviewer, this is a great idea. Therefore, we analyzed F-actin with Phalloidin and beta tubulin (E7 antibody, DSHB) in the dorsal trunk cells of stage 16 control and pio mutant embryos. However, tracheal cells are tiny and only gross irregularities can be realized. So, confocal Z-stack analysis of the stainings did not show gross differences between control and pio mutant embryos. We observe the expected apical subcortical accumulation for the actin and tubulin cytoskeleton in dorsal trunk cells of pio stage 16 mutant embryos which also has been shown for wt embryos elsewhere. These new data are presented in the supplement Fig. S7.

      Minor comments<br /> The model should not be in supplementary materials and should be moved to the main manuscript.

      We thank the reviewer for this suggestion and moved the model to the main part – now Fig.9. As requested by the reviewer 1, we extended the model, showing the timing events of Pio function.

      Throughout the manuscript embryonic stages are described using different nomenclature (stage X, stX and st X). Either way is correct, but the same nomenclature should be used throughout.

      We apologize for the different nomenclature and use "stage X" in the manuscript and "stX" in the figures for space reasons. Legend 1 clarifies the abbreviation.

      In Fig. S1 B and C the authors should specify which pio allele is being analysed (as in Fig. 7). The same should be done in the text.

      That's a fairly good point. To be clear from the beginning, we now state the following in the first paragraph of the results:

      “The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In the all other Figures, we show phenotypes of the pio17c allele.”

      Line 131, it is not correct to say that WGA visualizes cell membranes. WGA marks/stains cell membranes.

      Thanks for finding this mistake, it’s now corrected.

      Line 165 "leads to excessive tube dilation and length expansion due to strongly reduced luminal chitin" is not correct. Chitin reduction leads to excessive tube dilation but not to length expansion, as reported in the papers cited at the end of the sentence.

      Thanks very much for careful reading, we deleted “and length expansion” from the sentence.

      Line 220-221, what do authors refer to as "stage 16 wt-like control embryos"?

      Thanks for finding these mistakes. We corrected as follows:

      “In stage 16 embryos mCherry::Pio puncta….”

      Line 221, "some minutes" should be replaced by a specific number of minutes. According to Movie S2 reappearance of tracheal cell Pio happens from minute 16.

      We agree with the reviewer to state the time when mCerry::Pio puncta reappear. We observe first large puncta within two minutes after bleaching in tracheal cells at the ROI (Video S2, lower cell row at the movie). We further observe the reappearance of first large puncta at the ROI within 6 minutes in the tracheal tube lumen.

      We corrected it as follows: “In stage 16 embryos mCherry::Pio puncta reappeared in tracheal cells within 2 minutes of bleaching and in the tubular lumen within 6 minutes.”

      Line 291 "time laps" should be lapse.

      Thanks for finding the typo, it is corrected now.

      Line 302, "Pio was not shedded into the lumen but remained at the cell" should be "Pio was not shed into the lumen but remained in the cell".

      Thanks for finding the typo, it is corrected now.

      _Referees cross-commenting

      I agree. Taken together, all the comments will improve the quality of the work and of a future manuscript. Also, everything seems quite doable and will not present any problems._

      Reviewer #2 (Significance):

      _The findings shown in this manuscript shed light on the regulation of tubulogenesis by ZP proteins and how their interaction with the ECM can be regulated by proteolysis. It was known that Pio is involved in tracheal development, is secreted into the lumen, regulating tube elongation (Jaźwińska et al., Nat.Cell Biol., 2003) and anchoring MTs to the apical membrane during tubulogenesis (Brodu et al, Dev. Cell 2010). This work provides additional molecular insights into Pio dynamics and regulation during tube maturation.<br /> This work will be of interest to a broad cell and developmental biology community as they provide a mechanistic advance in ZP proteins involved in morphogenesis. It is of specific interest to the specialized field of tubulogenesis and tracheal morphogenesis.

      Field of expertise:<br /> Drosophila, morphogenesis, tracheal tubulogenesis, cytoskeleton_

      Reviewer #3 (Evidence, reproducibility and clarity):

      _Summary<br /> In this manuscript, Drees and colleagues analysed, during the formation and growth of tubular systems, how cells combine forces at the cell membranes while maintaining tubular network integrity. A fundamental question is to understand how cells manage to integrate the axial forces to stabilise the cell membrane and the apical extracellular matrix (aECM).<br /> To address this question, the authors study the formation of the tracheal system in Drosophila embryos, a well-established and detailed model system to investigate formation of tubular networks. In particular, they focused on the formation of the larger tube of the tracheal network, the dorsal trunk. The formation of this tube depends in part of axial extension along the antero-posterior axis.<br /> They concentrated their work on the function of Piopio (Pio), a Zona-Pellucida (ZP)-domain protein. They showed that Pio together with the protease Notopleural (Np) contribute the sense and support mechanical stresses when tracheal tubes elongate, thus ensuring normal membrane -aECM morphology.

      Major Comments

      In a previous work, Drees et al. (PLOS Genetics 2019), showed the matriptase-prostasin proteolytic cascade (MPPC), is conserved and essential for both Drosophila ECM morphogenesis and physiology.<br /> The functionally conserved components of the MPPC mediate cleavage of zona pellucida-domain (ZP-domain) proteins, which play crucial roles in organizing apical structures of the ECM in both vertebrates and invertebrates. They showed that ZP-proteins are molecular targets of the conserved MPPC and that cleavage within the ZP-domains is a conserved mechanism of ECM development and differentiation.<br /> Here, Drees et al. investigate further how the coupling between membrane and matrix takes place to ensure proper tube growth.<br /> Pio distribution and phenotypes<br /> They first focused on the tracheal phenotypes observed in a pio null mutant context. So far, the only pio mutant characterised was a point mutation in the ZP domain. Using CRISPR/Cas9, they generated new alleles of pio which are lack of function alleles. In the context, Drees and colleagues observed over-elongated dorsal trunk tubes, with bulges appearing at stage 16 between the apical domain of tracheal cells and adjacent extra-luminal matrix.<br /> Additionally, pio mutant embryos showed impaired tube lumen clearance of the some of the aECM components, which prevent gas-filling of the airways.<br /> To detect Pio distribution, the authors used either anti-Pio antibody directed toward a short stretch with the Pio ZP domain or generated a CRISPR/Cas9 piomCherry::pio line.

      _

      1.) The Pio antibody shows a strong luminal staining as already published. But the authors reported an apical membrane signal in tracheal cells. I find this apical membrane signal really difficult to observe in panel Fig. 2B. The overlap between the Pio dots and the apical membrane labelled with Uif showed in Fig 2C can be due to the 3D projection. It is only when endocytosis is unpaired (Suppl Fig. 2), that data are more convincing.

      We thank the reviewer for this important point, we are sorry for the unconvincing presentation and for having the chance to improve it.

      We show the 3D image of Pio puncta as voxels overlapping with Uif at the apical cell membrane. The amount of Pio voxels overlapping with the Uif marked apical cell membrane increased in mega mutant and due to tracheal Cht2 overexpression. This result was indicated by a representative region (frame) and white arrows and is shown now in Fig. 2C.

      We further used orthogonal projections across the tracheal tube of the airyscan Z-stacks. Random usage confirmed that puncta of Pio antibody staining overlap with Uif at the tube lumen. We observed overlap in controls, but increasing overlap in mega mutant and Cht2 overexpressing embryos. This result is shown now in Fig. 2E.

      However, to overcome any misinterpretations of projections, we used single images of the original airyscan Z-stacks for co-localization analysis with the Zeiss ZEN software (black, 2.3, sp1). We used two available and independent standard methods to compare fluorescence pixel intensities of different channels namely the ZEN co-localization and the ZEN profile tool. Both are described in the Materials section.

      a.) With the co-localization tool we compared directly fluorescence pixel intensities of Pio and Uif. Highest overlap of the intensities, shown in the ZEN tool as third quadrant, were set to white for better visualization in the images. These new images are included as Fig. 2D and show recurrent overlap of Pio and Uif antibody stainings (punctuate pattern) along the apical cell membrane at the dorsal trunk of stage 16 control embryos. This overlap pattern increased in mega mutant and Cht2 overexpression embryos.

      b.) A second approach for comparing fluorescence intensities is the ZEN “profile” tool. Drawing a line across the tube allowed us to compare peak fluorescence pixel intensities of the different channels at distinct regions, such as the apical cell membrane and the tube lumen including the cbp marked chitin cable. This tool detected overlap of peak fluorescence intensities of UIF and Pio antibody staining’s, confirming that Pio is located together with UIF at the apical membrane of dorsal trunk tracheal cells. These new intensity profiles and the corresponding images are presented in the supplement as Fig. S4B-D. Quantifications of this method comparing the ration of Pio peak intensities between the apical cell membrane and the tube lumen are presented as Fig. 2F (as requested by Reviewer 2).

      2.) When the author used their CRISPR/Cas9 piomCherry::pio line to characterise Pio distribution (Fig.4), Pio is localised at the apical plasma membrane before stage 16. Only at stage 16, Pio is detected within the lumen. This timing of Pio release in the lumen is critical for the model proposed by Drees at al. This is an important point to assess the difference between the use of the antibody (which mostly label the lumen) while piomCherry::pio line is mostly at the membrane.

      We agree with the reviewer that the Pio antibody shows a different pattern within the tube lumen of earlier stages. The Pio antibody shows intense extracellular staining from early stage 12 onwards, presumably due to its early function at dorsal and ventral branches, as shown by Anna Jazwinska (Jazwinska et al., 2003). The intense luminal Pio antibody staining, predominantly at the chitin cable, persist until its disappearance due to airway protein clearance during stage 17. Unfortunately, this strong luminal Pio staining made it impossible to examine the Pio distribution pattern in more detail during stage 16. Nevertheless, Np overexpression experiments indicate that luminal Pio release occurs specifically in stage 16 embryos (Fig. S13), which was tested with the Pio antibody, see results, second last paragraph:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13).”

      We further agree with the reviewer that mCherry::Pio was used to characterize in vivo Pio distribution within the dorsal trunk cells and tube lumen during stage 16. The Fig. 5A shows apical mCherry::Pio distribution pattern in early and late stage 16 embryos. Importantly, the appearance of luminal mCherry::Pio increased during stage 16 and mainly enriched at late stage 16. See Figure 5A, red arrowheads point to apical Pio and red arrows to luminal Pio staining.

      Furthermore, as discussed above and shown by different ZEN tools, such as co-localization and fluorescence intensity profile tools, Pio antibody stainings revealed a punctuate pattern at the apical cell membrane of dorsal trunk cells in stage 16 embryos, which is reflected also by the appearance of apical mCherry::Pio puncta at the membrane surface. Additionally, we observed mCherry::Pio puncta also within the tube lumen (see the new Figures S4B & S8). Thus, subcellular Pio distribution at the apical cell membrane and lumen were observed for both, Pio antibody staining and mCherry::pio pattern.

      Nonetheless, there is different luminal appearance between the Pio antibody staining and mCherry::Pio. Pio antibody detects a short stretch at the ZP domain and thus detects all possible Pio variants, uncleaved and cleaved. Due to early tracheal Pio function, Pio enriches within the tube lumen in an intense core-like structure, which is recognized by the Pio antibody and is comparable with the Dpy::eYFP pattern. Also mCherry::Pio labels all Pio variants, uncleaved and cleaved. The spatial temporal mCherry::Pio expression pattern (Fig. S5) is comparable with the Pio antibody pattern and the staining at the membrane in stage 16 embryos. However, mCherry::Pio did not enrich in the lumen in a core-like structure, nonetheless, shows overlap with luminal Dpy::eYFP.

      Jaswinska showed that Pio antibody staining is intracellular in the trachea of stage 11 pio2R-16 point mutation embryos (Jaswinska et al., 2003; Fig 2d). To understand more about the specificity of the antibody, we performed stainings in the null mutant embryos. In contrast, to the high number of intracellular Pio puncta in pio2R-16 point mutation embryos, Pio stainings were much more reduced in pio5m and pio17c mutants, but a low number of Pio puncta were still detectable in the embryos (Fig. S1G,H). It is of note that also dpy mutants showed strongly reduced Pio antibody staining (Fig. S10E). Thus, discussing underlying causes of enriched (Pio antibody) versus non-enriched (mCherry::Pio) luminal staining are speculative. However, observations by Jaswinska et al. (2003) and our new observations, investigating the Pio antibody stainings in pio null mutants, dpy mutants, eYFP::Dpy embryos and NP overexpression may hint to the possibility of cross-reactivity of the Pio antibody to other ZP domains which may intensify the appearance of luminal Pio antibody staining in control embryos.

      Anyway, we clarify the difference in luminal Pio pattern in the discussion as follows:

      “Indeed, the anti-Pio antibody, which detects all different Pio variants, showed a punctuate Pio pattern overlapping with the apical cell membrane markers Crb and Uif at the dorsal trunk cells of stage 16 embryos (Fig. 2; Fig. S3,S4). Additionally, Pio antibody also revealed early tracheal expression from embryonic stage 11 onwards, and due to Pio function in narrow dorsal and ventral branches, strong luminal Pio antibody staining is detectable from early stage 14 until stage 17, when airway protein clearance removes luminal contents. In the pio5m and pio17c mutants Pio stainings were strongly reduced although some puncta were still detectable in the trachea (Fig. S1G,H). Similarly, Pio antibody staining is intracellular in the trachea of stage 11 pio2R-16 point mutation embryos (Jaźwińska et al., 2003). Interestingly, also dpy mutants showed strongly reduced and intracellular Pio antibody staining (Fig. S10E).

      We generated mCherry::Pio as a tool for in vivo Pio expression and localization pattern analysis during tube lumen length expansion. The mCherry::Pio resembled the Pio antibody expression pattern from early tracheal development onwards. However, luminal mCherry::Pio enrichment occurs specifically during stage 16, when tubes expand. The stage 16 embryos showed mCherry::Pio puncta accumulating apically in dorsal trunk cells. Moreover, mCherry::Pio puncta partially overlapped with Dpy::YFP and chitin at the taenidial folds, forming at apical cell membranes. Supported by several observations, such as antibody staining, Video monitoring, FRAP experiments, and Western Blot studies (Figs. 4,5), these findings indicate that Pio may play a significant role at the apical cell membrane and matrix in dorsal trunk cells of stage 16 embryos.”

      3.) Another important point is to explain the discrepancy between the pio mutant alleles. The allele containing a point mutation in the ZP domain shows no over-elongated tubes (Dong et al 2014, Jazwinska et al. 2003) while the lack of function alleles does.

      The reviewer is correct that the pio2R-16 mutation shows only a disintegration phenotype whereas our pio null mutations show in addition tube length defects. However, Dong et al. showed significantly increased dorsal trunk length in shrub; pio2R-16 double mutant embryos when compared with shrub mutant embryos (Supplemental Fig. S4A). Also, the shrub;dpyolvR double mutant embryos revealed increased tube length expansion when compared with shrub mutant embryos. Moreover, their quantifications show that the also dpyolvR mutant embryos revealed significantly increased tube expansion when compared with wt. Altogether these previous findings suggests that Pio and Dpy are involved in controlling tube length control during stage 16.

      Furthermore, we generated three independent pio null mutation alleles, which lost all the essential Pio protein domains, and caused all embryonic lethality, gas-filling defects, branch disintegration phenotype and tube length defects (quantifications are shown in Figs. 9 and S1). In addition, pio null mutations prevent Dpy::eYFP secretion. Thus, we are confident that the observed tube length defects as well as the air-filling defects are due to the loss of Pio, and in particular since these defects could be rescued by Pio Expression in the pio null mutation background, as shown in Fig. 3B.

      So, what could make the difference?

      The described pio2R-16 mutation allele contains a X-ray induced single point mutation that led to an amino acid replacement (V159D) in the ZP domain. It is not clear how the amino acid exchange affects the protein and the ZP domain. It may hamper pio function and maybe this amino acid replacement is problematic for the early tracheal function but not during stage 16. As stated by Jazwinska et al. 2003 (Fig. 2 legend), Pio antibody staining is intracellular in the mutants and extracellular in the trachea of wt at stage 13.

      They further speculate that the mutant Pio protein may retain in the secretory pathway, but this is not confirmed with co-markers. As luminal Pio function is required to provide a barrier for autocellular AJ formation, this fails in pio2R-16 mutation. In contrast, it is still possible that Pio interacts and supports Dpy secretion in pio2R-16 mutation and additionally it is thinkable that intracellular Pio may reach to some extend the apical cell membrane in pio2R-16 mutation stage 16 and thus can support tube size control. But these assumptions are speculations.

      Nevertheless, to clarify this point we explain the discrepancy between the pio2R-16 mutation and pio null mutations alleles as follows:

      “Using CRISPR/Cas9, we generated three pio lack of function alleles (Fig. S1A), all exhibiting embryonic lethality and identical tracheal mutant phenotypes. The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In all other Figures, we show images of the pio17c allele. The pio17c and pio5m null mutant embryos revealed the dorsal and ventral branch disintegration phenotype known from a previously described pio2R-16 mutation allele which contains a X-ray induced single point mutation that led to an amino acid replacement (V159D) in the ZP domain (Jaźwińska et al., 2003). Additionally, the late stage 16 pio17c and pio5m null mutant embryos showed over-elongated tracheal dorsal trunk tubes (see below).”

      4.) A minor point, the author should provide hypothesis to explain why only the clearance of CBP, Obstructor-A and Knickkopf are affected in a pio mutant background and not Serpentine and Vermiform.

      We thank the reviewer for careful reading and the comment on this point. We would be happy to see such a scenario which could give us a hind of Pio interaction partners at the chitinous matrix. However, we stated that luminal material, such as Obst-A and Knk are removed from the lumen (see Fig. S5A). We further describe that in pio mutant embryos, luminal Serp and Verm staining appeared reduced but showed wt-like distribution (see Fig. S6) in stage 16 embryos. We do not show Serp and Verm in stage 17 embryos, but they are removed from the tube lumen (not shown). These data are received from immune-staining’s and confocal analysis.

      Nevertheless, we also state that pio mutant embryos revealed lumen clearance defects in TEM analysis, of undefined material in the tube lumen (see Fig. 1D and Fig. S2B).

      To clarify this point we state in the results as follows:

      “Fourth, ultrastructure TEM images revealed aECM remnants in the airway lumen of pio mutant stage 17 embryos, while control embryos cleared their airways (Fig. S2B). Consistently, the in vivo analysis of airways in stage 17 pio mutant embryos revealed lack of tracheal air-filling (Fig 3B). The pan-tracheal expression of Pio in pio mutant embryos rescued the lack of gas filling (Fig 3B). Thus, TEM images suggest that pio mutant embryos showed impaired tube lumen clearance of aECM, which prevented subsequent airway gas-filling. “

      And

      “Also, the pio mutant embryos showed tracheal lumen clearance defects of chitin fibers in ultrastructure (TEM) analysis (Figs. 1D, S2B). In contrast, confocal analysis revealed that well-known chitin matrix proteins, such as Obstructor-A (Obst-A) and Knickkopf (Knk), are removed from the lumen of pio mutants (Fig. S5A). These results suggest that the Pio function did not affect airway clearance of Obst-A and Knk and therefore did not play a central role in airway clearance like Wurst. Nevertheless, airway clearance defects observed in TEM images in pio null mutant embryos and, in addition, defective tube lumen morphology in wurst;pio transheterozygous mutant embryos explain the occurrence of airway gas filling defects.”

      5.) Pio and Dumpy. Dumpy (Dpy) is another ZP domain protein secreted by the tracheal cells and detected in the lumen. To follow Dpy distribution, Drees and colleagues used a Dpy::eYFP protein trap line, the same used in Dong et al. However, in this latter paper, Dong et al. stated, using a Crb staining, that Dpy is not at the apical cell surface but only in the lumen. However, Drees and colleagues reported (line 227 and Fig. 4C) that Dpy appears both at the apical cell surface and in the lumen of the tracheal system. But they did not show a co-localisation with an apical marker. Furthermore, in their previous work, (Drees et al. 2019) they called the apical staining a "peripheral shell" layer. In addition, in S2R+ cell culture, it is only when Pio and Dpy co-express that Dpy is detected at the cell membrane. The in vivo localisation of Dpy is an important point that needs to be clarified as it is of importance for the final model proposed Supp Fig. 9.<br /> Drees at al. also performed FRAP experiments on Dpy::eYFP protein trap embryos. As excepted as already shown by Dong et al.

      The referee is correct, we state “In stage 16 embryos Dpy::eYFP (Lye et al., 2014) appears at the tracheal apical cell surface and predominantly within the lumen (Fig. 4C).” The corresponding Fig. 4C reveals Dumpy::eYFP staining overlapping with chitin at two subcellular regions: Dpy is enriched as a core-like structure within the lumen overlapping with the chitin cable of the control embryos. Additionally, Dpy::eYFP overlaps with the chitin part that might be part of the apical cell surface. But this observation is hard to see in images in Fig. 4C and we apologize it. We therefore repeated the Dpy::eYFP localization analysis and analyzed in more detail with the ZEN profile tools, which shows peak fluorescence pixel intensities of different channels and provides the possibility to prove, if they overlap in XY axis.

      We asked first, if cbp (chitin) appears at the apical surface of dorsal trunk cells, when Pio becomes cleaved and released. In mid stage 16 embryos cbp staining appeared in the luminal chitin cable and additionally in a distinctive pattern, which fits to the pattern of taenidial folds that start to form. We therefore used the apical cell membrane marker Crumbs to co-stain cbp. Airycsan microscopy fluorescence intensity profile analysis and corresponding close ups images confirmed the overlap of Crb and cbp stainings at this distinctive pattern indicating this shows the chitin matrix at the apical cell surface (Fig. S8A). But there was no overlap of cbp and Crb at the chitin cable structure. Thus, knowing the localization of the apical cell surface chitin matrix, we performed co-stainings of cbp with mCherry::Pio (RFP antibody). This revealed, as expected, overlap of cbp and RFP antibody staining at the apical cell surface chitin matrix (distinct pattern) and with the luminal chitin-cable (Fig. S8B,C). Finally we repeated the stainings and analysis with cbp, mCherry::Pio (RFP antibody) and Dpy::eYFP (GFP antibody). First, these results revealed overlap of Dpy::eYFP and cbp at the apical cell surface and in the tube lumen (Fig. S8D) and second, overlap of punctuate staining of Dpy::eYFP, cbp and mCherry::Pio at the apical cell surface chitin matrix and also at the luminal chitin cable (Fig. S8E).

      Very obvious from images and Z-projection in Fig. 4C is the lack of extracellular Dpy::eYFP staining in pio mutant embryos. Dpy::eYFP enriched intracellularly, and thus, the pio mutant caused Dpy::eYFP mis-expression fits well to our results from S2R+ cell culture. As the reviewer notes, it is only when Pio and Dpy co-express that Dpy is detected at the cell membrane.

      Altogether, Fig. 4C, cell culture experiments and our new stainings support our model, that Pio and Dumpy interact and are co-secreted at the apical cell membrane/surface, where Np mediates Pio cleavage. As requested by reviewer 2, we moved the model to Fig. 9. As requested by reviewer 1, we extended the model for timing events.

      A minor point, the Dpy::eYFP protein trap line used in this study is not listed in the Materials and Methods section of the supplementary data.

      Thanks, we included it into the List of sources (Supplement). This YFP-trap line (called CPTI lines) was published by Claire M. Lye et al., Development, 141, 2014. We cite it in our manuscript.

      6.) The serine protease NP and Pio release. Drees and colleagues have pervious shown, preforming in vitro studies, that protease Notopleural (Np) cleaves the Pio ZP domain (Drees at al. 2019). Here the authors went a step further in demonstrating that it is also true in vivo at stage 17. In addition, they showed that, in Np mutant embryos, mCherry::Pio is mostly detected within tracheal cells and the luminal staining is strongly reduced. In this mutant context, the authors conducted FRAP experiment on the mCherry::Pio signal even very weak in the lumen. They showed hardly no recovery after photobleaching.<br /> In Drosophila S2 cells, Drees and colleagues showed that co-expression of the catalytically inactive NpS990A with mCherry::Pio in showed as a prominent signal the 90kDa mCherry::Pio variant in the cell lysate (Fig. 5B), and live imaging revealed mCherry::Pio localisation at the cell surface (Fig. S6B). However, in this inactive form context, a strong signal is also detected at 60kDA corresponding to a cleaved form of the Pio ZP domain (Fig. 5B), and Pio localisation at the cell surface appears weaker than in controls. They authors did not consider that another protease could be at play.<br /> On the other hand, in their previous work, Drees et al. identified a mutant form of Pio (PioR196A) which is resistant to NP cleavage in vitro. It will be a step forward to establish by CRISPR/cas9, as the authors seems to be successful with this technique, a mutant line carrying this point mutation. It will be important to determine whether the observed phenotype resembles that of a mutant Np phenotype.<br /> In their previous work (PLOS Genetics 2019), in Np mutant embryos, Drees et al. did not report "budge-like" deformations from stage 16 onwards leading to the detachment of the tracheal cell from their adjacent aECM. Either the alleles or the allelic combination is different between the two studies which could explain this difference, or it is a new phenotype that has not been previously described. In the latter case, it becomes important to quantify the proportion of segments showing these bubbles. Is this a rare phenotype to observe?

      We thank the reviewer for the very interesting comments and the careful reading of our manuscripts and the very useful suggestions. We agree, the we cannot exclude the possibility that another protease is involved in the cleavage of Pio. Therefore, we included this important point in the discussion section as follows:

      “Unknown proteases may likely be involved in Pio processing since cleaved mCherry::Pio is also detectable in inactive NpS990A cells.”

      We think the generation of the pioR196A mutant to address Pio localization and tracheal phenotypes is a great idea, which we would like to address in future experiments. Unfortunately, the production of this fly line with such a specific point mutation at this position will take several months, not included the subsequent evaluation and phenotypic analysis of this fly line and mutants. Therefore, we apologize that we cannot pursue this question experimentally. Nevertheless, mentioning the possibility and the requirement of such an experiment is important and we discuss it as follows:

      “Previously we identified a mutation at the Pio ZP domain (R196A) resistant to NP cleavage in cell culture experiments (Drees et al., 2019). Establishing a corresponding mutant fly line would be essential in determining whether the observed phenotype resembles the phenotype of the Np mutant embryos.”

      However, knowing that we are not able to provide a new mutant fly line to evaluate the formation of the dorsal tube when an NP non-cleavable form of Pio is expressed, we sought to use an alternative approach by overexpressing Np in the trachea with btl-Gal4. This shows a clear pairing of Np overexpression and Pio release specifically at stage 16 dorsal trunk and associated tube overexpansion.

      Finally, the reviewer is correct, we did not mention the appearance of bulges in Np mutant tracheal dorsal trunk cells in our previous publication. We used that same Np alleles in 2019 and a closer look at the publication of 2019 likewise shows the appearance of bulges in Np mutant embryos, e.g. Fig. 1B (red-dextran, left part of the tracheal lumen shows bulges) and even the Dpy::YFP matrix tear off at the site of bulges (Fig. 4F’’, above the arrowhead). But we did not know at the time the link with Pio and Dumpy

      However, we agree, it is important to know more about the appearance of the phenotype by means of quantifications. The quantifications of bulges per dorsal trunk (n=16) is shown in Fig. 7B.

      7.) Minor point: I don't understand what the authors are trying to show in supplementary Figure 8. Tracheal cells detach and are found in the lumen?

      We are sorry for the unclear description in the legend. We corrected it as follows in the legend of Fig. S12:

      “This indicates disintegration of apical cell membrane at bulges and subsequent leaking of cellular content into the lumen.”

      8.) Np function conserved matriptase.<br /> In this work, Drees and colleagues showed that Np controls in vivo the cleavage of the Pio ZP domain.<br /> Dumpy and Piopio are not conserved in vertebrates but they both contain a ZP domain which is conserved. The authors tested if other ZP proteins can be cleaved by Np or the human homolog Matriptase. The authors tested in cell culture the ability of the type III Transforming growth factor-β receptor which contains a ZP domain to be cleaved either by Np or Matriptase.<br /> This could be a general mechanism that needs to be extended to other ZP domain proteins and that could be at play to structure the matrix and give it its physical properties.<br /> However, as it is all speculative, I find the discussion section related to these data, for too long and that does not help to understand better the work done in the formation of the tracheal tubes of the drosophila embryo.

      We show that Np mediates cleavage of the Pio ZP domain in vitro and in vivo in Drosophila embryos. We further showed that also the human matriptase was able to cleave the Pio ZP domain. To understand if this is a more general mechanism, we extended our studies with the human TβIII and its ZP domain. These data show that both Drosophila and human matriptases are able to cleave ZP domains of different proteins from different species. These data suggest that Matriptase-mediated ZP domain cleavage is not a Drosophila specific mechanism. We cannot follow the argumentation of the referee to state it all speculative. Nevertheless, we agree that it will need follow up studies to show that the mechanism is more general than two different species and ZP domain proteins. Anyway, as requested by the referee, we deleted the following sentences of the paragraph, since they are speculative in the context of our manuscript and do not directly describe a potential matriptase and ZP domain function:

      “Matriptase degrades receptors and ECM in pulmonary fibrinogenesis in squamous cell carcinoma (Bardou et al., 2016; Martin and List, 2019). TβRIII is a membrane-bound proteoglycan that generates a soluble form upon shedding (López-Casillas et al., 1991), a potent neutralizing agent of TGF-β. Expression of the soluble TβRIII inhibits tumor growth due to the inhibition of angiogenesis (Bandyopadhyay et al., 2002). Idiopathic pulmonary fibrosis (IPF) is associated with a progressive loss of lung function due to fibroblast accumulation and relentless ECM deposition (King et al., 2011; Loomis-King et al., 2013). “

      However, the comparisons of the tubular organ and the phenotypic expressions of the bulging membrane and the aortic aneurysm appear to us as an important element of the article. In both cases, cell membrane loses its integrity and can break in tubular networks. Thus, with our findings on the modification of extracellular ZP proteins, we offer a potential new molecular approach even for clinical investigation.

      9.) Minor points: Pio and cytoskeleton organisation.<br /> Line 78-79, the authors wrongly quoted a work from Brodu et al (2010). Pio does not anchor the microtubule severing enzyme Spastin. Instead, Spastin releases the microtubule-organising centre from its centrosomal location, then Pio contributes to its apical membrane anchoring. It can therefore be assumed that the organisation of the microtubule network is affected in a pio null mutant. In addition, ZP proteins have been shown to link the aECM to the actin cytoskeleton. Therefore, it would be interesting to look at the organisation of the actin and microtubule cytoskeletons in a pio mutant context in which enlarged apical cell surface area are observed.

      We are very thankful for finding this mistake in the introduction. We corrected it as follows:

      “Further, Pio is involved in relocating microtubule organizing center components γ-TuRC (γ-tubulin and Grips; gamma-tubulin ring proteins). This requires Spastin-mediated release from the centrosome and Pio-mediated γ-TuRC anchoring in the apical membrane.”

      Studying cytoskeleton in pio mutant embryos is a helpful idea. Therefore, we analyzed F-actin with Phalloidin and beta tubulin (E7 antibody, DSHB) in the dorsal trunk cells of stage 16 control and pio mutant embryos. However, tracheal cells are tiny and only gross changes can be realized. The confocal Z-stack analysis of the stainings did not show gross differences between control and pio mutant embryos. We observe the expected apical subcortical accumulation for the actin and tubulin cytoskeleton in dorsal trunk cells of pio stage 16 mutant embryos which also has been shown for wt embryos elsewhere. These new data are presented in the supplement Fig. S7.

      _Referees cross-commenting

      I have just read the comments of the other two reviewers, who like me are specialists in the formation of the tracheal system in the drosophila embryo.<br /> I find the comments very fair and balanced. They are in the same spirit as my comments and are very complementary. I hope that all our comments will be constructive for the authors and will improve the quality of their work._

      Reviewer #3 (Significance):

      _Overall, the methodology is sound, the quality of the data is good and the paper is very well written. Authors combine in vivo, in vitro studies as well a cell culture approach. Using CRISPR/Cas9, they generated a large number of new tools allowing in vivo studies.<br /> Drees and colleagues generated new alleles of pio which are lack of function alleles. They described a new phenotype for pio mutant embryos, namely over-elongated tubes. But they authors do not comment on why these new alleles reveal a new phenotype. Furthermore, using their piomCherry::pio line, the authors state that Pio is localised to the plasma membrane. This location is very difficult to assess. Both new results require clarification.<br /> The authors had already demonstrated that Np cleaves the ZP domain of Pio in vitro. Here they demonstrate this in vivo. It appears important to evaluate the formation of the dorsal tube when an NP non-cleavable form of Pio is expressed.<br /> Finally, the model proposing a coupling between the extracellular matrix and the membrane of tracheal cells is very interesting. The demonstration that cleavage of Pio by Np could participate in this coupling is very interesting for those interested in the integration of mechanical stress and cellular deformation. However, such a model has already been discussed in Dong et al (2014). In this article, Dong et al. proposed that a "coupling of the apical membrane and Dpy matrix core is essential for tube length regulation".

      The audience for this article should be specialised and oriented towards basic research. It may be of interest to people working on tubular systems or working on ZP proteins.

      My field of expertise is cell biology and developmental biology in drosophila and formation of tubular networks._

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their constructive criticism that helped us to improve the paper. We modified Fig.6I and Fig.7, replaced Fig.8, and added supplementary Figs. 3-5 and supplementary Tables S1-2. The manuscript was extensively re-written. A new paragraph was added in the Discussion section where relative adhesiveness was related to absolute adhesion strength and the cadherin knockdown result to earlier findings.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary: This work examines the relationship between cell-cell contacts and pericellular matrix in Xenopus chordamesoderm, which is a tissue actively involved in convergent extension during gastrulation. By lanthanum staining of pericellular materials, the authors found that different types of pericellular matrix are present in cell-cell contacts in the chordamesoderm, which may mediate cell-cell adhesion. Knockdown of C-cadherin, Syndecan-4, fibronectin, and hyaluronic acid leads to the reduced abundance of cell contacts and cell packing density, but this does not seem to affect convergent extension. Based on these observations, the authors propose a model in which cell-cell contacts involve the interdigitation of distinct pericellular matrix units.<br /> Major points:

      1. Knockdown of adhesion molecules separates cells and leads to wide contacts with large interstitial spaces. Data in figure 1 show loosely packed morphant chordamesoderm cells. Intuitively, these should reduce cell-cell adhesion. However, a main conclusion from this manuscript is that reduced abundance of narrower contacts does not decrease adhesiveness. Although depletion of adhesion molecules modifies but not abolishes a contact, non-attached free surfaces increase significantly in morphant cells. It is therefore not easy to understand that how reduced cell contacts have no effect on cell adhesion.

      We added a section to the Discussion to address this issue (p.11ff). We show in the Results section (modified Fig.7) that relative adhesiveness is indeed significantly reduced in the morphants (Syn-4 always being the exception) when compared in the contact width range of normal chordamesoderm. However, contact width is strongly increased in the morphants, and adhesiveness increases linearly with width. We argue that these effects compensate for the initial lowering of adhesiveness. In other words, adhesive contacts become shorter (more gap surface) but wider (see Fig.6I), and become the more adhesive the wider they become. As in the original version of this paper, we then propose a model that explains the empirically observed increase of adhesiveness with width. How the abundance of cell-cell contact is reduced is less clear yet. Pericellular matrix deployment and structure is strongly affected by adhesion factor knockdown, and contact types are altered. Some contact types seem to widen but remain adhesive, others become non-adhesive, and still others may disappear without being replaced (see last paragraph of Discussion). To add detail to these notions and clarify this important issue to satisfaction will require future research.

      Importantly, the adhesiveness was not experimentally tested.

      Due to external circumstances, we were unable to perform additional experiments. However, we used our previously published quantitative data on adhesion in gastrula tissues including the chordamesoderm to interpret our present results for normal and C-cad-depleted chordamesoderm, and to relate relative adhesiveness to absolute adhesion strength, in a new section of the Discussion (p.11ff).

      1. It is surprising that reduced cell contacts, at least narrower cell contacts, do not affect convergent extension. Does this mean that active cell behavior changes in the chordamesoderm, which are required for convergent extension, are independent of cell contact types?

      We actually claimed that all treatments inhibited convergent extension, except for Syn-4 (Barua et al. 2021, and this manuscript, p.3, Fig.1B,C). Syn-4 knockdown had a dramatic effect on cell contacts, cell density and cell shape but none on convergent extension, at least up to the middle gastrula stage. This is surprising and does not fit easily to current views of cell intercalation during convergent extension, but analysing the underlying cell behaviors is beyond the scope of this article.

      1. Although the formation and localization of pericellular materials are differentially affected after knockdown of adhesion molecules, there is no clear evidence showing that different types of pericellular matrix mediate cell-cell adhesion in the chordamesoderm. It is possible that the disrupted distribution of pericellular materials in morphants only represents a secondary consequence of changed cell contacts. This may be supported by the fact that knockdown of adhesion molecules reduces narrow contacts and increases LSM-free gaps.
      2. The relationship between contact width spectra and LSM is also very elusive. Again, changes in contact width or abundance and distribution of LSM may be indirectly caused by loss of adhesion molecules. Therefore, although knockdown of adhesion molecules leads to changes of LSM localization, it cannot be concluded that cell-cell contacts in chordamesoderm are mediated different types of pericellular matrix.

      We find it difficult to interpret for example Fig.5A-F other than assuming an adhesive role for the pericellular matrix, in this case LSM, in normal and morphant tissue. What else would here hold two cells between two gaps together? The contacts are often much too wide for cadherin-cadherin binding. We indeed believe that changes in contact width or abundance are caused by the loss of adhesion molecules, directly or indirectly. Our LSM images show that remarkably, modified contacts (e.g. Fig.3D,F; Fig.5B,C) are still able to keep cells together over some distance, between interstitial gaps, and our quantitative data indicate similarly that e.g. contact widening is consistent with continued adhesion. However, some of the contacts may become non-adhesive, or be lost without being replaced, increasing non-adhesive gap surface. This is discussed now on p.11, middle paragraph.

      1. In contrast to the present observations, works by others using the same morpholinos have shown that Cadherin-dependent cell adhesion, fibronectin-rich extracellular matrix, and Syndecan-4-regulated non-canonical Wnt signaling are required for convergent extension. These discrepancies need to be appropriately addressed.

      As mentioned above, we found that all treatments affected convergent extension, as expected from the work of others and our own, except for Syn-4 depletion. We noticed that in the paper by Munoz et al. on Syn-4 overexpression and knockdown, only late gastrula/early neurula stages were evaluated. Syn-4 knockdown produced moderately strong axis defects, perhaps in part related to impaired neural plate closure. Unfortunately, we did not follow our morphants to these later stages to see whether defects developed then. But our main interest here is cell-cell contacts.

      1. If LSM and LSM-free contacts are similarly adhesive, what will be role of LSM in cell adhesion and how cell adhesion is established in these LSM-free contacts?

      We discuss now more explicitly the notion that gastrula non-epithelial cell adhesion is mediated by a mosaic of pericellular matrix patches of different composition, some containing LSM in different configurations, others not, but each similarly adhesive.

      Minor points:<br /> 1. It may be helpful to clearly define the pericellular matrix in this particular context and its relationship with LSM. It is also necessary to clarify whether the adhesion molecules examined in this work are considered as components of the pericellular matrix.

      We explain the use of these terms at the end of the first paragraph of the Introduction. The most general term is pericellular matrix; part of it is La3+ labeled – LSM; and some of the LSM can be compared to structures which in other systems are termed glycocalyx. We consider the adhesion molecules examined to be part of the pericellular matrix but are aware of other putative functions, like in cell signaling, which may indirectly affect contacts and thus contribute nevertheless to the phenomena studied here.

      1. In figure 1B, it appears that the Cadherin morphant has defects in chordamesoderm elongation and archenteron formation, suggesting impaired convergent extension.

      We find, in agreement with the work of others, that C-cad knockdown impairs convergent extension, and mention this when we describe Fig.1B.

      1. In figure 1C, the Syndecan-4 morphant gastrula clearly shows enhanced anteroposterior elongation of chordamesoderm and archenteron in comparison with the wild-type embryo. This seems to suggest that loss of Syndecan-4 promotes the movements of convergent extension. However, previous studies indicate that both gain and loss of Syndecan-4 impairs convergent extension.

      As mentioned above, late gastrula/early neurula stages were evaluated in the Munoz et al. paper, mid-gastrula stages in our work. One possible explanation would be that mild axis defects develop later, partly in connection with neural tube elongation and closure.

      1. Ideally, in knockdown experiments, control embryos should be injected with corresponding mismatch morpholinos.

      We explain in the Methods section that we only used morpholinos that were extensively characterized in previous publications.

      1. In figure 1E, it is unclear what type of cell contacts the light green arrowheads indicate.

      This is explained now in the figure legend.

      1. Figure 1 legend, "(wt) is from Barua et al. 2021". I am not sure it is appropriate to use previously published data.

      The present data were derived by further evaluations of the same samples and TEM sections as used in Barua et al. 2021. We show the previously published data (acknowledged in the legends) here for easy comparison (instead of citing the previous paper).

      1. There is no light blue arrowhead in figure 2, and in figure 3B and 3I, it seems that the same colored arrows are used to indicate different structures.

      This has been corrected.

      1. Triple-layered contacts are not clearly defined.

      We define this term now repeatedly, as consisting of two LSM layers enclosing a non-labeled layer between them.

      1. Page 2, "based on driven by" should be either "based on" or "driven by".

      Has been corrected.

      1. Page 8, "selectin" should be "selecting".

      Has been corrected.

      Reviewer #1 (Significance):

      Strengths:<br /> Demonstrated the effects of several adhesion molecules on the formation of cell contacts and pericellular matrix in Xenopus chordamesoderm.<br /> Limitations:<br /> The significance of chordamesoderm cell contact changes in convergent extension or gastrulation is not clear;

      Effects on gastrulation of PCM or membrane adhesion molecule depletion have very often been described as mediated by effects on cell signaling. Without excluding such possibilities, we liked to redirect attention here to other putative mechanisms by describing basic effects of treatments on cell-cell contacts including PCM deployment and structure. Future work must relate the specific, often dramatic, contact changes upon depletion of a specific factor to cell behavior during convergent extension and other tissue movements.

      there is no direct evidence showing the functional link between pericellular matrix, cell contacts and cell adhesion;

      Please see our response to main points 3 and 4 above.

      the absence of effects on convergent extension after depletion of several adhesion molecules is not fully consistent with previous reports.

      Please see our response to main points 2 and 5 and minor point 3 above.

      Advance: This work likely provides some fundamental and methodological advances for studying cell-cell adhesion. It shows promise for elucidating mechanisms underlying the regulation of cell contact changes in tissues involved in morphogenetic movements.<br /> Audience:<br /> This work likely interests readership studying embryonic cell adhesion in the field of developmental biology and cell biology. It may be also potentially interesting for people working on glycocalyx pericellular matrix in adult tissues.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary: During gastrulation, cells within vertebrate embryos require the ability to both adhere to one another and rearrange with their neighbors to shape the emerging body plan. These authors posit that such flexible adhesive contacts are mediated in part by the pericellular matrix (PCM), including multiple types of glycocalyces containing molecules such as fibronectin, hyaluronic acid, and syndecans, which they previously characterized in multiple embryonic tissues (Barua et al, PNAS, 2021). Here, in a follow-up to their 2021 study, the authors use electron microscopy to characterize the pericellular matrix within the chordamesoderm of Xenopus gastrulae. They identify several types of adhesive contacts within the chordamesoderm and assess how they are altered in the absence of key PCM molecules via morpholino knock-down. They conclude that syndecan-4 and hyaluronic acid comprise and promote assembly of PCM plaques whereas fibronectin and C-cadherin anchor them to cell surfaces. Cell packing density is decreased upon loss of all 4 of these molecules, which the authors attribute to a decrease in the number of cell contacts without affecting the strength of the remaining contacts. They further conclude that adhesiveness increases linearly with contact width, and that this relationship is unaffected by loss of any aforementioned adhesive/ PCM molecules.

      Major comments:<br /> Many conclusions in this manuscript are based on measurements of cell contact angles, which indicate the reduction of tension at cell contacts vs. free cell surfaces and thus relative adhesive strength. While this lab previously applied the same approach to live tissues (David et al, 2014), it is not clear to what extent such measurements accurately reflect adhesive strength in fixed tissues and/or electron micrographs. Especially given the issue of random sectioning planes, which cause distortion of contact angles. Although a correction was applied, the authors note this is not theoretically derived because the heterogeneity of gap sizes made such calculations too difficult. Indeed, it appears that the large gaps between cells within morphant embryos affect contact angle measurements, but if this is corrected for in any way, it is not mentioned.

      Geometrically determined contact angle distortion should affect angle or relative adhesiveness distributions in all conditions or treatments similarly and thus should not or only little affect comparisons of distribution peaks, averages, etc. Beyond this effect of random sectioning planes, we don’t see how large contact width should by itself affect measurements of angles.

      Because this is the sole measure of cell adhesion provided in the study, this reviewer is not convinced of the conclusion that loss of PCM components does not affect adhesive strength.

      In response to this criticism, we re-evaluated our adhesiveness-width data (Fig.7A-E). We noticed that there is indeed a reduction of relative adhesiveness when morphants are compared to normal chordamesoderm within the width range of the latter. But the addition of increased widths in the morphants and the linear increase of adhesiveness with width compensated or overcompensated the initial reduction of adhesiveness.

      Could such measurements not be made from live cells/tissues after manipulating PCM components, as the lab has done previously? Because the lab already has the necessary reagents and expertise for such experiments, the time and resources needed for such measurements shouldn't be prohibitive.

      Due to circumstances, we were unable to perform additional experiments. However, we used our previously published quantitative data on adhesion in gastrula tissues including the chordamesoderm to analyze our present results for normal and C-cad-depleted chordamesoderm, and to relate relative adhesiveness to absolute adhesion strength, in a section added to the Discussion (p.11ff).

      • As mentioned above, these authors previously measured adhesive strength in live Xenopus cells and tissues (David et al, 2014). In that study, they found that C-cadherin MO reduced relative adhesiveness whereas the current study found that relative adhesiveness actually increases in this condition. What explains this discrepancy?

      We explain now in the new Discussion section (p.11ff) and with the help of supplementary Figure S5 how adhesion strength and relative adhesiveness are related overall (tissue surface vs. cell contacts) and at gaps within a tissue (gap free cell surface vs. cell contacts). In the previous study (David et al, 2014), we discussed relative adhesiveness in relation to overall adhesion strength, and both are decreased upon C-cad knockdown. Here we examined these parameters at interstitial gaps, where we find a small increase of relative adhesiveness, due to overcompensation caused by a strong increase of adhesiveness with contact width. Using our David et al, 2014 data we quantitated the effects. We previously found a similar increase of relative adhesiveness at gaps in C-cad morphant ectoderm (Barua et al. 2017) which we could not explain at the time, but explain now by analogy to our chordamesoderm results.

      • No control morpholinos are used, and for the morpholinos that are used, the doses are very large. An equally high dose of control MO should be used to ensure that all observed phenotypes are specific.

      We detail in the Methods section that we used here and in previous publications only previously characterized morpholinos.

      • It appears that all the images analyzed were collected in the sagittal plane, and the analyses don't seem to consider the intrinsic polarity of the chordamesoderm. For example: cells in different positions within the tissue (basal vs. apical), or that WT chordamesoderm cells are mediolaterally polarized and actively intercalating whereas disruption of PCM components like fibronectin disrupts cell intercalation and randomizes cell polarity. It is possible that 1) cell-matrix (in basal cells) and 2) cell-cell (during intercalation) interactions may affect the measurements made in this study. In other words, that cell contacts could differ by position within the embryo and intercalation/polarity status... have such effects been accounted for in the current analysis?

      Here we only analyzed cell contacts deep in the chordamesoderm. Basal contacts were examined to some extent in Barua and Winklbauer, 2022, apical contacts not yet. Our present analysis is based on sagittal sections. The cells in the chordamesoderm are elongated and aligned mediolaterally but not in register, i.e. they are randomly wedged between each other. Thus, all mediolateral positions in cells should be present in our samples. Nevertheless, trends in the occurrence of contacts related to medial-to-lateral positions on cells (e.g. recognizable in spindle-shaped cells as wide vs narrow cell cross-sections) may have escaped our attention, and in particular, the protrusion-bearing medial and lateral ends of cells may develop special contacts. However, our goal in this study was to analyse basic properties of cell-cell contacts in this tissue, as a foundation for further detailed studies.

      • In this study, the authors state that chordamesoderm movements are preserved in syndecan-4 morphants, and in their 2021 article (Barua et al) they state that convergent extension movements are accelerated. But another study describing this MO found that it causes severe convergent extension defects (Munoz et al, NCB, 2006). What explains this discrepancy?

      In their knockdown experiments, Munoz et al. find relatively mild axis defects in late gastrula/early neurula stage embryos while we studied the mid-gastrula. Perhaps defects develop during later stages in Syn-4 morphant embryos.

      Also, the syn-4 morphant showed in Fig. 1 appears more developmentally advanced than the other embryo... if the embryos are not stage matched it could affect the measurements and conclusions drawn from them.

      Stage matching was not possible since C-cad and FN morphants did not involute or engage in convergent extension (i.e. were arrested at the initial gastrula stage), Syn-4 morphants appeared to gastrulate faster than normally. Therefore, embryos were strictly time matched. A limitation remains, that the time course of cell contact development over gastrulation was considered low priority in this initial study and was thus not determined.

      • In figure 7, the authors plot relative adhesion (measured from contact angles) vs. contact width, then fit regression lines to the lower boundaries of these scatter plots. It is not clear why this analysis is focused only on the lower boundaries rather than considering the full spread of the data. Particularly for syn-4 morphants, whose values do not appear to be concentrated along the lower boundary. This analysis is further confused by the introduction of alpha*, which represents relative adhesiveness relative to the regression.

      The lower boundary line is most convenient to extract (Fig.7A’-E’). But we agree that the “interior” of the scatter plot distribution should also be analyzed. Using average adhesiveness gives rise to artifacts since the density of data points decreases strongly with contact width but also with distance from the lower boundary, leading to the preferential disappearance of large adhesiveness values for higher widths. Instead, we constructed a line tracing the highest density in the scatter plot near the lower boundary (Fig.7B’’-E’’), by determining the positions of adhesiveness distribution peaks in consecutive width brackets (new Fig.8, Fig.S3). We abstained from introducing alpha*.

      • Based on these regression lines alone, the authors conclude that all 4 conditions are similar enough to pool the data for further analysis. If these contacts have different properties, which the data in Figures 1-6 suggest they do, it seems inappropriate to pool them together.

      We no longer pooled the data, except in supplementary Fig.S4 where we consider angle distortion. Instead, we show in Fig.8 relative-adhesiveness frequency distributions for different treatments and width brackets. This emphasizes differences between the different adhesion factor depletions and shows that adhesiveness is not simply normal or log-normal distributed, in agreement with different contact types contributing differently though similarly to overall adhesion. It also allows to follow main peaks as they shift position with width, roughly in proportion to the lower surface boundary.

      Based on this pooling, the authors then conclude that relative adhesiveness increases linearly with contact width over the entire width range, regardless of adhesion factor depletion. This again assumes that all contacts (morphant and WT) are functionally equivalent, and that what is observed in morphant embryos in very wide contacts would also hold true in WT contacts. But because WT contacts occupy only a small portion of the width range, we cannot know how they would behave if scaled to be wider, and I am not convinced that very wide morphant contacts are representative of or functionally equivalent to WT. In other words, we cannot know that contact width is the only factor increasing their relative adhesion, given the experimental manipulations that structurally alter these contacts.

      Although differences between contact types are apparent, we think that the contacts function very similarly. We still hold that relative adhesiveness increases with contact width, as seen in each of the separate plots for wt and adhesion factor depletions. But re-evaluating the alpha-width scatter plots now we show that in the narrow width range of normal chordamesoderm, C-cad, FN and Has depletions show similar, significantly decreased relative adhesiveness (Fig.7A-E). With alpha proportional to width, and width strongly increased in morphants, this initial decrease is compensated in total adhesiveness averages. The relative independence of adhesiveness from contact type could hint at non-specific PCM-PCM adhesion (Winklbauer, 2019). We think that although adhesion factor depletion leads to the loss of some contact types or renders others non-adhesive (thus lowering contact abundances), it modifies some contact types (e.g. by widening them) while only moderately lowering their adhesiveness per unit interaction surface.

      Minor comments<br /> - In their descriptions of PCM in different experimental conditions, the authors overstate some conclusions drawn from EM data. For example, that type I glycocalyces are absent in chordamesoderm (although this signal is only reduced),

      We qualified the statement.

      or that because the Has2 morphant phenotype is intermediate between C-cad and fibronectin morphants this indicates an adhesive role for hyaluronic acid.

      Overall, Has2MO increases the abundance of gaps, i.e. HA normally reduces gaps between cells, strongly suggesting an adhesive role of HA. HA is also required for the formation of 10-20 nm gaps, again proposing a direct or at least indirect adhesion-promoting role.

      • The authors state of the data in figure 1 that "All treatments significantly increase the size of non-adhesive gaps", but they don't show a quantification of the gaps size (they show the abundance).

      Has been corrected.

      • The authors state that LSM contacts exist as 10-20 and 20-50 nm subtypes. It is not clear what about the data suggest this division.

      In the LSM width difference spectra, CadMO and SynMO both increase the abundances of ≤ 20 nm contacts and decrease those of 20-50 nm contacts (Fig.4). The different response suggests at least two differently reacting subtypes.

      • In the same paragraph, the authors state that "C-cad and Syn-4... favor LSM width between 20-50 nm." What is meant by "favor"? Given that the number of 20 nm contacts is increased and 50 nm contacts is decreased in both conditions, this statement is unclear.

      The whole paragraph has been reworded.

      • On page 7, the authors say that the size of LSM structures is "consistent with larger plaques being assembled from small units", but if that were the case, wouldn't the plaque sizes be multiples of the size of a single unit? I.e. 100, 200, and 300 nm peaks? Because this is not the case, the data seem more consistent with a continuous range of LSM plaque sizes than with discrete units.

      The size of the units has a peak at 100 nm but a long tail (Fig.6F-H). Moreover, we discuss lateral compression (piling up of PCM material) or active stretching of plaques (to separate units for interdigitation), all factors that would blur plaque length patterns, i.e. we did not expect plaque sizes to be multiples of 100 nm.

      • On page 8, the authors refer repeatedly to LSM volume. Given that these measurements are made from TEM sections, how is volume being measured?

      This is explained now (p.7).

      • The authors present a model in which PCM interdigitates within cell contacts, but this is based on measurements from static tissues alone. Could the measurements of contact width instead be explained by compression of the PCM or some other mechanism? The data as presented don't rule out such possibilities.

      The model is in agreement with the linear increase of relative adhesiveness with contact width, with LSM height at gap surfaces not adding up to adjacent contact width, with visible interdigitation of glycocalyx units (“bushes”) described previously for prechordal mesoderm (Barua et al. 2021), and with the good agreement of calculated unit size with the size of measured LSM units. In addition, it agrees with literature data on endothelial glycocalyx plaques being composed of 100 nm units and of complete interpenetration of glycocalyces during blood cell adhesion.

      Some terms used are not clear, for example: "partial LSM", "triple layer contact", "random removal [of LSM plaques]".

      We point out the meaning of the terms now more clearly. That “partial LSM” is identical with “triple layer contact” (but shorter, for use in figure) is explained in the legend to fig.6.

      • In figure 5, the graphs depict negative "abundance". Recommend "difference in abundance" instead.

      Done. For shortness, Δ Abundance.

      • Statistics: In figure 1I, it is not clear what the asterisk in this graph means or if statistical differences between these groups was determined. And in figure 6, some groups are marked as n.s., but P values for groups that are statistically different are not presented.

      The asterisk in fig.1I was meant to indicate that this column is from Debanjan et al. 2021, but this is indicated by different shading and mentioned in the legend. The non-used n.s. marks were removed.

      Reviewer #2 (Significance):

      This detailed electron microscopy study advances our understanding of pericellular matrix within vertebrate embryos and how loss of its constituent molecules affects cell interactions. It further addresses the relationship between structurally distinct pericellular matrices and their adhesive properties, although this analysis is less convincing. This study adds to a body of literature in which cell-cell and cell-matrix adhesion are known to regulate morphogenetic cell movements, but how such contacts are remodeled as cells rearrange is poorly understood. Previous work has also used measurements from live cells, embryos, and tissues to infer physical forces within embryos such as adhesive strength, cortical tension, and viscosity. This work follows up directly on a previous study from this group that characterized glycocalyces within various tissues within Xenopus gastrulae by electron microscopy. The hypothesis that pericellular matrix enables flexible/fluid adhesion within highly dynamic embryonic tissues is exciting, and is likely to be of interest to developmental biologists - particularly those who apply mechanical concepts to embryos. However, additional evidence, preferably from live tissues and embryos, is needed to support this hypothesis. This assessment is based on over 15 years' experience studying gastrulation morphogenesis in multiple vertebrate species.

    1. Reviewer #3 (Public Review):

      The authors previously showed that expressing formate dehydrogenase, rubisco, carbonic anhydrase, and phosphoribulokinase in Escherichia coli, followed by experimental evolution, led to the generation of strains that can metabolise CO2. Using two rounds of experimental evolution, the authors identify mutations in three genes - pgi, rpoB, and crp - that allow cells to metabolise CO2 in their engineered strain background. The authors make a strong case that mutations in pgi are loss-of-function mutations that prevent metabolic efflux from the reductive pentose phosphate autocatalytic cycle. The authors also argue that mutations in crp and rpoB lead to an increase in the NADH/NAD+ ratio, which would increase the concentration of the electron donor for carbon fixation. While this may explain the role of the crp and rpoB mutations, there is good reason to think that the two mutations have independent effects, and that the change in NADH/NAD+ ratio may not be the major reason for their importance in the CO2-metabolising strain.

      Specific comments:

      1. Deleting pgi rather than using a point mutation would allow the authors to more rigorously test whether loss-off-function mutants are being selected for in their experimental evolution pipeline. The same argument applies to crp.

      2. Page 10, lines 10-11, the authors state "Since Crp and RpoB are known to physically interact in the cell (26-28), we address them as one unit, as it is hard to decouple the effect of one from the other". CRP and RpoB are connected, but the authors' description of them is misleading. CRP activates transcription by interacting with RNA polymerase holoenzyme, of which the Beta subunit (encoded by rpoB) is a part. The specific interaction of CRP is with a different RNA polymerase subunit. The functions of CRP and RpoB, while both related to transcription, are otherwise very different. The mutations in crp and rpoB are unlikely to be directly functionally connected. Hence, they should be considered separately.

      3. A Beta-galactosidase assay would provide a very simple test of CRP H22N activity. There are also simple in vivo and in vitro assays for transcription activation (two different modes of activation) and DNA-binding. H22 is not near the DNA-binding domain, but may impact overall protein structure.

      4. There are many high-resolution structures of both CRP and RpoB (in the context of RNA polymerase). The authors should compare the position of the sites of mutation of these proteins to known functional regions, assuming H22N is not a loss-of-function mutation in crp.

      5. RNA-seq would provide a simple assay for the effects of the crp and rpoB mutations. While the precise effect of the rpoB mutation on RNA polymerase function may be hard to discern, the overall impact on gene expression would likely be informative.

  2. Jul 2023
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Please find our point-to-point response to the reviewer’s comments below, where we marked all changes implemented in the manuscript in italics.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      With the emergence and spread of resistance to Artemisinin (ART), a key component of current frontline malaria combination therapies, there is a growing effort to understand the mechanisms that lead to ART resistance. Previous work has shown that ART resistant parasites harbour mutations in the Kelch13 protein, which in turn leads to reduced endocytosis of host haemoglobin. The digestion of haemoglobin is thought to be critical for the activation of the artemisinin endoperoxide bridge, leading to the production of free radicals and parasite death. However, the mechanisms by which the parasites endocytose host cell haemoglobin remain poorly understood.

      Previous work by the authors identified several proteins in the proximity of K13 using proximity-based labelling (BioID) (Birnbaum et al. 2020). The authors then went on to characterise several of these proteins, showing that when proteins including EPS15, AP2mu, UBP1 and KIC7 are disrupted, this leads to ART resistance and defects in endocytosis leading to the hypothesis that these two processes are inextricably linked.

      In this manuscript, Schmidt et al. set themselves the task of characterising more K13 component candidates identified in their previous work (Birnbaum et al. 2020) that were not previously validated or characterised. They chose 10 candidates and investigated their localisations, and colocalisation with K13, and their involvement in endocytosis and in vitro ART resistance, 2 processes mediated by K13 and some members of the K13 compartments

      The authors show that of their 10 candidates, only 4 can be co-localised with K13. Then, using a combination of targeted gene disruption (TGD) as well as knock sideways (KS), they characterised these 4 proteins found in the K13 compartment. They show that MyoF and KIC12 are involved in endocytosis and are important for parasite growth, however their disruption does not lead to a change in ART sensitivity. The authors also confirm the findings of their previous publication (Birnbaum et al. 2020), using a slightly different TGD

      (note from the authors: we apologise if this has not properly transpired from the manuscript but the difference between the TGDs is substantial and relevant: one has less than 3% of the protein left and hence can be considered to fully inactivate MCA2 and has a growth defect whereas the other contains about two thirds of the protein (1344 amino acids/~66% are left), has no growth defect, although it lacks the MCA2 domain (hence that domain can not be critical for the growth defect)),

      that MCA2 is involved in ART resistance, however they did not check whether its disruption impacts haemoglobin uptake. They also show that KIC11 is not involved in mediating haemoglobin uptake or ART resistance. To finish, the authors used AlphaFold to identify new domains in the proteins of the K13 compartment. This led them to the conclusion that vesicle trafficking domains are enriched in proteins of the K13 compartment involved in endocytosis and in vitro ART resistance.

      The majority of the experiments conducted by the authors are performed to a good standard in biological and technical replicates, with the correct controls. Their findings provide confirmation that their 4 candidate genes seem to be important for parasite growth, and show that some of their candidates are involved in endocytosis. While the KD and KS approaches employed by the authors to study their candidate genes each have their own advantages and can be excellent tools for studying a large sets or genes, this manuscript highlights the many limitations of these approaches. For example, the large tag used for the KS approach can mislocalise proteins or disrupt their function (as is the case for MyoF), resulting in spurious results, or indeed the inability to generate the tagged line (as is the case for MCA2). The KS approach also makes the results of a protein with a dual localisation, like KIC12, extremely difficult to interpret.

      We thank the reviewer for this thorough and insightful review.

      The limitations mentioned above were addressed in the response to the main points and a general detailed response in regards to the systems used for this research are added at the end of this rebuttal. Briefly summarised here: while we agree that there are limitations of the system used, we are convinced that

      • the advantages of using a large tag in most cases outweighs the drawbacks as it permits to track the inactivation of the target, if need be on the individual cell level

      • while not optimal for MyoF, the partial inactivation actually helps in its functional study as detailed in major point 23&28 or reviewer#3 major point 11: it shows a consistent correlation of the phenotype with different causes and degrees of inactivation (this is now better illustrated in Figure 1L1M). Further, regarding the concern of the large tag: the effect of the tag based on localisation was overestimated in the review by what seems to have been a mix up comparing numbers from MyoF with a number from MCA2 (there is a difference, but it is only small) (see reviewer#1 major point #23).

      • KS is the optimal method for most of the assays in this work (e.g. bloated food vacuole assays and RSAs); these assays would be impossible or difficult to use with other inactivation systems currently used in P. falciparum research (see details in the response to the specific points and after the rebuttal)

      In regards to the difficulty to interpret KIC12 data: this is only true for measuring absolute essentiality, everything else we believe we actually have the optimal method. If not KS, which method targets a specific pool of a protein with a dual localisastion? Again, our assays targeting the K13 pool and revealing the specific function would have been difficult or impossible with any other system.

      Ultimately the question is whether any other system would have resulted in a different conclusion on the function of the proteins studied. At present we are confident this would not be the case and other systems probably would not have delivered the specific functional data shown in this work. Clearly, more in depth work will provide more nuanced and detailed insights into the proteins analysed in this work and this likely will also include the use of other systems for specific aspects they are most suitable for. However, this (e.g. different complementations in a diCre cKO) is complex and therefore beyond what fits into this work which had the goal to assess which proteins are true positives for the K13 compartment and to place them into functional groups in regards to endocytosis.

      Moreover, the manuscript is disjointed at times, with the authors choosing to conduct certain experiments for only a subset of genes, but not for others. For example, considering that the aim of this paper was to identify more proteins involved in ART resistance and endocytosis, it is confusing why the authors do not perform the endocytosis assays for all their selected proteins, and why they do not do this for the proteins they identify in their domain search. There is significant room for improvement for this manuscript, and a generally interesting question.

      The reviewer remarks that not every experiment was done for every target. Based on the rebuttal we tried to amend this but also note that there was some sentiment by the reviewers to better stick to the point and not make the manuscript more disjointed. We attempted to balance that as much as possible and hope we were able to honour both aspects (amendments were done as detailed in the point by point response below).

      In regards to endocytosis and choice of targets: We did do endocytosis assays for all proteins that showed a growth phenotype upon inactivation in this work. We therefore assume the reviewer here refers to major point #40 asking for endocytosis assays with KIC4 and KIC5 (which were not studied in this manuscript) as well as MCA2 (point 17). We fully agree with the reviewer that this would fill a gap in the work on K13 compartment proteins but such assays are difficult with TGDs (there are issues with non-comparable samples and compensatory effects) and proteins that are not essential (and hence likely have a smaller impact on endocytosis when truncated). We nevertheless now carried them out, but due to the limitations to do this with these lines would be hesitant to draw definite conclusions (see major point 17 and 40 for details and outcomes).

      But in it's current format, other than confirming that MCA2 is involved in ART resistance (which was already known from the Birnbaum paper), the authors do not further expand our understanding of the link between ART resistance and endocytosis in this manuscript.

      We would like to point out that the importance of the K13 compartment and endocytosis goes beyond ART resistance (see e.g. also newly published papers on the K13 compartment in Toxoplasma, (Wan et al., 2023; Koreny et al., 2023)). Endocytosis is an essential and prominent process in blood stages. However, in contrast to processes such as invasion, our understanding about endocytosis is only rudimentary. Hence, this manuscript provides important insights on an emerging topic that in our opinion deserves more attention:

      • it identifies novel proteins at the K13 compartment and provides 2 new proteins in endocytosis (MyoF and KIC12); getting an as complete as possible list of proteins involved in the process will be critical to study and understand it

      • it leads to the realisation that not all growth-relevant proteins detected at the K13 compartment are needed for endocytosis

      • it provides domains and stage specificity of function for several K13 compartment proteins, overall bolstering the model of endocytosis in ART resistance and providing a framework critical to direct future studies on endocytosis and their detailed mechanistic function at the cytostome

      • the identified vesicle trafficking domains (for instance now also found in UBP1) are expected to strengthen the support for the role of endocytosis of the K13 compartment; this and also the above points are important as (based on the current literature) there still seems to be prominent sentiment in the field that (in part due to the involvement of UBP1 and K13) the cause of ART resistance is due to various unclearly defined stress response pathways

      • with MyoF it also shows the first protein in connection with the K13 compartment that acts downstream of the generation of hemoglobin-filled containers in the parasite and provides the first protein that explains the suspected involvement of actin in endocytosis (so far this was only based on CytD studies)

      Overall we therefore believe this manuscript contains critical information and a framework for future studies on endocytosis and the K13 compartment. We hope the relevance of endocytosis as one of the most prominent and essential processes in the parasites and the connection to various aspects linked with many commercial drugs (in addition to the role of endocytosis in ART resistance), is adequately explained in the introduction. We also would like to mention that the main focus of the work is reflected in the title of the manuscript which does not mention ART susceptibility.

      Major Comments

      1) line 31: please change defined to characterised - defined suggests that novel proteins were identified in this study, which is not the case.

      We apologise, but we do not fully understand this comment. We did identify novel proteins not before known to be at the K13 compartment (MCA2 (admittedly this one was likely but had not previously been verified), MyoF, KIC11 and KIC12). In our view "further defining the composition of the K13 compartment" therefore is an accurate statement. Additionally, the identification of previously not-discovered domains, the stage-specificity and function of these proteins helped to further define the K13 compartment.

      If the reviewer is referring to the fact that the proteins analysed in this study were taken from a previously generated list of hits, we would like to stress that the presence in such a list (obtained from a BioID, but also if from an IP etc) can not be equalled for them to be true positives, they are merely candidates that still need to be experimentally validated. This is what we did in this work to find out which further proteins from the list can be classified as K13 compartment proteins (for hits with lower FDRs this is even more relevant as illustrated by the fact that 6 of the here analysed hits were not at the K13 compartment). In an attempt to address this comment in the manuscript, we changed the wording of this sentence to (line 31): "Here we further defined the composition of the K13 compartment by analysing more hits from a previous BioID, showing that MyoF and MCA2 as well as Kelch13 interaction candidate (KIC) 11 and 12 are found at this site."

      2) line 37: please change 'second' to "another". As explained further below, the authors identified 3 classes of proteins (confer ART resistance + involved in HCCU, involved in HCCU only, or involved in neither).

      We realized that the groups description wasn’t clear in the abstract. Please see response to major comment #41 for a detailed answer to this (endocytosis is an overarching criterion, ART resistance is a subgroup and applies only to those proteins with a function in endocytosis in ring stages). To clarify this (see also major point #8) we added an explanation on the influence of stage-specificity of endocytosis on ART susceptibility to the introduction (line 76): In contrast to K13 which is only needed for endocytosis in ring stages (the stage relevant for in vitro ART resistance), some of these proteins (AP2µ and UBP1) are also needed for endocytosis in later stage parasites (Birnbaum et al., 2020). At least in the case of UBP1, this is associated with a higher fitness cost but lower resistance compared to K13 mutations (Behrens et al., 2021; Behrens et al., 2023). Hence, the stage-specificity of endocytosis functions is relevant for in vitro ART resistance: proteins influencing endocytosis in trophozoites are expected to have a high fitness cost whereas proteins not needed for endocytosis in rings would not be expected to influence resistance.” The abstract was changed in response to this and other comments and hope it is now clearer in regards to the groups.

      3) Line 40: You define KIC11 as essential but according to your data some parasites are still alive and replicating 2 cycles after induction of the knock sideways. Please consider changing "essential" to "important for asexual parasite growth".

      We fully agree with the reviewer, we reworded the sentence as suggested.

      4) Line 40: please change 'second group' to 'this group'

      We reworded this part of the abstract and it know reads: (line 38): “While this strengthened the link of the K13 compartment to endocytosis, many proteins of this group showed unusual domain combinations and large parasite-specific regions, indicating a high level of taxon-specific adaptation of this process.”

      5) line 41: state here that despite it being essential, it is unknown what it is involved in.

      With the newly added data we show that this protein either has a function in invasion or very early ring development although we did not see any evidence for the latter. We therefore changed the sentence to (line 43): “We here identified the first protein of this group that is important for asexual blood stage development and showed that it likely is involved in invasion*..” *

      6) Line 50: the authors should state here that there is actually a reversal in this trend over the last few years.

      Done as suggested.

      7) Line 54: please separate out the references for each of the two statements made in this line (a: that ART resistance is widespread in SEA, and b: that ART resistance is now in Africa) Reference 14 also seems to reference ART resistance in Amazonia - which is not covered by the statement made by the authors (in which case the authors should state ART is now present in Africa and South America). The authors should also reference PMID: 34279219 for their statement that ART resistance is now found in Africa (albeit a different mutation to the one found in SEA).

      Done as suggested.

      8) Line 65: it is also worth mentioning here that there are other mutations in proteins other than K13, such as AP2mu and UBP1 (PMID: 24994911;24270944) that can lead to ART resistance.

      As suggested by the reviewer, we included a sentence about non-K13 mutations linked with reduced ART susceptibility in the introduction (line 74): Beside K13 mutations in other genes, such as Coronin (Demas et al., 2018) UBP1 (Borrmann et al., 2013; Henrici et al., 2020b; Birnbaum et al., 2020; Simwela et al., 2020) or AP2µ (Henriques et al., 2014; Henrici et al., 2020b)* have also been linked with reduced ART susceptibility." *

      We here also added data on fitness cost that is related to this and is also relevant for the issue of proteins with a stage-specific function in endocytosis, making a transition for this statement which might help clarifying the grouping of K13 compartment proteins (see also major point #2).

      9) Line 80, 86: ref 43 is misused. Reference 43 refers to Maurer's clefts trafficking which takes place in the erythrocyte cytosol and is not involved in haemoglobin uptake as far as I know. Please replace ref 43 with one showing the role of actin in haemoglobin uptake.

      We thank the reviewer for pointing this out, Ref 43 was removed from the manuscript.

      10) Line 98: the authors state here that they 'identified' further candidates from the K13 proxiome. This suggests that they identified new proteins in this paper, when in fact the list was already generated in ref 26. All they did was characterise proteins from that list that were not previously characterised. The authors should therefore remove identified from this statement.

      We agree with the reviewer that we did not identify further candidates, we identified new K13 compartment proteins from the list of potential K13 compartment proteins. We therefore changed “identified further candidates” into “identified further K13 compartment proteins” (line 116). Please see also response to major comment #1.

      11) Line 107-108: it is not clear from this sentence why these proteins were left out of the initial analysis in Ref 26. A sentence here explaining this would be valuable for the reader.

      This is a good point. One reason why we did not analyse more in our previous publication was that we had to stop somewhere and adding more would have been very difficult to fit into what was already a packed paper. However, as shown in this work, the list does contain further interesting candidates (e.g. K13 compartment proteins that are involved in endocytosis).

      We altered the relevant part of the introduction to highlight that we previously analysed the top hits, clarifying that the 'remaining' hits analysed in this work were further down in the list. This now reads: (line 113)“We reasoned that due to the high number of proteins that turned out to belong to the K13 compartment when validating the top hits of the K13 BioID (Birnbaum et al., 2020), the remaining hits of these experiments might contain further proteins belonging to the K13 compartment.” We hope this clarifies that we simply moved further down in the candidate list.

      12) Line 117-123: The authors say that PF3D7_0204300, PF3D7_1117900 and PF3D7_1016200 were not studied because they were not in the top 10 hits. However, the current organisation of Supplementary Table 1 shows all 3 proteins among the top 10 hits (MyoF, KIC12, UIS14 and 0907200 being after them). I think the authors should reorganise their table. It is also unclear according to what the proteins in the table are ranked. Could the authors indicate the metric used for the ranking?

      We thank the reviewer for alerting us to this. The issue here is that the 3 non-analysed proteins belong to a 'lower stringency' group comprising hits significant with FDRThe information about ranking is now also included as “Table legend” in the revised manuscript and the Table heading has been changed to: List of putative K13 compartment proteins, proteins selected for further characterization in this manuscript are highlighted.”

      13) Line 129-141: Can the authors be clearer with their explanations of the identification of mutation Y1344Stop? One dataset (ref 61) shows that 52% of African parasites have a mutation in MCA2 in position 1344 leading to a STOP codon. But another dataset (ref 62) shows that the next base is also mutated, reverting the stop codon. That should have been seen in the first dataset as well. Could the authors please clarify.

      This mutation was first spotted in the MalariaGEN database (https://www.malariagen.net) (MalariaGEN et al., 2021), which allows online accessing of the data by using the “variant catalogue” tool, which is in a table format of frequency rather than in a sequence context. Hence, only after further research later on it became evident to us, that this mutation does not occur alone when looking at individual MCA2 sequences from patient samples in (Wichers et al., 2021b). We hope this is accurately reflected in our results section.

      14) Line 147: the authors say that MCA2 is expressed throughout the intraerythrocytic cycle as shown by live cell imaging. In Birnbaum et al 2020 fig 4I, the authors show that MCA2 is mainly expressed between 4 and 16hpi. But in Figure 1B of this manuscript there is a clear multiplication of MCA2 signal between trophozoite and schizont. How do the authors explain this discrepancy? Could expression of the truncated MCA2 be different than the full length? This cannot be assessed as expression and localisation of the full-length HA tag MCA2 is not shown in Schizonts.

      The key difference lies in transcription vs protein expression (usually protein levels peak after mRNA levels peak and - depending on turnover - protein levels can stay high even after mRNA levels have declined). Figure 4 of the Birnbaum et al paper presents transcriptomic data, but with a peak in trophozoites (The axis label in Fig. 4l of that publication is a bit confusing, as hour 0 is at the top, 48 h at the bottom; it is clearer in Fig. S13 of that paper) which would fit very well with the multiplication of the signal between trophozoites and schizonts mentioned by the reviewer. So, overall, the temporal peaks of transcripts and protein of that protein fit well.

      For the signal in rings: Likely the protein has a turnover rate that is sufficiently low for some protein to be taken into the new cycle after re-invasion. Also different transcriptomic datasets e.g. (Otto et al., 2010; Wichers et al., 2019; Subudhi et al., 2020) available on plasmoDB show some mRNA present across the complete asexual development cycle, with each dataset showing maximum peak at a slightly different stage.

      Even when located in foci and hence aiding detection of small amounts of protein (as is the case for MCA2-Y1344-GFP), the MCA2 signal in rings is not strong. For MCA2-TGD, the GFP signal is dispersed and therefore likely below our detection limit, while the same amount of protein concentrated at the K13 compartment is visible as foci in the MCA2-Y1344 cell line. Please note that MCA2-TGD has only 2.8% of the protein left whereas MCA2-Y1344 has 66.5% left and based on our manuscript is almost fully functional, hence fitting the different locations between the two versions.

      Overall we believe this shows that there are actually no significant discrepancies of the expression of the different MCA2 versions.

      15) Line 158: would it not have been more useful for the authors to have episomally expressed MCA2-3xHA in their MCA2Y1344STOP-GFPENDO line to make sure that the truncated protein is indeed going to the correct compartment? The experiments done by the authors suggests that the MCA2Y1344STOP goes to the right location but does not really confirm it.

      We appreciate the reviewers caution here. However, considering that MCA2Y1344STOP-GFPendo co-locates with mCherryK13 and endogenously HA-tagged full length MCA2 does the same to a similar extent, there is in our opinion little doubt that MCA2 is found at the K13 compartment and that this is similar with both constructs. If there are minor differences, these might as well occur if MCA2 is episomally (as suggested in the comment) instead of endogenously expressed. Given the limited insight, we therefore decided against the episomal overexpression (which due to its size of > 6000bp may also be somewhat less straight forward than it may sound).

      16) Line 191: it is stated that MCA2 confers resistance independently of the MCA domain, however in both the MCA2-TGD and MCA2Y1344STOP-GFPENDO parasites, the MCA domain is deleted, and for both parasites, there is resistance (albeit to a lower level in the MCA2Y1344STOP-GFPENDO line). Therefore, how can the authors state that the ART resistance is independent of the MCA domain? This statement should be that resistance is dependent on the loss of the MCA domain.

      We agree that this can’t be categorically excluded. However, a ~5 fold difference in ART sensitivity was observed between the parasites with MCA2 truncated at amino acid 57 compared to those with MCA at amino acid 1344 even though both do not contain the MCA2 domain. Hence, at least this difference is not dependent on the MCA2 domain. The larger construct missing the MCA domain shows only a very moderate reduction in RSA survival, again suggesting the MCA domain is not the main factor. We amended our statement in an attempt to more accurately reflect the data (line 487): This considerable reduction in ART susceptibility in the parasites with the truncation at MCA2 position 57 compared to the parasites still expressing 1344 amino acids of MCA2, despite both versions of the protein lacking the MCA domain, indicates that the influence on ART resistance is not, or only partially due to the MCA domain.” We would be hesitant to state the reviewer's conclusion that “resistance is dependent on the loss of the MCA domain”, as the larger construct missing the MCA2 domain has a milder RSA effect compared to MCA2-TGD, which suggests the reduction in ART susceptibility is independent of the MCA domain. These considerations also agree with the fact that the parasites with the longer MCA2 version (in contrast to the MCA2-TGD) do not have any detectable growth defect which indicates that the protein can fulfil its function without the MCA2 domain.

      17) Line 192: Why did the authors not check if MCA2 is involved in endocytosis? They state later on in the manuscript that they did not do endocytosis assays with TGD lines, however if the authors include the correct controls, this could be easily done. It would also be really interesting to see whether endocytosis gets progressively worse going from WT to MCA2Y1344STOP to MAC2TGD. This experiment (as well as doing endocytosis assays for KIC4 and KIC5 TGD lines) would drastically increase the impact of this study. These experiments would not take more than 3 weeks to perform, and would not require the generation of new lines.

      So far were very hesitant to do bloated FV assays with TGDs (even though TGDs were available for the genes encoding MCA2 and KIC4 and KIC5). The reason for this was:

      1. the fact that these proteins could be disrupted indicated either redundancy or only a partial effect on endocytosis which might lead to only small effects that likely are difficult to pick up in an assay scoring for the rather absolute phenotype of bloated vs non-bloated. Using the refined assay measuring FV size could partly amend this but we note that also FV without hemoglobin have a certain size, reducing the relative effect if there are smaller differences.
      2. a TGD line does not permit tightly controlled inactivation of the target which makes comparing the outcome of bloated food vacuole assays difficult if there are smaller growth and stage differences to the 3D7 control.
      3. in contrast to conditional inactivation parasites, the TGD lines had ample times to adapt to loss of the target protein (compensatory mechanisms are well known for endocytosis, for instance in clathrin mediated endocytosis loss of individual components can be compensated (Chen and Schmid, 2020)). We nevertheless see the reviewer's point that this should at least be attempted and now conducted these assays (see also major point 40). For MCA2 (as requested in this point), the data is shown in Figure S5C-E. This assay showed that in MCA2-TGD, MCA2Y1344STOP-GFPendo (similar to the 3D7 control) >95% of parasites developed bloated food vacuoles. Additionally, we also measured the parasite and food vacuole size of individual cells in an attempt to solve some of the problems with TGDs with such assays. In order to specifically solve problem 2 mentioned above, we analysed the food vacuoles of similarly sized parasites, however, they were non-distinguishable between the three lines. Of note, in agreement with the reduced parasite proliferation rate (Birnbaum et al., 2020) a general effect on parasite and food vacuole size was observed for MCA2-TGD parasites, indicating reduced development speed in these parasites. Hence, it is possible that a potential endocytosis reduction was accompanied by a slowed growth, and the comparison of similarly sized parasites may have obscured the effect. It is therefore not sure if there indeed is no endocytosis phenotype, although we can exclude a strong effect in trophozoites.

      Based on the RSA results at least rings can be expected to have a reduced endocytosis in the MCA2-TGD. Apart from options 1-3 mentioned above, it is therefore possible there is an effect restricted to rings, although in that case the reduced growth in trophozoites would be due to other functions of MCA2. Overall, we can conclude that the MCA2-TGD parasites do not have a strongly reduced endocytosis, but given the fact that the parasites are viable, this is not surprising. Whether the MCA2-TGD has no effect at all on endocytosis we would be very hesitant to postulate based on these results.

      18) The authors should consider re-organising the MCA2 section, first showing that the 3xHA tagged line colocalises with K13, then performing the new truncation.

      We attempted to re-organise as suggested but because we now included additional fluorescence microscopy images of schizont and merozoites (in response to reviewer 2 major comment 3) the main figure would become even larger. To prevent this, we kept the 3xHA data in the supplement.

      19) Line 197: Once again ref 43 is not correct to illustrate that actin/myosin is involved in endocytosis

      We thank the reviewer for pointing this out – we removed Ref 43.

      20) Line 202: the authors state that MyoF localises near the food vacuole from ring stage/trophs onwards. However, how can this statement be made in schizonts based on these images (Fig. 2A), where it doesn't look like MyoF is anywhere near the FV? This statement can only be made for schizonts if co-localised with a FV marker (which is done in Fig. 2B), however, based on the number of MyoF foci, it appears that this was not done for schizonts. Please either remove the statement that MyoF is near the food vacuole from trophs onwards (because it is only seen near the FV up until trophs) or show the data in Fig. 2B of schizonts to substantiate these claims.

      This is a valid point. We originally did not focus on schizonts because most markers end up in some focal area in the forming merozoite but other proteins (such as e.g. K13) also have one or more additional foci at the FV, making interpretation unclear, particularly if the schizont is still organizing to become fully segmented. This is why we generally focused the K13 co-localisations on the trophozoite stage to obtain the clearest information on endocytosis. However, given the fact that this manuscript gives the first localization of MyoF in P. falciparum parasites, we now provide a comprehensive time course (Figure 1C, S1A) including schizonts, which show quite a complex pattern: while the MyoF-GFP localization in trophozoites appeared as multiple foci close to K13 and also the FV, the MyoF-GFP pattern changes in late schizonts (fully segmented) and merozoites, appearing as elongated foci no longer close to K13 or the FV. Of note, this pattern has been previously reported for MyoE in P. berghei (Wall et al., 2019).

      We therefore revised the statement about MyoF localization in schizont to better reflect the observed localization: (line 175): In late schizonts and merozoite the MyoF-GFP signal was not associated with K13, but showed elongated GFP foci (Figure 1C, S2A) reminiscent of the MyoE signal previously reported in P. berghei schizonts (Wall et al., 2019).”

      21) Line 204-206: what does this statement bring to the paper? Is it to show that it is the real localisation of MyoF because 2 tag cell line show the same localisation? I don't think this is needed, especially as later in the manuscript an HA-tag MyoF line is used and show similar localisation.

      We see the reviewers point, but prefer to keep this data included in the supplement, particularly because potential differences in the location of tagged MyoF were a major concern.

      Related to the tag issue: in order to get a better understanding of the effect of C-terminally tagging with different sized tags we now performed a more detailed analysis of the MyoF-3xHA cell line (Figure S2F-G), showing that this cell line shows a growth rate similar to the 3D7 wild type parasites, and has less vesicles than the 2x-FKBP-GFP-2xFKBP cell line, but still slightly, but significantly more than 3D7 parasites. Overall, this indicates that the smaller 3xHA tag has less effect on the parasite, than the larger 2x-FKBP-GFP-2xFKBP tag (see also new Figure 1L, showing a correlation of level of inactivation and the endocytosis phenotype for MyoF).

      22) Line 212: The overlap of K13 with MyoF in Figure 2C 3rd panel (1st trophozoite panel) is not obvious, especially as the MyoF signal seems inexistant. I would advise the authors to replace with a better image. Also, why are there no images of schizonts shown in Figure 2C?

      As suggested we exchanged the trophozoite image of panel Figure 2 C (now Figure 1C) and expanded this panel with images covering the complete asexual development cycle including schizonts in response to this and the previous points. As indicated above (point 20), schizont stages are complex to interpret. While late schizonts likely are not very relevant for endocytosis this is the first description of the location of the protein in this parasite and we therefore now provide a more thorough representation of the MyoF location across asexual stages in Figure1C and S2A.

      23) Line 217: the spatial association of MyoF with K13 is very different when it is tagged with GFP and when it is tagged with 3xHA. The way the authors word it here, it seems that there is agreement with the two datasets, when this is not in fact the case (59% overlap for MyoF-GFP and only 16% overlap with MyoF-3xHA). These data suggest that the GFP and the multiple FKBP tags are doing something to the protein and therefore maybe the ensuing results using this line should not be trusted or be taken with a pinch of salt.

      We agree with the reviewer that the location of this MyoF-GFP in the cell might differ due to the partial inactivation but in contrast to this comment, the data does not indicate any large differences. It seems the reviewer mixed something up (the 59% mentioned might come from the MCA2 figure?). The data with the two lines with differently tagged MyoF co-localised with K13 are actually quite comparable: GFP-tagged vs HA-tagged MyoF overlapping with K13 was 8% vs 16% full overlap, 12% vs 19% partially overlapping foci, 36% vs 63% foci that were touching but not overlapping (compare what now is Figure 1D and Figure S2C). Only in the 'no overlap' there is a much smaller proportion in the HA-tagged line. However, given that these are IFAs which on the one hand are more sensitive to see small protein pools but on the other hand also have pitfalls due to fixing of the cells (e.g. tiny increase in focus size due to fixing could increase the number of touching foci that in live cells might be close but did not touch), some variation can be expected to the live cells. We agree though that the partly reduced functionality of MyoF might be the reason for the consistent tendency of a lower overlap even though the difference is much less than indicated in the comment. We added "with a tendency for higher overlap with K13 which might be due to the partial inactivation of the GFP-tagged MyoF" to the sentence "IFA confirmed the focal localisation of MyoF and its spatial association with mCherry-K13 foci"

      While we expect the fact that the difference between these parasites is only small somewhat reduces the "pinch of salt" with the MyoF line, we do agree that the partial functional inactivation of the GFP-tagged MyoF line may have some impact. However, we do not think that this means the results with the MyoF-GFP line are untrustworthy. On the contrary, it provides insights into its function that in some ways is equivalent to a knock down or TGD. Overall all the MyoF lines show: few vesicles occur in the MyoF-HA-line, more in the MyoF-GFP line and even more after knock sideways of MyoF-GFP. Importantly the severity of this phenotype correlates with the growth rates in these lines. Hence, together with the bloated food vacuole assays, this provides consistent data indicating that MyoF has a role in the transport of HCC to the FV and its level of activity correlates with the number of vesicles and growth. To better highlight this, it is now summarised in Figure 1M.

      24) Line 219: the authors state here that they could not detect MyoF-GFP in rings, when in Figure 2C they show MyoF-GFP in rings, and also show that they could detect MyoF in Sup Fig. 3B with the 3xHA tagged line. Is this a labelling mistake in Figure 2C? If the authors could indeed not see MoyF-GFP in rings, this statement should have been made when Figure 2A was presented, and not so late in the manuscript, which causes confusion.

      We thank the reviewer for pointing this out. We now provide a detailed time course (see also previous points) which shows that there is no detectable MyoF-GFP signal during ring stage development until the stage where the parasites starts the transition to trophozoites (i.e. MyoF-GFP signal could only be observed in parasites already containing hemozoin). In addition to the extended time course in Figure 1C (previously 2C) we included a panel of example ring stage images below to further highlight this. We also changed the labelling of the parasite with MyoF-GFP signal the reviewer mentions in Figure 1C to “late ring stage” (it already contains hemozoin) to clarify this.

      The description of Figure 1A is now changed to: (line 153) *“The tagged MyoF was detectable as foci close to the food vacuole from the stage parasites turned from late rings to young trophozoite stage onwards, while in schizonts multiple MyoF foci were visible (Figure 1A, S2A).” *

      Please see our answer to major comment #45 where we provide an explanation for the difference between MyoF-3xHA and MyoF-GFP signal in ring stage parasites.

      [Figure MyoF]

      25) Line 237: Showing a DNA marker (DAPI, Hoecht) for Figure 2E, and subsequent figures using mislocalisation to the nucleus, would help the reader assess efficiency of the mislocalisation.

      Please see response to major comment #64 for a detailed answer on why we did not include DNA staining in the imaging used to assess mislocalization upon knock-sideways.

      26) Line 254-256: authors should show the results of the bloating assay for parental 3D7 parasites (+ and - rapalog) to see whether the MyoF line - rapalog has increased baseline bloating. This applies to all subsequent FV bloating assays.

      We did do several controls for bloated assays (including +/- rapalog of an irrelevant knock sideways line as well as using a chemical insult for which the control was 3D7 without treatment) in previous work (Birnbaum et al., 2020), which indicated that there is no effect of rapalog to reduce bloating. Although these controls are more stringent, we nevertheless did a 3D7 +/- rapalog control and added this to the manuscript (Figure S2I). As it is not possible to do this side by side with the assays that are already in the manuscript and the +/- rapalog 3D7 cells consistently showed no or very low numbers of cells without bloating (and stringent controls in the past equally did not show an effect), we believe adding this control once suffices.

      27) Line 254-257: The authors say that because fewer parasites show a bloated food vacuole upon inactivation of MyoF it means that less hemoglobin reached the food vacuole. I understand the authors statement, however, shouldn't they look at the size of the food vacuole, instead of the number of parasites with bloated FV, to make such a statement? This has been done for KIC12 so why not doing it for MyoF?

      This was now done and is provided as Figure 1J-K, S2J. The results confirm the assessment scoring bloated vs non-boated food vacuoles.

      28) Line 259-261: these results would be difficult to interpret namely because the authors have dying parasites, which is exacerbated with the protein being knocked sideways. The authors should mention the pitfalls their knock sideways and tagging design here. Line 260-261: RSA is an assay relying on measuring parasite growth 1 cycle after a challenge with ART for 6 hours.

      Fortunately, this concern is unfounded, as the survival (measured by parasitemia after one cycle) of the same sample + and - DHA is assessed, isolating the DHA effect independent of potential growth defects which are cancelled out. Hence, if there were parasites dying in the MyoF line (please note that they might not actually die, but simply grow more slowly), this factor applies for both the + and - ART condition. As we are testing for a decreased susceptibility to ART which would manifest as an increased survival in RSA surfacing above 1%, antagonistic effects of reduced MyoF function and ART treatment would not result in detectable differences as without effect, the RSA survival is always close to zero.

      The same applies for the knock sideways where we assess the survival of +rapalog between +ART and -ART. If the reduced MyoF activity of the knock sideways leads to a decreased survival, this applies to both +ART and -ART. Please also note that rapalog was lifted after the DHA pulse (see e.g. Figure S2K).

      That effects on growth are cancelled out is nicely illustrated for proteins where there is a stronger and more rapid effect on growth upon their conditional inactivation. For instance when KIC7 is knocked aside, there is a considerable increased of RSA survival, even though continued inactivation of KIC7 would have a severe growth defect (Birnbaum et al., 2020). Vice versa, a growth defect alone does not result in reduced RSA susceptibility as evident from knock sideways of an unrelated protein or using a chemical insult (Figure 4H in (Birnbaum et al., 2020) or simply slowing the ring stage by e.g. reducing EXP1 levels (Mesén-Ramírez et al., 2019). Hence, a growth reduction is not expected to alter the RSA outcome. And even if it did, it would only lead to an underestimation of the readout if growth is too severely affected (which would be obvious in the + rapalog without DHA sample, which was not the case).

      In that respect it is valuable to have the rapid kinetics of knock sideways which permit inactivation of a protein before severe growth defects occur (although the only partial responsiveness of MyoF clearly is not the most optimal). In contrast, the absolute loss of a gene (as is the case if diCre is used) prevents (or at least makes it extremely difficult as the timing would need to exactly hit sufficient protein reduction without killing the parasite until the end of the RSA) using this system in these experiments (again see (Mesén-Ramírez et al., 2021) where in a EXP1 diCre based knock out RSA was only possible because we complemented with a lowly, episomally expressed EXP1 copy to have parasites with only a partial phenotype to do this assay).

      29) Line 261-263: the authors sate that MyoF has a function in endocytosis but at a different step compared to K13 compartment proteins. I am not sure what they mean here. Can this be clarified?

      The different steps in endocytosis are explained in the introduction and we now tried to further clarify this (line 98). So far VPS45 (Jonscher et al., 2019), Rbsn5 (Sabitzki et al., 2023), Rab5b (Sabitzki et al., 2023), the phosphoinositide-binding protein PX1 (Mukherjee et al., 2022), the host enzyme peroxiredoxin 6 (Wagner et al., 2022) and K13 and some of its compartment proteins (Eps15, AP2µ, KIC7, UBP1) (Birnbaum et al., 2020) have been reported to act at different steps in the endocytic uptake pathway of hemoglobin. While inactivation of VPS45, Rbsn5, Rab5b, PX1 or actin resulted in an accumulation of hemoglobin filled vesicles (Lazarus et al., 2008; Jonscher et al., 2019; Mukherjee et al., 2022; Sabitzki et al., 2023), indicative of a block during endosomal transport (late steps in endocytosis), no such vesicles were observed upon inactivation of K13 and its compartment proteins (Birnbaum et al., 2020), suggesting a role of these proteins during initiation of endocytosis (early steps in endocytosis).

      VPS45 has not apparent spatial connection to the K13 compartment but the fact that MyoF does - and its inactivation also results in vesicle accumulation - indicates that it is downstream of vesicle initiation, providing the first connection from the initiation phase to the transport phase. More evidence for these different steps of endocytosis has been published in a recent preprint from our lab, where we simultaneously inactivated a protein of both “endocytosis steps” (Sabitzki et al., 2023).

      To clarify this in the results as requested, we changed the statement to: (line 256) Overall, our results indicate a close association of MyoF foci with the K13 compartment and a role of MyoF in endocytosis albeit not in rings and at a step in the endocytosis pathway when hemoglobin-filled vesicles had already formed and hence is subsequent to the function of the other so far known K13 compartment proteins.”

      30) Do the authors mean that it is involved in endocytosis but not in ART resistance? If so, this is a very difficult statement to make since the parasites are dying. Is there any evidence of point mutations in MyoF in the field?

      We split this point to address all issues raised here. Please see response to point 29 which clarifies that this was meant in a different way and our response to point 28 which explains why the dying parasite issue is not expected to affect the RSA (please also note that we do not have evidence of actually dying parasites in the MyoF-2xFKBP-GFP-2xFKBP line, most likely the growth is slowed).

      The mutation issue is interesting. In fact evidence exists that MyoF mutations may be associated with resistance (Cerqueira et al., 2017) (please note that there it is still called MyoC) but in a recent preprint from our lab we did not find any evidence for a significantly changed RSA survival in 12 tested mutations in the corresponding gene (Behrens et al., 2023).

      To clarify this we added the following statement to the discussion (line 709): "Of note, mutations in myoF have previously been found to be associated with reduced ART susceptibility (Cerqueira et al., 2017), but 12 mutations tested in the laboratory strain 3D7 did not result in increased RSA survival (Behrens et al., 2023)*. *

      31) Line 298: the authors state that there is no growth defect in the first cycle when rapalog is added to the KIC11 line, however based on Figure 3D, there is evidently a 25% reduction in growth compared to - rapalog at day 1 post treatment, and a 60% reduction by day 2, which is still within the 1st growth cycle. The authors should either revise their statement or provide an explanation for these findings. The authors should also explain why their Giemsa data in Fig. 3E is not in accordance with their FACS data.

      We think there is a misunderstanding here, as our figure legend was not detailed enough and we apologise if this had been misleading. The growth effect is restricted to invasion or possibly the first hours of ring stage development (see point 4&5, reviewer 2), which in asynchronous cultures more rapidly takes effect as the culture also contains schizonts that immediately generate cells that re-invade but can't due to inactivation of KIC11 (due to the rapid action of the knock sideways, KIC11 is already inactivated). In contrast, in highly synchronous cultures, this effect can only be evident once the parasites reached the schizont stage (starting with rings this takes close to 2 days). We now clarify that Figure 2E (previously Figure 3D) shows growth data obtained with an asynchronous parasite culture, while in Figure 2F the growth assay is performed with tightly synchronized (4h window) parasites as stated in the Figure legend.

      We now explicitly state in each Figure legend and for each growth experiment throughout the manuscript whether we used asynchronous or synchronized parasites for growth assays.

      Related to this, the incorrect y-axis label of what is now Figure 2E mentioned in major comment #58 is now corrected.

      32) Line 301: KIC11 could also be important very early for establishment of the ring stage for example for establishment of the PV. Also, was mislocalisation assessed in rapalog-treated parasites at 72 hours or in cycle 3?

      This is a valid point and this has now been addressed. We performed an invasion/egress assay revealing similar schizont rupture rates, but significantly reduced numbers of newly formed ring stage parasites (Figure 2H, S3G), indicating an effect of KIC11 inactivation either on invasion or possibly the first hours of ring stage development. A very similar point was raised by Reviewer 2, please see reviewer 2; major comment #4. This is now also reflected in line 302, which now reads: ”… indicating an invasion defect or an effect on parasite viability in merozoites or early rings but no effect on other parasite stages (Figure 2F-H, Figure S3F-G).”

      We further included an assessment of mislocalization 80 hours after the induction of knock-sideways by addition of rapalog in Figure S3E which showed mislocalization of KIC11 to the nucleus.

      33) Line 311: the authors should change the sentence from 'not related to endocytosis' to 'not related to endocytosis or ART resistance'.

      Done as suggested.

      34) Line 323-325: Authors say that a nuclear GFP signal can be observed in early schizonts for KIC12. According to the pictures provided in Figure 4A and Figure S5A it is not very obvious. Also faint cytoplasmic GFP signal could only be background as we can see that exposure is higher for schizont pictures

      We changed the sentence (line 339) to: “…nuclear signal and a faint uniform cytoplasmic GFP signal was detected in late trophozoites and early schizonts and these signals were absent in later schizonts and merozoites (Figure 3A, Figure S4A,B).” in order to emphasize that the nuclear signal disappears early during schizont development.

      35) Line 326-328: The authors say that kic12 transcriptional profile indicate mRNA levels peak (no s at peak) in merozoites. Should they show live cell imaging of merozoites then? Because from the Figure 4A schizont pictures where schizonts are almost fully segmented no signal can be observed.

      The observation that mRNA levels of early ring stage expressed proteins tend to increase already in mature schizonts and merozoites is well established (e.g. (Bozdech et al., 2003)). A very good example for this are exported proteins of which most show a transcription peak in schizonts but the proteins are only detected in rings see e.g. (Marti et al., 2004). Hence, our observation for KIC12 is quite typical.

      We originally did not include merozoites, as in the last row of Figure 3B fully developed merozoites within a schizont with already ruptured PVM are shown and no GFP signal can be detected in these parasites. We now provide images of free merozoites in Figure S4A-B showing again no detectable GFP signal.

      We thank the reviewer for pointing out the typo, "peak" has been corrected.

      36) Line 347: The authors state that using the Lyn mislocaliser the nuclear pool of KIC12 is inactivated by mislocalisation to the PPM. This tends to suggest that only the nuclear pool of KIC12 is mislocalised. How is it possible that only the nuclear pool is mislocalised?

      The Lyn mislocaliser is at the PPM which is continuous with the cytostomal neck where the K13 compartment likely is found. The effect of the Lyn mislocalizer on the KIC12 protein pool localizing at the K13 compartment is therefore somewhat unclear. For this reason we already had the following statement in the original submission (line 400): “Foci were still detected in the parasite periphery and it is unclear whether these remained with the K13 compartment or were also in some way affected by the Lyn-mislocaliser.” We would like to stress here that the same does not apply to the nuclear mislocaliser, which is only a trafficking signal delivering KIC12 to the nucleus and hence likely does not affect the nuclear pool of KIC12, only the K13 compartment pool (the main interest of this manuscript).

      We realised that the statement towards the end of this paragraph was unnecessarily ambiguous in regards to the K13 compartment pool of KIC12 which might have caused some confusion about the function of this pool of KIC12 and therefore modified it to (line 374): "Due to the possible influence on the K13 compartment located foci of KIC12 with the Lyn mislocaliser, a clear interpretation in regard to the functional importance of the nuclear pool of KIC12 other than that it confirms the importance of this protein for asexual blood stages is not possible. In contrast, the results with the nuclear mislocaliser indicate that the K13 located pool of KIC12 is important for efficient parasite growth.". It is also important to note that this limitation does not apply to the NLS knock sideways in regard to the K13 compartment and that the endocytosis function of this pool of KIC12 seems solid which with this statement is enforced.

      37) Line 368-369: Effect was also only partial for MyoF. Why didn't you measure the same metrics for MyoF?

      This was now done and is provided as Figure 1J-K, S2J, confirming our previous interpretation, see also point #27 which raises the same point.

      38) Line 379: you don't know if all proteins acting later in endocytosis will have an increased number of vesicles as a phenotype

      This is based on our current definition as stated in the introduction. It assumes a directional vesicular transport of hemoglobin to the food vacuole where inhibition of early stages will prevent transport before HCC-filled autonomous vesicular containers have formed and entered the cell. In contrast later inhibition stops such containers from further transport, leading to their accumulation. Such an accumulation is visible after VPS45-inactivation and other proteins (Jonscher et al., 2019; Mukherjee et al., 2022; Sabitzki et al., 2023) or treatment with cytochalasin D (Lazarus et al., 2008). While it is possible that there may be smaller intermediates formed at the K13 compartment that later on unite or fuse with the compartment evident after VPS45 inactivation and these might be missed due to small size (i.e. inhibition of a step between K13 compartment and an early endosome or equivalent), this would still be upstream of the VPS45 induced containers and hence would be earlier. We therefore believe that based on the framework given in the introduction (see also (Spielmann et al., 2020)) to assume that a phenotype manifesting as reduced food vacuole bloating without formation of detectable vesicles likely signifies inhibition of the process early whereas reduced bloating but with vesicles signifies inhibition later in the process.

      39) Line 413-414: The authors state that no growth defect was observed upon KS of 1365800. Is growth alone enough to say that there is no impact on endocytosis?

      This is an interesting point. The endocytosis proteins we studied so far indicate that efficient impairment of endocytosis manifests as a severe growth defect. Hence, lack of a growth defect can be assumed to be an indicator for absence of an important role for endocytosis (or any other growth relevant process). Clearly there is a gradual response, such as seen in the different MyoF versions resulting in proportional growth and vesicle appearance phenotypes. Hence, a protein with a minor role might have slipped our attention but then it probably is also not a very important protein in endocytosis.

      To further strengthen our assessment of PF3D7_1365800 importance for asexual blood stage development, we now also generated a cell line expressing the PPM Mislocalizer, enabling knock sideways to the PPM. This was done because this protein consistently has a focus at the nucleus that may be within the nucleus. Again this revealed no growth defect upon inactivation (Figure S7D).

      40) Line 432: in this section, the authors state that KIC4 and KIC5 seem to have domains that may suggest these proteins are involved in endocytosis, based on the alpha fold data that is publicly available. Considering the authors have TGD-SLI versions of these lines (Birnbaum et al. 2020) and have already confirmed in this previous publication that they confer resistance to ART; it would make sense to look at endocytosis for these genes. This would be a relatively simple and straightforward experiment, taking no longer than two to three weeks, and would require no additional reagents or line generation. Doing these experiments would add a lot more weight to this final section. The authors later state that KIC4 and 5 are TGD lines, so not the best for endocytosis assays. It is unclear why this would be difficult to do if an adequate control is contained in the experiment (such as parental 3D7). It explains why they did not perform the MCA2 endocytosis assays further up, but in my opinion, an attempt at doing these assays is important and would significantly increase the impact of this paper. Identical as major comment #17.

      As stated in the manuscript and above, we were originally hesitant to do these assays due to the fact that we can't induce inactivation which is less ideal than comparing the identical parasite population split into plus and minus and is further complicated by the likely smaller effect as the TGDs still permitted growth. However, we see the point of the reviewer and now performed these assays using 3D7 as controls and taking extra care to account for stage differences between the TGD lines and 3D7. However, there was no significant difference in the bloated food vacuole assays with these cell lines. Due to the reasons mentioned in major point 17, we are not sure this indeed means these proteins have no role in endocytosis. One possible reason why we were able to obtain these TGDs may have been because the effect on endocytosis is less than in the essential proteins (or is ring stage specific) and in a TGD an endocytosis defect may therefore not be detectable with our assays (see details and further possible explanations in response to point 17).

      In an attempt to address the TGD issue, we generated knock sideways cell lines for KIC4 and KIC5. Unfortunately, the mislocalization of KIC5 to the nucleus was inefficient (see figure below). As this did not result in a growth defect (in contrast to the clear KIC5-TGD growth defect (Birnbaum et al., 2020)), this line is not suitable to study a potential role of this protein in endocytosis. Therefore, we performed the bloated food vacuole assay only with KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites. However, this revealed no effect on HHC uptake, which is in line with the normal growth of KIC4-TGD parasites (Birnbaum et al., 2020) and suggests that this protein could only have a minor or redundant role in endocytosis (it is the line that shows the smallest effect in RSA). As the KIC4 and KIC5 knock sideway lines did not permit any conclusions, we did not include them into the revised manuscript but they can be found here:

      [Figure KIC4 knock sideways & KIC5 knocksideways]

      Figure legend: (A) Live-cell microscopy of knock sideways (+ rapalog) and control (without rapalog) KIC4-2xFKBP-GFP-2xFKBPendo+ 1xNLS mislocaliser parasites 4 and 20 hours after the induction of knock-sideways by addition of rapalog. Scale bar, 5 µm. Relative growth of asynchronous KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser plus rapalog compared with control parasites over five days. Three independent experiments were performed. Growth of knock sideways (+ rapalog) compared to control (without rapalog) KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser (blue) or KIC5-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser (red) parasites over five days. Mean relative parasitemia ± SD is shown. (B) Live-cell microscopy of knock sideways (+ rapalog) and control (without rapalog) KIC5-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites 4 and 20 hours after the induction of knock-sideways by addition of rapalog. Scale bar, 5 µm. Growth of asynchronous KIC5-2xFKBP-GFP-2xFKBPendo+ 1xNLSmislocaliser plus rapalog compared with control parasites over five days. Four independent experiments were performed. __(C) __Bloated food vacuole assay with KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites 8 hours after inactivation of KIC4 (+rapalog). Cells were categorized as with ‘bloated FV’ or ‘non-bloated FV’ and percentage of cells with bloated FV is displayed; n = 3 independent experiments with each n=19-30 (mean 21.4) parasites analysed per condition. Representative DIC are displayed. Area of the FV, area of the parasite and area of FV divided by area of the corresponding parasites were determined. Mean of each independent experiment indicated by coloured symbols, individual datapoints by grey dots. Data presented according to SuperPlot guidelines (Lord et al., 2020); Error bars represent mean ± SD. P-value determined by paired t-test. Area of FV of individual cells plotted versus the area of the corresponding parasite. Line represents linear regression with error indicated by dashed line.

      41) Line 490-493: the authors state that the K13 compartment proteins fall in two groups, some that are involved in ART resistance AND endocytosis, and some that have different functions. However, in this manuscript the authors have demonstrated 3 flavours that K13 compartment proteins can come in: • Some that confer ART resistance and are involved in HCCU (MCA2) • Some that are involved in HCCU but not ART resistance (MyoF & KIC12) • Some that are involved in neither (KIC11) The authors should therefore revise this statement.

      We agree that this was not well phrased. To account for the fact that not all endocytosis proteins confer increased RSA survival to the parasites when inactivated we changed this statement (line 604): "This analysis suggests that proteins detected at the K13 compartment can be classified into at least two groups of which one comprises proteins involved in endocytosis or in vitro ART resistance whereas the other group might have different functions yet to be discovered.

      Generally, we believe that endocytosis is the overarching criterion and we therefore would like to keep the definitions of the main groups (endocytosis or not). As indicated by the title, the focus of the manuscript is on the K13 compartment for which so far endocytosis is the only experimentally associated function. That this group contains proteins that do not confer reduced ART susceptibility when conditionally inactivated (KIC12 and MyoF) is explained by their stage-specificity, making this a subgroup of the overarching endocytosis group.

      We realise that with the endocytosis data on the KIC4, KIC5 and MCA2 TGD there is now also a subgroup we were unable to demonstrate an endocytosis effect in trophozoites although they show changes in RSA survival. However, as indicated above, we would be hesitant to fully exclude some role of these proteins in endocytosis in rings. Particularly as a comparably small reduction in endocytosis protein activity or abundance is sufficient to increase RSA survival (Behrens et al., 2023). A principal classification of "endocytosis or ART resistance" or "neither endocytosis nor ART resistance" still accounts for this and therefore seems to us to be the most useful, particularly also in light of our domain identification that then can be linked with one or the other group.

      42) Line 508: the authors state that they expanded the repertoire of K13 compartments, when in fact they functionally analysed them - they did not do another BioID to identify more candidates.

      We respectfully disagree with the reviewer in this point, we did expand the repertoire of known K13 compartment proteins. Only independently experimentally validated proteins from proximity biotinylation experiments can be considered part of the K13 compartment (or any other cellular site or complex). Without validation of the location, the identified proteins can only be considered candidates. This is highlighted in this manuscript by the finding that several proteins of the list did not localize at the K13 compartment.

      43) Line 570-572: has anyone ever tested whether CytoD or JAS treatment in rings, is sufficient to mediate ART resistance? Something similar to what was done in PMID 21709259 with protease inhibitors. If not this would be a pretty interesting experiment for the authors to do that could shed more light on the MyoF data. It would take maybe 2 weeks to do and not require the generation of any new lines. This would clarify whether other Myosins other than MyoF are involved in endocytosis, as is suggested by previous publications (PMID: 17944961).

      We now included this experiment. In agreement with a lacking need of MyoF in rings and no effect on RSA survival, there was no increased survival of the parasites in RSA (neither on 3D7 nor on K13 C580Y parasites) after cytD treatment (new part in Figure 1M). We thank the reviewer for pointing out that this experiment might also inform on whether other myosins influence endocytosis in ring stages. We added (line 250): Similarly, also incubation with the actin destabilising agent Cytochalasin D (Casella et al., 1981), had no effect on RSA survival in 3D7 or K13C580Y (Birnbaum et al., 2020) parasites, indicating an actin/myosin independent endocytosis pathway in ring stage parasites (Figure 1M) and speaking against other myosins taking over the MyoF endocytosis function in rings.”

      44) Line 608: inhibitors targeting the metacaspase domain of MCA2 may inadvertently inactivate other essential parts of the protein. They authors should acknowledge this possibility in the text.

      The inhibitors used in the cited studies (Kumari et al., 2018) are validated metacaspase inhibitors, such as Z-FA-FMK (Lopez-Hernandez et al., 2003). Activity against the other parts of PfMCA2 - which apart from the MCA domain shows no homology to other proteins - is therefore unlikely.

      45) Line 624-625: the authors state that MyoF is 'lowly expressed in rings' - indeed this is the case in their MyoF-2xFKBP-GFP-2xFKBP line which the authors established has defects due to the tag, but it appears from their MyoF-3xHA tagged line that it is expressed in rings. The authors should therefore revise their statement, and be careful of making claims based on their defective line and using fluorescence imaging as their only metric. If they do want to make the statement that it is not there in rings, they should also do a western blot, which is much more sensitive since it amplifies the signal compared to an image of one parasite.

      This comment is related to major point #24. We also would like to stress that while the MyoF-GFP line already shows a phenotype, the impression of defectiveness based on its location is due to a mix up (see major point #23).

      We now provide a comprehensive time course of the MyoF-GFP signal (Figure 1C, S2A) showing that there is no detectable MyoF-GFP signal until the transition from ring to trophozoite stage. As this is all under the endogenous promoter, we do not think the partial functional inactivation of the tagging is the reason for the absence of the signal. If anything, we would have expected adding a stably folded structure such as GFP to increase the stability of the protein. The main reason for the discrepancy of MyoF signal in rings between the GFP-tagged line (of note there is also no detectable MyoF-GFP signal in MyoF-2xFKBP-GFP ring stage parasites (Figure S2B)) and the HA-tagged line likely is that IFA is much more sensitive than live GFP detection (similar to the high sensitivity the reviewer mentions in regards to WB). This discrepancy therefore is likely due to the fact that the lowly expressed MyoF only become apparent with the HA-tagged line due to the IFA. We therefore believe that MyoF is 'lowly expressed in rings' is an appropriate description of our results obtained with three different cell lines (MyoF-2xFKBP-GFP-2xFKBP, MyoF-2xFKBP-GFP and MyoF-3xHA). We hope this is sufficiently well reflected in the manuscript where we write ‘a low level of expression of MyoF in ring stage parasites.’ not that it is ‘not there in rings’ (line 174).

      46) Line 635: arguably this is the 3rd variety and not the 2nd (the authors already mentioned 2 types - ones that are involved in HCCU AND ART and those involved in HCCU only). See comment for line 490-493 above.

      See response for major comment #41, we now consistently used "or" instead of "and". See line 490-493 how this was resolved for what previously was line 635.

      47) Line 785: Bloated food vacuole assay/E64 hemoglobin uptake assay method specify that a concentration of 33mM E64protease inhibitor was used. However, in reference 44, cited in the manuscript, a concentration of 33µM E64 was used. Please confirmed if this is just a typo or if 1000x E64 concentration was used which renders the experiment invalid.

      We thank the reviewer for pointing this out, we corrected this typo and will look out for symbol font conversion errors for the resubmission.

      48) Line 788: it is unclear from this section what is considered a bloated food vacuole - is there an area above which the FV is considered bloated? Do the authors do these measurements manually or use an addon in FIJI/ImageJ? What is the cutoff for if a FV is bloated? Please clarify. Additionally, for the representative images + rapalog for Figures 2H and 4H, it would be useful to see where the authors delineate the FV (add a white circle showing what is actually measured).

      The bloated FV assay is well established (Jonscher et al., 2019; Birnbaum et al., 2020; Sabitzki et al., 2023). Although the bloating of the FV is a human judgment call, it is actually quite obvious: bloating appears as an easily spotted bulging of the FV in DIC. As also minor bloating is scored as 'bloated', it is a very conservative assay. Using an-add on to measure this is not straight forward. It is unclear how this bulging effect of the FV in DIC could be spotted by a software and due to the obviousness to human operators, potentially lengthy and complicated efforts to design appropriate machine learning options were not undertaken. The situation faced by the scorer of the assay is evident from Figure S4F-G which contains close to 50 "on rapalog" cells and close to 50 control cells, giving representative cells from all replicas of bloated FV assays with KIC12. Please note that these images shows the most complicated situation as far as bloated assays go, because the phenotype is not 100% (see Figure 3F) compared to e.g. KIC7 inactivation which leads to lack of bloating in almost all cells (see (Birnbaum et al., 2020) Figure 3E) but nevertheless the difference is still obvious. We are aware that in such situations (less than absolute inhibition) this assay scoring of "yes" or "no" is a surrogate for the actual level of inhibition and may be more subjective. This is why in this case we also did the FV size measurements (which are less dependent on human judgment) to further support this and give a better quantifiable measure. Of note, the bloated food vacuole judgments are done "blinded", i.e. the examiner does not know which sample they are looking at.

      In response to this reviewer's point we now also added the FV size refinement of the assay for MyoF inactivation which is one of the cases where inhibition of bloating is not in 100% of the cells (see major comment #27). Please also note here the advantage of the rapidly acting knock sideways technique for these assays which shows the sum of effect 8 h after initiating inactivation and for which we carefully control size of the cells which shows that there is no significant growth reduction over the assay time, excluding secondary effects due to a generally reduced viability. Compared to slower acting systems suggested to have been used instead (see introductory part and significance of this review), the rapid speed of knock sideways reduces the risk of potential pleiotropic or compensatory effects due to the time needed for proteins to be depleted if the gene or mRNA is targeted instead.

      The suggestion to include a ‘white circle’ (raised also as minor comment#27) is useful as an aid to see the food vacuole. However, in contrast to the Figures in (Birnbaum et al., 2020) (where we did add such a circle), we here included the DHE staining images in the figure, labelling the parasite cytosol which readily shows the FV (the FV corresponds to the region where there is no DHE staining). As this shows the position of the FV we would prefer to not obscure the DIC images with additional features to permit the reader to see the difference between bloated or non-bloated food vacuoles and keeping the image as natural as possible.

      49) Line 863-864: this sentence seems to be out of place.

      We thank the reviewer for pointing this out, the details of nucleus staining were moved to the correct part.

      50) Line 875: the authors state that there is a light blue wedge, when the circle consists of grey and black wedges. Please revise this.

      This has been corrected.

      51) Line 1059-1061: it is unclear whether the individual growth curves are different clones or whether they are just the same experiment repeated? If it is the latter, then why are they not combined, as is traditionally done?

      These are the individual replicates of the growth curves shown in Figure 1G of the same cell lines done on a different occasion. We always try to show as much of the primary data as possible and believe that showing individual data points from the different experiments is better than only the combined values which obscure the actual course of each experiment.

      52) Line 919-924: the authors mention a blue and red line, but there is only a black line in figure 3D. Moreover, the experiment of using the LYN mislocaliser was only done for KIC12 according to the manuscript. Additionally, the y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis there are several days. In the text it says there is no growth defect until the second cycle, but from this graph it appears the growth defect is evident as early as 1 day post rapalog treatment. Can the authors please clarify and correct the issues pointed out.

      We thank the reviewer for pointing this out, this was due to a copy & paste error in the figure legend that was now amended. We also fixed the incorrect axis label. For the last part (growth defect) please see detailed answer to Major comment#31 raising the same concern for KIC11 (in synchronous parasites the defect only takes effect once the cells reached the relevant stage whereas in asynchronous cultures there are always cells in the relevant stage that due to the rapid effect of the knock sideways already have a growth phenotype).

      53) Figure 1 panel B & C: the label of the figure where the signal from MCA2Y1344STOP-GFP is shown with the DAPI signal overlayed is deceptive since it suggests that this is the signal of full length MCA2. Please change the label of this panel from MAC2/DAPI to MCA2Y1344STOP/DAPI. The same is true for Panel C for the image labeled MCA2/K13 - please change this to MCA2Y1344STOP/K13.

      Done as requested.

      54) Figure 2B: what stages are these parasites? Please state this in the figure. Based on the MyoF pattern, it looks like rings in the upper panel and trophs in the bottom pannel. Why were schizonts not shown?

      Both are trophozoites (early trophozoite in top panel and late trophozoite in bottom panel). This is now labelled in what now is figure 1B. As stated above, schizont stages are less relevant for the topic of this manuscript and in order to prevent the manuscript from getting more disjointed and keeping it more focussed on the main topic, we decided to not include a schizont in the manuscript. Nevertheless, we included an example image below.

      [Figure MyoF_p40px schizont]

      55) Figure 2D&F: it is not very meaningful when growth assays are shown as a final bar after 4 days of growth. It is much more useful and informative to see a growth curve instead (as is shown in the supplementary), since it shows if the defect is apparent in the first growth cycle or later. With the way the data is currently shown, this is not apparent. I would advise the authors to switch the graph in 2F out of a combined graph of all the biological replicates growth curves for S3D - showing error bars.

      While we in principle fully agree with the reviewer in showing the course of the full experiment (which is available in Figure S2E), the key here is to show the overall difference. Hence, we would like to keep this comparison of the overall effect on growth in what now is Figure 1E and G. It is part of the argument to the doubts this reviewer raises to the function of MyoF (mainly in the overall assessment and the significance statement) to show that the phenotype is actually very consistent (partial inactivation through tagging or further inactivation using knock sideways increases endocytosis phenotypes, correlating with parasite viability).

      Please also note, that the growth curves upon knock sideways shown in Figure 1G, S2E are performed with asynchronous parasite cultures, which doesn’t allow us to draw direct conclusions about growth cycle effects.

      Nevertheless, we now also included the suggested combined data representation in Figure S2E.

      56) Figure 3: why were the calculation of FV area, parasite area and FV/parasite area only done for KIC12 and not done for MyoF? It would be interesting to see if any of these values are different for MyoF - whether the parasites are smaller in area and therefore FV smaller. Please present them Figure 2. Images should be already available and would not require further experiments to be done, only the analysis.

      This now has been done (confirming our results) and is included as Figure 1J-K, S2J. This point was also raised as major comment #37, please also see detailed answer there.

      57) Figure 3B: why is there no spatial association assessment for KIC11 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins.

      This is now included in Figure 2C.

      58) Figure 3D: The y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis the experiment takes place over several days. Is this a typo in the y axis? Additionally, the authors state in line 287-290 that the growth defect upon addition of rapalog is only seen in the second cycle, but from this graph it appears the growth defect is already evident 1 day post rapalog addition. The figure legend also does not make sense for this figure since it mentions a blue and a red line, when there is only a black line present. The legend also mentions the LYN mislocaliser which was used for KIC12 not KIC 11 (see above).

      We apologise for the inadequate legend and colour issues, this was amended. This point was also raised in major comment #31 and #52, please find detailed answer there.

      59) Figure 3E: the colour for Control and Rapalog 4 hpi are very similar and very hard to discern. Please choose an alternative colour or add a pattern to one of the samples. The y axis is also missing a label. Is this supposed to be parasitemia (%)?

      We thank the reviewer for pointing this out, the missing label is now included and the colour has been adapted to make them better distinguishable.

      60) Figure 4A: the ring shown in this figure does not appear to be a ring (it is far too large and appears to have multiple nuclei?). Do the authors have any other representative images to show instead?

      This is in fact a ring, but we realize that we accidentally included an incorrect size bar in the ring image of Figure 4A (now Figure 3A) (size bar for 63x objective instead of the correct one for the 100x objective), we apologise for this oversight. We don’t think this parasite has multiple nuclei, instead the Hoechst signal shows the often elongated nucleus seen in rings that can appear as two foci in Giemsa stained smears which leads to the typical diagnostic feature of P. falciparum rings in diagnostics. In order to exclude any doubts about the nuclear localization of KIC12 in rings, we here attached a panel with more examples of KIC12-2xFKBP-GFP-2xFKBP ring stage parasites.

      [Figure KIC12]

      61) Figure 4B: why is there no spatial association assessment for KIC12 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins. This should be done for the different life cycle stages considering the changing localisation of KIC12.

      This is now provided in Figure S4A. As suggested by the reviewer, we independently quantified the association for ring stage, early trophozoite and late trophozoites stage. As there is no KI12 signal in schizonts, we did not include a quantification for this stage.

      62) Figures 4C&E: it is extremely important to show the DNA stain in both these samples considering that a portion of KIC12 is in the nucleus! Please add the DAPI signal for these figures (as for all other figures!).

      Please see major comment #64 for a detailed answer why we did not include DNA staining in the imaging used to assess mislocalization upon knock-sideways.

      63) Figure 4E: this figure should be presented before 4D (considering the line being presented in 4E is used in an experiment in 4D). The authors should switch the order of these two.

      We see the point the reviewer is raising here, Figure 4D (now Figure 3D) also contains the data with the Lyn mislocaliser while we first talk about the NLS mislocaliser. This permits a better comparison between the two mislocaliser lines. However, first explaining the Lyn-mislocaliser and then going back to the NLS would make it rather complicated for the reader to follow the storyline and therefore we would like to keep the order as it is. We realise that this means the reader has to go back one figure part for seeing the Lyn growth data, but believe this is worth the benefit that the data is there compared to the NLS result.

      64) It is unclear why in many of the fluorescence images the authors do not show the DAPI signal - particularly when colocalising with K13 and when doing the knock sideways experiments. Please add these images to the figures - I would assume they have already been taken, so would simply involved adding the images to the panel.

      We did not include DNA staining (DAPI or Hoechst) for any of the images used to assess the efficacy of mislocalization, as we would prefer to keep the parasites as representative of a viable parasites in culture as possible. Hence they were imaged without DNA stain (these stains are toxic). We would like to point out that a DNA stain is not necessary, as the mislocaliser already marks the nucleus (in the case of the NLS mislocaliser), actually even somewhat more accurately, as it fills the entire nuclear space rather than only the DNA which is marked by DAPI or Hoechst.

      For LYN this admittedly is not the case, there the mislocaliser marks the plasma membrane. However, we think the proper control for efficient mislocalisation is the comparison between the GFP-tagged protein of interest and the mCherry mislocaliser to show mislocalisation, as previously done in our lab (e.g. (Birnbaum et al., 2017; Jonscher et al., 2019; Birnbaum et al., 2020)).

      Due to their toxicity, we also avoided nuclear staining in some other parts of the manuscript when we were of the opinion that a nucleus signal was not necessary.

      65) Throughout the manuscript, there is no western blot confirming the correct size of their modified proteins. This should be provided.

      We did perform Western blot analysis for both MCA2 cell lines. MCA2 is the only gene-product for which we generated a disruption for this work, and together with the severe truncation from previous work, we provided a Western blot-based confirmation of the correct size.

      The MCA2 disruptions are at least partially dispensable for in vitro parasite growth, hence if degradation occurred, this might not have been noticed. In that case we considered it relevant to show that the truncations were of the expected size. The other proteins in the main figures are essential for growth. Hence, if the tagging approach would lead to unexpected changes in protein integrity (which we assume is what was intended by this concern to be assessed with a Western blot), the parasites expressing the tagged MyoF, KIC11 and KIC12 would - due to their importance for asexual blood stage development - not have been obtained. Hence, we can assume the integrity of the tagged protein is very unlikely to have been affected in a functionally relevant way.

      66) None of the figures are appropriate for individuals with colour blindness, limiting their accessibility to the paper. Please change the colour schemes for all fluorescent images using magenta/green or an alternative colour combination appropriate for colourblind individuals.

      We thank the reviewer for this comment. This has now been amended, individual channels of fluorescence microscopy images are now shown in greyscale, while the overlay was changed to green/magenta.

      Minor Comments

      1) line 29: remove 'are'.

      Done.

      2) Line 29: the text says "HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins are among the few proteins so far functionally linked to this process." The sentence should be: 'HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins among the few proteins so far functionally linked to this process."

      Done.

      3) line 44: remove 'the'

      Done.

      4) Line 48: consider mentioning here that malaria is caused by the parasite Plasmodium - otherwise the first mention of parasite in line 52 is confusing for the non-specialist reader.

      Done.

      5) Line 49: estimated malaria-related death and case numbers are from the 2021 WHO World malaria report. You cite the 2020 WHO World malaria report.

      We now cite the newest WHO report.

      6) Line 53: please insert the word 'have' between now and also.

      Done.

      7) Line 54: please change 'was linked' to is linked

      Done

      8) Line 72: I would specify that free heme is toxic to the parasite. Especially as you mention that hemozoin is nontoxic.

      Sentence would be "where digestion results in the generation of free heme, toxic to the parasite, which is further converted into nontoxic hemozoin"

      Done.

      9) Line 90: authors should either say "in previous works" or "in a previous work"

      The text has been altered to say: “ in a previous work”.

      10) Line 91: "We designated these proteins as K13 interaction candidates (KICs)"

      Done.

      11) Line 95: please change 'rate' to number

      Done.

      12) Line 109: Please include a coma before (ii).

      Done.

      13) Line 112: as shown by Rudlaff et al in the paper you are citing, PPP8 is actually associated with the basal complex. You can say that "(ii) were either linked or had been shown to localise to the inner membrane complex (IMC) or the basal complex (PF3D7...).

      Done.

      14) Line 114: Protein PF3D7_1141300 is called APR1 in the manuscript but ARP1 in Supplementary Table 1. Please correct.

      Done.

      15) Line 131: please define SNP - this is the first use of the acronym.

      Done.

      16) Line 133-134: South-East Asia instead of "South Asia"

      Done.

      17) Line 135: please explain what TGD is - it is referred to over and over again in the manuscript without ever being explained.

      We apologise for this oversight. We now explain what is meant with TGD at the suggested point of the manuscript.

      18) Line 145: change 'Western blot' to western blot - only Southern blot is capitalised since it is named after an individual, while the other techniques are not.

      To the best of our knowledge this issue has not been resolved, some Journals capitalize the “W” (e.g. Science), while others don’t (e.g. Nature). We would prefer to continue to capitalize the “W”, as this is consistent with the original publication from (Burnette, 1981), but if there are strong objections, we would be happy to change this____.

      19) Line 152: add "the" between 'and spatial'

      Done.

      20) Line 158: please define SLI as selected linked integration, since it is the first use of the acronym.

      Done.

      21) Line 178: introduce a coma after protein. Sentence should be "Proliferation assays with the MCAY1344STOP-GFPendo parasites which express a larger portion of this protein, yet still lacking the MCA domain (Figure 1), indicated no growth ...

      Done.

      22) Line 195: the authors could mention that MyoF was previously called MyoC in the Birnbaum 2020 paper. I wanted to check back in the Birnbaum 2020 paper and could not find MyoF

      Good point, this was done.

      23) Line 200: "Expression and localisation of the fusion protein was analysed by fluorescent microscopy". Why expression was not analysed also by western Blot same as for MCA2?

      Please see major comment #64 for a detailed answer.

      24) Line 204: I could not find any mention of MyoF (Pf3D7_1329100) in reference 65. Please remove reference 65 if not correct. Also reference 66 looks at Plasmodium chabaudii transcriptomes so I would specify that "This expression pattern is in agreement with the transcriptional profile of its Plasmodium chabaudii orthologue"

      Reference 65 (Wichers et al., 2019) provides an RNAseq transcriptome dataset for asexual blood stage development of 3D7 (originating from the same source as the 3D7 used in this study). While Ref 66 (Subudhi et al., 2020) indeed contain transcriptomic data from P. chabaudi, the authors also provide a nice 2h window RNAseq transcriptome dataset for asexual blood stage development of Plasmodium falciparum. Both datasets are therefore suitable as reference for the statement about myoF transcription pattern. Both datasets are also easily accessible and show the pattern in a graph in PlasmoDB.

      25) Line 208: Please indicate a reference for P40 being a marker of the food vacuole

      Done.

      26) Line 220-224: The authors should consider changing to " Taken together these results show that MyoF is in foci that are mainly close to K13 and, at times, overlapping, indicating that MyoF is found in a regular close spatial association with the K13 compartment."

      The suggested wording introduces "mainly" for "frequently" and likely was in part motivated by the discrepancy in location between cell lines that we hope we now could clarify to be only minor (see major point #23). We therefore think the original wording appropriately summarises the findings (line 178): “*Taken together these results show that MyoF is in foci that are frequently close or overlapping with K13, indicating that MyoF is found in a regular close spatial association with the K13 compartment and at times overlaps with that compartment.” *

      27) Line 255: In Figure 2H, and subsequent figures showing bloated FV assay, I would delineate the food vacuole with dashed line as in Birnbaum et al. 2020 to help the reader understanding where the food vacuole is.

      In contrast to the Figures in Birnbaum et al. 2020, we here included the DHE staining (parasite cytosol) in images of bloated FV assays which visualizes the FV. We therefore decided to avoid any further marking, to keep the image as unprocessed as possible (see also major point 48).

      28) Line 265-266: Here the title says that KIC11 is a K13 compartment associated protein, but the title of Figure 3 says KIC11 is a K13 compartment protein. I noticed that you make the difference between K13 compartment protein et K13 compartment associated protein for MyoF for example which is not clearly associated with the K13 compartment. Which one is it for KIC11?

      The interpretation of the reviewer is correct, we indeed graded this subconsciously based on level of overlap. Based on the newly added quantification shown in Figure 2C, we describe KIC11 now as K13 compartment protein.

      29) Line 309-310: indicate a reference for your statement "which is in contrast to previously characterised essential K13 compartment proteins".

      Done, we now included Birnbaum et al. 2020 as reference for this.

      30) Line 377: Figure 4I, please correct 1st panel Y axis legend

      Done.

      31) Line 404: replace "dispensability" with dispensable

      Done.

      32) Line 416: can the authors provide any speculation as to why they observed these proteins as hits in the BioID experiments?

      As some of these proteins were less well or less consistently enriched, they could be background of the experiment. Alternatively, some could be proteins that only transiently interact with the K13 compartment.

      33) Line 451: Where the "97% of proteins containing these domains also contain an Adaptin_N domain and function in vesicle adaptor complexes as subunit a" come from. Do you have a reference?

      The statement now includes references and reads (with small changes to original submission): "More than 97% of proteins containing these domains also contain an Adaptin_N (IPR002553) domain (Blum et al., 2021) and in this combination typically function in vesicle adaptor complexes as subunit α (Hirst and Robinson, 1998; Traub et al., 1999) (Figure 5D) but no such domain was detectable in KIC5."

      34) Line 465-467: the same could be said for KIC4 as it also has a VHS domain.

      The critical issue is the combination of domains and their position within the protein. While KIC4 also contains a VHS domain, the VHS domain in KIC4 is N-terminal, not in a central position and it is also not the first structural domain to be identified in KIC4. The similarity to adaptin domains was already described ((Birnbaum et al., 2020) and annotated in PlasmoDB) and these domains are also involved in vesicle formation and trafficking. These aspects of the statement can therefore not be extended to KIC4. With regards to VHS domains being involved in vesicle trafficking, this is already stated in line 538: «KIC4 contained an N-terminal VHS domain (IPR002014), followed by a GAT domain (IPR004152) and an Ig-like clathrin adaptor α/β/γ adaptin appendage domain (IPR008152) (Figure 5A-C, Figure S8). This is an arrangement typical for GGAs (Golgi-localised gamma ear-containing Arf-binding proteins) which are vesicle adaptors first found to function at the trans-Golgi (Dell’Angelica et al., 2000; Hirst et al., 2000)

      35) Line 477-479: Can be rephrased to "However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 could be demonstrated, suggesting a limited role for PF3D7_1365800 in endocytosis. Or something like that. Makes it clearer.

      We rephrased this sentence and it now reads (line 592): However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 was observed, suggesting PF3D7_1365800 is not needed for endocytosis“.

      36) Line 535: Have AP-2a or AP-2b been shown to be at the K13 compartment?

      AP2m is at the K13 compartment (Birnbaum et al., 2020). Adaptor complexes are heterotetramers and their subunits do not typically function on their own and this is conserved across evolutionarily distant organisms. In agreement that this is also the case in P. falciparum, Henrici et al. (Henrici et al., 2020a) showed that both, AP-2a and AP-2b, were present in an AP2µ Co-IP, indicating that the AP2 complex consist of the ‘classical’ subunits in P. falciparum. Therefore, the presence of all subunits at the K13 compartment is very likely, although this has only been experimentally confirmed for AP2µ. Of note, for Toxoplasma gondii the presence of AP-2a and AP-2b at the micropore has been experimentally confirmed (Wan et al., 2023; Koreny et al., 2023) and interaction suggested by presence in the same IP as DRPC (Heredero-Bermejo et al., 2019).

      37) Line 569: reference 43 is wrong

      We thanks the reviewer for pointing this out – we removed Ref 43.

      38) Line 746: typo "ot" instead of or.

      Changed.

      39) Line 801: method for Domain Identification using AlphaFold specify that RMSDs of under 5Å over more than 60 amino acids are listed in the results. However, there is a typo in Figure 5B for KIC5 where it says "RMSD 4.0 Å over 8 aa". Please correct.

      Done. In addition, we have now applied a more stringent cut-off of 4Å over more than 60 amino acids to ensure a higher reliability of our hits. This decision was based on results from our preprint (Behrens and Spielmann, 2023). Because of this the phosphatase domain in KIC12 is no longer included in this manuscript and accordingly the following sentence has been deleted. In KIC12 we identified a potential purple acid phosphatase (PAP) domain. However, with the high RMSD of 4.9 Å, the domain might also be a divergent similar fold, such as a C2 domain, which targets proteins to membranes.”

      40) Line 856: In Figure 1E, please use the same Y axis legend as in Figure 2D "relative growth at day 4 [%] compared with 3D7"

      Done.

      41) Figure S1: Some PCR gels check for integration are presented as 5', 3' and ori whereas other gels are presented as ori, 5' and 3'. This is confusing.

      We agree that ideally the order of sample loading should be consistent and we apologise for this. The explanation for this is that these gels were run by different people at different times before we were able to better standardize the loading scheme. However, in the interest of not unnecessarily using resources for something that has a similar meaning, we would prefer not to repeat these PCRs and re-run them only for consistency reasons (as the conclusion is not affected by the different loading schemes).

      42) Figure S1: Why was the expression of only MCA2 was verified by Western blot? What about the other proteins?

      See response to major comment 56.

      43) Line 493: Considering KIC11 was not involved in HCCU or ART resistance it might be worth mentioning in this section that it is of note that there are no domains detected that would be involved in endocytosis.

      We agree that this is the case, however it is also the case for all other proteins that either are not involved in endocytosis and/or lowered susceptibility to ART. We therefore now added a summary statement addressing this in line 602: In contrast, the K13 compartment proteins where no role in ART resistance (based on RSA) or endocytosis was detected, KIC1, KIC2, KIC6, KIC8, KIC9 and KIC11, do not contain such domains (Figure 5E).” We did not add this at the suggested part of the manuscript as at that point the domain search results are not yet introduced and doing this each time for all the individual proteins would disconnect the flow of the manuscript.

      44) Line 503-506: is it wise to generate more drugs that target a pathway that is already highly susceptible to mutations? The authors should add a statement explaining how this might be avoided.

      The only protein for which mutations do not have a large fitness cost is K13 (see also our preprint on fitness cost of ubp1 mutation (Behrens et al., 2023) and even with K13 the level of resistance seems to be limited by amino acid deprivation when endocytosis is reduced (Mesén-Ramírez et al., 2021). We therefore do not think that this pathway is particularly prone for mutations. Further, the number of commercial drugs targeting the "endproduct" of endocytosis (hemoglobin digestion and detoxification of heme) highlight it as the most prominent vulnerability for drug-based intervention if we go by number of commercially available drugs acting on things associated with a single process.

      45) Throughout, scale bars are stated in the figure legends at the end of the legend. This is a slightly confusing format. The authors should consider stating the scale bar for each sub-legend where a fluorescence image is taken.

      Done.

      ** Referees cross-commenting**

      After reading reviewer 2 and 3's comments, I think there are significant overlaps in the key points raised in terms of questions about fusion proteins and their potential partial mis-localisation, better descripton of results and target selection. Overall I think we agree that the work has potential, but in its current form does not represent a major advance. It would be immensely helpful if the manuscript would be carefully edited for a better flow and linear description of results.

      We now rearranged the manuscript for better flow but would like to highlight that the many requests for smaller experimental issues (and "better description of results") worked somewhat in the opposite way of a more linear description. We hope the rearranged version acceptably balances these two issues. The issues raised in regards to target selection and potential partial mis-localisation are addressed in our responses mainly to this reviewer. Please also see comments on systems used at the end of the rebuttal.

      Reviewer #1 (Significance (Required)):

      The authors set out to test whether other proteins that are in the vicinity of K13 are involved in mediating ART resistance and endocytosis. This is an interesting question. However, other than MCA2 which was already known to be involved in mediating ART resistance (and was not tested for its involvement in endocytosis), none of their candidate proteins seem to be involved in mediating both these functions. The authors show that the other proteins tested appear important for parasite growth, with KIC12 and MyoF involved in mediating endocytosis. While these findings are novel, the KS approach used by the authors casts some doubt over the findings, and would mean that these findings would have to be re-tested with a more reliable approach, such as the GlmS system or generating a conditional knockout using the DiCre system. Despite not advancing our understanding of ART resistance, or identifying further players involved in this process, this manuscripts provides two candidates that are involved in mediating endocytosis and a further candidate that appears to be important for parasite growth. Further work on these proteins will be required to understand their exact roles. As stated above, there is currently limited interest for these results (limited to researchers working on endocytosis in apicomplexan parasites and possibly the wider endocytosis field from an evolutionary perspective), however with further work, this could increase the impact and interest of this work substantially.

      The authors do not describe any novel methods/approaches within this work.

      In the significance statement the reviewer indicates that other systems would have been more reliable for the work here. This is addressed in our response above and in a detailed considerations on the properties of conditional inactivation systems at the end of the rebuttal. The systems used in this work were not only chosen because they permit rapid targeting of many different proteins, but because they have merits that are beneficial for our assays. In fact many of the functional assays in this manuscript are difficult or impossible to carry with the suggested conditional inactivation systems (please note that we have extensive experience with the systems considered preferable:

      • DiCre (Birnbaum et al., 2017; Mesén-Ramírez et al., 2019; Mesén-Ramírez et al., 2021; Wichers et al., 2022; Kimmel et al., 2023)

      • glmS (Wichers et al., 2021c; Wichers et al., 2021a; Wichers et al., 2022; Wichers-Misterek et al., 2023)).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In a previous publication the Spielmann lab identified the molecular mechanism of ART resistance in P. falciparum by connecting reduced levels of the protein K13 to decreased endocytosis (uptake of hemoglobin from the RBC cytosol), which results in reduced ART susceptibility. Using quantitative BioID the authors further identified proteins belonging to a K13 compartment, highlighting an unusual endocytosis mechanism.

      In the present manuscript the authors follow up on this work and closely examine ten more proteins of the K13/Eps15-related "proxiome". They successfully link MCA2 to ART resistance in vitro, while the proteins MyoF and KIC12 are involved in endocytosis but do not confer in vitro ART resistance when impaired. They further characterize one candidate (KIC11) that partially colocalizes with K13 in trophozoites but to a lesser degree in schizonts. Growth assays suggest an important function for KIC11 in late stages of the intraerythrocytic developmental cycle. Five analyzed proteins however do not colocalize with the K13 compartment, while a sixth was refractory to endogenous tagging.

      Using AlphaFold predictions of the KIC protein structures the author identify domains in most constituents of the K13 compartment, highlighting vesicle trafficking-related features that were not identified on primary sequence level before.

      The combination of functional data together with structure predictions leads them to propose a refinement of the K13 compartment as being divided into proteins participating in endocytosis and proteins that have an unknown function.

      We thank the reviewer for the assessment of the manuscript and the constructive comments.

      Major comments:

      1) -Table 1 is missing

      We apologise for this mistake; Table 1 is now included.

      2) -Lines 117-123: Given the total list of uncharacterized candidates encompasses 13 proteins, can the author gives the reason why only the top 10 and not all 13 were characterized in this study?

      A similar point has been raised by Reviewer 1 in major comment #12, please see our response there for an explanation why we chose which targets.

      3) -Line 174: 20% of observed MCA2 foci show no overlap with K13 and 21% only partially overlap, can the author confirm that the observed MCA2 foci in schizonts are the ones that co-localize with K13. (Addition of a schizont stage image in Fig 1C would be sufficient).

      We now extended Figure 4C with images of MCA2-Y1344STOP-GFP+mCherryK13 parasites covering the schizont and merozoite stage, showing that the majority of the MCA2 foci in schizonts are also mCherry-K13 positive.

      4) -The localization and observed phenotype of KIC11 is interesting but unfortunately the authors do not explore it further. Does KIC11 localize with markers of e.g. the secretory organelles (micronemes or rhoptries) in schizonts and could therefore be involved in RBC invasion?

      While we intended to focus mainly on the endocytosis aspect of these proteins, we see the reviewer's point and now generated new cell lines enabling assessment of spatial association of KIC11 with markers for rhoptry (ARO), micronemes (AMA1), and inner membrane complex (IMC1c). This revealed that the KIC11-GFP signal in schizonts does not overlap with apical organelle markers and the signal does not resemble a typical apical localization. In addition, we assessed all three organelle markers after inactivating KIC11 by knock sideways which showed that KIC11 inactivation has no apparent effect on the appearance of these markers, suggesting no major alterations in schizont morphology in respect to apical markers. These results are now presented as Figure S3A and in line 304 of the results.

      5) Can the author distinguish if KIC11 is involved in RBC invasion or in establishment of the ring-stage parasite?

      In order to look into this, we performed egress/invasion assays, quantifying schizont and ring stage parasites in tightly synchronized parasites at two different time points (pre-egress: 38-42 hpi & post-egress: 46-50 hpi). This revealed a significant decrease in newly formed ring stage parasite per ruptured schizont in parasites with inactivated KIC11, while the egress efficacy remained unaffected. This indicated an invasion or very early ring stage development defect (new Figure 2H, Figure S3G). To further determine at which point exactly the phenotype occurs (ie during invasion or early after invasion) would require extensive experimentation that goes beyond the scope of this study (e.g. invasion assays using video microscopy with a representative number of parasites or sophisticated flow based quantification assays). We hope by excluding egress and gross changes of apical organelles as well as no indication for similar number of early rings (indicating it is invasion or a very early ring-establishment phenotype) will sufficiently narrow down the phenotype for labs interested in invasion to more definitely answer this question.

      Minor comments:

      1) Table S1: Please add the criterion for the order of proteins (abundance in "proxiome"?) in the table as a separate column. I would also suggest adding a new column that highlights the 10 proteins investigated in this study as I found the color-coding slightly confusing.

      Done as suggested: we now include the “average log2 Ratio normalized Kelch13” values from the four DiQ-BioID experiments performed with K13 in (Birnbaum et al., 2020), as well as the suggested column to highlight the investigated proteins. Please also see reviewer 1 major point # 12 for additional information on the selection criteria and how this was added to the manuscript.

      2) -154-155: There is a discrepancy between the text and Fig1C regarding the % of partial overlapping and non-overlapping foci.

      We thank the reviewer for pointing this out, this was corrected.

      3) -The y-axis label is missing in Fig 3E

      Done.

      4) -Fig 4I left graph, the superscript 2 is missing in μm2

      We thank the reviewer for pointing this out, this is now changed.

      5) -Did the author colocalize KIC11 in schizonts with other proteins found in the K13 compartment group of proteins not involved in endocytosis/ART resistance? This may help to further subgroup these proteins.

      This is an interesting point but would actually be technically challenging to do. For this we would need to generate a KIC11endo parasite line for each of these KICs and then do co-localisation in schizonts. However, the outcome of this likely would not be very clear. The reason for this is as follows. There are foci of KIC11 that do overlap with K13 in schizonts. One can expect that these foci show KIC11 at the K13 compartment and that the other KICs would overlap with KIC11 in these K13 foci in schizonts. Hence, we would also need to see K13 to find the non-K13 compartment KIC11 foci and see if these contained the KIC of interest. This is technically challenging because it would mean we would need a third fluorescent protein which is not that trivial to do. Due to the difficulty to do this and the large amount of work involved and the already considerable amount of data in this manuscript, we believe this will be better suited for a different study.

      6) -As a general comment: to make the beautiful IFAs more accessible to a broader readership, I would encourage the authors to switch the color-coding to green/magenta/blue or an equivalent color system or add grayscale images.

      This was done as suggested, all fluorescence images are now provided as greyscale images and the overlays are shown in magenta/green.

      Reviewer #2 (Significance (Required)):

      Characterizing the molecular components involved in Plasmodium endocytosis will not only reveal interesting biology in these highly adapted parasites, but will more importantly lead to a better understanding and potentially open new avenues for intervention of ART resistance. The here presented manuscript is a carefully executed follow-up on previous work done in Dr. Spielmann's lab focusing on the K13 compartment. The authors use established assays to characterize novel components and reveal three new players in endocytosis with one mediating ART resistance in vitro. The proposition that parts of the K13 compartment have a function other than endocytosis is interesting, but will have to await more data from future studies. Taken together this manuscript adds significantly to our understanding of endocytosis in P. falciparum.

      This work is of interest for cell and molecular biologists working on Apicomplexa, but especially for the Plasmodium community.

      We thank the reviewer for this positive assessment.

      I am a cell and molecular biologist working on Toxoplasma gondii

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors characterized 4 proteins from P. falciparum via cellular (co-)localization, endocytosis, parasite growth, and artemisinin resistance assays. These proteins have been identified as candidates for Kelch13 compartment and a possible role in endocytosis in their previously work with quantitative BioID for potential proximity to K13 and Eps15 (Birnbaum et al. 2020). In the current work, additional 6 proteins were not confirmed as being associated to the K13 compartment. This experimental work was complemented by an in-silico analysis of protein domains based on AlphaFold algorithm. For this protein structure evaluation all proteins were chosen, which were experimentally confirmed to be linked to the K13 compartment in the current publication and previous work. With the work 3 novel proteins linked to artemisinin resistance or endocytosis could be functionally described (KIC12, MCA2, and MyoF) and a number of hypotheses were generated.

      We thank the reviewer for the assessment of the manuscript and the constructive comments.

      Major comments:

      The quality of the presented work is solid, the experimental design is adequate, and methods are presented clearly. The publication contains a lot of results both presented in text and in the figures and it is not always straight forward for the reader to follow the descriptions due to many details presented and a lack of context for some of these experiments.

      We thank the reviewer for this overall positive assessment.

      We now reordered the results section in an attempt to increase the flow of the manuscript. We also made changes to improve the context for the results. Given the further (very valid) requests for data on schizonts and invasion, there was an increased danger for a less linear manuscript that we hope to have acceptably managed with the re-arrange.

      Specific suggestions for consideration by the authors to improve the manuscript. Abstract: 1) R 31: Mention how the 4 proteins were identified as candidates, you need to refer to previous work to clarify this

      To clarify this the sentence was changed to (line 31): "Here we further defined the composition of the K13 compartment by analysing more hits from a previous BioID, showing that MyoF and MCA2 as well as Kelch13 interaction candidate (KIC) 11 and 12 are found at this site."

      2) R38: "Second group of proteins" is confusing - different from the 4 mentioned above? Significance to endocytosis unclear. Please unify terminology in the manuscript, see also comment below on proxiome.

      We changed the wording to clarify the group issue in the abstract as follows line 34: "Functional analyses, tests for ART susceptibility as well as comparisons of structural similarities using AlphaFold2 predictions of these and previously identified proteins showed that canonical vesicle trafficking and endocytosis domains were frequent in proteins involved in resistance or endocytosis (or both), comprising one group of K13 compartment proteins, While this strengthened the link of the K13 compartment to endocytosis, many proteins of this group showed unusual domain combinations and large parasite-specific regions, indicating a high level of taxon-specific adaptation of this process. Another group of K13 compartment proteins did not influence endocytosis or ART susceptibility and lacked detectable vesicle trafficking domains. We here identified the first protein of this group that is important for asexual blood stage development and showed that it likely is involved in invasion.”

      3) Abstract can only be understood after reading the full publication

      We attempted to amend this by expanding the abstract, particularly the changes highlighted in the previous two points.

      Results: 4) Table 1 is missing from the submitted materials

      We apologise for this mistake. Table 1 is now included.

      5) Consider to shorten and stratify the result section to focus on the significant data

      We rearranged the results in an attempt to streamline this section and are now starting with MyoF in the revised manuscript. However, as highlighted by the requests from reviewer 1, many details need to be available to support our conclusions. For instance the fact that GFP-tagging partially inactivated MyoF asked for further data to support our conclusion (HA-tagged version, showing that the location of the GFP-tagged version was consistent with the HA-tagged version, showing to what extent the different constructs affected growth and correlated with number of vesicles and bloating, see new figure 1M) or that KIC12 has two locations. Overall, we are therefore hesitant to remove data or description from the result part.

      6) Unclear how the localization and functionalization assays might be impaired by the fusion proteins Significance of ART resistance assay is not clear, in presence of strong growth effects due to inactivation or truncation of genes/proteins

      As indicated also in the example given in the previous point (this reviewer #5), the use of different cell lines (GFP-tagged live cells and small epitope tag in IFA) for targets with an indication for an effect of the tagging confirm that the location we assigned is reasonable. In the case of MyoF, the HA-tagged line, the partial inactivation due to GFP and the further inactivation in the GFP-tagged line by knock sideways show plausible increase of phenotypes (vesicle accumulation and bloated FV assays). Thereby the GFP-tagged line can be seen as a partial inactivation line that further supports our conclusions and overall this paints a consistent picture of the function of this protein in endocytosis (see new Figure 1M better illustrating this). Please note that the difference in location shown by this line compared to the HA-tagged proteins is only small (see also reviewer 1 major point 23ff). See also general discussion on tags at the end of this rebuttal.

      Significance of ART resistance assay: The ‘ART resistance assay’ is done comparing +/- ART (DHA) in identical parasites (originating from the same culture and the same condition). Hence, any growth effects are cancelled out and effects in reducing ART susceptibility would - if at all - be underestimated (see more detailed response to point 28, reviewer 1 and controls in Birnbaum et al., 2020 where we tested an unrelated essential protein, unrelated chemical insult and rapalog on 3D7 and did not detect any effect on RSA survival).

      MCA 7) Stratify results, order by significance of findings, it appears to be described in chronological order, improve readability/flow, eg ART resistance if mentioned in r138, but only reported in r183ff

      We attempted to stratify, but then the reason for generating the partial MCA2 disruption parasite line becomes very arbitrary and would leave the reader wondering why we at all truncated the protein at two thirds of the protein. Hence, we do not see a way around this chronological reporting. However, this part is now not at the start of the experimental results section anymore, possibly making it overall a bit more palatable.

      MyoF 8) R195 to 197 - consider moving to discussion as it is distracting here

      This was shortened and additional information (asked for by reviewer 1, major point 22) to clarify that MyoF was previously called MyoC, was added (line 147): “The presence of MyosinF (MyoF; PF3D7_1329100 previously also MyoC), in the K13 proxiome could indicate an involvement of actin/myosin in endocytosis in malaria parasites. "

      9) Term proxiome is introduced above, but not used in result section - suggest to unify language, eg r195 uses "K13 compartment DiQ-BioIDs" instead, which is not very convenient for the reader

      We carefully reviewed this and made this more consistent.

      10) What is the enrichment factor? Please provide for this and the following proteins, eg in Table 1

      The enrichment factor is log2 enrichment over control and this is now provided in table S1 (see also detailed answer for Reviewer 1 major point 12).

      11) R225 to 243 - overall significance of the growth experiments with mislocaliser is not clear, consider removing from manuscript or explain relevance more clearly

      See also point 28, reviewer 1: This experiment is actually quite important. It shows that if we conditionally inactivate the GFP-tagged MyoF, the growth is further reduced, as stated in line 208. It might have been confusing that the mislocalisation is only partial, but this is equivalent to a partial knock down and hence is useful. This becomes even more relevant with the specific assays following in the next paragraph: while the tagging of MyoF already resulted in vesicles, conditional inactivation with KS generated even more vesicles, showing that the same phenotype was rapidly increased when MyoF was further inactivated by a different means and this also correlated with growth. Hence, this is actually a very consistent phenotype that despite some shortcomings of the tools available to analyse this protein (due to the partial inactivation by the GFP tag) in our eyes looks very convincing. We now added a graph showing the correlation of growth and phenotypes to illustrate this (Figure 1L).

      We also tried to make this clearer by changing line 200 to: Hence, conditional inactivation of MyoF further reduced growth despite the fact that the tag on MyoF already led to a substantial growth defect, indicating an important role for MyoF during asexual blood stage development.” And line 208 to:“ This was even more pronounced upon conditional inactivation of MyoF by KS (Figure 1H), suggesting this is due to a reduced function of MyoF.”

      12) KIC11/KIC12 Enrichment factor?

      The enrichment (’average log2 Ratio normalized Kelch13 from Birnbaum et al. 2020’) is 1.65 for KIC11 and 1.32 for KIC12, which is now also explicitly shown in column D of Table S1.

      ** Referees cross-commenting**

      I would like to applaud reviewer #1 for a great, very thorough review and lots of detailed suggestions. I agree with the conclusions mentioned in the significance evaluation from reviewer #1 and #2: the work presented does not contain novel methods and the scope is rather narrow with the current results. (I am working on clinical studies with novel antimalarial agents)

      Reviewer #3 (Significance (Required)):

      On the one hand side, the authors have wrapped up some of the remaining protein candidates of the K13 compartment and could verify 4 of 10 proteins. The work is of interest for the scientific community working on endocytosis and malaria drug resistance mechanisms. Overall, the conclusions and findings from the previous work, Birnbaum et al. 2020, could be confirmed and extended mainly using the methods previously described. On the other hand, the authors made use of progress in protein structure predictions and identified domains linking the K13 compartment proteins to putative functions. The overlaid protein folds of the newly identified domains in figure 5 look convincing, but I can't comment on the technical details or cut-off used for this in-silico analysis.

      Extended general remarks on the systems used for this work:

      Mainly reviewer 1 suggest (in the general comments and the significance statement) that other systems would have been better suited to use for this work, namely glmS and diCre and also has concerns about the large tag which is seconded by a comment of reviewer 3. In light of this we here provide some extended considerations on the properties for conditional systems and tagging in regards to the goals of this work.

      We would like to point out that we do have experience with the systems considered better-suited by the reviewer (one of the first authors has extensively used glmS (Wichers et al., 2021c; Wichers et al., 2021a; Wichers et al., 2022; Wichers-Misterek et al., 2023) and our lab was one of the first to adopt the diCre system in P. falciparum parasites and we regularly us it (Birnbaum et al., 2017; Mesén-Ramírez et al., 2019; Kimmel et al., 2023)). Clearly, these methods have a lot of strengths but there are a number of issues to be considered for the assays we use in this work (see the next section on conditional inactivation systems). In a nutshell, we believe diCre would give a more reliable readout of the absolute level of "essentiality" (i.e. importance for growth) but is unsuitable or at least difficult to use for the assays that reveal the function of our interest in this work. GlmS basically combines the drawbacks of diCre and knock sideways and hence for most targets is not expected to give a better readout of level of "essentiality" but is similarly difficult to use for our specific assays. The fact that both of these systems are possible to use without adding a tag to the target may be an advantage but without tag one loses some very important features that can be critical to understand the outcome with a given system (see considerations on the tag further below).

      Conditional inactivation systems:

      1. __ speed of inactivation:__ glms acts on mRNA and diCre on the gene level, which makes them slower than techniques acting directly on the protein such as DD or KS. With diCre, mRNA and protein is still left, even if the gene is very rapidly excised. For instance for Kelch13 it takes 3-4 days after excising the gene until protein levels have waned enough that this manifests in a reduced growth (Birnbaum et al., 2017). While in some instances diCre permits same cycle analyses if the protein has a very rapid turn-over (e.g. Rab5a, (Birnbaum et al., 2017)), control in a few hours is still difficult. For vesicle accumulation and bloated food vacuole assays, which are done over comparably short time frames and with specific stages, it is rather challenging to hit the correct time of induction to have all the cells at the correct stage with suitably (and uniformly, ie all cells) sufficiently reduced target protein levels during the assay time. Slow acting systems are also more prone to secondary effects. The more immediate the inactivation, the closer it is to the core of the affected function. With vesicle trafficking processes this is particularly relevant as all vesicle trafficking in a cell is interconnected and there are always recycling pathways that maintain the membrane and protein homeostasis of individual compartments. Particularly for endocytosis there seem to be compensatory capacities at least in other organisms (see e.g. (Chen and Schmid, 2020)). One reason why knock sideways was developed is that it permitted to avoid compensatory changes when vesicle adaptors are inactivated (Robinson et al., 2010).

      The comparably short time frame for malaria parasites to go through different stages during blood stage development also is an issue relevant for inactivation speed. The advantage of speed and the danger of obscured phenotypes is highlighted by our work on VPS45 which showed that in trophozoites this protein is involved in the transport of hemoglobin to the FV whereas in late stages it also has a role in secretory processes. Both of these functions we were able to specifically assess in the same growth cycle using KS to rapidly inactivate the protein (Bisio et al., 2020) but with a slower system would have been more complicated to dissect.

      Speed of effect with glmS: unless the KS does not work well, glmS is slower acting than KS (it does not target the already synthesised protein which can remain in the cell) and also often suffers from only partial inactivation, hence the benefit of using it here is unclear. The option to have an untagged protein is a plus, however it also is a minus, as assessing efficiency (particularly in live cells e.g. for bloated assays etc a fluorescent tag is the only direct option to assess inactivation of target) is critical to ensure the phenotype manifests at the stage of interest.

      lethality/absolute phenotypic effects are detrimental to some assays to study the functions we are interested in for this work: no RSA can be conducted, if the gene is lost and the parasites die. Again, with diCre, one could attempt to hit the point when the parasites have lost sufficient amounts of the target protein when they are placed under ART but then the parasites need to continue growing for ~3 days, which is not possible if the cKO is lethal except for very slowly turning over proteins. However, in that latter case, the parasites likely still had full functionality of the target protein at the beginning of the RSA, when the drug pulse happens and there would be no effect. Knock sideways solves these problems by permitting knock sideways inactivation only under ART (or with a few hours pre-incubation depending on the inactivation speed) to not yet affect growth in a severe manner but inhibiting the process the protein is involved in. It may be possible to use glmS for RSAs, but the slow speed would complicate it (it would not permit control of target protein levels in a matter of a few hours to inactivate the target protein and then re-install it).

      None-absolute inactivation is also a strength for some functional assays. While we really like using diCre, in the case of EXP1 it made it necessary to complement the exp1 cKO parasites with low levels of EXP1 to be able to do functional assays without killing the parasites (Mesén-Ramírez et al., 2019; Mesén-Ramírez et al., 2021). While the lethality issue does not apply to glmS (like knock sideways, it also can be tuned), it is unclear what would be gained over knock sideways. Knockdown levels with glmS vary from gene to gene and cannot be predicted, it is in most cases considerably slower than KS, it requires glucosamine which becomes toxic at higher concentrations and might introduce off target effects and tracking protein levels during the assay would equally need GFP tagging.

      Integration of properties of conditional systems

      Given the above discussed properties, several factors have to be considered to be able to use a system for a given assay. Stage-specific transcription is one example. For diCre a protein not expressed in e.g. rings permits to remove the gene and the protein is never made in that parasite development cycle. We exploited this for instance for two proteins only expressed from the trophozoite stage onwards (Kimmel et al., 2023). However, if lethal (absolute effect problem), this also means one can also only see the phenotype on onset of expression of the target (e.g. if in mitosis, the first nuclear division in case the protein is absolutely essential for the process). This is just one example of such issues. Expression timing, turnover of the protein and homogeneity of stage-specific loss of protein will all influence how clearly the phenotype can be determined. All this will decide the exact time of loss/inactivation of the target protein to levels generating a phenotype and ideally therefore can be monitored during an assay (see considerations on tagging).

      For these reasons vesicle accumulation or bloated food vacuole assays are difficult with slow systems as ideally the target should rapidly be inactivated at the trophozoite stage and the result monitored before the cells have moved to the schizont stage. For this a well responding knock sideways is ideal as the protein can be rapidly taken away (sometimes within seconds) to visualise the immediate, direct effect in the cell.

      As shown for KIC11, there is also no disadvantage of using KS for proteins with other assays or proteins that result in different phenotypes. It permits stage-specific same cycle inactivation without having to worry about the turnover of mRNA and protein (Fig. 2F,G). Thus, besides the advantages of knock sideways for endocytosis related assays and RSAs, we also see no disadvantage of using knock sideways for the functional study of KIC11 which has a role other than endocytosis. KS also permits to specifically target the K13 pool of KIC12, something impossible or very difficult to do with other systems. Hence, we are of the opinion that the system for inactivation was adequate for most of the proteins analysed in this manuscript.

      Large tag: we agree that GFP-tagging can be a disadvantage but in our opinion its benefits often outweigh the drawbacks because it permits easy and immediate (on individual cell level, if need be) monitoring of the presence/location of the target protein (e.g. after KS, but given the discrepancy of the timing between gene excision and protein loss, it might be even more important for techniques such as diCre). No fixing/permeabilisation (prone to artifacts, prevents immediate view of cells) to detect a target with specific antibodies or via a small tag is needed with GFP. Similarly, the use of Western blots to do this is time consuming and impractical if monitoring of left-over protein in the course of an assay such as a bloated food vacuole assay is needed.

      In many cases, adding GFP has no negative effect. In addition, if the bulky folded structure of GFP is tolerated, it usually also tolerates the 2 to 4 12kDa FKBP domains in our standard tag. We also typically add a linker. This approach has worked for a large number of different proteins, including many essential ones for which we would not otherwise have obtained the integration cell lines (Birnbaum et al., 2017; Jonscher et al., 2019; Hoeijmakers et al., 2019; Birnbaum et al., 2020; Kimmel et al., 2023; Sabitzki et al., 2023). Hence, whenever a cell line is obtained with it, this tag in most cases is not a disadvantage. Admittedly an exception in this is MyoF and to some extent maybe MCA2 (we would like to stress that in the case of MCA2 the reason for not being able to obtain the full length tagged cell line is unclear: the protein can be severely truncated to less than 3% of its amino acid sequence and a GFP-tag is tolerated on the version with 2/3s of the protein left, which gives no good reason why the full length was not obtained; a potential reason could be a dominant negative effect). However, we obtained the full length with a small tag detected by IFA for both, MyoF and MCA2 and the location of these agreed well with the GFP tagged versions, indicating that the GFP-tagged versions are useful to show the location of these proteins in live cells.

      There are also tricks to attempt monitoring the effect of e.g. diCre without tagging the target. For instance, if a fluorescent protein is connected to excision without actually being fused to the target (ie excision of the gene leads to its expression of e.g. GFP), which would avoid adding a tag to the target itself. However, the problem with this is that expression of GFP does only show excision, but mRNA producing the target protein and left over target protein may still be there in the cell. All in all, the GFP-tag on the target, while with some drawbacks, is still our preferred method to control to monitor the target protein in the cell (in principle permitting quantification of ablation efficiency on the individual cell level).

      Conclusion on these considerations for this manuscript

      Based on these considerations we do not see the immediate benefit of changing the system for the conclusions drawn from this study and are unsure if they are indeed better suited for this work as suggested. While a more exact readout of "essentiality" might be possible with the diCre system we are of the opinion this is less important than learning the function of a protein which - as outlined above - we believe to be considerably more difficult with diCre and even more so with glmS considering our target functions. The same applies to target specific cellular pools of a protein as done here for KIC12. Clearly MyoF is one example where the employed systems shows limitations, but with the new Figure part showing consistency in phenotype with degree of inactivation (importantly with two different forms of inactivation) and the clarification that the location of the GFP-tagged and HA-tagged versions are actually quite similar in location, we do not think employing an extra system is warranted for the conclusions of this work. Admittedly, the apparent lack of need in ring stags might give an opening to attack MyoF using diCre (by excision before its major expression peak), but depending on lethality this might preclude extended analyses (possibly vesicle assays, for sure not RSAs).

      In the end the question is, if our approach provides the function of target analysed in this work and based on the data in our manuscript and the arguments in the rebuttal, we are reasonably confident that this is the case. It is not very likely the other mentioned techniques would result in a different conclusion on the function of the here studied proteins. In fact, we expect other commonly used techniques to be less suitable for the key assays in this work.

      References used in our responses to the reviewers’ comments:

      Behrens, H.M., Schmidt, S., Peigney, D., Sabitzki, R., Henshall, I., May, J., et al. (2023) Impact of different mutations on Kelch13 protein levels, ART resistance and fitness cost in Plasmodium falciparum parasites. bioRxiv 2022.05.13.491767.

      Behrens, H.M., Schmidt, S., and Spielmann, T. (2021) The newly discovered role of endocytosis in artemisinin resistance. Med Res Rev med.21848.

      Behrens, H.M., and Spielmann, T. (2023) Identification of domains in Plasmodium falciparum proteins of unknown function using DALI search on Alphafold predictions. bioRxiv 2023.06.05.543710.

      Birnbaum, J., Flemming, S., Reichard, N., Soares, A.B., Mesén-Ramírez, P., Jonscher, E., et al. (2017) A genetic system to study Plasmodium falciparum protein function. Nat Methods 14: 450–456.

      Birnbaum, J., Scharf, S., Schmidt, S., Jonscher, E., Hoeijmakers, W.A.M., Flemming, S., et al. (2020) A Kelch13-defined endocytosis pathway mediates artemisinin resistance in malaria parasites. Science (80- ) 367: 51–59.

      Bisio, H., Chaabene, R. Ben, Sabitzki, R., Maco, B., Baptiste Marq, J., Gilberger, T.W., et al. (2020) The zip code of vesicle trafficking in apicomplexa: Sec1/munc18 and snare proteins. MBio 11: 1–21.

      Blum, M., Chang, H.Y., Chuguransky, S., Grego, T., Kandasaamy, S., Mitchell, A., et al. (2021) The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49: D344–D354.

      Borrmann, S., Straimer, J., Mwai, L., Abdi, A., Rippert, A., Okombo, J., et al. (2013) Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya. Sci Rep 3.

      Bozdech, Z., Llinás, M., Pulliam, B.L., Wong, E.D., Zhu, J., and DeRisi, J.L. (2003) The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol 1: e5.

      Burnette, W.N. (1981) “Western Blotting”: Electrophoretic transfer of proteins from sodium dodecyl sulfate-polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein A. Anal Biochem 112: 195–203.

      Casella, J.F., Flanagan, M.D., and Lin, S. (1981) Cytochalasin D inhibits actin polymerization and induces depolymerization of actin filaments formed during platelet shape change. Nature 293: 302–305.

      Cerqueira, G.C., Cheeseman, I.H., Schaffner, S.F., Nair, S., McDew-White, M., Phyo, A.P., et al. (2017) Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites reveals complex genomic architecture of emerging artemisinin resistance. Genome Biol 18: 78.

      Chen, Z., and Schmid, S.L. (2020) Evolving models for assembling and shaping clathrin-coated pits. J Cell Biol 219.

      Dell’Angelica, E.C., Puertollano, R., Mullins, C., Aguilar, R.C., Vargas, J.D., Hartnell, L.M., and Bonifacino, J.S. (2000) GGAs: A family of ADP ribosylation factor-binding proteins related to adaptors and associated with the Golgi complex. J Cell Biol 149: 81–93.

      Demas, A.R., Sharma, A.I., Wong, W., Early, A.M., Redmond, S., Bopp, S., et al. (2018) Mutations in Plasmodium falciparum actin-binding protein coronin confer reduced artemisinin susceptibility. Proc Natl Acad Sci 201812317.

      Henrici, R.C., Edwards, R.L., Zoltner, M., Schalkwyk, D.A. van, Hart, M.N., Mohring, F., et al. (2020a) The plasmodium falciparum artemisinin susceptibility-associated ap-2 adaptin μ subunit is clathrin independent and essential for schizont maturation. MBio 11.

      Henrici, R.C., Schalkwyk, D.A. van, and Sutherland, C.J. (2020b) Modification of pfap2μ and pfubp1 Markedly Reduces Ring-Stage Susceptibility of Plasmodium falciparum to Artemisinin in Vitro. Antimicrob Agents Chemother 64.

      Henriques, G., Hallett, R.L., Beshir, K.B., Gadalla, N.B., Johnson, R.E., Burrow, R., et al. (2014) Directional selection at the pfmdr1, pfcrt, pfubp1, and pfap2mu loci of Plasmodium falciparum in Kenyan children treated with ACT. J Infect Dis 210: 2001–2008.

      Heredero-Bermejo, I., Varberg, J.M., Charvat, R., Jacobs, K., Garbuz, T., Sullivan, W.J., and Arrizabalaga, G. (2019) TgDrpC, an atypical dynamin-related protein in Toxoplasma gondii, is associated with vesicular transport factors and parasite division. Mol Microbiol 111: 46–64.

      Hirst, J., Lui, W.W.Y., Bright, N.A., Totty, N., Seaman, M.N.J., and Robinson, M.S. (2000) A family of proteins with γ-adaptin and VHS domains that facilitate trafficking between the trans-golgi network and the vacuole/lysosome. J Cell Biol 149: 67–79.

      Hirst, J., and Robinson, M.S. (1998) Clathrin and adaptors. Biochim Biophys Acta - Mol Cell Res 1404: 173–193.

      Hoeijmakers, W.A.M., Miao, J., Schmidt, S., Toenhake, C.G., Shrestha, S., Venhuizen, J., et al. (2019) Epigenetic reader complexes of the human malaria parasite, Plasmodium falciparum. Nucleic Acids Res 47: 11574–11588.

      Jonscher, E., Flemming, S., Schmitt, M., Sabitzki, R., Reichard, N., Birnbaum, J., et al. (2019) PfVPS45 Is Required for Host Cell Cytosol Uptake by Malaria Blood Stage Parasites. Cell Host Microbe 25: 166-173.e5.

      Kimmel, J., Schmitt, M., Sinner, A., Jansen, P.W.T.C., Mainye, S., Ramón-Zamorano, G., et al. (2023) Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell Syst 14: 9-23.e7.

      Koreny, L., Mercado-Saavedra, B.N., Klinger, C.M., Barylyuk, K., Butterworth, S., Hirst, J., et al. (2023) Stable endocytic structures navigate the complex pellicle of apicomplexan parasites. Nat Commun 14: 2167.

      Kumari, V., Singh, A.P., Singh, J., Sharma, R., Akhter, M., Mishra, P.K., et al. (2018) Biochemical characterization of unusual cysteine protease of P. falciparum, metacaspase-2 (MCA-2). Mol Biochem Parasitol 220: 28–41.

      Lazarus, M.D., Schneider, T.G., and Taraschi, T.F. (2008) A new model for hemoglobin ingestion and transport by the human malaria parasite Plasmodium falciparum. J Cell Sci 121: 1937–1949.

      Lopez-Hernandez, F.J., Ortiz, M.A., Bayon, Y., and Piedrafita, F.J. (2003) Z-FA-fmk inhibits effector caspases but not initiator caspases 8 and 10, and demonstrates that novel anticancer retinoid-related molecules induce apoptosis via the intrinsic pathway. Mol Cancer Ther 2: 255–263.

      Lord, S.J., Velle, K.B., Mullins, R.D., and Fritz-Laylin, L.K. (2020) SuperPlots: Communicating reproducibility and variability in cell biology. J Cell Biol 219.

      MalariaGEN, Ahouidi, A., Ali, M., Almagro-Garcia, J., Amambua-Ngwa, A., Amaratunga, C., et al. (2021) An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. Wellcome open Res 6: 42.

      Marti, M., Good, R.T., Rug, M., Knuepfer, E., and Cowman, A.F. (2004) Targeting malaria virulence and remodeling proteins to the host erythrocyte. Science 306: 1930–3.

      Mesén-Ramírez, P., Bergmann, B., Elhabiri, M., Zhu, L., Thien, H. von, Castro-Peña, C., et al. (2021) The parasitophorous vacuole nutrient pore is critical for drug access in malaria parasites and modulates the fitness cost of artemisinin resistance. Cell Host Microbe 0: 283.

      Mesén-Ramírez, P., Bergmann, B., Tran, T.T., Garten, M., Stäcker, J., Naranjo-Prado, I., et al. (2019) EXP1 is critical for nutrient uptake across the parasitophorous vacuole membrane of malaria parasites. PLoS Biol 17: e3000473.

      Mukherjee, A., Crochetière, M.-È., Sergerie, A., Amiar, S., Thompson, L.A., Ebrahimzadeh, Z., et al. (2022) A Phosphoinositide-Binding Protein Acts in the Trafficking Pathway of Hemoglobin in the Malaria Parasite Plasmodium falciparum. MBio 13.

      Otto, T.D., Wilinski, D., Assefa, S., Keane, T.M., Sarry, L.R., Böhme, U., et al. (2010) New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq. Mol Microbiol 76: 12–24.

      Robinson, M.S., Sahlender, D.A., and Foster, S.D. (2010) Rapid Inactivation of Proteins by Rapamycin-Induced Rerouting to Mitochondria. Dev Cell 18: 324–331.

      Sabitzki, R., Schmitt, M., Flemming, S., Jonscher, E., Hoehn, K., Froehlke, U., and Spielmann, T. (2023) Identification of a Rabenosyn-5 like protein and Rab5b in host cell cytosol uptake reveals conservation of endosomal transport in malaria parasites. bioRxiv 2023.04.05.535711.

      Simwela, N. V., Hughes, K.R., Roberts, A.B., Rennie, M.T., Barrett, M.P., and Waters, A.P. (2020) Experimentally engineered mutations in a ubiquitin hydrolase, UBP-1, modulate in vivo susceptibility to artemisinin and chloroquine in plasmodium berghei. Antimicrob Agents Chemother 64.

      Spielmann, T., Gras, S., Sabitzki, R., and Meissner, M. (2020) Endocytosis in Plasmodium and Toxoplasma Parasites. Trends Parasitol 36: 520–532.

      Subudhi, A.K., O’Donnell, A.J., Ramaprasad, A., Abkallo, H.M., Kaushik, A., Ansari, H.R., et al. (2020) Malaria parasites regulate intra-erythrocytic development duration via serpentine receptor 10 to coordinate with host rhythms. Nat Commun 11.

      Traub, L.M., Downs, M.A., Westrich, J.L., and Fremont, D.H. (1999) Crystal structure of the α appendage of AP-2 reveals a recruitment platform for clathrin-coat assembly. Proc Natl Acad Sci U S A 96: 8907–8912.

      Wagner, M.P., Formaglio, P., Gorgette, O., Dziekan, J.M., Huon, C., Berneburg, I., et al. (2022) Human peroxiredoxin 6 is essential for malaria parasites and provides a host-based drug target. Cell Rep 39: 110923.

      Wall, R.J., Zeeshan, M., Katris, N.J., Limenitakis, R., Rea, E., Stock, J., et al. (2019) Systematic analysis of Plasmodium myosins reveals differential expression, localisation, and function in invasive and proliferative parasite stages. Cell Microbiol 21.

      Wan, W., Dong, H., Lai, D.-H., Yang, J., He, K., Tang, X., et al. (2023) The Toxoplasma micropore mediates endocytosis for selective nutrient salvage from host cell compartments. Nat Commun 14: 977.

      Wichers-Misterek, J.S., Binder, A.M., Mesén-Ramírez, P., Dorner, L.P., Safavi, S., Fuchs, G., et al. (2023) A Microtubule-Associated Protein Is Essential for Malaria Parasite Transmission. MBio .

      Wichers, J.S., Gelder, C. van, Fuchs, G., Ruge, J.M., Pietsch, E., Ferreira, J.L., et al. (2021a) Characterization of Apicomplexan Amino Acid Transporters (ApiATs) in the Malaria Parasite Plasmodium falciparum. mSphere 6.

      Wichers, J.S., Mesén-Ramírez, P., Fuchs, G., Yu-Strzelczyk, J., Stäcker, J., Thien, H. von, et al. (2022) PMRT1, a Plasmodium -Specific Parasite Plasma Membrane Transporter, Is Essential for Asexual and Sexual Blood Stage Development. MBio 13.

      Wichers, J.S., Scholz, J.A.M., Strauss, J., Witt, S., Lill, A., Ehnold, L.-I., et al. (2019) Dissecting the Gene Expression, Localization, Membrane Topology, and Function of the Plasmodium falciparum STEVOR Protein Family. MBio 10: e01500-19.

      Wichers, J.S., Tonkin-Hill, G., Thye, T., Krumkamp, R., Kreuels, B., Strauss, J., et al. (2021b) Common virulence gene expression in adult first-time infected malaria patients and severe cases. Elife 10.

      Wichers, J.S., Wunderlich, J., Heincke, D., Pazicky, S., Strauss, J., Schmitt, M., et al. (2021c) Identification of novel inner membrane complex and apical annuli proteins of the malaria parasite Plasmodium falciparum. Cell Microbiol 23: e13341.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      With the emergence and spread of resistance to Artemisinin (ART), a key component of current frontline malaria combination therapies, there is a growing effort to understand the mechanisms that lead to ART resistance. Previous work has shown that ART resistant parasites harbour mutations in the Kelch13 protein, which in turn leads to reduced endocytosis of host haemoglobin. The digestion of haemoglobin is thought to be critical for the activation of the artemisinin endoperoxide bridge, leading to the production of free radicals and parasite death. However, the mechanisms by which the parasites endocytose host cell haemoglobin remain poorly understood.

      Previous work by the authors identified several proteins in the proximity of K13 using proximity-based labelling (BioID) (Birnbaum et al. 2020). The authors then went on to characterise several of these proteins, showing that when proteins including EPS15, AP2mu, UBP1 and KIC7 are disrupted, this leads to ART resistance and defects in endocytosis leading to the hypothesis that these two processes are inextricably linked.

      In this manuscript, Schmidt et al. set themselves the task of characterising more K13 component candidates identified in their previous work (Birnbaum et al. 2020) that were not previously validated or characterised. They chose 10 candidates and investigated their localisations, and colocalisation with K13, and their involvement in endocytosis and in vitro ART resistance, 2 processes mediated by K13 and some members of the K13 compartments

      The authors show that of their 10 candidates, only 4 can be co-localised with K13. Then, using a combination of targeted gene disruption (TGD) as well as knock sideways (KS), they characterised these 4 proteins found in the K13 compartment. They show that MyoF and KIC12 are involved in endocytosis and are important for parasite growth, however their disruption does not lead to a change in ART sensitivity. The authors also confirm the findings of their previous publication (Birnbaum et al. 2020), using a slightly different TGD, that MCA2 is involved in ART resistance, however they did not check whether its disruption impacts haemoglobin uptake. They also show that KIC11 is not involved in mediating haemoglobin uptake or ART resistance. To finish, the authors used AlphaFold to identify new domains in the proteins of the K13 compartment. This led them to the conclusion that vesicle trafficking domains are enriched in proteins of the K13 compartment involved in endocytosis and in vitro ART resistance.

      The majority of the experiments conducted by the authors are performed to a good standard in biological and technical replicates, with the correct controls. Their findings provide confirmation that their 4 candidate genes seem to be important for parasite growth, and show that some of their candidates are involved in endocytosis. While the KD and KS approaches employed by the authors to study their candidate genes each have their own advantages and can be excellent tools for studying a large sets or genes, this manuscript highlights the many limitations of these approaches. For example, the large tag used for the KS approach can mislocalise proteins or disrupt their function (as is the case for MyoF), resulting in spurious results, or indeed the inability to generate the tagged line (as is the case for MCA2). The KS approach also makes the results of a protein with a dual localisation, like KIC12, extremely difficult to interpret.

      Moreover, the manuscript is disjointed at times, with the authors choosing to conduct certain experiments for only a subset of genes, but not for others. For example, considering that the aim of this paper was to identify more proteins involved in ART resistance and endocytosis, it is confusing why the authors do not perform the endocytosis assays for all their selected proteins, and why they do not do this for the proteins they identify in their domain search. There is significant room for improvement for this manuscript, and a generally interesting question. But in it's current format, other than confirming that MCA2 is involved in ART resistance (which was already known from the Birnbaum paper), the authors do not further expand our understanding of the link between ART resistance and endocytosis in this manuscript.

      Major Comments

      line 31: please change defined to characterised - defined suggests that novel proteins were identified in this study, which is not the case.

      line 37: please change 'second' to "another". As explained further below, the authors identified 3 classes of proteins (confer ART resistance + involved in HCCU, involved in HCCU only, or involved in neither).

      Line 40: You define KIC11 as essential but according to your data some parasites are still alive and replicating 2 cycles after induction of the knock sideways. Please consider changing "essential" to "important for asexual parasite growth"

      Line 40: please change 'second group' to 'this group'

      line 41: state here that despite it being essential, it is unknown what it is involved in.

      Line 50: the authors should state here that there is actually a reversal in this trend over the last few years.

      Line 54: please separate out the references for each of the two statements made in this line (a: that ART resistance is widespread in SEA, and b: that ART resistance is now in Africa) Reference 14 also seems to reference ART resistance in Amazonia - which is not covered by the statement made by the authors (in which case the authors should state ART is now present in Africa and South America). The authors should also reference PMID: 34279219 for their statement that ART resistance is now found in Africa (albeit a different mutation to the one found in SEA).

      Line 65: it is also worth mentioning here that there are other mutations in proteins other than K13, such as AP2mu and UBP1 (PMID: 24994911;24270944) that can lead to ART resistance.

      Line 80, 86: ref 43 is misused. Reference 43 refers to Maurer's clefts trafficking which takes place in the erythrocyte cytosol and is not involved in haemoglobin uptake as far as I know. Please replace ref 43 with one showing the role of actin in haemoglobin uptake.

      Line 98: the authors state here that they 'identified' further candidates from the K13 proxiome. This suggests that they identified new proteins in this paper, when in fact the list was already generated in ref 26. All they did was characterise proteins from that list that were not previously characterised. The authors should therefore remove identified from this statement.

      Line 107-108: it is not clear from this sentence why these proteins were left out of the initial analysis in Ref 26. A sentence here explaining this would be valuable for the reader.

      Line 117-123: The authors say that PF3D7_0204300, PF3D7_1117900 and PF3D7_1016200 were not studied because they were not in the top 10 hits. However, the current organisation of Supplementary Table 1 shows all 3 proteins among the top 10 hits (MyoF, KIC12, UIS14 and 0907200 being after them). I think the authors should reorganise their table. It is also unclear according to what the proteins in the table are ranked. Could the authors indicate the metric used for the ranking?

      Line 129-141: Can the authors be clearer with their explanations of the identification of mutation Y1344Stop? One dataset (ref 61) shows that 52% of African parasites have a mutation in MCA2 in position 1344 leading to a STOP codon. But another dataset (ref 62) shows that the next base is also mutated, reverting the stop codon. That should have been seen in the first dataset as well. Could the authors please clarify.

      Line 147: the authors say that MCA2 is expressed throughout the intraerythrocytic cycle as shown by live cell imaging. In Birnbaum et al 2020 fig 4I, the authors show that MCA2 is mainly expressed between 4 and 16hpi. But in Figure 1B of this manuscript there is a clear multiplication of MCA2 signal between trophozoite and schizont. How do the authors explain this discrepancy? Could expression of the truncated MCA2 be different than the full length? This cannot be assessed as expression and localisation of the full-length HA tag MCA2 is not shown in Schizonts. MCA2 expression seems also different for the MCA2TGD-GFP with no expression in rings.

      Line 158: would it not have been more useful for the authors to have episomally expressed MCA2-3xHA in their MCA2Y1344STOP-GFPENDO line to make sure that the truncated protein is indeed going to the correct compartment? The experiments done by the authors suggests that the MCA2Y1344STOP goes to the right location but does not really confirm it.

      Line 191: it is stated that MCA2 confers resistance independently of the MCA domain, however in both the MCA2-TGD and MCA2Y1344STOP-GFPENDO parasites, the MCA domain is deleted, and for both parasites, there is resistance (albeit to a lower level in the MCA2Y1344STOP-GFPENDO line). Therefore, how can the authors state that the ART resistance is independent of the MCA domain? This statement should be that resistance is dependent on the loss of the MCA domain.

      Line 192: Why did the authors not check if MCA2 is involved in endocytosis? They state later on in the manuscript that they did not do endocytosis assays with TGD lines, however if the authors include the correct controls, this could be easily done. It would also be really interesting to see whether endocytosis gets progressively worse going from WT to MCA2Y1344STOP to MAC2TGD. This experiment (as well as doing endocytosis assays for KIC4 and KIC5 TGD lines) would drastically increase the impact of this study. These experiments would not take more than 3 weeks to perform, and would not require the generation of new lines.

      The authors should consider re-organising the MCA2 section, first showing that the 3xHA tagged line colocalises with K13, then performing the new truncation.

      Line 197: Once again ref 43 is not correct to illustrate that actin/myosin is involved in endocytosis

      Line 202: the authors state that MyoF localises near the food vacuole from ring stage/trophs onwards. However, how can this statement be made in schizonts based on these images (Fig. 2A), where it doesn't look like MyoF is anywhere near the FV? This statement can only be made for schizonts if co-localised with a FV marker (which is done in Fig. 2B), however, based on the number of MyoF foci, it appears that this was not done for schizonts. Please either remove the statement that MyoF is near the food vacuole from trophs onwards (because it is only seen near the FV up until trophs) or show the data in Fig. 2B of schizonts to substantiate these claims.

      Line 204-206: what does this statement bring to the paper? Is it to show that it is the real localisation of MyoF because 2 tag cell line show the same localisation? I don't think this is needed, especially as later in the manuscript an HA-tag MyoF line is used and show similar localisation.

      Line 212: The overlap of K13 with MyoF in Fig 2C 3rd panel (1st trophozoite panel) is not obvious, especially as the MyoF signal seems inexistant. I would advise the authors to replace with a better image. Also, why are there no images of schizonts shown in Figure 2C?

      Line 217: the spatial association of MyoF with K13 is very different when it is tagged with GFP and when it is tagged with 3xHA. The way the authors word it here, it seems that there is agreement with the two datasets, when this is not in fact the case (59% overlap for MyoF-GFP and only 16% overlap with MyoF-3xHA). These data suggest that the GFP and the multiple FKBP tags are doing something to the protein and therefore maybe the ensuing results using this line should not be trusted or be taken with a pinch of salt.

      Line 219: the authors state here that they could not detect MyoF-GFP in rings, when in Figure 2C they show MyoF-GFP in rings, and also show that they could detect MyoF in Sup Fig. 3B with the 3xHA tagged line. Is this a labelling mistake in Figure 2C? If the authors could indeed not see MoyF-GFP in rings, this statement should have been made when Figure 2A was presented, and not so late in the manuscript, which causes confusion. Line 237: Showing a DNA marker (DAPI, Hoescht) for Figure 2E, and subsequent figures using mislocalisation to the nucleus, would help the reader assess efficiency of the mislocalisation.

      Line 254-256: authors should show the results of the bloating assay for parental 3D7 parasites (+ and - rapalog) to see whether the MyoF line - rapalog has increased baseline bloating. This applies to all subsequent FV bloating assays.

      Line 254-257: The authors say that because fewer parasites show a bloated food vacuole upon inactivation of MyoF it means that less hemoglobin reached the food vacuole. I understand the authors statement, however, shouldn't they look at the size of the food vacuole, instead of the number of parasites with bloated FV, to make such a statement? This has been done for KIC12 so why not doing it for MyoF?

      Line 259-261: these results would be difficult to interpret namely because the authors have dying parasites, which is exacerbated with the protein being knocked sideways. The authors should mention the pitfalls their knock sideways and tagging design here.<br /> Line 260-261: RSA is an assay relying on measuring parasite growth 1 cycle after a challenge with ART for 6 hours.

      Line 261-263: the authors sate that MyoF has a function in endocytosis but at a different step compared to K13 compartment proteins. I am not sure what they mean here. Can this be clarified? Do the authors mean that it is involved in endocytosis but not in ART resistance? If so, this is a very difficult statement to make since the parasites are dying. Is there any evidence of point mutations in MyoF in the field?

      Line 298: the authors state that there is no growth defect in the first cycle when rapalog is added to the KIC11 line, however based on Figure 3D, there is evidently a 25% reduction in growth compared to - rapalog at day 1 post treatment, and a 60% reduction by day 2, which is still within the 1st growth cycle. The authors should either revise their statement or provide an explanation for these findings. The authors should also explain why their Giemsa data in Fig. 3E is not in accordance with their FACS data.

      Line 301: KIC11 could also be important very early for establishment of the ring stage for example for establishment of the PV. Also, was mislocalisation assessed in rapalog-treated parasites at 72 hours or in cycle 3?

      Line 311: the authors should change the sentence from 'not related to endocytosis' to 'not related to endocytosis or ART resistance'.

      Line 323-325: Authors say that a nuclear GFP signal can be observed in early schizonts for KIC12. According to the pictures provided in Figure 4A and Figure S5A it is not very obvious. Also faint cytoplasmic GFP signal could only be background as we can see that exposure is higher for schizont pictures

      Line 326-328: The authors say that kic12 transcriptional profile indicate mRNA levels peak (no s at peak) in merozoites. Should they show live cell imaging of merozoites then? Because from the Figure 4A schizont pictures where schizonts are almost fully segmented no signal can be observed. Line 347: The authors state that using the Lyn mislocaliser the nuclear pool of KIC12 is inactivated by mislocalisation to the PPM. This tends to suggest that only the nuclear pool of KIC12 is mislocalised. How is it possible that only the nuclear pool is mislocalised? Line 368-369: Effect was also only partial for MyoF. Why didn't you measure the same metrics for MyoF? Line 379: you don't know if all proteins acting later in endocytosis will have an increased number of vesicles as a phenotype

      Line 413-414: The authors state that no growth defect was observed upon KS of 1365800. Is growth alone enough to say that there is no impact on endocytosis?

      Line 432: in this section, the authors state that KIC4 and KIC5 seem to have domains that may suggest these proteins are involved in endocytosis, based on the alpha fold data that is publicly available. Considering the authors have TGD-SLI versions of these lines (Birnbaum et al. 2020) and have already confirmed in this previous publication that they confer resistance to ART; it would make sense to look at endocytosis for these genes. This would be a relatively simple and straightforward experiment, taking no longer than two to three weeks, and would require no additional reagents or line generation. Doing these experiments would add a lot more weight to this final section. The authors later state that KIC4 and 5 are TGD lines, so not the best for endocytosis assays. It is unclear why this would be difficult to do if an adequate control is contained in the experiment (such as parental 3D7). It explains why they did not perform the MCA2 endocytosis assays further up, but in my opinion, an attempt at doing these assays is important and would significantly increase the impact of this paper.

      Line 490-493: the authors state that the K13 compartment proteins fall in two groups, some that are involved in ART resistance AND endocytosis, and some that have different functions. However, in this manuscript the authors have demonstrated 3 flavours that K13 compartment proteins can come in: • Some that confer ART resistance and are involved in HCCU (MCA2) • Some that are involved in HCCU but not ART resistance (MyoF & KIC12) • Some that are involved in neither (KIC11) The authors should therefore revise this statement.

      Line 508: the authors state that they expanded the repertoire of K13 compartments, when in fact they functionally analysed them - they did not do another BioID to identify more candidates.

      Line 570-572: has anyone ever tested whether CytoD or JAS treatment in rings, is sufficient to mediate ART resistance? Something similar to what was done in PMID 21709259 with protease inhibitors. If not this would be a pretty interesting experiment for the authors to do that could shed more light on the MyoF data. It would take maybe 2 weeks to do and not require the generation of any new lines. This would clarify whether other Myosins other than MyoF are involved in endocytosis, as is suggested by previous publications (PMID: 17944961).

      Line 608: inhibitors targeting the metacaspase domain of MCA2 may inadvertently inactivate other essential parts of the protein. They authors should acknowledge this possibility in the text.

      Line 624-625: the authors state that MyoF is 'lowly expressed in rings' - indeed this is the case in their MyoF-2xFKBP-GFP-2xFKBP line which the authors established has defects due to the tag, but it appears from their MyoF-3xHA tagged line that it is expressed in rings. The authors should therefore revise their statement, and be careful of making claims based on their defective line and using fluorescence imaging as their only metric. If they do want to make the statement that it is not there in rings, they should also do a western blot, which is much more sensitive since it amplifies the signal compared to an image of one parasite.

      Line 635: arguably this is the 3rd variety and not the 2nd (the authors already mentioned 2 types - ones that are involved in HCCU AND ART and those involved in HCCU only). See comment for line 490-493 above.

      Line 785: Bloated food vacuole assay/E64 hemoglobin uptake assay method specify that a concentration of 33mM E64protease inhibitor was used. However, in reference 44, cited in the manuscript, a concentration of 33µM E64 was used. Please confirmed if this is just a typo or if 1000x E64 concentration was used which renders the experiment invalid.

      Line 788: it is unclear from this section what is considered a bloated food vacuole - is there an area above which the FV is considered bloated? Do the authors do these measurements manually or use an addon in FIJI/ImageJ? What is the cutoff for if a FV is bloated? Please clarify. Additionally, for the representative images + rapalog for Figures 2H and 4H, it would be useful to see where the authors delineate the FV (add a white circle showing what is actually measured).

      Line 863-864: this sentence seems to be out of place.

      Line 875: the authors state that there is a light blue wedge, when the circle consists of grey and black wedges. Please revise this.

      Line 1059-1061: it is unclear whether the individual growth curves are different clones or whether they are just the same experiment repeated? If it is the latter, then why are they not combined, as is traditionally done?

      Line 919-924: the authors mention a blue and red line, but there is only a black line in figure 3D. Moreover, the experiment of using the LYN mislocaliser was only done for KIC12 according to the manuscript. Additionally, the y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis there are several days. In the text it says there is no growth defect until the second cycle, but from this graph it appears the growth defect is evident as early as 1 day post rapalog treatment. Can the authors please clarify and correct the issues pointed out.

      Figure 1 panel B & C: the label of the figure where the signal from MCA2Y1344STOP-GFP is shown with the DAPI signal overlayed is deceptive since it suggests that this is the signal of full length MCA2. Please change the label of this panel from MAC2/DAPI to MCA2Y1344STOP/DAPI. The same is true for Panel C for the image labeled MCA2/K13 - please change this to MCA2Y1344STOP/K13.

      Figure 2B: what stages are these parasites? Please state this in the figure. Based on the MyoF pattern, it looks like rings in the upper panel and trophs in the bottom pannel. Why were schizonts not shown?

      Figure 2D&F: it is not very meaningful when growth assays are shown as a final bar after 4 days of growth. It is much more useful and informative to see a growth curve instead (as is shown in the supplementary), since it shows if the defect is apparent in the first growth cycle or later. With the way the data is currently shown, this is not apparent. I would advise the authors to switch the graph in 2F out of a combined graph of all the biological replicates growth curves for S3D - showing error bars.

      Figure 3: why were the calculation of FV area, parasite area and FV/parasite area only done for KIC12 and not done for MyoF? It would be interesting to see if any of these values are different for MyoF - whether the parasites are smaller in area and therefore FV smaller. Please present them Figure 2. Images should be already available and would not require further experiments to be done, only the analysis.

      Figure 3B: why is there no spatial association assessment for KIC11 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins.

      Figure 3D: The y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis the experiment takes place over several days. Is this a typo in the y axis? Additionally, the authors state in line 287-290 that the growth defect upon addition of rapalog is only seen in the second cycle, but from this graph it appears the growth defect is already evident 1 day post rapalog addition. The figure legend also does not make sense for this figure since it mentions a blue and a red line, when there is only a black line present. The legend also mentions the LYN mislocaliser which was used for KIC12 not KIC 11 (see above).

      Figure 3E: the colour for Control and Rapalog 4 hpi are very similar and very hard to discern. Please choose an alternative colour or add a pattern to one of the samples. The y axis is also missing a label. Is this supposed to be parasitemia (%)?

      Figure 4A: the ring shown in this figure does not appear to be a ring (it is far too large and appears to have multiple nuclei?). Do the authors have any other representative images to show instead?

      Figure 4B: why is there no spatial association assessment for KIC12 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins. This should be done for the different life cycle stages considering the changing localisation of KIC12.

      Figures 4C&E: it is extremely important to show the DNA stain in both these samples considering that a portion of KIC12 is in the nucleus! Please add the DAPI signal for these figures (as for all other figures!).

      Figure 4E: this figure should be presented before 4D (considering the line being presented in 4E is used in an experiment in 4D). The authors should switch the order of these two.

      It is unclear why in many of the fluorescence images the authors do not show the DAPI signal - particularly when colocalising with K13 and when doing the knock sideways experiments. Please add these images to the figures - I would assume they have already been taken, so would simply involved adding the images to the panel.

      Throughout the manuscript, there is no western blot confirming the correct size of their modified proteins. This should be provided.

      None of the figures are appropriate for individuals with colour blindness, limiting their accessibility to the paper. Please change the colour schemes for all fluorescent images using magenta/green or an alternative colour combination appropriate for colourblind individuals.

      Minor Comments

      line 29: remove 'are'.

      Line 29: the text says "HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins are among the few proteins so far functionally linked to this process." The sentence should be: 'HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins among the few proteins so far functionally linked to this process."

      line 44: remove 'the'

      Line 48: consider mentioning here that malaria is caused by the parasite Plasmodium - otherwise the first mention of parasite in line 52 is confusing for the non-specialist reader.

      Line 49: estimated malaria-related death and case numbers are from the 2021 WHO World malaria report. You cite the 2020 WHO World malaria report.

      Line 53: please insert the word 'have' between now and also.

      Line 54: please change 'was linked' to is linked

      Line 72: I would specify that free heme is toxic to the parasite. Especially as you mention that hemozoin is nontoxic. Sentence would be "where digestion results in the generation of free heme, toxic to the parasite, which is further converted into nontoxic hemozoin"

      Line 90: authors should either say "in previous works" or "in a previous work"

      Line 91: "We designated these proteins as K13 interaction candidates (KICs)"

      Line 95: please change 'rate' to number

      Line 109: Please include a coma before (ii).

      Line 112: as shown by Rudlaff et al in the paper you are citing, PPP8 is actually associated with the basal complex. You can say that "(ii) were either linked or had been shown to localise to the inner membrane complex (IMC) or the basal complex (PF3D7...).

      Line 114: Protein PF3D7_1141300 is called APR1 in the manuscript but ARP1 in Supplementary Table 1. Please correct.

      Line 131: please define SNP - this is the first use of the acronym.

      Line 133-134: South-East Asia instead of "South Asia"

      Line 135: please explain what TGD is - it is referred to over and over again in the manuscript without ever being explained.

      Line 145: change 'Western blot' to western blot - only Southern blot is capitalised since it is named after an individual, while the other techniques are not.

      Line 152: add "the" between 'and spatial'

      Line 158: please define SLI as selected linked integration, since it is the first use of the acronym.

      Line 178: introduce a coma after protein. Sentence should be "Proliferation assays with the MCAY1344STOP-GFPendo parasites which express a larger portion of this protein, yet still lacking the MCA domain (Figure 1), indicated no growth ...

      Line 195: the authors could mention that MyoF was previously called MyoC in the Birnbaum 2020 paper. I wanted to check back in the Birnbaum 2020 paper and could not find MyoF

      Line 200: "Expression and localisation of the fusion protein was analysed by fluorescent microscopy". Why expression was not analysed also by western Blot same as for MCA2?

      Line 204: I could not find any mention of MyoF (Pf3D7_1329100) in reference 65. Please remove reference 65 if not correct. Also reference 66 looks at Plasmodium chabaudii transcriptomes so I would specify that "This expression pattern is in agreement with the transcriptional profile of its Plasmodium chabaudii orthologue"

      Line 208: Please indicate a reference for P40 being a marker of the food vacuole

      Line 220-224: The authors should consider changing to " Taken together these results show that MyoF is in foci that are mainly close to K13 and, at times, overlapping, indicating that MyoF is found in a regular close spatial association with the K13 compartment."

      Line 255: In Figure 2H, and subsequent figures showing bloated FV assay, I would delineate the food vacuole with dashed line as in Birnbaum et al. 2020 to help the reader understanding where the food vacuole is.

      Line 265-266: Here the title says that KIC11 is a K13 compartment associated protein, but the title of Figure 3 says KIC11 is a K13 compartment protein. I noticed that you make the difference between K13 compartment protein et K13 compartment associated protein for MyoF for example which is not clearly associated with the K13 compartment. Which one is it for KIC11?

      Line 309-310: indicate a reference for your statement "which is in contrast to previously characterised essential K13 compartment proteins".

      Line 377: Figure 4I, please correct 1st panel Y axis legend

      Line 404: replace "dispensability" with dispensable

      Line 416: can the authors provide any speculation as to why they observed these proteins as hits in the BioID experiments?

      Line 451: Where the "97% of proteins containing these domains also contain an Adaptin_N domain and function in vesicle adaptor complexes as subunit " come from. Do you have a reference?

      Line 465-467: the same could be said for KIC4 as it also has a VHS domain.

      Line 477-479: Can be rephrased to "However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 could be demonstrated, suggesting a limited role for PF3D7_1365800 in endocytosis. Or something like that. Makes it clearer.

      Line 535: Have AP-2 or AP-2 been shown to be at the K13 compartment?

      Line 569: reference 43 is wrong

      Line 746: typo "ot" instead of or.

      Line 801: method for Domain Identification using AlphaFold specify that RMSDs of under 5Å over more than 60 amino acids are listed in the results. However, there is a typo in Figure 5B for KIC5 where it says "RMSD 4.0 Å over 8 aa". Please correct.

      Line 856: In Figure 1E, please use the same Y axis legend as in Figure 2D "relative growth at day 4 [%] compared with 3D7"

      Figure S1: Some PCR gels check for integration are presented as 5', 3' and ori whereas other gels are presented as ori, 5' and 3'. This is confusing. Figure S1: Why was the expression of only MCA2 was verified by Western blot? What about the other proteins?

      Line 493: Considering KIC11 was not involved in HCCU or ART resistance it might be worth mentioning in this section that it is of note that there are no domains detected that would be involved in endocytosis.

      Line 503-506: is it wise to generate more drugs that target a pathway that is already highly susceptible to mutations? The authors should add a statement explaining how this might be avoided.

      Throughout, scale bars are stated in the figure legends at the end of the legend. This is a slightly confusing format. The authors should consider stating the scale bar for each sub-legend where a fluorescence image is taken.

      Referees cross-commenting

      After reading reviewer 2 and 3's comments, I think there are significant overlaps in the key points raised in terms of questions about fusion proteins and their potential partial mis-localisation, better descripton of results and target selection. Overall I think we agree that the work has potential, but in its current form does not represent a major advance. It would be immensely helpful if the manuscript would be carefully edited for a better flow and linear description of results.

      Significance

      The authors set out to test whether other proteins that are in the vicinity of K13 are involved in mediating ART resistance and endocytosis. This is an interesting question. However, other than MCA2 which was already known to be involved in mediating ART resistance (and was not tested for its involvement in endocytosis), none of their candidate proteins seem to be involved in mediating both these functions. The authors show that the other proteins tested appear important for parasite growth, with KIC12 and MyoF involved in mediating endocytosis. While these findings are novel, the KS approach used by the authors casts some doubt over the findings, and would mean that these findings would have to be re-tested with a more reliable approach, such as the GlmS system or generating a conditional knockout using the DiCre system. Despite not advancing our understanding of ART resistance, or identifying further players involved in this process, this manuscripts provides two candidates that are involved in mediating endocytosis and a further candidate that appears to be important for parasite growth. Further work on these proteins will be required to understand their exact roles. As stated above, there is currently limited interest for these results (limited to researchers working on endocytosis in apicomplexan parasites and possibly the wider endocytosis field from an evolutionary perspective), however with further work, this could increase the impact and interest of this work substantially.

      The authors do not describe any novel methods/approaches within this work.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank all four reviewers for their helpful and constructive comments. We have gone through each and every comment and proposed how we would address each point raised by the reviewers. We are confident our proposed revisions are feasible within a reasonable and expected time frame. Some of the comments regarding minor typo/aesthetics and extra references have already been addressed in the transferred manuscript. The changes are highlighted in yellow in the transferred manuscript.

      2. Description of the planned revisions

      Reviewer #1

      Major points:

      1. The presented work itself (Figures 1-4) does not need significant adjustments prior to publication, in my view, with only a few points to address. However, the work in Figure 5- doesn't really support the claims the authors make on its own, and would require some additional experiments or at the very least discussion of the caveats to its current form.

      We thank the reviewer for these comments and will follow the reviewer’s suggestion by discussing the caveats regarding the interpretation of Figure 5. We will also add to the discussion to suggest future research approaches beyond the scope of this manuscript that would address the functional importance of localised mRNA translation. We will briefly mention in the discussion methods such as the quantification of the mRNA foci and the disruption of the mRNA localisation signals to disrupt localised translation and the use of techniques such as Sun-Tag (Tanenbaum et al, 2014) and FLARIM (Richer et al, 2021) to visualise local translation directly.

      Tanenbaum et al, 2014 DOI: 10.1016/j.cell.2014.09.039

      Richer et al, 2021 DOI: 10.1101/2021.08.13.456301

      1. Localized glia transcripts, are they "glial/CNS/PNS" significant or are they similar to other known datasets of protrusion transcriptomes? The authors compared their 4801 "total" localized to a local transcriptome dataset from the Chekulaeva lab finding that a significant fraction are localized in both. As the authors note, this is in good agreement with a recent paper from the Talifarro lab showing conservation of localization of mRNAs across different cell types. What the authors haven't done here, is further test this by looking at other non-neuronal projection transcriptomic datasets (for example Mardakheh Developmental Cell 2015, among others). If the predicted glia-localized processes are similar to non-neuronal processes transcriptomes, this would further strengthen this claim and rule out some level of CNS/PNS derived linage driving the similarities between glia and neuronal localized transcripts.

      This is a good point and we thank the review for pointing out this interesting cancer data set. We will do as the reviewer suggests and intersect our data with Mardakheh Dev Cell 2015 to test the further generality of localisation in neurons and glia, in other cell types. Specifically, we plan to intersect both glial (this study) and neuronal (von Kuegelgen & Chekulaeva, 2020) dataset with protrusive breast cancer cells (Mardakeh et al, 2015).

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      Mardakeh et al, 2015 DOI: 10.1016/j.devcel.2015.10.005

      1. The presentation/discussion around Figure 3 is a bit weaker than other parts of the manuscript, and it doesn't really contribute to the story in its current form. Notably there is no discussion about the significance of glia in neurological disorders until the very end of the manuscript (page 21), meaning when its first brought up.. it just sits there as a one off side point. The authors might consider strengthening/tightening up the discussion here, if they really want to keep it as a solo main figure rather than integrating it somewhere else/putting it into supplemental. In my view, Figures 2 & 3 should be merged into something a bit more streamlined.

      This is a good point. We plan to strengthen the presentation of Figure 3 and discussion of the significance of glia in neurological disorders by adding a description of the Figure in the Results section and highlighting the significance of glia in nervous system disorders in the Discussion section.

      1. Why aren't there more examples of different mRNAs in Figure 4? Seems a waste to kick them all to supplemental.

      We agree that it could be helpful to show different expression patterns in the main figure. To address this point we will add Pdi (Fig. S4D), which shows mRNA expression in both the glia and the surrounding muscle cell. This pattern is in contrast to Gs2, which is highly specific to glial cells. We will also note that although pdi mRNA is present in both the glia and muscle, Pdi protein is only abundant in the glia, suggesting that translation of pdi mRNA to protein is regulated in a cell-specific manner.

      1. The plasticity experiments, while creative, I think need to be approached far more cautiously in their interpretation. Given that the siRNAs will completely deplete these mRNAs- it really needs to be stressed any/all of the effects seen could just be the result of "defective" or "altered" states in this glial population- which has spill over effects on plasticity in at the NMJ. Without directly visualizing if these mRNAs are locally translated in these processes and assessing if their translation is modulated by their plasticity paradigm, all these experiments can say is that these RNAs are needed in glia to modulate ghost bouton formation in axons. This represents the weakest part of this manuscript, and the part that I feel does not actually backup the claims currently being made. Without any experiments to A. quantify how much of these transcripts are localized vs in the cell body of these glia, B. visualize/quantify the translation of these mRNAs during baseline and during plasticity; the authors cannot use these data to claim that localized mRNAs are required for synaptic plasticity.

      We are grateful to the reviewer for pointing out that we were not precise enough in defining our interpretation of the structural plasticity assay. We did not intend to claim that our results show that local translation of these transcripts is necessary for plasticity, only that these transcripts are localized and are required in the glia for plasticity in the adjacent neuron (in which the transcript levels are not disrupted in the experiment). Definitively proving that these transcripts are required locally and translated in response to synaptic activity would require genetic/chemical perturbations and imaging assays that would require a year or more to complete, so are beyond the scope of this manuscript. To address this point, we will clarify that the results do not show that localized transcripts are required, only that the transcripts are required somewhere specifically in the glial cell (without affecting the neuron level), and we can indeed show in an independent experiment that there are localized transcripts.

      Reviewer #2

      Major points:

      1. The authors analyse the 1700 shortlisted genes for Gene Ontology and associations with austism spectrum disorder, leading to interesting results. However, it is not clear to what extent the enrichments they observe are driven by their presumptive localization or if the associations are driven to a significant extent by the presence of these genes in the selected cell types in the Fly Cell Atlas. One way to address this would be to perform the GO and SFARI analysis on genes that are expressed in the same cells in the Fly Cell Atlas but were not shortlisted from the mammalian cell datasets - the results could then be compared to those obtained with the 1700 localized transcripts.

      This is a fair point raised by the reviewer as genes involved in neurological disease such as Autism Spectrum Disorder may be enriched in CNS/PNS cell types. We will follow the reviewer’s suggestion to perform GO and SFARI gene enrichment analysis in genes that were not shortlisted for presumptive glial localisation.

      1. Although the authors attempt to justify its inclusion, I'm not convinced why it was important to use the whole cell transcriptome of perisynaptic Schwann cells as part of the selection process for localizing transcripts. Including this dataset may reduce the power of the pipeline by including mRNAs that are not localized to protrusions. How many of the shortlisted 1700 genes, and how many of the 11 glial localized mRNAs in Table 5, would be lost if the whole cell transcriptome were excluded. More generally, what is the distribution of the 11 validated localizing transcripts in each dataset in Table 4? This information might be valuable for determining which dataset(s), if any, has the best predictive power in this context.

      We thank the reviewer for raising this point, which we will address with further analysis and adding to the discussion. We propose to address the criticism by running our analysis pipeline without the inclusion of the dataset using Perisynaptic Schwann Cells (PSCs) and then intersect with the PSCs-expressed genes, since their functional similarity with polarised Drosophila glial cells is highly relevant. We also agree with the reviewer that it would be a useful control for us to assess the ‘predictive power’ of each glial dataset by calculating their contribution to the shortlisted 1,700 glial localised transcripts and to the 11 experimentally validated transcripts via in situ hybridisation. To address this point, we plan to add this information in the revised manuscript.

      1. Did the authors check if any of the RNAi constructs are reducing levels of the target mRNA or protein? Doing so would strengthen the confidence in these important results significantly. In any case, the authors should also mention the caveat of potential off-target effects of RNAi.

      We thank the reviewer for their useful comment and agree that the extent to which the RNAi expression reduces the levels of mRNA is not specifically known. We will add a FISH experiment on lac, pdi and gs2 RNAi showing very strong reduction in mRNA levels. We will also add an explanation of the caveats of the use of the RNAi system to the discussion.

      1. Methods: what is the justification for assuming that if the RNAi cross caused embryonic or larval lethality then the 'next most suitable' RNAi line is reporting on a phenotype specific to the gene. If the authors want to claim the effect is associated with different degrees of knockdown they should show this experimentally. An alternative explanation is that the line used for phenotypic analysis in glia is associated with an off-target effect.

      We thank the reviewer for this comment. We agree that off target effects cannot in principle be completely ruled out without considerable additional experimental analysis beyond the scope of this manuscript. To address the criticism we will remove the expression data of the lines that cause lethality and revise the discussion to explain that the level of knockdown in each line is unknown, and would require further experimental exploration.

      Minor points:

      1. It would be helpful to have in the Introduction (rather than the Results, as is currently the case) an operational definition of mRNA localization in the context of the study. And is it known whether or not localization in protrusions is the norm in mammalian glia or the Drosophila larval glia? I ask because it may be that almost all mRNAs diffuse into the protrusion, so this is not a selective process. One interesting approach to test this idea might be to test if the 1700 shortlisted transcripts have a significant underrepresentation of 'housekeeping' functions.

      We thank the reviewer for this excellent suggestion. To address the comment, we will move our explanation of the operational definition of mRNA localization to the Introduction. We will also perform enrichment analysis of housekeeping genes within 1,700 shortlisted transcripts compared to the transcriptome background, as the reviewer suggested.

      Reviewer #3

      Major points:

      1. The authors have pooled data from different studies across different type of glial cells performed from in vitro to in vivo. While pooling datasets may reveal common transcripts enriched in processes, this may not be the best approach considering these are completely different types of glial cells with distinct function in neuronal physiology.

      We thank the reviewer for highlighting the need for us to further justify why we pooled datasets. We will revise the manuscript to better emphasise that the overarching goal of our study was to try to discern a common set of localised transcripts shared between the cells. The problem with analysing and comparing individual data sets is that much of the variation may be due to differences in the methods used and amount of material, rather than differences in the type of cells used. We will revise the discussion to make this point and plan to explain that our approach corresponds well with a previous publication pooling localised mRNA datasets in neurons (von Kugelgen & Chekulaeva 2021).

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      1. It is important to note the limitations of the study. For example, DeSeq2 is biased for highly expressed transcripts. How robust was the prediction for low abundance transcripts?

      The presented 1,700 transcripts were shortlisted based on their presence and expression level (TPM) in glial protrusions rather than their relative enrichment. Nevertheless, the reviewer makes a valid criticism of our use of DESeq2, where we compared enriched transcripts in glial and neuronal protrusions in Figure 1D. To address this point we will discuss this caveat in the relevant section.

      The issue raised regarding low abundance transcript prediction raises an important question: does the likelihood of localisation to cell extremities correlate with mRNA abundance? We have already partially addressed this point, since our analysis of the fraction of localised transcripts per expression level quantiles shows only limited correlation. To address this comment, we will add these results in the revised manuscript as a supplementary figure.

      1. The authors identify 1,700 transcripts that they classify as "predicted to be present" in the projections of the Drosophila PNS glia. This was based on the comparison to all the mammalian glial transcripts. Since the authors have access to a transcriptomic study from Perisynaptic Schwann cells (PSCs), the nonmyelinating glia associated with the NMJ isolated from mice; it would be more convincing to then validate the extent of overlap between Drosophila peripheral glial with the mammalian PSCs. This may reveal conserved features of localized transcripts in the PNS, particularly associated with the NMJ function.

      Thank you for the valuable suggestion. A similar point was also raised by [Reviewer #2 - Major point 2] to re-run our pipeline excluding the PSCs dataset and intersect with the PSC transcriptome post-hoc. Please see the above section for our detailed response.

      1. Fig 2: What is the extent of overlap between the translating fractions versus the localized fraction? It will be informative to perform the functional annotation of the translating glial transcripts as identified from Fig 1D.

      This is an interesting question. To address this point, we plan to: (i) compare transcripts that are translated vs. localised in glial protrusions, and (ii) perform functional annotation enrichment analysis on the translated fraction of genes.

      1. "We conclude predicted group of 1,700 are highly likely to be peripherally localized in Drosophila cytoplasmic glial projections". To validate their predictions, the authors test some of these candidates in only one glial cell type. It might be worthy to extend this for other differentially expressed genes localized in another glial type as well.

      The presented in vivo analyses made use of the repo-GAL4 driver, which is active in all glial subtypes, including subperineurial, perineurial and wrapping glia that make distal projection to the larval neuromuscular junction. We agree that subtype-specific analysis would be highly informative, but we believe this is outside the scope of the current work where we aimed to identify conserved localised transcriptomes across all glial subtypes. Nevertheless, to address the comment, we plan to further clarify our use of pan-glial repo-GAL4 driver in the Results and Method section of the revised manuscript.

      1. Figure 5: The authors perform KD of candidate transcripts to test the effect on synapse formation. However, these are KD with RNAi that spans across the entire cell. To make the claim about the importance of "target" RNA localization in glia stronger, ideally, they should disrupt the enrichment specifically in the glial protusions and test the impact on bouton formation. Do these three RNAs have any putative localization elements?

      We agree with the review, that we would ideally test the effect of disruption of mRNA localization (and therefore localised translation). However, we feel these experiments are beyond the scope of this current study, as they will require a long road of defining localisation signals that are small enough to disrupt without affecting other functions. To address this comment we will revise the Discussion section to mention those difficulties explicitly, and clarify the limitations of the approach used in our study for greater transparency.

      Reviewer #4

      Major points:

      1. The authors use FISH to validate the glial expression of their target genes, though these experiments are not quantified, and no controls are shown. The authors should provide a supplemental figure with "no probe" controls, and/or validate the specificity of the probe via glial knockdown of the target gene (see point 2). Furthermore, these data should be quantified (e.g. number of puncta colocalized with NMJ glia membrans).

      Thank you for requesting further information regarding the YFP smFISH probes. We have validated the specificity and sensitivity of the YFP probe in our recent publication (Titlow et al, 2023, Figure 1 and S1). Specifically, we demonstrated the lack of YFP probe signal from wild-type untagged biosamples and showed colocalization of YFP spots with additional probes targeting the endogenous exon of the transcript. Nevertheless, we will address this comment by adding control image panels of smFISH in wild-type (OrR) neuromuscular junction preparations.

      Titlow et al, 2023 DOI: 10.1083/jcb.202205129

      1. For the most part, the authors only use one RNAi line for their functional studies, and they only show data for one line, even if multiple were used. To rule out potential false negatives, the authors should leverage their FISH probes to show the efficacy of their knockdowns in glia. This would serve the dual purpose of validating the new probes (see point 1).

      Thank you for the suggestion. This point was also raised by [Reviewer #2 - Major point 3]. Please see above for our detailed response.

      1. In Figure 5 E, given the severe reduction in size in the stimulated Pdi KD animals, the authors should show images of the unstimulated nerve as well. Do the nerve terminals actually shrink in size in these animals following stimulation, rather than expand? The NMJ looks substantially smaller than a normal L3 NMJ, though their quantification of neurite size in F suggests they're normal until stimulation.

      We share the same interpretation of the data with the reviewer that the neurite area is reduced post-potassium stimulation in pdi knockdown animals. We will follow the reviewer’s suggestion and add an image showing unstimulated neuromuscular junctions.

      Minor points:

      1. The authors claim that there is an enrichment of ASD-related genes in their final list of ~1400 genes that are enriched in glial processes. It is well-appreciated that synaptically-localized mRNAs are generally linked to ASDs. Can the authors comment on whether the transcripts localized to glial processes are even more linked to ASDs and neurological disorders than transcripts known to be localized to neuronal processes?

      This is an interesting point. To address the comment, we will add a comparison of the degree of enrichment of ASD-related genes in neurite vs. glial protrusions in the revised manuscript.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      1. The use of blue/green or blue/green/magenta is difficult to resolve in some places. Swapping blue for cyan would greatly aid in visualizing their data.

      This comment is much appreciated. We have swapped blue for cyan in Figures 4 and S4. We have also changed Figure S1 to increase contrast and visibility as per reviewer’s comment.

      1. Make the colouring/formatting of the tables more consistent, its distracting when its constantly changing (also there is no need for a blue background.. just use a basic white table).

      This comment is much appreciated. We have applied a consistent colour palette to the Tables without background colourings and made the formatting uniform.

      Reviewer #2

      1. Introduction: 'Asymmetric mRNA localization is likely to be as important in glia, as it is in neurons,...'. Remove commas

      Thank you for pointing this mistake out. We have made the corresponding edits.

      Reviewer #3

      1. RNA localization in oligodendrocytes has been well studied and characterized. The authors should cite and discuss those papers (PMID: 18442491; PMID: 9281585).

      We thank the reviewer for this useful suggestion. We have added these references to the paper.

      Reviewer #4

      1. In Figure 5D, the authors should include a label to indicate that these images are from an unstimulated condition.

      We thank the reviewer for pointing this out. We have added the label as requested.

      1. The authors are missing a number of key citations for studies that have explored the functional significance of mRNA trafficking in glia, and those that have validated activity-dependent translation:

      - https://pubmed.ncbi.nlm.nih.gov/18490510/

      -https://pubmed.ncbi.nlm.nih.gov/7691830/

      -https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001053

      -https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7450274/

      -https://pubmed.ncbi.nlm.nih.gov/36261025**_/

      _**

      We thank the reviewer for the comment. We have added these references to the text.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      This article examines the cellular processes that predispose cells to nuclear blebbing and DNA damage in response to lamin and chromatin perturbations. The authors show key differences in these two types of perturbation and demonstrate a role for actin contractility. The experiments are well controlled and the data analysis generally rigorous. However, prior to acceptance, a number of issues must be fixed to improve the manuscript. I do not know the field sufficiently well to judge the novelty of the data.

      Major issues:

      • page 7, bottom: The authors state that measuring nuclear height gives an indication of confinement and force balance. But, if the nuclear mechanical properties have changed, then the nuclear height could change without any change in contractility. So, the authors would need to also verify that the level of contractility hasn't changed and that the mechanical properties haven't changed to really confirm that the cell height is a good measure of confinement. The level of contractility can be assessed by staining for pMLC. The nuclear mechanical properties may have been measured by others.
      • In general, are the changes in contractility resulting from drug treatments sufficiently large to deform the nucleus? Can the authors show a time course of nuclear height in response to a treatment for WT for example? This would allow to link contractility to nuclear height.
      • Page 9: The authors do not find any change in nuclear shape. Can they measure shape pre/post treatment on the same cells? It could be that the effect is lost in variability unless you do paired measurements?
      • Page 11: the authors find nuclear ruptures unchanged in LMA -/- even when there is no contractility. They then state: "We hypothesized that LMNA-/- nuclei do not show bleb-based behaviors because this perturbation cannot, due to reported disrupted nuclear-actin connections". I do not understand this sentence.
      • To characterise actin contractility better, it would be good to present images of the actin cables in each condition and pre/post treatments. This would allow to visually assess whether the morphology of the F-actin cytoskeleton has changed. This is one of the main topics of the study and as such it should be examined.
      • On all bar charts, the authors should indicate: the number of independent experiments, the number of cells examined.
      • I find the diagrams on Fig 1A, 2A etc do not help to illustrate what the authors think is happening. Can they redraw them in a more informative way?
      • The abstract, introduction, and discussion are overly long and lack focus. These should be rewritten succinctly.

      Minor issues:

      • page 4: inhibitors of Rho-kinase will also modulate actin polymerisation indirectly through the action on Lim-kinase and cofilin.
      • page 5, second paragraph: the authors should state that they are measuring the frequency of ruptures. At first, I thought this might be a mechanical strain.
      • Page 7: In general, it may be useful to discuss the temporal evolution of the c/n and the circularity side by side. The change in circularity over time could be an indicator of mechanical strain, while the c/n would report on any transient loss of integrity of the nuclear membrane.
      • Fig 1B: it would be nice to present the time course of the c/n as well.
      • Fig S1: it might be interesting to characterise the dynamics/amplitude of the c/n for the different conditions. There doesn't appear to be any difference between the nuclear blebbing rupture and the non blebbing rupture. This suggests that the two phenomena (nuclear blebbing and nuclear rupture) are independent: i.e. rupture is not causally linked to blebbing.

      Significance

      This article examines the cellular processes that predispose cells to nuclear blebbing and DNA damage in response to lamin and chromatin perturbations. The authors show key differences in these two types of perturbation and demonstrate a role for actin contractility. The experiments are well controlled and the data analysis generally rigorous. However, prior to acceptance, a number of issues must be fixed to improve the manuscript. I do not know the field sufficiently well to judge the novelty of the data.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary

      While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      Strengths/weaknesses

      By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

      Comments

      1) Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.

      2) There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.

      3) The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

      (1) According to Figure 1A (according actually to Meyer et al., 2012, Prufer et al., 2017, and Prüfer et al., 2013), the phylogenetic distance from modern humans to Denisovan is shorter than the distance to Altai Neanderthal. However, also according to these studies, the branch of Denisovan is more remote to modern humans than Altai Neanderthal. Thus, it is not unreasonable to find that 2514 and 1256 DBSs have distances > 0.034 in genes in Denisovans and Altai Neanderthals, respectively. Probably, both the phylogenetic distances and DBS distances depend considerably on the sampled genomes of Altai and Denisovan who lived on the earth for quite long. When new samples are obtained, these distances may be somewhat changed.

      (2) Regarding “they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3”, the second type of distances were discussed in section 3, and the distances computed in the first way were not further analyzed because “This defect may be caused by that the human ancestor was built using six primates without archaic humans”.

      4) Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

      Indeed, the TFs-TFBSs and lncRNAs-DBSs relationships are comparable, and which one contains more QTLs is an interesting question. In this sense, it is reasonable to use TFBSs as the control. However, for three reasons, we did not perform the comparison and use TFBSs as the control. First, most TFBSs are predicted by varied methods, making us concern the reliability of comparing two sets of predictions. Second, most QTLs in DBSs are mQTLs but most QTLs in TFBSs are eQTLs. Third, probably a greater portion of TFBSs than DBSs are not in promoters, and the time consumption of LongTarget made us unable to predict DBSs truly genome-wide. Nevertheless, this is an interesting question deserving further exploring.

      5) In the discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

      Multiple sugar metabolism-related pathways, including “glucose homeostasis” and “glucose metabolic process”, are found to be enriched only in Altai Neanderthal but not in chimpanzees (Figure 2). Indeed, HS lncRNAs are across a much longer time frame than the transition to agriculture. However, given that apes and monkeys know picking the ripe, sugar-rich fruits at the right time and place, we conjecture that archaic humans as hunter-gatherer could effectively explore natural sugars.

      Reviewer #2 (Public Review):

      Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lnc RNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

      At this point, my suggestions are mostly focused on tightening and strengthening the methods; it is hard for me to predict the consequence of these changes on the results or their interpretation, but as a general rule I also encourage the authors to not over-interpret their conclusions in terms of what phenotype was selected for when as they do at certain points (eg glucose metabolism).

      I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

      1) Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

      Length is an important metric of DBS, but it has a defect – a triplex of 100 bp may have 50% or 70% of nucleotides bound; in the two situations, the binding affinity of DBD and DBS is very different.

      2) There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

      More details are described in the citation Wen et al. 2022. We will put the sites into Supplementary Tables in the revised version.

      It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

      (1) If, say, three transcripts of a gene share the same promoter region (i.e., they have the same TSS) but differ only in 3’UTR, the promoter region was used to predict DBSs just for once. Otherwise, if the three transcripts have different TSS, the three promoter regions were used to predict DBSs.

      (2) A gene may have many DBSs if it has many transcripts, or few ones if it has just a few transcripts. We did not correct for this uneven distribution of transcripts, because our GTEx analysis was on the transcript level; it is well recognized that transcripts of the same gene can be expressed in different tissues.

      (3) We randomly sampled a pair of non-HS lncRNA and a transcript for 10000 times (i.e., 10000 pairs). It is a point that multiple draws of 40 non-HS lncRNAs should be made to make the statistics more robust.

      3) Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

      The over-representation analysis using g:Profiler was performed taking the whole genome as the background. Analyzing more DBSs (especially weak DBSs) would generate more results, but the results could be less reliable. Thus, there is a trade-off between analyzing fewer DBSs with relatively high reliability and analyzing more DBSs with relatively low reliability. Inevitably, the handling of this trade-off is somewhat subjective, and to carefully compare the two classes of DBSs per can be an independent question. Although weak DBSs were not systematically analyzed, the results from the strong DBSs undoubtedly suggest that HS lncRNAs have contributed greatly to human evolution.

      Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

      We examined Tajima’s D in DBSs (Supplementary Figure 9) and in HS lncRNA genes (Supplementary Figure 18). In both cases, we compared the Tajima’s D values with the genome-wide background.

      4) There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same cutoff of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

      We used the same workflow (and the same cutoff 0.034) to analyze Vindija and Altai Neanderthal and Denisovan. If a smaller cutoff was used, one would see more Vindija genes. The question again is that there is a trade-off. Analyzing epigenome and epigenetic regulation in archaic genomes is an interesting direction, and much more studies are needed before more reasonably setting related parameters and cutoffs.

      5) Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.

    2. Reviewer #1 (Public Review):

      Summary<br /> While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      Strengths/weaknesses<br /> By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

      Comments<br /> 1) Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.

      2) There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.

      3) The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

      4) Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

      5) In the discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

    1. Author Response

      The primary concern of Reviewer 1 is that Ne might affect gBGC and hence GC, and this might act as a confounding effect. The reviewer suggests that we should investigate how gBGC (with GC presumably as its proxy) might affect CAIS, and to what extent any relationship here could explain the relationship between CAIS and body mass. We believe that we have already dealt with this both in Supplementary Figure S5A (where we regret having inserted the wrong figure panel, a mistake we will correct), and its PIC-corrected counterpart in S5B. These two panels show (or will show) that CAIS is not correlated with GC. Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that mutation biases, including but not limited to the strength of gBGC, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, our CAIS thus corrects for mutation bias, in order to isolate the effects of selection.

      Reviewer 1 also suggests that we examine the relationship between gene expression and GC corrected RSCU, as we would expect codon adaptation to be stronger in more highly expressed genes, as was previously shown in the non-GC corrected CAI metric (Sharp et al 1987). Correlations with gene expression are outside the scope of the current work, which is focused on producing a single value of codon adaptation per species. It is indeed possible that our general approach could be useful in future work investigating differences among genes.

      One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences per species. Our approach thus remains appropriate even for scenarios e.g. where different cell types, different environmental conditions, and/or different genes have different codon preferences (Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne. Through use of a more sensitive methodology, we believe we have expanded our ability to detect codon adaptation into animals of somewhat higher Ne than in previous work.

      We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. In our revisions, we will more clearly acknowledge that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. We believe our approach worked despite this because the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene.

      Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. we believe the complications documented by Gingold et al. 2014 cited above are pertinent, but incorporating them into simulations would require a complex set of assumptions.

      In response to the final comment of reviewer 2, the reason that we hard-coded genome-wide %GC values is that we took them from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan conducted in that work, of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The code used in the current work to calculate the intergenic %GC, as well as that used to calculate amino acid frequencies, is located at https://github.com/MaselLab/Codon-Adaptation-Index-of-Species. We agree that more user-friendly tools would be useful, but producing robust tools falls outside the scope of the current manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      “Liu et al present a very interesting manuscript investigating whether there are distinct mechanisms of learning in children with ASD. What they found was that children with ASD showed comparable learning to typically developing children, but that there was a difference in learning strategy, with less plasticity and more stable learning representations in children with ASD. In other words, children with ASD showed similar learning performance to typically developing children but were more likely to use different learning rules to get there. Interestingly greater fMRI-measured brain plasticity was associated with learning gains in typically developing children, whereas more stable (less plasticity) neural patterns were associated with learning gains in autistic children. This was mediated by insistence on sameness (from the RRIB) in the ASD group. This is a good paper, well reasoned and with strong methods.”

      We appreciate the positive comments from the reviewer.

      1.1) “The biggest issue is related to subject numbers...With n=35 it is only possible to make a generalized statement about autism.”

      Thank you for this comment. Although the sample size in the current study was modest, we would like to note that acquiring high-quality behavioral and brain imaging data at multiple time points a is a challenge in children with ASD. The current training study with unique longitudinal behavioral and brain imaging data provides an unprecedented opportunity to investigate the potentially atypical training-induced learning and brain plasticity in children with ASD relative to TD peers. To our knowledge, the present longitudinal sample is largest of its kind in studies of neurocognitive function in children with ASD. We have acknowledged these points in the revised Discussion section (Page 15), including the following statement:

      “First, larger sample sizes are required to further characterize heterogeneous patterns of atypical learning and whether the findings can be generalized to a broader ASD population.” (Page 15)

      1.2) “[Another] issue is related to [heterogeneity of autism-related findings]. For example, take the following statement from the results: "while most TD children used the memory-based strategy most frequently following training, nearly half of the children with ASD used rule-based strategies most frequently for trained problems." Is this the heterogeneity of autism at play, or the noisiness of the task and measures?

      We hypothesize that group differences in changes in strategy use following training are due to atypical learning style or high level of inter-individual differences, i.e., greater heterogeneity, in autism, rather than noisiness of the measures. This hypothesis is based on the fact that we used the same tasks before and after training and a standardized training protocol across the two groups, which (i) allowed us to systemically examine atypical learning of these tasks in children with ASD compared to TD children and (ii) provided ecologically valid measures. This design minimized potential differences in measurement error between the two groups. We have clarified these points in the revised Introduction section (Page 4), including the following statement: “Crucially, we employed identical tasks before and after training and a standardized training protocol across the two groups. This approach enabled systemic analysis of learning in children with ASD relative to TD children.” (Page 4)

      1.3) “Conceptually, is it realistic to expect a unitary learning strategy in all of autism?

      We agree with the sentiment expressed by the reviewer, and indeed this notion led to the hypothesis that our study was to test. We hypothesized that children with ASD would not show a unitary learning strategy at this stage of development examined. Our results reveal that a disproportionate number of children with ASD use a rule-based strategy, reflecting atypical learning styles.

      1.4) “Lastly, the task itself can only be solved in a subset of autistic children and therefore presents a limited view of the condition.”

      We thank the reviewer for this important point and agree that additional studies tailored to more severely affected children with ASD are required for a more comprehensive characterization of learning in children with autism.

      Reviewer #2 (Public Review):

      “Overall, the authors sought to determine whether children with autism spectrum disorder (ASD) or typical development (TD) would both benefit from a 5-day intervention designed to improve numerical problem-solving. They were particularly interested in how learning across training would be associated with pre-post intervention changes in brain activity, measured with functional magnetic resonance imaging (fMRI). They also examined whether brain-behavior associations driven by learning might be moderated by a classic cognitive inflexibility symptom in ASD ("insistence on sameness"). The study is reasonably well-powered, uses a 5-day evidence-based intervention, and uses a multivariate correlation-based metric for examining neuroplastic changes that may be less susceptible to random variation over time than conventional mass univariate fMRI analyses. The study did have some weaknesses that draw into question the specific claims made based on the present set of analyses, as well as limit the generalizability of the findings to the significant proportion of individuals with ASD that are outside of the normative range of general cognitive functioning. The study also found minimal evidence for transfer between trained and untrained mathematical problems, limiting enthusiasm for the intervention itself. The majority of the authors' claims were rooted in the data and the team was generally able to accomplish their aims. I am sensitive to the fact that one of the main limitations I noted would have significant ethical implications-i.e. NOT offering potentially beneficial numerical training to children randomized to a sham or control group. I think the authors' work will represent a welcome addition to a growing corpus of studies showing similar neuropsychological test performance across several cognitive domains (e.g. learning, memory, proactive cognitive control, etc.) in ASD and TD. However, these relatively preserved cognitive functions still appear to be implemented by unique neural systems and demonstrate unique correlations to clinical symptoms in youth with ASD relative to TD, which may have implications for both educational and clinical contexts.

      We thank the reviewer for the positive feedback and helpful suggestions.

      Reviewer #3 (Public Review):

      “Liu and colleagues examined learning and brain plasticity in neurotypical children and children with autism. The main findings include autistic children relying more on rule-based versus memory-based learning strategies, altered associations between learning gains and brain plasticity in children with autism, and insistence on sameness as a moderator between brain plasticity and learning in autism. Although the sample size is limited in this study, the findings provide a significant contribution to the field. The major strengths of this paper include an extensive pre and post training protocol, a detailed methods section, rationale behind the study, investigation of a potential moderator of learning gains and neural plasticity, and investigation of "neural plasticity" in association to learning in autism. Weaknesses of the study include a small sample size, and some missing information/analyses from the study. The authors laid out four clear aims of the study. They investigated these aims and the analytic approaches were appropriate. The paper included significant findings toward better understanding the mechanisms underlying differences in learning strategies and behavior in children diagnosed with autism spectrum disorder. This holds significant value in educational and classroom settings. Further, the investigation of a potential moderator of learning gains and neural plasticity provides a potential mechanism to improve the relationship. Overall, this is a significant contribution to the field. The autism literature is limited in understanding differences in learning styles and the underlying neural mechanisms of these differences.”

      We thank the reviewer for the positive comments and detailed suggestions.

    1. While it may be obvious that there are specific technologies for those with different abilities that help them engage with their learning, never forget that how we choose existing learning technologies is probably the first step in ensuring access to our learners, and potentially presenting barriers to their learning. Learning Management Systems (LMSs) like Moodle, Canvas, Blackboard Learn, D2L Brightspace, Google Classroom and other technologies should have accessibility features built in as well – if they don’t, these foundational systems will present barriers for our learners. If we’re choosing to use ad-hoc or additional technologies that sit outside what our institutions have set up for us (e.g., Kahoot, Canva, etc.) it’s up to us to assess what technologies we use for accessibility.

      The key takeaway I think

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements

      We were naturally pleased to read the enthusiasm coming from both reviewers. Both mentioned that an extension to experimentation in cells would increase the impact of the study, even though both recognize that the biophysical and biochemical experiments constitute a study that is significant and interesting to a broad readership.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript by Bryan et al., describes the use of Hydrogen/Deuterium-exchange Mass Spectrometry (HXMS) as a powerful tool to identify key amino acid residues and associated interactions driving liquid-liquid demixing. They have particularly focused on the Chromosomal Passenger Complex (CPC), an important regulator of chromosome segregation, which has recently been shown to undergo liquid-liquid demixing in vitro. Their work presented here allowed them to identify a few key electrostatic interactions as molecular determinants driving the liquid-liquid demixing of the CPC. Their work also shows that crystal packing information of protein molecules, where available, can provide valuable insight into likely factors driving liquid-liquid demixing.

      Major comments:

      [#1] A previous study by Trivedi et al., NCB 2019 identified an unstructured region in Borealin (aa residues 139-160) as the main region driving the phase separation of CPC. Interestingly, this region only shows a moderate reduction in HX upon liquid-liquid demixing. But no experiments or discussions related to this observation are presented in the current version of the manuscript.

      In the Trivedi et al. paper, the authors were careful to state that the region of borealin between 139-160 contributed to phase separation, but there was clearly a remaining propensity to phase separate in vitro in the mutant. Thus, it is fully expected that there should be other regions in the complex that contribute to phase separation. It was satisfying that this region was independently identified in the hydrogen-deuterium exchange experiments and we suggest that a “moderate” reduction is consistent with a protein condensate having liquid properties. Since this region was already characterized we have focused our work in this paper to the new region identified by the hydrogen-deuterium exchange experiments.

      [#2] In the absence of cellular data on if and how these mutations (within the triple-helical bundle region) affect CPC's ability to phase separate in cells, the implication of this work is very limited - One can't say for sure these are interactions driving phase separation of CPC in a cellular environment. In the absence of any cellular data with the mutants described here, much of the discussion on the possible roles of CPC phase separation in cells does not appear relevant to this manuscript. I would suggest that the authors focus mainly on highlighting the power of using HXMS as a tool to characterise the molecular determinants of liquid-liquid demixing at a relatively high resolution.

      We have now added cellular data in the form of one of the key experiments used to explore CPC liquid-liquid demixing utilizing the Cry2 optogenetic system for inducible dimerization. The results of testing WT Borealin versus the mutant we identified is defective in droplet formation are shown in the all new Fig. 6. Some relation of our overall findings, encompassing observations made with purified components and now in cells, to the cellular function of the CPC is pertinent. In light of the reviewer comments, we have also reduced this aspect in the discussion (see the substantial edits on pg. 12).

      Minor comments:

      [#3] The authors should ensure that the introduction cites relevant literature thoroughly. For example, where the potential role of Borealin residues 139-160 in conferring phase separation properties to the CPC is mentioned, the authors failed to cite Abad et al., 2019, which showed the contribution of the same Borealin region in conferring nucleosome binding ability to the CPC.

      We have made this particular change on pg. 4 and also have gone through to ensure we are appropriately citing relevant literature.

      Reviewer #1 (Significance (Required)):

      This is a highly relevant and significant work, particularly considering the rapidly growing list of examples for Phase separation of proteins/protein assemblies and their potential biological roles (in spite of ongoing debates in the field about the cellular relevance of several phase separation claims). The data presented in this manuscript are solid and convincingly establish HXMS as a useful tool to characterise molecular interactions driving liquid-liquid demixing. Considering its applicability to characterise wide-ranging protein assemblies implicated in phase separation, this work will be of interest to a broad readership.

      We thank the reviewer for the strong praise of the significance of our study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, using the technique of hydrogen/deuterium-exchange mass spectrometry (HXMS), the authors have tried to gain insights into the structure of the chromosomal passenger complex (CPC) within the phase separated chromatin body, known to regulate chromosome segregation in mitosis. The CPC phase separated compartment comprises three regulatory and targeting subunits, INCENP, Survivin, and Borealin, forming a three-helix bundle hetero-trimer. By measuring changes in the polypeptide backbone dynamics of this trimeric INCENP/Survivin/Borealin complex, in the liquid-liquid de-mixed state in comparison to its soluble state, using HXMS measurements, the paper puts forward high-resolution structural details of the phase separated CPC. Using a step-wise mutagenesis approach in conjunction with the information from HXMS measurements and previous crystallographic data, this work also identifies distinct regions/interfaces within this complex harboring crucial salt bridges, which directly contribute toward the liquid-liquid demixing of the CPC. Comments: 1) "The three non-catalytic subunits of the CPC (INCENP1-58, Borealin, and Survivin) form soluble homotrimers that have a propensity to undergo liquid-liquid phase separation.8 " Do the authors mean the hetero-trimeric CPC?

      Yes, we meant heterotrimers. It is now corrected.

      2) For better clarity, the authors can indicate the residue numbers of each of the components INCENP, Survivin, and Borealin in the CPC trimeric helix-bundle crystallographic structure in Fig 1.

      These are included on the revised Figure 1A.

      3) "In the condition we identified, 90% +/- 5% of the ISB protein was found within the rapidly sedimenting droplet population (Fig. 1C)." The authors should include the time-point corresponding to the gel shown in Fig 1C.

      This information is now directly labeled in Fig. 1C.

      4) Prior to the HXMS experiments on the phase-separated ISB protein complex, were the samples subjected to sedimentation to separate the dispersed from the condensed droplet phase? Since several time points after formation of phase-separated ISB complex have been characterized to compare and contrast between the dispersed and the droplet phase, the authors can consider performing a time-dependent sedimentation assay to ascertain the fraction of the ISB complex in the droplet phase.

      The HXMS experiments were not performed on sedimented samples, so this complication in our HX workflow is not necessary. We note that the sedimentation that we include in our study (Figs. 1C, 5E, and S6), involves centrifugation for 10 minutes, and that length of time presents a substantial design challenge to our HX experimentation. We considered it at the outset of our study, but, in the end, our study was facilitated by our finding early on that this separation step was unnecessary. Further, we note that we report statistically significant differences at the earliest HX timepoints in the areas prominently protected from HX upon droplet formation (10 and 100 s; see Fig. 1C for an example). Indeed, we do not observe broadening of our HXMS spectra (examples shown for all timepoints, Fig. 2B,F) that would be expected if there were a large degree of mixed states (i.e. a large population of molecules in the free protein state and a large population of molecules in the droplet state) each having different HXMS rates. One can imagine that this sort of envelope broadening behavior (“EX1-like”) could be observed in other samples where there are multiple substantially populated states of a protein present at a particular timepoint, but this is not what we observe in the experiments we performed in this study.

      5) "At the 100 s timepoint, the most prominent differences between the soluble and droplet state were located within the three-helix bundle of the ISB, with long stretches in two subunits (INCENP and Borealin) and a small region at the N-terminal portion of the impacted a-helix in Survivin (Fig. 1F)" According to Fig 1F, at the 100 s time-point, there is also another small region in Survivin (approximately residues 12-20) that exhibits slower exchange rates in the droplet state. Can the authors comment on whether this region undergoes any conformational change or if it exhibits homotypic interactions retarding the hydrogen/deuterium exchange rates in the droplet phase?

      Our general approach in the Black lab over the past decade-plus of HXMS has been to restrict our conclusions whenever practical to do so to the consensus behavior. This permits multiple partially overlapping peptides to be used to generate confidence in the changes that drive our conclusions. The reviewer carefully recognizes the behavior of a single peptide (in 2 different charge states) that might have actual changes relative to some of the longer peptides that it partially overlaps with, and smaller changes can yield larger percentage changes on small peptides. We have chosen to not include this single peptide in the text describing our main conclusions from the work to be consistent with our longstanding strategy for rigorous interpretation of HXMS data. Our conclusion is that this region of not substantially changed upon droplet formation.

      6) The authors mention that: "By the latest timepoint, 3000 s, there was some diminution in the number of droplets which may indicate the start of a transition of the droplets to a more solid state (i.e., gel-like)." As a result of this time points beyond 3000 s have not been used for comparing Hydrogen/Deuterium exchange rates in the condensed droplet phase with the soluble state. Can the authors comment on what happens to the nature of these specific interactions between the components of the CPC in the 'gel-like state'? A combination of both non-specific weak interactions as well as strong site-specific interactions between macromolecular components has been widely known to contribute towards the formation of several phase-separated compartments. It will be interesting to know the perspective of the authors on what sort of interactions get populated within these compartments to give rise to a more solid gel-like state. At this later time points, do the droplets exhibit reversibility under higher ionic strength conditions? Do the authors have some data to show how the material property of these droplets evolve as a function of time?

      We offered the idea of a transition to a more solid state to the reader because it was a reasonable conclusion, although challenging to prove (something the Stukenberg lab is actively working on, though, see our response to point #9, below). The vast majority of our conclusions in the paper, and essentially all of what we emphasize are the important ones, are based on earlier timepoints where this is not an issue. Thus, we find an extended study of the late-developing features in our droplets something more appropriate for separate studies outside the scope of the current one.

      7) "Examination of the entire time course shows that during intermediate levels of HX (i.e., between 100-1000 s), this region takes about three times as long to undergo the same amount of exchange when the ISB is in the droplet state relative to when it's in the free protein state (Figs. 2B, C and Supplemental Fig. 2). Upon droplet formation, HX protection within Borealin is primarily located in the interacting a-helix and is less pronounced at any given peptide when compared to INCENP peptides (Fig. 2E). Nonetheless, similar to INCENP peptides, it still takes about twice as long to achieve the same level of deuteration for this region of Borealin in the droplet state as compared to the free state." How do the hydrogen/deuterium exchange rates and extent of deuteration in the N-terminal part (residues 98-142) of the Survivin polypeptide chain, constituting the three-helix bundle core, evolve as a function of time? Also, how do the exchange rates for peptides in this region compare with those of the other protein subunits Borealin and INCENP and what inference can be drawn from these differences?

      The peptides from a.a. 98-142 of Survivin exhibit HX protection through the timecourse (and before and after droplet formation) consistent with a folded a-helix (and comparable to the overall HX behavior of the other helices in the 3-helix bundle of the ISB)(Fig. S2). There is subtly slower HX in the droplet state for this region at later timepoints for this portion of Survivin (Fig. S4), and this is explicitly highlighted in the Results section on pg. 6.

      8) The authors mention that mutating either all the glutamate residues or combinations of these residues on the acidic patch on the INCENP subunit, to positively charged residues, causes a decrease in the propensity of phase separation, as formation of salt bridges with Borealin subunit from adjacent hetero-trimeric complexes appears to be the major driving force for phase separation. Can the authors elaborate on how the reduction in the phase separation propensity of these salt-bridge inhibiting mutants might be directly affecting the subsequent localization of the CPC to the inner centromeres? Can the authors supplement their existing in vitro data with further in vivo characterization of CPC recruitment or localization to the centromeres, for each of the constructs exhibiting reduced propensity of phase separation?

      As we state in the introduction, the recruitment to centromeres requires established ‘conventional’ targeting via the specific histone marks to which we refer. We also cite the correlations demonstrated between prior mutations in Borealin (impacting aa 139-160) that both disrupt phase separation in vitro and reduce CPC levels at the centromere. In our revision, we have added what we feel are the most critical cell-based experiments to relate to our HX studies in the new Fig. 6. We are preparing for future studies to study mutants arising from our HX studies, and our plans are to pursue gene replacement approaches that will rigorously test the impact on the mitotic function of the CPC. In the process of these future studies, the impact on localization will be measured, too. As others in the field are investigating the correlations between observations made with purified components and those made in the cell, and where there are nuances at play in how the actual experiments are conducted, we are certain our cell-based studies will extend far beyond the timeframe appropriate for our HX-focused study. Rigorous cell-based studies of mitotic functions are what is needed, however, and we have made our plans with that in mind.

      9) It might be really interesting for the authors to look at the recent preprint from Hedtfeld et al. 2023 Molecular Cell, (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4472737). In this preprint they have recombinantly purified a stoichiometric trimer (referred to as CPC-TARGWT) comprising full length survivin, borealin, and a 1-350 residue fragment of INCENP (instead of 1-58 used in this study) and have tried to assess if any correlation exists between the in-vitro phase behaviour of CPC-TARGWT mutants and their corresponding recruitment to the inner centromere, to form a phase separated compartment. Targeting residues in the BIR domain of Survivin involved in interactions with the N-terminus of the Histone H3, Shugosin 1 or in the recognition of H3T3phos, and substituting them with Alanine or completely deleting C-terminal domain of Borealin (a region implicated in CPC dimerization and centromere recruitment), was found to result in poor centromere localization, although the in vitro phase separation properties of these constructs were found to be indistinguishable, suggesting no evident correlation between the two phenomena. Thus it might be a useful piece of data to correlate the phase separation propensities of the ISB complex variants used in this current study with the extents of their in vivo recruitment to the inner centromere. This maybe beyond the scope of the paper, but it would be good to comment on this.

      For the correlation studies, please refer to our response to point #8, above. From our reading of the June 2023 preprint that the reviewer mentions, the main concern raised by the authors is questioning whether the region first identified in the Trivedi et al paper in Borealin (aa 139-160) has a role in phase separation. As the reviewer noted, Hedtfeld et al report using a complex that includes more of the INCENP protein than used in the Trivedi et al study, complicating the direct comparison between studies. Using the data in figure 5E of the Hedtfeld et al preprint, the authors suggest that the condensate formation of their version of the Borealin mutant D139-160 in vitro complex has similar phase separation properties as the wild type. However, we note that in our inspection of these data we see numerous differences. The mutant forms rounder, and larger condensates than WT and have reduced concentration of protein (less bright intensity). Finally only the WT protein has a “grape bunch” morphology. We note that unpublished data in the Stukenberg lab show these same differences can represent a defect in liquid demixing properties of a version of the purified CPC. While it is intuitive that larger condensates represent more phase separation, the unpublished data mentioned above suggests the opposite is true for the CPC. In particular, the data from the Stukenberg lab suggest the size of a droplet is mostly governed by the amount of droplet fusion in the first minutes after dilution and thus is limited by relatively rapid hardening of the complex. We note that in the course of discussions with the corresponding author of the preprint mentioned by the reviewer we did apprise them of the unpublished observations mentioned, above, in case they saw fit to include in their ongoing studies what would seem to be critical measurements (e.g. measuring circularity, droplet size, droplet intensity, and FRAP) to assess our suspicion that their construct contains a portion of INCENP that can accelerate condensate formation. If true, the Hedtfeld et al data are fully consistent with the Borealin mutant D139-160 having a significant condensate formation potential than the WT protein.

      10[A]) "Our data also provide an important clue about the previously identified region on Borealin that is required for liquid-demixing in vitro and proper CPC assembly in cells 8. Specifically, our data (Fig. 1F, Supplementary Figs. 2, 4A) suggest this region of Borealin adopts secondary structure that undergoes additional HX protection in the liquid-liquid demixed state" This data fits perfectly with previous studies from Trivedi et al. (2019), which states that deletion of the Borealin 139-160 fragment obliterates its phase separation in vitro and also reduces the accumulation of CPC at the centromere. On the contrary, in the recent preprint from Hedtfeld et al. 2023 Molecular Cell, they have shown that the phase separation behaviour of their reconstituted CPC-TARGWT harboring the Borealin 139-160 deletion mutant was found to be indistinguishable from the WT. Can the authors comment on what might be the reason for this difference? Is it possible that this central Borealin region is involved in interactions with the additional fragment of INCENP subunit used in the helical bundle reconstitution, or with other centromere component proteins, whereby the deletion of region is causing inefficient recruitment to the inner centromere? This can be elaborated in the discussion section of the manuscript.

      This is discussed in the response to #9, above. Through this format (the Review Commons procedure for public posting of author responses before submission of the study to a journal), our comments herein will be made public for those with the most interest in comparing our data to what is has been posted on preprint servers. We think that is the most appropriate for now, with more to surely come when the aforementioned results from the Stukenberg lab are posted/published and, hopefully when there is more information about the nature of the droplets reported in the Hedtfeld et al., study.

      10 [B]) It is also well known that in addition to these electrostatic interactions, the core of the ISB helical bundle is formed by an extensive network of hydrophobic interactions. Have the authors ever looked into how perturbing any of these intra-trimeric complex hydrophobic interactions affect their ability to phase separate and perform their subsequent function?

      We think there is some confusion, here. The electrostatics we focus on are between heterotrimers rather than within them. We certainly would predict that disrupting the hydrophobic surface that generates a stable heterotrimer would, in turn, disrupt individual heterotrimers. Our study assumes a stable heterotrimer as a starting point, so we view this type of perturbation as unrelated to our conclusions.

      11) The phase separated CPC compartment is known to enrich several other inner centromere proteins such as the Histone H3, Sgo1, the histone H3T3phos, among others. Have the authors tried to increase the complexity of the reconstituted CPC scaffold by incorporating more components to look into whether that changes any of the interaction interfaces between the ISB trimeric complexes within the condensed phase? Can this CPC compartment be reconstituted using a bottom-up approach?

      We are glad that our studies with a reductionist biochemical reconstitution approach have inspired the questions that require increased complexity. They are now warranted based on the advance we have made in the present study, and hopefully will form the basis for future, separate studies.

      Overall, this paper brings forward a useful technique to probe the conformational landscape of proteins in the condensed droplet phase and compare it with its dispersed phase. This paper serves as an interesting read showing how specific salt-bridge interactions between multiple stoichiometric protein complexes can be the driving force for phase separation.

      Reviewer #2 (Significance (Required)):

      In this manuscript, using the technique of hydrogen/deuterium-exchange mass spectrometry (HXMS), the authors have tried to gain insights into the structure of the chromosomal passenger complex (CPC) within the phase separated chromatin body, known to regulate chromosome segregation in mitosis. The CPC phase separated compartment comprises three regulatory and targeting subunits, INCENP, Survivin, and Borealin, forming a three-helix bundle hetero-trimer. By measuring changes in the polypeptide backbone dynamics of this trimeric INCENP/Survivin/Borealin complex, in the liquid-liquid de-mixed state in comparison to its soluble state, using HXMS measurements, the paper puts forward high-resolution structural details of the phase separated CPC. Using a step-wise mutagenesis approach in conjunction with the information from HXMS measurements and previous crystallographic data, this work also identifies distinct regions/interfaces within this complex harboring crucial salt bridges, which directly contribute toward the liquid-liquid demixing of the CPC.

      Overall, this paper brings forward a useful technique to probe the conformational landscape of proteins in the condensed droplet phase and compare it with its dispersed phase. This paper serves as an interesting read showing how specific salt-bridge interactions between multiple stoichiometric protein complexes can be the driving force for phase separation

      We thank the reviewer for the positive comments on the significance of our study.

    1. Residents crossing between islands during a rising tide on Majuro, Marshall Islands, in 2015. Majuro is home to former residents of Bikini Atoll who were relocated in the 1940s.Credit...Josh Haner/The New York TimesBy Pete McKenzieMay 3, 2023The golden sand of Bikini Atoll is laced with plutonium. The freshwater is poisoned with strontium. The coconut crabs contain hazardous levels of cesium.In the 1940s and ’50s, the U.S. government used this coral reef, in the Pacific nation of the Marshall Islands, for testing nuclear weapons. Radioactive residue has left Bikini uninhabitable to this day, forcing those whose families once lived on the atoll into exile on a handful of other Marshallese islands and in the United States.Recognizing the damage its testing caused, the U.S. government established two trust funds in the 1980s to help pay for Bikinians’ health care, build housing and cover living costs. In 2017, after a campaign by Bikini leaders for greater autonomy, the Trump administration announced that the government would lift withdrawal limits and stop auditing the main fund, then worth $59 million.Six years later, only about $100,000 remains, and the Bikini community is in crisis.Anderson Jibas, the mayor of the council that oversees the displaced Bikini community, made a series of questionable purchases on Bikini’s behalf, including of a large plot of land in Hawaii and a fleet of new vehicles. He has defended some of the purchases as investments against climate change, as necessary to support isolated Bikinians and as attempts at revenue-generating projects.AdvertisementSKIP ADVERTISEMENTMr. Jibas has also acknowledged using trust fund money for personal expenses and has been accused by a top Marshall Islands official of receiving kickbacks from an investment manager — a charge Mr. Jibas denies.ImageA U.S nuclear bomb test at Bikini Atoll in 1946.Credit...Universal Images Group, via Getty ImagesWith the fund virtually depleted, the council’s roughly 350 employees are no longer being paid. Monthly payments of about $150 each to the community’s 6,800 members — a vital lifeline that helped cover food and rent among a population with high rates of poverty — have ceased.The emergency highlights the lasting consequences of decades of U.S. nuclear testing in the Pacific, including lingering questions about the American commitment to address that legacy, an undertaking made more difficult by pervasive fraud and mismanagement in the region.“It’s a disaster,” said Tommy Jibok, a former member of the Bikini council who challenged Mr. Jibas in an election in 2019. “They told us we would be sitting and sleeping on money. Look what is happening now. We’re sleeping on nothing.”AdvertisementSKIP ADVERTISEMENTIn 1946, the United States relocated the 167 inhabitants of Bikini to clear the way for nuclear tests that it said would “end all world wars.” It then left them virtually alone on a small, desolate island, where many nearly starved. In 1948, the islanders were moved again.Over 12 years, the United States tested 23 nuclear bombs in Bikini. In 1968, President Lyndon B. Johnson announced that the Bikinians would return home. But after scientists found that radiation levels remained dangerously high, the United States in 1978 evacuated the almost 150 people who had chosen to go back. The Marshall Islands gained independence from the United States the next year.In 1982, the American government established a $25 million resettlement fund to clean up Bikini and support its people. In 1987, it created a second fund to provide annual payments directly to Bikinians. A year later, it contributed an additional $90 million to the resettlement fund. American officials administered the money and could veto withdrawals.Bikini representatives argued that the resettlement fund contained too little money to remedy the atoll’s radioactivity. They used the funds instead to support the exiled Bikinians.Editors’ PicksWhy You Can’t Stop Reading About Sofia Vergara’s SplitWould You Drink Wastewater? What if It Was Beer?Does My Fiancé Love Me, or Does He Just Want U.S. Citizenship?AdvertisementSKIP ADVERTISEMENTImageMike Pompeo, then the secretary of state, visiting in the Marshall Islands in 2019. With him is Hilda Heine, the Marshallese president from 2016 to 2020.Credit...Jonathan Ernst/Agence France-Presse — Getty ImagesBut the Bikini leaders were frustrated by American officials’ refusal to release more than a few million dollars each year. The struggle culminated in 2016 with the election of Mr. Jibas, who promised to take control of the resettlement fund. (The other fund is overseen by independent trustees.)AdvertisementSKIP ADVERTISEMENTDuring a 2017 congressional hearing, Mr. Jibas explained that Bikinians “​​know far better than the intermediaries or distant agencies of the United States what is needed to make the lives of the displaced population more bearable.”Douglas Domenech, at the time an assistant interior secretary, announced that the Interior Department would relinquish control of the resettlement fund to “restore trust and ensure that sovereignty means something.”Mr. Jibok, the former Bikini council member, had a different interpretation: that U.S. officials wanted to “wash their hands clean” of responsibility for Bikinians.Whatever the motivation, the result was a rapid increase in council spending under Mr. Jibas, from $7.6 million in 2016 to $25.7 million in 2018, according to audits from the time. Bank statements provided by Gordon Benjamin, a lawyer for the council, show that the fund, worth $59 million in 2017, was down to just $100,041 in March of this year.AdvertisementSKIP ADVERTISEMENTMany of the council’s purchases were popular, including of a small aircraft and two cargo ships to help supply isolated Bikinians, as well as construction equipment to build protections against rising seas that threaten low-lying Pacific islands because of climate change.But there were also more dubious purchases: $4.8 million for 283 acres of land in Hawaii; $1.3 million for an apartment complex in the Marshall Islands’ capital, Majuro; and multiple new vehicles for the personal use of Bikini council members, according to Mr. Benjamin. Mr. Jibas also introduced an annual $100,000 “representation package” to fund his regular trips to the United States.ImageIsles that form part of Majuro, the Marshall Islands’ capital. One of the purchases made with the resettlement fund was an apartment complex in Majuro.Credit...Josh Haner/The New York TimesMr. Jibas has said he wants to develop housing in Hawaii for rent or sale, but no development has taken place yet. The Majuro apartment complex was purchased as an investment property, but it appears to be losing money so far.Lani Kramer, a Bikinian who previously worked as the council’s city manager and is now challenging Mr. Jibas for the mayoralty, said Mr. Jibas and council members had used public funds for personal spending. “They were bringing receipts for diapers, chewing gum,” Ms. Kramer said. “It was obviously not for the people, it was for their own grocery shopping.”AdvertisementSKIP ADVERTISEMENTThe Marshall Islands’ banking commissioner has also accused Mr. Jibas of accepting $50,000 from a local bank manager who is being prosecuted on suspicion of unlawfully investing Bikini funds and laundering money. The Marshallese auditor general did not respond to requests for comment about the allegations.Starting in 2018, Mr. Jibas refused to disclose council finances to the Marshall Islands’ auditor general, prompting the police to seize council documents in 2021. Late last month, a spokesman for the Interior Department said it had written to bank officials seeking information about the fund and to Mr. Jibas requesting the council’s recent budgets.That request came after Jack Niedenthal, an American expatriate who served as the Marshallese health secretary, wrote to the Interior Department warning about the depleted trust fund and asking the department to intervene. He was subsequently fired for breaching diplomatic protocol by circumventing the Marshallese foreign ministry and the American Embassy.Mr. Jibas acknowledged in an interview that he occasionally used his representation package to buy food and other items for his family, which he said council staff members were aware of and had approved, but he denied taking money from the bank manager.ImageCollecting laundry on Ejit, an isle in Majuro. The money from the resettlement fund is nearly gone, and the Bikini community is in crisis.Credit...Josh Haner/The New York TimesAdvertisementSKIP ADVERTISEMENTMr. Jibas said in the interview that he was trying to access the independently controlled second fund, which now holds $28 million, to sustain council spending.According to Mr. Benjamin, starting in October 2021 the trustees of that fund permitted the council to withdraw roughly $13 million to fund its spending, but reversed their stance earlier this year and halted all payments out of the fund, including the regular living payments to Bikinians, to avoid further depletion. In the interview, Mr. Jibas said he also hoped to tap into new American funding to replenish the main fund.Earlier this year, the Biden administration promised to provide the Marshall Islands $700 million in one-time aid and to continue underwriting much of the government’s budget. Under a treaty, the United States controls the country’s defense policy, which the American government considers crucial to countering China in the region. The aid has not yet been approved, meaning Bikinians’ future remains uncertain.In a statement on behalf of Mr. Jibas, Mr. Benjamin said that the mayor’s critics were not pushing the United States hard enough for more funding.Mr. Jibok, who as a council member opposed Mr. Jibas’s efforts to gain control of the fund, said that the United States had done little to facilitate self-sufficiency in the Bikini community, leaving few financial safeguards in place.“I didn’t think we were ready,” Mr. Jibok said, “because I knew that we didn’t have anything in place to control” mismanagement or fraud.A version of this article appears in print on May 4, 2023, Section A, Page 4 of the New York edition with the headline: Bikini Atoll Leaders Blew Through Millions From U.S.. Order Reprints | Today’s Paper | Subscribe
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Centrioles are small cylindrical structures with roles in cell division, motility, and signaling. Typically, centrioles are highly stable structures which can persist for many cell generations. However, in some cells, such as the female germ line of many species, centrioles are programmed for elimination. This process is essential for maintaining centriole number from one generation to the next in sexually reproducing organisms, yet in nearly all species the molecular mechanisms underlying how centrioles are eliminated is unknown. The current study utilizes the nematode C. elegans to explore how centriole architecture changes during the elimination program in the female germ line. Using a suite of light microscopy techniques, the authors provide a stunning visual perspective of how centrioles are disassembled during oogenesis and show that removal of the central tube component SAS-1, a key regulator of centriole stability, is an early event in elimination. I have no major objections to the work and enthusiastically endorse its publication with the following minor revisions.

      Page 9 line 200: In the pcmd-1 mutant, the authors state that centriolar foci devoid of nuclei are present in rachis, but they do not mention in the text that there are also nuclei that lack centriole foci in early pachytene. This is mentioned in the figure legend, but I felt it was important enough to mention in the text.

      As per the reviewer’s suggestion, we will provide this information in the main text as well.

      Page 9 line 211. The authors found that in the absence of dynein heavy or light chain that centrioles remain associated with the nuclear envelope (rather than moving to the periphery). To me this was striking as dynein depletion in the embryo results in the opposite phenotype with centrioles losing attachment to the nuclear envelope and moving to the cell periphery (Gonczy et al. 1999 JCB 147:135). It might be worth pointing this out somewhere in the manuscript and speculating about the reasons for this difference.

      We will expand the Discussion section to better explain the difference of dynein’s involvement in the oocyte versus the embryo.

      Page 11 line 277: The authors state that elimination timing is not affected by the loss of SPD-5. This is a small but important point. It really is the absence of PCMD-1 and not SPD-5, as SPD-5 is still present in the cell. An alternative would be to say "in the absence of PCM" or "in absence of a pericentriolar accumulation of SAS-5".

      Fully agreed, we will modify the text accordingly.

      Figure 4D: Why does loss of PCMD-1 result in a delay in oocyte maturation as judged by RME-2 accumulation? This is not mentioned in the paper. Is this a general response to a loss of PCM or is this specific to a loss of PCMD-1?

      We realize that we were not sufficiently clear in explaining that RME-2 accumulation reflects the maturation state of oocytes. In the revised manuscript, we will clarify this point further and mention that a mild developmental delay (such as in pcmd-1(t3421ts) mutant animals) can impact the number of maturing oocytes present in the proximal gonad, and thereby lead to a slight shift in RME‑2::GFP distribution. See also related minor comment 2 of reviewer 2, and major comment 1 of reviewer 3.

      Figure 7 E and F. The authors measure the tubulin and SAS-4 intensity in wild-type and sas-1(t1521) embryos and conclude that microtubules and SAS-4 signals decay faster in the sas-1 mutant than in the control. To me, this is convinceingly the case with microtubules in panel E but I am not so sure this is the case with SAS-4 as shown in panel F. The differences in SAS-4 levels are much smaller between mutant and control. Could the authors provide statistical analysis to show how significant the differences are?

      We will provide the requested statistical analysis (which indeed shows significance).

      Page 15 line 363. I think this sentence should be reworded to: "Finally, we demonstrate that the central tube protein SAS-1 is the first of the factors analyzed here to leave centrioles..."

      In response to this suggestion and to the related comment of reviewer 2 (see below), we will rephrase this sentence to read “among the centriolar components analyzed to date, SAS-1 is the first to depart”.

      Reviewer #1 (Significance (Required)):

      The work contained in this manuscript represents a fundemental step forward in understanding the process of centriole elimination. The authors have carefully described the stepwise disassembly of the centriole including changes in the architechure during oogenesis. They have identified loss of the centriole stability factor SAS-1, as an early event in the elimination program and have found that in a sas-1 mutant, the centriole disassembles prematurely. They have also shown that loss of SAS-1 is followed by expansion of the centriole and ultimately loss of structural integrity. This work should be of interest to a broad range of scientists including those interested in centrosome dynamics, germ line development, and more generally cell biologists.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript Pierron et al. explore the mechanisms of centriole elimination during oogenesis in C. elegans. Centriole elimination is a common feature of oogenesis in many species, but it is relatively poorly understood and understudied. Here, the authors characterise the kinetics with which several key centriole and centrosome proteins are lost during this process in living worms, and they correlate this with an EM and expansion microscopy (U-Ex-STED) analyses of fixed tissues. They conclude that centriole elimination begins with the loss of SAS-1 from the central region of the centrioles, which correlates with the widening of the structure and the loss of the centriole MTs. A remnant structure containing several core centriole proteins remains, however, and this often ultimately detaches from the nuclear envelope and moves towards the plasma membrane in a MT-motor-dependent fashion before it dissipates (although detachment from the nucleus does not seem to be required for the eventual elimination of this residual structure). Intriguingly, centriole loss in this system does not appear to require the down-regulation of PLK activity, which is in contrast to the situation in Drosophila oogenesis.

      The manuscript is generally well written and the data is of a high quality and is logically and clearly presented. Although the ultimate mechanisms regulating centriole elimination remain obscure (i.e. what triggers the loss of SAS-1, and how is this regulated?), the data presented here will be of significant interest to the centriole/centrosome field and I am supportive of publication. I have a few points that the authors should consider prior to publication.

      Major comment:

      In the EM shown in Figure 5F the authors claim that the central tube of the centriole is disrupted, but the other elements (inner tube, MTs and paddlewheel) are not. I don't think this is as clear cut as the authors claim-at least from comparing the images of the one normal centriole (5E) and one centriole that is starting to be eliminated (5F). It seems much harder to distinguish the MTs and the inner tube in the image in 5F. Perhaps this is obvious to the authors as they have compared many more images, but I think they need to find some way of showing this more convincingly (a montage of multiple centrioles)?

      We understand that Figure 5F alone may have left the reviewer wondering whether the central tube is truly the first element to be disrupted during centriole elimination. We plan on strengthening this point by providing additional EM images as a Supplemental Figure.

      This same issue is compounded in Figure 6D where, using a different technique (U-Ex-STED), the authors claim that the centriolar distribution of SAS-1 is gradually disrupted as centriole elimination proceeds. It does look like the amount of SAS-1 has decreased from early prophase to late pachytene, but the central tube it stains doesn't look particularly disrupted and, if anything, the MTs look more disrupted (and also possibly of lower intensity, perhaps explaining why the ratio of SAS-1/tubulin doesn't change very much over these stages, as shown in Figure 6G).

      As the reviewer correctly noticed, there is some variability in central tube removal during oogenesis. In some cases, such as in the centriole on the right of the late pachytene panel in Fig. 6D, SAS-1 signal intensity diminishes uniformly, without apparent holes in the central tube. By contrast, in other cases, such as in the centriole on the left of the late pachytene panel, SAS-1 signal intensity diminution is accompanied by a loss of central tube continuity. We will clarify the writing and qualify our findings on this important point in the revised manuscript.

      These points are important, as throughout the manuscript the authors assume it as a fact that SAS-1 leaves the centriole early (which is clear), and that this leads to the specific loss of the central tube (which, at least on the basis of this data, is not so clear).

      As mentioned above, we will make certain that the results linking SAS-1 departure and central tube loss are explained in a clear and balanced manner in the revised manuscript.

      Minor comments:

      1. The authors state that the kinetics of GFP-SAS-7 or SAS-4 loss were not altered in pcmd-1 mutants (Figure 4A-C; Figure S3E,F). This doesn't look correct to me, as both proteins seem to stay brighter for longer in the mutant embryos (and this is quite easy to see on the quantification graph for SAS-7 in Figure 4C). It looks similar for SAS-4 from the pictures shown in Figure S3E,F, although this data is not quantified (and is there any reason why this data is not quantified?).

      As mentioned in response to reviewers 1 and 3, we will mention in the revised manuscript that a mild developmental delay can impact the number of maturing oocytes present in the proximal gonad, thereby leading to this slight shift in GFP::SAS-7 and GFP::SAS-4 persistence.

      1. The authors state that they demonstrate that SAS-1 is the first component to leave the disassembling centrioles. I would rephrase as they can't know this for sure (i.e. there could be some untested component that leaves earlier).

      In response to this suggestion and to the related comment of reviewer 1 (see above), we will rephrase this sentence to read “among the centriolar components analyzed to date, SAS-1 is the first to depart”.

      In the latter part of the Discussion the authors state that SAS-1 is critical for centriole elimination. I would rephrase, as this seems to suggest it is required for centriole elimination, which is not the case. It might also be worth discussing that the elimination machinery clearly seems to target SAS-1 early on, but we don't yet know what this machinery is or how it is regulated.

      We thank the reviewer for raising this important point, which we will implement in the Discussion accordingly.

      Reviewer #2 (Significance (Required)):

      The manuscript is generally well written and the data is of a high quality and is logically and clearly presented. Although the ultimate mechanisms regulating centriole elimination remain obscure (i.e. what triggers the loss of SAS-1, and how is this regulated?), the data presented here will be of significant interest to the centriole/centrosome field and I am supportive of publication. I have a few points that the authors should consider prior to publication.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Pierron et al. uses C. elegans oocytes to tackle a fundamental, yet heavily under-studied question in developmental biology: how are centrioles are eliminated during gamete formation/maturation? The paper's main conclusion is that SAS-1 (a key protein that make up the central tube in C. elegans centrioles) plays a critical part to regulate the timing of centriole elimination. I congratulate the authors on all the experiments related to SAS-1 part of their story, as they are done meticulously and in unprecedented detail (particularly all the fascinating EM and expansion microscopy data!).

      The paper also concludes that the Polo-like kinase family does not have a central role in this process, in stark contrast to a previous report demonstrating their importance for centriole elimination in Drosophila oogenesis (Pimenta-Marques et al. 2016 Science). Unfortunately, I am less convinced about this part of the paper, and half of my major comments below relate to the experiments/analyses in this regard. I was similarly not very enthusiastic about a part of story that I didn't find very relevant to the main point of the paper: half of the centrioles detach from the nucleus and translocate to plasma membrane prior to their elimination. I find the observations here quite epiphenomenal and lacking a direct/mechanistic relevance to either the PLK or SAS-1 part of the story. In my view, the authors should consider taking this part out.

      Regarding this last suggestion: we think that even if the movement of centrioles remnant is not essential for final removal, an account of this process provides important information about cellular dynamics during oocyte maturation. We note also that the two other reviewers did not raise this point, but leave the final decision to the editor.

      Overall, the piece is well written and organized, however it suffers from several shortcomings that preclude it from publication in its current form. I list my criticisms and suggestions below.

      Major comments:

      1. The authors state firmly at several places in the text that PCM components do not contribute to the timing of centriole elimination (e.g., lines 420-421), particularly given their experiments with Polo kinase paralogs. In my view, the data speaks otherwise. The centriole elimination process appears strikingly premature when SPD5__1__ (another PCM component) is overexpressed with the fluorescent transgene (Figure 1I). The opposite is also true - when another PCM component, PCMD-1, is knockdown by a temperature sensitive allele, the centriole elimination process is severely delayed 2 (Figure 4C). Even more extremely in the epistatic Polo mutant conditions (Fig. S3B), the centrioles do not appear to be eliminated at all__3__ (though the authors prefer to interpret this result differently in line 260-263, which could be flawed per my second comment below). How do the authors explain all these intriguing results? (underlining and numbering added above to clarify our responses point by point hereafter)

      1 > We respectfully disagree, since our quantifications show clearly that the SAS-7 signal disappears with an analogous timing in the line expressing RFP::SPD-5 (Fig. 1J) when compared to the other lines (Fig. 1D, 1F and 1H). The image shown currently for RFP::SPD-5 (Fig. 1I) is somewhat of an outlier compared to the others (Fig. 1C, 1E and 1G), and we will therefore provide a more representative specimen in the revised manuscript to avoid confusion.

      2 > As mentioned also in response to reviewers 1 and 2, we realize that we were not sufficiently clear in explaining that RME-2 accumulation reflects the maturation state of oocytes. In the revised manuscript, we will clarify this point and mention that a mild developmental delay (such as in pcmd-1(t3421ts) mutant animals) can impact the number of maturing oocytes present in the proximal gonad, and thereby lead to a slight shift in RME‑2::GFP distribution (as opposed to representing a delay in centriole elimination in pcmd-1(t3421ts) mutant animals).

      3 > We used plk-1(or683ts); plk-2(ok1936) double mutants to further test whether there might be premature elimination in this strong reduction-of-function condition compared to RNAi-mediated depletion. Although centriolar foci appear to remain for a longer time, these gonads are extremely disorganized, so that our conclusion regarding PLK-1 and PLK-2 are based primarily on the combined data shown in Fig. 3 and Fig. S3, which do not exhibit premature centriole elimination. We will rectify the writing to clarify these points.

      Also, I believe these claims (on the PCM components and their role in centriole elimination) will benefit from more nuanced statements. For instance, although Plk paralogs may not be necessary for the centriole elimination process, some other centrosome components clearly are. Paradoxically, the effects observed here (when disrupting or promoting PCM formation) has the totally opposite effects observed in Pimenta-Marques et al. 2016 Science. The 2016 piece claimed that the loss of PCM renders centrioles more vulnerable to losing their stability (which makes sense). How do the authors interpret their own results (i.e. that a disturbed PCM leads to slower centriole elimination, and vice versa)?

      As suggested by the reviewer, we will consider toning down claims regarding the role of PCM components in centriole elimination. Moreover, we will expand the section in the Discussion comparing our results with the published work of Pimenta-Marques et al. in Drosophila. This being written, as mentioned above, our findings do not suggest that removing the PCM (in pcmd-1(t3421ts) mutant animals) alters centriole elimination timing in C. elegans.

      I invite the authors to more carefully tread these nuances throughout their manuscript, which otherwise may cast major doubt on their claims.

      See point above.

      1. When investigating the role of Polo-like kinases, the authors assume that centriole elimination must follow (or correlate with) the dynamics of RME-2 (as a proxy for oocyte maturation). What guarantees that the centriole elimination process has to follow oocyte maturation? As far as I could tell, there is no direct evidence presented in the paper about this point. Do the authors have direct data (or reference to another work) that this trend must hold true at all times? I can readily see several places in the paper where this correlation doesn't appear to hold (e.g., in Fig. 4D the centriole elimination precedes the oocyte maturation under pcmd-1 condition).

        We will provide further data supporting the view that oocyte maturation and centriole elimination are correlated, whereby premature oocyte maturation mutants, such as let-60(ga89ts) and kin-18(ok395), exhibit precocious elimination.

      To correctly interpret their results on the epistatic Polo mutants, the authors could examine centriole elimination timing with mutants that can pre-maturely trigger or delay oocyte maturation (and do so without affecting the centriole biology itself).

      See above point.

      1. Lines 155-159 on the dimness of the SAS-6 signal make me worried about how successfully the transgenes were generated. Could the authors comment on, or perhaps extend in detail in the Methods section, through what assays the transgenes were validated? For example, did the authors try to rescue a SAS-6-/- with a SAS-6::GFP transgene? I would like to see further support for their validities.

      We will explicitly explain in the Material and Methods section that the SAS-6::GFP transgene indeed rescues the sas-6 null phenotype.

      If the authors can demonstrate the validity of their transgenes more reliably, could they possibly comment on the bunch of seemingly random SAS-6::GFP foci in Fig. 1G?

      We will comment on the presence of small SAS-6::GFP foci in the most mature oocytes, which correspond to potential precursors of centriolar elements later assembled in the embryo.

      1. Starting from line 204, the authors use the percentage of oocytes with detached centrioles (from the nucleus) as a proxy for movement to plasma membrane. This can be very confounding in my view (due to erroneous detachments etc.). As the authors explicitly state that the detachment is a process followed by a directed movement (with a defined velocity) towards the plasma membrane, this calls for a much better measurement in general. The authors should directly measure how far the centrioles are from the closest plasma membrane region in each condition they are examining (and should do this as a function of the "time progression" in different oocytes as they get closer to fertilization).

      As mentioned above, we think that an account of the movement of centriole remnants provides important information about cellular dynamics during oocyte maturation. However, given that this movement is not essential for the elimination of such remnants, it appears that providing additional complex 3D analysis as suggested by the reviewer will not benefit the present manuscript.

      Do the authors observe any propensity in sas1(t1521ts) oocytes as to where the centrioles are being degraded more prominently in the cytoplasm (i.e., when attached to the nucleus vs. when near the plasma membrane)? They could perform analyses à la their assessments in Fig. S2 and see whether they can extract some more information about this. In other words, I am wondering whether SAS-1 regulates the centriole elimination process more prominently at near the nucleus or near the plasma membrane.

      Centriole elimination occurs during pachytene in sas-1(t1521) mutant animals, when nuclei are packed in the gonad and surrounded by little cytoplasm. Therefore, even if foci were to detach from nuclei at this stage, we would not be able to quantify it with certainty. We will discuss these points in the revised manuscript.

      I ask this because the section about "centrioles moving to plasma membrane" appears epiphenomenal and rather random (i.e., the chances of a centriole moving to plasma membrane appears 50-50 under some control conditions - see control RNAi in Fig. 2G for example). Could the authors explore their existing data more closely (like suggested above), to see whether they could find intriguing correlations that tells us a little more about whether the centriole elimination at these two places are achieved differently? Otherwise, I frankly do not think this section contributes significantly to the essence of the story.

      We apologize for the confusion our writing seems to have generated. The chances of moving to the plasma membrane are not 50-50. The actual figure is 78.7% (reported as ~80% in the manuscript, line 187), and stems from the live imaging experiments where every travelling event can be monitored. By contrast, the analysis of fixed specimens is an underestimate as it provides only a snapshot of a dynamic process. We will expand the writing in the revised manuscript to clarify this point.

      Finally, the statements about a deterministic function for the plasma membrane re-localization should be toned down, because unlike what the authors claim in the paper (that ~80% of the centrioles move to plasma membrane), the control data (in Fig. 2B) clearly demonstrates that this number is more like ~60% (hence close to its chances being 50-50).

      Please see response just above.

      The paper carefully quantifies most of the data (for which I sincerely congratulate the authors!), however the experiments in Fig. S3 fall short of this. It would be nice if the authors could do the same here for completion.

      We will provide quantifications for Fig. S3E and S3F. However, due to the high disorganization of plk-1(or683ts); plk-2(ok1936) gonads, the presence of centriolar foci relative to oocyte position cannot be quantified accurately in this case.

      Minor comments:

      1. Sentence in lines 110-113 is too long and perturbs the flow. This should be shortened or be broken into better clauses. Perhaps the following way? "Prior analysis of centriole elimination in C. elegans oogenesis uncovered that this process takes place during diplotene..."

      The text will be modified accordingly.

      What are the orange arrowheads in the figure panels? They are not stated explicitly in the figure legends. My prediction was that they point to regions where centrioles are in another plane (though the overview is depicted from a different slice in the stack). Is this right? Either way, it will be useful to over-guide the reader on these orange arrowheads.

      The meaning of the orange arrowheads is explained in lines 520-521.

      If I am not wrong, the data/graph in Figures S2G and 2E are essentially the same (i.e., the data are duplicated). I couldn't find any statement in the figure legends indicating this. This should be added.

      Apologies about this oversight -the reviewer is correct and we will make a mention of this redundancy in the legend of Fig. S2.

      Some may consider the discussion on C2CD3 a little far-fetched, as this protein localizes to the distal end of centrioles (completely unlike SAS-1). Also, unlike the C. elegans centrioles, mammal centrioles do not contain a discernible central tube, casting doubt on the possibility of speculations made in the Discussion section. I suggest to remove out this paragraph, and instead to explicitly state whether the SAS-1 dependent mechanism could be applicable to other species is unclear.

      We will nuance these thoughts, further stressing their speculative nature, but intend to maintain them in some form as they provide a potential parallel that will be of interest to the human cell biology community.

      Could the authors add in their Discussion section some comment/thought on what the remaining GFP::SAS-7 pool (line 300-302) might possibly be? Curiously, there doesn't seem to be any structure associated with it in their EM tomograms, so it would be helpful to guide the reader further on this interesting finding.

      Although we would love to comment on this further, the remaining GFP::SAS-7 foci lack ultrastructural organization and do not exhibit recognizable electron densities. That this is the case will be stated explicitly in the revised manuscript.

      Reviewer #3 (Significance (Required)):

      General Assessment: This paper's strength is in its rigorous cell biology approaches to tackle a fundamental developmental biology problem. However, some of their conclusions are too firm while not being well-supported by the data, so the paper requires major revision before its publication.

      Advance: Discovery of a new molecular player in the centriole elimination process in worm oocytes, which can pave the way for future discoveries of centriole elimination mechanisms in other species. It is not yet clear whether the results will be broadly applicable, as some of the findings presented are in stark contrast to previous studies published on centriole elimination processes in Drosophila oocytes (e.g., Pimenta-Marques et al. 2016 Science). However, as summarized in the above section, these conclusions require further experimental evidence/support.

      Audience: Centriole elimination mechanisms are not widely studied, so I am not entirely sure whether this piece will be of immediate interest to the broad cell biology community. It will certainly be of general interest to several groups studying centriole elimination mechanisms, as well as developmental biologists trying to understand the oocyte maturation process.

      My expertise: Molecular and cellular mechanisms of cytoplasmic organization in development

    1. There’s tremendous value in coming into yourself as a person. Why wouldn’t that be true online, too? Recognizing that my online self was lacking, I decided to learn how to be myself on the internet.

      It is impossible to present yourself truly on the internet, to come into yourself as a person, when everything is highly self conscious and selective, as well as limited and misleading. In person we struggle to understand eachother. This may be because of the internet so I have no frame of reference, but how is the internet any better? Maybe because your inner dreams and thoughts can be shared alongside pictures of you - I am realizing what I know of internet representation of people is basically instagram and snapchat so I can't imagine a different reality. To accurately represent oneself you must be honest, a quality we are all incapable of to an extent, and I think the internet and its way of falsely representing things might create so much insecurity that this only pushes us further from honesty. You can't hide nearly as much when you are in front of people.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First, the authors would like to thank the reviewers and editors for their thoughtful comments. The comments were used to guide our revision, which is substantially improved over our initial submission. We have addressed all comments in our responses below, through a combination of clarification, new analyses and new experimental data.

      Reviewer #1 (Public Review):

      In this manuscript, the authors identified and characterized the five C-terminus repeats and a 14aa acidic tail of the mouse Dux protein. They found that repeat 3&5, but not other repeats, contribute to transcriptional activation when combined with the 14aa tail. Importantly, they were able to narrow done to a 6 aa region that can distinguish "active" repeats from "inactive" repeats. Using proximal labeling proteomics, the authors identified candidate proteins that are implicated in Dux-mediated gene activation. They were able to showcase that the C-terminal repeat 3 binds to some proteins, including Smarcc1, a component of SWI/SNF (BAF) complex. In addition, by overexpressing different Dux variants, the authors characterized how repeats in different combinations, with or without the 14aa tail, contribute to Dux binding, H3K9ac, chromatin accessibility, and transcription. In general, the data is of high quality and convincing. The identification of the functionally important two C-terminal repeats and the 6 aa tail is enlightening. The work shined light on the mechanism of DUX function.

      A few major comments that the authors may want to address to further improve the work:

      We thank the reviewer for their efforts and constructive comments, which have guided our revisions.

      1) The summary table for the Dux domain construct characteristics in Fig. 6a could be more accurate. For example, C3+14 clearly showed moderate weaker Dux binding and H3K9ac enrichment in Fig 3c and 3e. However, this is not illustrated in Fig. 6a. The authors may consider applying statistical tests to more precisely determine how the different Dux constructs contribute to DNA binding (Fig. 3c), H3K9ac enrichment (Fig. 3e), Smarcc1 binding (Fig. 5e), and ATAC-seq signal (Fig. 5f).

      We thank the reviewer for this comment, and agree that there were some modest differences in construct characteristics that were not captured in the Summary Table (6a). To better reflect the differences between constructs, we added additional dynamic range to our depiction/scoring, and believe that the new scoring system provides sufficient qualitative range to capture the difference without imposing a statistical approach.

      2) Another concern is that exogenous overexpressed Dux was used throughout the experiments. The authors may consider validating some of the protein-protein interactions using spontaneous or induced 2CLCs (where Dux is expressed).

      We agree that it would be helpful to determine endogenous DUX interaction with our BioID candidates. Here, we attempted co-IPs for endogenous DUX protein with the DUX antibody and were unsuccessful, which indicated that the DUX antibody is useful for detection but not efficient in the primary IP. This is why we utilized the mCherry tag for DUX IP experiments, which worked exceptionally well.

      3) It could be technically challenging, but the authors may consider to validate Dux and Smarcc1 interaction in a biologically more relevant context such as mouse 2-cell embryos where both proteins are expressed. Whether Smarcc1 binding will be dramatically reduced at 4-cell embryos due to loss of Dux expression?

      While we agree that it would be interesting to validate the in vivo interaction of DUX and SMARCC1 in the early embryo, it is not technically feasible for us to conduct the experiment, as the IP would require thousands of two-cell embryos, and we have the issue of poor co-IP quality with the DUX antibody.

      Reviewer #2 (Public Review):

      In this manuscript, Smith et al. delineated novel mechanistic insights into the structure-function relationships of the C-terminal repeat domains within the mouse DUX protein. Specifically, they identified and characterised the transcriptionally active repeat domains, and narrowed down to a critical 6aa region that is required for interacting with key transcription and chromatin regulators. The authors further showed how the DUX active repeats collaborate with the C-terminal acidic tail to facilitate chromatin opening and transcriptional activation at DUX genomic targets.

      Although this study attempts to provide mechanistic insights into how DUX4 works, the authors will need to perform a number of additional experiments and controls to bolster their claims, as well as provide detailed analyses and clarifications.

      We thank this reviewer for their constructive comments, and have conducted several new analyses, additional experiments and clarifications – which have strengthened the manuscript in several locations. Highlights include a statistical approach to the similarity of mouse repeats to themselves and to orthologs (Figure S1d) and clarified interpretations, a wider dynamic range to better reflect changes in DUX construct behaviors (Figure 6a), and additional data on construct behavior, including ‘inactive’ constructs (e.g C1+14aa in Figure 1a,d, new ATAC-seq in Figure S1g), and active constructs such as C3+C5+14aa and C3+C514aa (in Figure S1b).

      Reviewer #3 (Public Review):

      Dux (or DUX4 in human) is a master transcription factor regulating early embryonic gene activation and has garnered much attention also for its involvement in reprogramming pluripotent embryonic stem cells to totipotent "2C-like" cells. The presented work starts with the recognition that DUX contains five conserved c. 100-amino acid carboxy-terminal repeats (called C1-C5) in the murine protein but not in that of other mammals (e.g. human DUX4). Using state-of-the-art techniques and cell models (BioID, Cut&Tag; rescue experiments and functional reporter assays in ESCs), the authors dissect the activity of each repeat, concluding that repeats C3 and C5 possess the strongest transactivation potential in synergy with a short C-terminal 14 AA acidic motif. In agreement with these findings, the authors find that full-length and active (C3) repeat containing Dux leads to increased chromatin accessibility and active histone mark (H3K9Ac) signals at genomic Dux binding sites. A further significant conclusion of this mutational analysis is the proposal that the weakly activating repeats C2 and C4 may function as attenuators of C3+C5-driven activity.

      By next pulling down and identifying proteins bound to Dux (or its repeat-deleted derivatives) using BioID-LC/MS/MS, the authors find a significant number of interactors, notably chromatin remodellers (SMARCC1), a histone chaperone (CHAF1A/p150) and transcription factors previously (ZSCAN4D) implicated in embryonic gene activation.

      The experiments are of high quality, with appropriate controls, thus providing a rich compendium of Dux interactors for future study. Indeed, a number of these (SMARCC1, SMCHD1, ZSCAN4) make biological sense, both for embryonic genome activation and for FSHD (SMCHD1).

      A critical question raised by this study, however, concerns the function of the Dux repeats, apparently unique to mice. While it is possible, as the authors propose, that the weak activating C1, C2 C4 repeats may exert an attenuating function on activation (and thus may have been selected for under an "adaptationist" paradigm), it is also possible that they are simply the result of Jacobian evolutionary bricolage (tinkering) that happens to work in mice. The finding that Dux itself is not essential, in fact appears to be redundant (or cooperates with) the OBOX4 factor, in addition to the absence of these repeats in the DUX protein of all other mammals (as pointed out by the authors), might indeed argue for the second, perhaps less attractive possibility.

      In summary, while the present work provides a valuable resource for future study of Dux and its interactors, it fails, however, to tell a compelling story that could link the obtained data together.

      We appreciated the reviewer’s views regarding the high quality of the work and our generation of an important dataset of DUX interactors. We also appreciate the comments provided to improve the work, and have performed and included in the revised version a set of clarifications, additional analyses and additional experiments that have served to reinforce our main points and provide additional mechanistic links. We also agree that more remains to be done to understand the function and evolution of repeats C1, C2 and C4.

      Reviewer #1 (Recommendations For The Authors):

      1) For immuno-blots, authors may indicate the expected bands to help readers better understand the results.

      Agreed, and we have included the predicted molecular weight of proteins in the Figure Legends. We note that our work shows that the C-terminal domains confer anomalous migration in SDS-PAGE.

      2) Fig. 5b, a blot missing for the mCherry group?

      Figure 5b is a volcano blot, so we believe the reviewer is referring to Figure 5d, which is a coimmunoprecipitation experiment between SMARCC1 and mCherry-tagged DUX constructs. However, we are unsure of the comment as an anti mCherry sample is present in that panel.

      3) Line 99-100, Fig. S1d, it seems that repeat2, but not repeat3, is more similar to human DUX4 C-terminal region.

      This comment and one by another reviewer have prompted us to re-examine the similarities of the DUX repeats, and we have new analyses (Figure S1d) and an alternative framing in the manuscript as a result. We have expanded on this in our response to Reviewer #2, point #1 – and direct the reviewer there for our expanded treatment.

      4) There are a few references are misplaced. For example, line 48, the studies that reported the role of Dux in inducing 2CLCs should be from Hendrickson et al., 2017, De Iaco et al., 2017, and Whiddon et al., 2017. The authors may want to double check all references.

      Thanks for pointing these out. These issues have been corrected in the manuscript.

      5) In the materials & methods section, a few potential errors are noticed. For example, concentrations of PD0325901 and CHIR99021 in mESC medium appear ~1000-fold higher than standards.

      Thanks – corrected.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      1) Line 99 - The authors claimed that the "human DUX4 C-terminal region is most similar to the 3rd repeat of mouse DUX", but based on Supp. Fig. 1d, the human DUX4 C-term should be most similar to the 2nd repeat of mouse DUX. If this is indeed the case, it will undermine the rest of this study, since the authors claim that the 3rd repeat is transcriptionally active, whereas the 2nd repeat is transcriptionally inactive, and the bulk of this study largely focused on how the active repeats, not the inactive repeats, are critical in recruiting key transcriptional and chromatin regulators to induce the embryonic gene expression program.

      We thank the reviewer for their comments here. Since submission,and as mentioned above for reviewer #1 we have revisited the issue of similarity of the DUX4 C-terminal region to the mouse C-terminal repeats, with a BLAST-based approach that is more rigorous and informed by statistics – which is in Author response table 1 and now in the manuscript as Figure S1d, and has affected our interpretation. Our prior work involved a simple % identity comparison table and we now appreciate that some of the similarity analyses did not meet statistical significance, and therefore we are unable to draw certain conclusions. We make the appropriate modifications in the text. For example, we no longer state that the DUX4 C-terminus appears to be most similar to mouse repeats 3 and 5. This does not affect the main conclusions of the paper regarding interactions of the C-terminus with chromatin-related proteins, only our speculation on which repeat might have represented the original single repeat in the mouse – an issue we think of some interest, but did not rise to the level of mentioning in the original or current abstract.

      Author response table 1.

      Parameters: PAM250 matrix. Gap costs of existence: 15 and extension: 3. Numbers represent e-value of each pairwise comparison

      *No significant similarities found (>0.05).

      2) In Supp Fig 1d, it seems that the rat DUX4 C-terminal region is most similar to the 4th repeat of mouse DUX, which according to the author is supposedly transcriptionally inactive. This weakens the authors justification that the 3rd or 5th repeat is likely the "parental repeat for the other four", and further echoes my concern in point 1 where the human DUX4 C-term is most similar to the 2nd (inactive) repeat of mouse DUX.

      The reviewer’s point is well taken and is addressed in point #1 above.

      3) In Fig. 1d, the authors showed that DUX4-containing C3 and C5, but lacking acidic tail, can promote MERVL::GFP expression, albeit to a slightly lower extent compared to FL. However, in Fig. 2b, C3 or C5 alone (lacking acidic tail) completely failed to promote MERVL::GFP expression. However, in the presence of the acidic tail, both versions were able to promote MERVL::GFP expression, similar to that of FL. The latter would suggest that it is the acidic tail that is crucial for MERVL::GFP expression, and this does not quite agree with Fig 1b, where C12345 (lacking acidic tail) was able to promote MERVL::GFP expression. Although C12345 did not activate MERVL to a similar level as FL, it is clearly proficient, compared to C3 or C5 alone (lacking acidic tail) where there is no increase in MERVL at all. Additional constructs will be helpful to clarify these points. For example, 'C3+C5 minus acidic tail' and 'HD1+HD2+acidic tail only' constructs.

      We agree that constructs such as those mentioned would add to the work. First, we have done the additional construct HD1+HD2+14aa tail, which is presented as ΔC12345+14aa in Figure 2a and in S2a. Additionally, we performed experiments on the requested C3+C5+14aa and C3+C5Δ14aa (see samples 6 and 7 in Author response image 1, which are now included in Supplemental Figure 2b). The results reinforce our hypothesis of an additive effect toward DUX target gene activation by increasing C-terminal repeats and including the 14aa tail.

      Author response image1.

      4) Related to the above, the flow cytometry data for the MERVL::GFP reporter as presented in Figures 1 and 2, as well as in Supp. Fig. 2, show a considerably large difference in the %GFP|mCherry for the FL construct, ranging from ~6-26%. This makes it difficult to convince the reader which of the different DUX domain constructs cannot or can partially induce GFP|mCherry signal when compared to FL, and hence it is tough to definitively ascertain the exact contribution of each of the 5 C-terminal repeats with high confidence, as it appears that there exists a significant amount of variability in this MERVL::GFP reporter system. The authors need to address this issue since this is their primary method to elucidate the transcriptional activity of each of the mouse DUX repeat domains.

      We note that with the Dux-/- cell lines we used throughout the timeline of the study, the percent of %GFP|mCherry expression progressively and slowly decreased – possibly due to slow/modest epigenetic silencing of the reporter. However, we always used the full-length DUX construct to establish the dynamic range. We emphasize that the relative differences between constructs over multiple cell line replicates remained relatively consistent. However, we elected to show absolute values in each experiment, rather than simply normalizing the full-length to 100% and showing relative.

      5) Lines 140-142 - The authors claimed that the functional difference between the transcriptionally active and inactive repeats could be narrowed down to a "6aa region which is conserved between repeats C3 and C5, but not conserved in C1, C2 and C4". Assuming the 6aa sequence is DPLELF, why does C1C3a elicit almost twice the intensity of GFP|mCherry signal compared to C3C1c, despite both constructs having the exact same 6aa sequence?

      Indeed, C1C3a and C3C1c both containing the ‘active’ DPL sequence but having different relative levels of %GFP|mCherry. This is consistent with these sequences having a positive role in DUX target gene regulation – but likely in combination with other other regions which potentiate its affect, possibly through interacting proteins or post-translational modifications.

      Why does DPLEPL (the intermediate C3C1b construct) induce a similar extent of GFP|mCherry signal as the FL construct, even though the former includes 3aa from a transcriptionally inactive repeat? In contrast, GSLELF (the other intermediate C1C3b construct) that also includes 3aa from a transcriptionally inactive repeat is almost completely deficient in inducing any GFP|mCherry signal. Why is that so? Is DPL the most crucial sequence? It will be important to mutate these 3 (or the above 6) residues on FL DUX4 to examine if its transcriptional activity is abolished.

      These are interesting points. DPL does appear to be the most important region in the mouse DUX repeats. However, DPL is not shared in the C-terminus of human DUX4. Notably, the DUX4 C-terminus is sufficient to activate the mouse MERVL::GFP reporter when cloned to mouse homeodomains (see Author response image 2, second sample) and other DUX target genes (initially published in Whiddon et al. 2017). One clear possibility is that the DPL region is helping to coordinate the additive effects of multiple DUX repeats, which only exist in the mouse protein.

      Author response image 2.

      6) Line 154 - The intermediate DUX domain construct C1C3b occupied a different position on the PCA plot from the C1C3c construct that does not contain any of the critical 6aa sequence, as shown in Fig. 2e. However, both these constructs appear to be similarly deficient in inducing any GFP|mCherry signal, as seen in Fig. 2c. Why is that so?

      The PCA plot assesses the impact on the whole transcriptome and not just the MERVL::GFP reporter, suggesting the 3aa region has transcriptional effects on the genome beyond what is detected in the MERVL::GFP reporter.

      7) To strengthen the claim that "Chromatin alterations at DUX bindings sites require a transcriptionally active DUX repeat", the authors should also perform CUT&Tag for constructs containing transcriptionally inactive DUX repeats (e.g. C1+14aa), and show that such constructs fail to occupy DUX binding sites, as well as are deficient in H3K9ac accumulation.

      This is a good comment. We elected to control this with constructs containing or lacking an active repeat. Although we have not pursued this by CUT&TAG, we have examined the impact of DUX constructs with inactive repeats (including the requested C1+14aa, new Figure S1g) by ATAC-seq (see #12, ATAC-seq section, below), and observe no chromatin opening, suggesting that the lack of transcriptional activity is rooted in the inability to open chromatin.

      8) It would be good if the authors could also include CUT&Tag data for some of the C1C3 chimeric constructs that were used in Fig. 2, since the authors argued that the minimal 6aa region is sufficient to activate many of the DUX target genes. This would also strengthen the authors’ case that the transcriptionally active, not inactive, repeats are critical for binding at DUX binding sites and ensuring H3K9ac occupancy.

      We agree that these would be helpful, and have examined the inactive repeats in transcription and ATAC-seq formats during revision (new data in Figures 1d and S1g), but not yet the CUT&TAG format.

      9) Line 213 - "SMARCA4" should have been "SMARCA5"? Based on Fig. 4d, SMARCA5 is picked up in the BirA*-DUX interactome, not SMARCA4.

      Thanks – corrected.

      10) Lines 250-252 - The authors compared the active BirA-C3 against the inactive BirA-C1 to elucidate the interactome of the transcriptionally active C3 repeat, as illustrated in Fig. 5c. They found 12 proteins more enriched in C1 and 154 proteins in C3. This information should be presented clearly as a separate tab in Supp Table 2. What are the proteins common to both constructs, i.e. enriched to a similar extent? Do they include chromatin remodellers too? Although the authors sought to identify differential interactors between the 2 constructs, it is also meaningful to perform 2 separate comparisons - active BirA-C3 against BirA alone control, and inactive BirA-C1 against BirA alone control - like in Fig. 4d, so as to more accurately define whether the active C3 repeat, and not the inactive C1 repeat, interacts with proteins involved in chromatin remodeling.

      We thank the reviewer for this comment, and we have modified the manuscript by adding a second sheet in Supplementary Table 2 including the results for enriched proteins in BirA-C1 vs. C3. Additionally, due to limitations of annotation between BirA alone and BirA*-C3 being sequenced in different mass spectrometry experiments, it is difficult to quantitatively compare the two datasets with pairwise comparisons.

      11) Fig 5d: The authors mentioned in the legend that endogenous IP was performed for SMARCC1. However, in line 266, they stated Flag-tagged SMARCC1. Is SMARCC1 overexpressed? The reciprocal IP should also be presented. More importantly, C1 constructs (e.g. C1+14aa and C1Δ14aa) should also be included.

      To clarify, Figure 4e used exogenously overexpressed FLAG-SMARCC1 in HEK-293T cells to confirm the results of the full-length DUX BioID experiment. Figure 5d was performed with overexpressed DUX construct, but involved endogenous SMARCC1 in mESCs. This has now been made clearer in the revised manuscript.

      12) For both the SMARCC1 CUT&Tag and ATAC-seq experiments shown in Figures 5e and 5f respectively, the authors need to include DUX derivatives that contain transcriptionally inactive repeats with and without the 14aa acidic tail, i.e. C1+14aa and C1Δ14aa, and show that these constructs prevent the binding/recruitment of SMARCC1 to DUX genomic targets, and correspondingly display a decrease in chromatin accessibility. Only then can they assert the requirement of the transcriptionally active repeat domains for proper DUX protein interaction, occupancy and target activation.

      We agree that examination of an inactive repeat in certain approaches would improve the manuscript. Importantly, we have now included C1+14 in our ATAC-seq experiments, and in Author response image 3 two individual replicates, which constitute a new Figure S1g. Compared to the transcriptionally active DUX constructs, which see opening at DUX binding sites, we do not see chromatin opening at DUX binding sites with transcriptionally inactive C1+14.

      Author response image 3.

      13) To prove that DUX-interactors are important for embryonic gene expression, it will be important to perform loss of function studies. For instance, will the knockdown/knockout of SMARCC1 in cells expressing the active DUX repeat(s) lead to a loss of DUX target gene occupancy and activation?

      We agree that it would be interesting to better understand SMARCC1 cooperation with DUX function in the embryo, but we believe this is beyond the scope of this paper.

      Minor Points

      1) Lines 124-126 - What is the reason/rationale for why the authors used one linker (GGGGS2) for constructs with a single internal deletion, but 2 different linkers (GGGGS2 and GAGAS2) for constructs with 2 internal deletions?

      With Gibson cloning, there are homology overhang arms for each PCR amplicon that are required to be specific for each overlap. Additionally, each PCR amplicon needs to be specific enough from one another so that all inserts (up to 5 in this manuscript) are included and oriented in the right order. The linker sequences were included in the homology arm overlaps, so the nucleotide sequences for each linker needed to be specific enough to include all inserts. This is a general rule to Gibson cloning. Additionally, both GGGGS2 and GAGAS2 are common linker sequences used in molecular biology and the amino acids structures are similar to one another, suggesting there is no functional difference between linkers.

      2) Line 704 - 705: In the figure legend, the authors stated that 'Constructs with a single black line have the linker GGGGS2 and constructs with two black lines have linkers with GGGGS2 and GAGAS2, respectively.'. This was not obvious in the figures.

      Constructs used for flow and genomics experiments that are depicted in Figure 2, Supplementary Figure 2, Figure 3, Figure 4, and Figure 5 have depicted black lines where deletions are present. Where these deletions are present, there are linkers in order to preserve spacing and mobility for the protein.

      3) Line 160 - Clusters #1 and #2 are likely written in the wrong order. It should have been "activating the majority of DUX targets in cluster #2, not cluster #1" and "failed to activate those in cluster #1, not cluster #2", based on the RNA-seq heatmap in Fig. 2f.

      We thank the reviewer for this comment, and the error has been corrected in the manuscript.

      4) Line 188 - Delete the word "of" in the following sentence fragment: "DUX binding sites correlating with the of transcriptional".

      Thanks – corrected.

      5) Line 191 - Delete the word "aids" in the following sentence fragment: "important for conferring H3K9ac aids at bound".

      Thanks – corrected.

      6) Line 711 - "C1-C3 a,b,d" should be "C1-C3 a,b,c".

      Thanks – corrected.

      7) Lines 711-712 - The colors "pink to blue" and "blue to pink" are likely written in the wrong order. Based on Fig. 2c, the blue to pink bar graphs should represent C1-C3 a,b,c in that order, and likewise the pink to blue bar graphs should represent C3-C1 a,b,c in that order.

      Thanks – corrected.

      8) There is an overload of data presented in Fig. 2c, such that it is difficult to follow which part of the figure represents each data segment as written in the figure legend. It is recommended that the data presented here is split into 2 sub-figures.

      Figure 2c has a supporting figure in Supplementary Figure 2b. While there is both a graphical depiction of the constructions and the data both in the main panel of Figure 2C, we have depicted it as so to be as clear as possible for the reader to interpret the complexity and presentence of amino acids in each of the constructs.

      9) Line 717 - "following" is misspelt.

      Thanks – corrected.

      10) Lines 720-721 - "(Top)" and "(Bottom)" should be replaced with "(Left)" and "(Right)", as the 2 bar graphs presented in Fig. 2d are placed side by side to each other, not on the top and bottom.

      Thanks – corrected.

      11) Lines 725 and 839 - "Principle" is misspelt. It should be "Principal".

      Thanks – corrected.

      12) In Figures 3d and 3e, the sample labeled "C3+14_1" should be re-labeled to "C3+14", in accordance with the other sub-figures. Additionally, for the sake of consistency, "aa" should be appended to the relevant constructs, e.g. "C3+14aa" and "C3Δ14aa".

      Thanks – corrected.

      13) Line 773 - Were the DUX domain constructs over-expressed for 12hr (as written in the figure legend) or 18hr (as labeled in Fig. 5d)?

      Thanks – corrected.

      14) Related to minor point 19 above, is there a reason/rationale for why some of the experiments used 12hr over-expression of DUX domain constructs (e.g. for CUT&TAG in Fig. 3), whereas in other experiments 18hr over-expression was chosen instead (e.g. flow cytometry for MERVL::GFP reporter in Figures 1 and 2, and co-IP validations of BirA*-DUX interactions in Fig. 4)?

      Thanks for the opportunity to explain. In this work, experiments that reported on proteins that are translated following DUX gene activation (e.g. MERVL:GFP via flow) were done at 18hr to allow for enough time for transcription and translation of GFP (or other DUX target genes). For experiments that report on the impact of DUX on chromatin and transcription, such as RNA-seq, CUT&Tag, and ATAC-seq, we induced DUX domain constructs for 12 hours.

      15) Line 804 - "ΔHDs" is missing between "C2345+14aa" and "ΔHD1".

      Thanks – corrected.

      16) In Fig. 5c, "Chromatin remodelers" is misspelt.

      Thanks – corrected.

      17) There is no reference in the manuscript to the proposed model that is presented in Fig. 6b.

      Thanks – corrected.

      Reviewer #3 (Recommendations For The Authors):

      Given the uncertainty of the function of the Dux peptide repeats in mice, could it not also be possible that the underlying repeated nature of the (coding) DNA? That is, could these DNA repeats exert a regulatory function on Dux transcription itself (also given the dire consequences of misregulated DUX4 expression as seen in FSHD, for example).

      Yes, it remains possible that the internal coding repeats within Dux are playing a role in locus regulation, and might be interesting to examine. However, we consider this question as being outside the scope of the current paper.

      Finally, it would be interesting to know whether these repeats are, in fact, present in all mouse species. Already no longer present in rat, do they exist, or not, in more "distant" mice, e.g. M. caroli?

      Determining whether all mouse strains contain C-terminal repeats in DUX is a question we also considered. However, Dux and its orthologs are present in long and very complex repeat arrays that are not present in the sequencing data or annotation in other mouse strains. Therefore, we are not unable to answer this question from existing sequencing data. Answering would require a considerable genome sequencing and bioinformatics effort, or alternatively a considerable effort aimed at cloning ortholog cDNAs from 2-cell embryos.

      Minor points:

      line 169: here it seems, in fact, that the 'inactive' C2, C4 repeats are more similar to each other (my calculation: 91 and 96% identity at the protein and DNA level, respectively) than the active C3 and C5 repeats (82 and 89% identity, resp.), the outlier being C1.

      Thanks for this comment, which was mentioned by other reviewers as well and has been addressed through new statistical analyses and interpretation (see new Figure S1d).

      line 191: I'm not sure this sentence parses correctly ("...14AA tail is important for conferring H3K9Ac aids at bound sites...")

      We thank the reviewer for this comment, and we have corrected the sentence in the manuscript.

    1. Reviewer #1 (Public Review):

      Summary of what the authors were trying to achieve.

      This paper studies the possible effects of tACS on the detection of silence gaps in an FM-modulated noise stimulus. Both FM modulation of the sound and the tACS are at 2Hz, and the phase of the two is varied to determine possible interactions between the auditory and electric stimulation. Additionally, two different electrode montages are used to determine if variation in electric field distribution across the brain may be related to the effects of tACS on behavioral performance in individual subjects.

      Major strengths and weaknesses of the methods and results.

      The study appears to be well-powered to detect modulation of behavioral performance with N=42 subjects. There is a clear and reproducible modulation of behavioral effects with the phase of the FM sound modulation. The study was also well designed, combining fMRI, current flow modeling, montage optimization targeting, and behavioral analysis. A particular merit of this study is to have repeated the sessions for most subjects in order to test repeat-reliability, which is so often missing in human experiments. The results and methods are generally well-described and well-conceived. The portion of the analysis related to behavior alone is excellent. The analysis of the tACS results is also generally well described, candidly highlighting how variable results are across subjects and sessions. The figures are all of high quality and clear. One weakness of the experimental design is that no effort was made to control for sensation effects. tACS at 2Hz causes prominent skin sensations which could have interacted with auditory perception and thus, detection performance.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      Unfortunately, the main effects described for tACS are encumbered by a lack of clarity in the analysis. It does appear that the tACS effects reported here could be an artifact of the analysis approach. Without further clarification, the main findings on the tACS effects may not be supported by the data.

      Likely impact of the work on the field, and the utility of the methods and data to the community.

      The central claim is that tACS modulates behavioral detection performance across the 0.5s cycle of stimulation. However, neither the phase nor the strength of this effect reproduces across subjects or sessions. Some of these individual variations may be explainable by individual current distribution. If these results hold, they could be of interest to investigators in the tACS field.

      The additional context you think would help readers interpret or understand the significance of the work.

      The following are more detailed comments on specific sections of the paper, including details on the concerns with the statistical analysis of the tACS effects.

      The introduction is well-balanced, discussing the promise and limitations of previous results with tACS. The objectives are well-defined.

      The analysis surrounding behavioral performance and its dependence on the phase of the FM modulation (Figure 3) is masterfully executed and explained. It appears that it reproduces previous studies and points to a very robust behavioral task that may be of use in other studies.

      There is a definition of tACS(+) vs tACS(-) based on the relative phase of tACS that may be problematic for the subsequent analysis of Figures 4 and 5. It seems that phase 0 is adjusted to each subject/session. For argument's sake, let's assume the curves in Fig. 3E are random fluctuations. Then aligning them to best-fitting cosine will trivially generate a FM-amplitude fluctuation with cosine shape as shown in Fig. 4a. Selecting the positive and negative phase of that will trivially be larger and smaller than a sham, respectively, as shown in Fig 4b. If this is correct, and the authors would like to keep this way of showing results, then one would need to demonstrate that this difference is larger than expected by chance. Perhaps one could randomize the 6 phase bins in each subject/session and execute the same process (fit a cosine to curves 3e, realign as in 4a, and summarize as in 4b). That will give a distribution under the Null, which may be used to determine if the contrast currently shown in 4b is indeed statistically significant.

      Results of Fig 5a and 5b seem consistent with the concern raised above about the results of Fig. 4. It appears we are looking at an artifact of the realignment procedure, on otherwise random noise. In fact, the drop in "tACS-amplitude" in Fig. 5c is entirely consistent with a random noise effect.

      To better understand what factors might be influencing inter-session variability in tACS effects, we estimated multiple linear models ..." this post hoc analysis does not seem to have been corrected for multiple comparisons of these "multiple linear models". It is not clear how many different things were tried. The fact that one of them has a p-value of 0.007 for some factors with amplitude-difference, but these factors did not play a role in the amplitude-phase, suggests again that we are not looking at a lawful behavior in these data.

      "So far, our results demonstrate that FM-stimulus driven behavioral modulation of gap detection (FM-amplitude) was significantly affected by the phase lag between the FM-stimulus and the tACS signal (Audio-tACS lag) ..." There appears to be nothing in the preceding section (Figures 4 and 5) to show that the modulation seen in 3e is not just noise. Maybe something can be said about 3b on an individual subject/session basis that makes these results statistically significant on their own. Maybe these modulations are strong and statistically significant, but just not reproducible across subjects and sessions?

      "Inter-individual variability in the simulated E-field predicts tACS effects" Authors here are attempting to predict a property of the subjects that was just shown to not be a reliable property of the subject. Authors are picking 9 possible features for this, testing 33 possible models with N=34 data points. With these circumstances, it is not hard to find something that correlates by chance. And some of the models tested had interaction terms, possibly further increasing the number of comparisons. The results reported in this section do not seem to be robust, unless all this was corrected for multiple comparisons, and it was not made clear?

      "Can we reduce inter-individual variability in tACS effects ..." This section seems even more speculative and with mixed results.

      Given the concerns with the statistical analysis above, there are concerns about the following statements in the summary of the Discussion:

      "2) does modulate the amplitude of the FM-stimulus induced behavioral modulation (FM-amplitude)"<br /> This seems to be based on Figure 4, which leaves one with significant concerns.

      "4) individual variability in tACS effect size was partially explained by two interactions: between the normal component of the E-field and the field focality, and between the normal component of the E-field and the distance between the peak of the electric field and the functional target ROIs."<br /> The complexity of this statement alone may be a good indication that this could be the result of false discovery due to multiple comparisons.

      For the same reasons as stated above, the following statements in the Abstract do not appear to have adequate support in the data:<br /> "We observed that tACS modulated the strength of behavioral entrainment to the FM sound in a phase-lag specific manner. ... Inter-individual variability of tACS effects was best explained by the strength of the inward electric field, depending on the field focality and proximity to the target brain region. Spatially optimizing the electrode montage reduced inter-individual variability compared to a standard montage group."<br /> In particular, the evidence in support of the last sentence is unclear. The only finding that seems related is that "the variance test was significant only for tACS(-) in session 2". This is a very narrow result to be able to make such a general statement in the Abstract. But perhaps this can be made more clear.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank reviewers for their insightful comments.

      Overall, there were two major concerns/suggestions:

      • Applicability to humans of the increase of BTC in non-alcoholic steatohepatitis (NASH) and mechanisms of downregulation of BTC by omega-3. We now analyzed __3 __additional human gene expression datasets and show that BTC not only is increased in human NASH (as we have already shown for liver cancer meta-analysis), but is also decreased in livers of patients who received omega-3.

      • One of the reviewers suggested investigating a potential mechanism of how BTC is regulated by omega3 fatty acids. Although a complete answer to this question would require entirely new studies to be done, we still performed additional investigation that was possible within a reasonable timeframe. We found that transcription factor FOXO3 (well-known inhibitor of carcinogenesis) is a highly probable mediator of the DHA inhibitory effect on BTC.

      See all details of items 1 and 2 as well as answers to other (less critical concerns) below after each specific question.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This work by Padiadpu and colleagues investigate the mechanism by which pufa of the n-3 series (mostly DHA) may influence NAFLD progression using systems biology analysis and multiple omics analysis. The work is interesting and may provide a novel view of the topic. However, there are a number of issues the authors may wish to consider in order to improve their manuscript.

      Major issues: Clarity: Since the authors refer to previously published experiments, they must refer to this work in the figure legends and improve the clarity of such legends. Here are a list of issues that must be fixed:

      Fig.1: First panel is not clear. What does the table tell the reader? What are the effects of the different diets on NAFLD?

      All the transcriptomic data are newly generated from the samples of previously published studies. The table shows the number of features changed by DHA and/or EPA in each of the -omics and phenotypic data used in the analysis.

      I understand that the results are published elsewhere, but the authors must provide information regarding the NAFLD/ NASH scores.

      We now added a supplementary table 1a showing the scores.

      Fig.4: Why is there sometimes a DHA diet, sometimes DHA and EPA. Legend is not clear. What does WD + Mean? I guess it is olive oil... But the legend must be improved.

      We added details in the legend for more clarity. Specifically, WD+O means WD + olive oil added as a control for WD+DHA, WD+EPA. As described in the 2nd paragraph of results, when both EPA and DHA had a similar and significant effects in reversing WD effect, it was defined as “EPA&DHA category” of parameters. When only WD+DHA or WD+EPA were significantly changed vs WD+O, those were assigned as “DHA category” or “EPA category”, respectively.

      One issue the authors may consider trying to fix is the specificity of the effect of DHA on BTC.

      Is it really specific? It seems to me that EPA has more or less the same effect. If the effect is DHA-specific, than make this clearer through the text.

      Although BTC expression was reduced by both DHA and EPA comparing to WD, DHA had a statistically significant stronger effect than EPA (Fig. 3D).

      Another issue the authors may wish to investigate is the relationship between W3 consumption and BTC expression in studies performed by other labs (if available on Gene expression omnibus?).

      Thanks for the suggestion. We used publicly available data of human and mouse studies that showed significant increase in liver BTC gene expression in NASH in multiple datasets while a human trial with Omega 3 treatment for one year showed its significant reduction (Figures 3F - human data, S3G-mouse data).

      Finally, a key issue would be to identify the mechanism by which DHA inhibits BTC expression? How does this happen? could such inhibition be induced by other fatty acids of the W3 series? I understand that this is not easy to address but it would significantly strengthen the manuscript.

      Thanks to your question we investigated and found at least one of potential mechanisms contributing to how “DHA inhibits BTC expression”. See details in the answer to next question. As for “other fatty acids” while we agree this is important question, it is outside of the scope of the current study but will be investigated in future studies.

      Moreover, it might be possible to identify the set of genes highly co-regulated with BTC expression and to investigate the possible transcription factors at play in the control of such gene set.

      We really appreciate this question as our efforts in this direction provided one potential mechanism. A direct screen of transcription factor (TF) motifs in genes co-regulated with BTC did not provide any clear results. Therefore, we implemented a combination of network analysis and screen for motifs in BTC gene with the in vivo and in vitro treatment results and found FOXO3 as a candidate TF regulated by DHA upstream of BTC.

      See details of the analysis and results in a new Supplementary Figure S6 and corresponding text located at the end of the results.

      Minor: the authors use the term "beneficial" transcriptome alterations by DHA.

      I do not think it is correct to use "beneficial".

      We agree and removed the word "beneficial”.

      Reviewer #1 (Significance (Required)):

      Strength: This paper uses new approaches to investigate the relationship between W3 consumption and liver gene expression and its relevance to chronic metabolic liver diseases.

      The experiments and data set used to perform systems biology are from an excellent lab (the authors lab) who has published a lot of important and reproducible discoveries in the field of regulation of gene expression by dietary fatty acids.

      The work has high translational relevance in medicine / hepatology / metabolism.

      I am not a qualified reviewer to assess the systems biology that has been done.

      Limitation: The mechanistic link between DHA consumption and BTC expression is not very clear. The specificity of this effect could also be tested (DHA vs other W3 and/or W6).

      Although BTC expression was reduced by both DHA and EPA comparing to WD, DHA had a significantly stronger effect than EPA (Fig. 3D). Other omega fatty acids were not tested but it can be done in future studies.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors files a manuscript describing the impact of the suppression of betacellulin as a key mechanism to counteract fibrosis and inflammation in NASH by modulating fatty acids in WD-fed mice.

      Major Comments: (i) No histological analysis was presented and indeed this is of clinical relevance for NASH since diagnosis is still based on biopsy.

      While histological evaluation was presented in the originally published papers (PMID: 28422962, 23303872), it is now provided in Supplementary Table S1a.

      (ii) Human comparative analysis: is done with HCC not with NASH patients.

      This cancer-related dataset is most likely obtained from different etiologies.

      I would suggest comparing these mouse datasets with GSE48452 (human NAFLD-NASH spectra).

      Thanks for this important question. We now analyzed available human data of NASH and show significant increase of BTC expression in two datasets while a human trial with omega-3 treatment for one year showed its significant reduction of BTC expression (Figure 3F) resembling our observations in mice.

      (iii) to compare the inflammation and fibrosis (also lipid metabolism), one can compare these mouse datasets with GSE222576 and cite this preprint (https://doi.org/10.21203/rs.3.rs-2009380/v1)

      Using the suggested dataset (of a chemically induced liver fibrosis), we first observed that Btc gene expression was significantly increased over 10 weeks of the model and now included this result in Fig. S3G.

      We also queried the 66 genes from the network modules described by the authors to check their changes in our NASH model. We observed that 28 genes were differentially expressed in NASH with 14 of them belonging to the module that authors named as “Pathways in Cancer”. Other genes were from the lipid metabolism (4 genes), immunity (2) and inflammation (2 genes). In addition, we observed that several genes we found regulated by omega-3 and changed in this fibrosis model contained other inflammatory genes such as classical macrophage genes (Mmp12, Lgals3, Cd68, Trem2), fibrosis (Col4a1, Col27a1, Itga2b, Itga8) and lipid metabolism (Scd2, Lpl, Soat1). Of note, the preprint has been published and we now cite the corresponding article.

      Minor comments:

      (i) The heatmap in Figure 1B and another heatmap should show all mice not the average to see the variability

      The supplementary figure with all the individual mouse data as another heatmap is added to show the variability and similarity (Figure S1D).

      Reviewer #2 (Significance (Required)): The authors files a manuscript describing the impact of the suppression of betacellulin as a key mechanism to counteract fibrosis and inflammation in NASH by modulating fatty acids.

      This is well designed experiment, and the results are of interest to hepatologists and should be indeed published after consideration of the following points

      Strength is multiOMICs approach.

      Weakness is human applicability.

      We improved human applicability by investigating 3 additional human datasets of NASH (Fig. 3F) and finding consistent changes in BTC expression closely resembling our observations in mouse NASH model, including one trial with omega-3 treatment of patients for one year showing significant reduction in BTC gene expression.

    1. Author Response

      Reviewer #1 (Public Review):

      This study demonstrates that a hybrid measurement method increases 3 fold the resolution of mouse USV localization. This increased resolution enables to revise previous occurrence frequency measures for female vocalizations and establishes the existence of vocal dominance in triadic interactions. The method is well described and its efficiency is carefully quantified. A limitation of the study is the absence of ground truth data, which may have been generated eventually with miniaturized loudspeakers in mouse puppets. However, a careful error estimation partially compensates for the absence of these likely challenging calibrations. In addition, the conclusions take into account this uncertainty. The gain in accuracy with respect to previous methods is clear and the impact of localisation accuracy on biological conclusions about vocalisation behavior is clearly exemplified. This study demonstrates the impact of the new method for understanding vocal interactions in the mouse model, which should be of tremendous interest for the growing community studying social interactions in mice.

      We have performed the requested, additional ground estimate using a movable miniature speaker, for more details see point 2 of Reviewer 2, and the new supplementary figure.

      Reviewer #2 (Public Review):

      Past systems for identifying and tracking rodent vocalizations have relied on triangulating positions using only a few high-quality ultrasonic microphones. There are also large arrays of less sensitive microphones, called acoustic cameras that don't capture the detail of the sounds, but do more accurately locate the sound in 3D space. Therefore the key innovation here is that the authors combine these two technologies by primarily using the acoustic camera to accurately find the emitter of each vocalization, and matching it to the highresolution audio and video recordings. They show that this strategy (HyVL) is more accurate than other methods for identifying vocalizing mice and also has greater spatial precision. They go on to use this setup to make some novel and interesting observations. The technology and the study are timely, important, and have the potential to be very useful. As machine learning approaches to behavior become more widespread in use, it is easy to imagine this being incorporated and lowering entry costs for more investigators to begin looking at rodent vocalizations. I have a few comments.

      1) What is the relationship of the current manuscript to this: https://www.biorxiv.org/content/10.1101/2021.10.22.464496v1 which has a number of very similar figures and presents a SLIM-only method that reportedly has lower precision than the current HyVL approach. Is this superseded by the submitted paper?

      The referred manuscript (now published in Scientific Reports) is indeed related to the current work: The currently presented system is based on the integration between SLIM (based on 4 high quality microphones) and Beamforming (based on the 64-channel microphone array). The accuracy of SLIM is generally lower than that of HyVL, but it makes essential contributions to the overall accuracy of HyVL through the integration of the complementary strengths of the two methods/microphone arrays (see Fig. 3A, L-shape of errors). To our knowledge, SLIM was the previously most accurate technique (based on 4 microphones, see comparison in the Discussion), but HyVL exceeds this by a substantial margin. Some figures appear similar mostly due to related code in the underlying analysis pipeline and visualization scripts (e.g. the half-disc densities). However, the set of dyadic and triadic recordings was collected specifically for the present study, and all top-level analyses were performed separately. The single mouse (C57Bl/6 WT) ground truth dataset is shared between the two studies, where in the SLIM paper only the USM4/SLIM part was evaluated (leading to a correspondingly lower, single animal accuracy).

      We felt that the level of detail above would probably impede the reading of the manuscript, and we have therefore added a subset of the above clarifications to the methods and the first time the other study is mentioned.

      2) Can the authors provide any data showing the accuracy of their system in localizing sounds emitted from speakers as a function of position and amplitude? I am imagining that it would be relatively easy to place multiple speakers around the arena as ground truth emitting devices to quantify the capabilities of the system.

      Ground truth data is critical for any meaningful comparison. First, we would like to highlight that we already provided ground truth data in the previous version of the manuscript: In Fig. 3C. we analyzed vocalization data from trials with (1) just a single mouse as well as (2) vocalization at times when all mice were far apart in relation to the accuracy of HyVL (>100 mm, i.e. >25x the accuracy of HyVL) where the chances of erroneous assignment are negligible. We think that these tests are the most relevant, as they are conducted with the relevant sounds, at their actual intensity, spectral profile and emitter acoustics.

      In addition, we have now conducted a series of tests with sounds produced by a miniature speaker placed in 25 different locations to demonstrate the lower-bound of accuracy achievable with the system. The tests indicate an accuracy of MAE < 1mm under these ideal conditions, i.e. without the absorption of the mouse bodies, varying direction of emission of the mouse snout, varying intensity, varying spectral content, duration, etc. Exploring the dependence on all these parameters is in itself interesting, but requires a detailed study in itself. The detailed experimental conditions and results are now provided in Supplementary Fig. 4, including a quantification of the dependence on amplitude.

      3) How is the system's performance affected by overlapping vocalizations? It might be useful to compare the accuracy of caller identification for periods where only one animal is calling at a time vs. periods where multiple animals are simultaneously calling.

      This is an excellent question. Our current code for detecting vocalizations cannot automatically determine if one or multiple vocalizations are concurrently present. We have therefore manually checked all vocalizations for overlapping instances, including those in triadic recordings with two males, where this would be expected to occur most frequently.

      We considered vocalizations to be overlapping if the overlapping constituent timefrequency traces did not form a harmonic stack. Overall, overlaps were surprisingly rare. We did find a couple of cases (<0.1%) where our detection algorithm produced a longer vocalization interval that contained multiple, differently shaped vocalization traces that, when re-analyzed in shortened time-frequency bins with beamforming, belonged to two different males. Note here that beamforming is separately performed from the onset to the end of each vocalization, so the cumulative heatmap can change depending on these onset and end times, which are normally determined by our detection algorithm.

      However, although the identity of the assigned vocalizer could shift in these very rare cases depending on which time bin was re-analyzed, the system’s localization performance remained in principle unaffected: as mentioned above, shorter time bins on non-overlapping parts correctly show the origin of the vocalizations in this case, and therefore a solution to this issue could be a USV detection algorithm that is able to detect the overlap based on the spectral shapes and parses them apart. During the beamforming each vocalization can then be separately localized, by restricting the beamforming to the corresponding time and frequency range. Further, the analysis could be refined so that multiple salient peaks can be detected in the soundfield estimate. This would, however, substantially change the analysis approach, i.e. rather than a single estimate per USV, a sequence of soundfield estimates should be computed and later fused again. Since such a procedure uses less data per single estimate, it also increases the possibility of false positives, which in the current situation with very few overlaps in time, would likely reduce the overall accuracy of the system, we decided to not modify the algorithm in this direction, but we agree that ideally a joint approach - combining separation on the spectrogram and soundfield level - should be pursued. For the present data, if a time window was analyzed such that the intensity map of the sound field contains multiple hotspots of an approximately equal magnitude, the USV would likely remain unassigned, because the within soundfield uncertainty would be higher than for a single peak, and this would reduce the MPI. However, given the rarity of these cases in our dataset, we do not think that their exclusion would change the results appreciably. This information was added as a paragraph to the Discussion.

      It is worth noting that HyVL is very robust: There were a number of cases (<5%) where environmental dampening in combination with harmonic stacking produced interesting timefrequency traces in some of the USM4 microphones, but our system did not have any issue spatially localizing this - what seems like a - smeared vocalization trace. We provide a few examples of this kind in a short video (see Rebuttal Video 2 and the legend at the bottom of this document), where the overlap is also reflected in the intensity map of the sound field, overlaid onto the platform.

      4) Can the authors comment on how sound shadows cast by animals standing between the caller and a USM4 affect either the accuracy of identification or the fidelity of the vocal recording?

      An important point to raise. Sound scattering and dampening caused by the conspecifics of the vocalizing animal can impede the accuracy of any sound localization system, but can unfortunately not be avoided in a social setting. To address this issue, we raised all USM4 microphones by ~12 cm above the interaction platform to minimize the instances of sound blocked by the mice. Further, the Cam64 device should largely be unaffected by sound shadows as it is centrally located above the platform. We have added a modified version of the above comment to the discussion under the heading "Current limitations and future improvements of the presented system".

      5) I'm a bit confused about how the algorithm uses the information from the video camera. Reading through the methods, it seems like they primarily calculate competing location estimates by the two types of microphone data and then make sure that a mouse is in close proximity to one location, discarding the call if there isn't. Why did the authors choose this procedure rather than use the tracked position of the snouts as constrained candidate locations and use the microphone data to arbitrate between them? Do they think that their tracking data are not reliable or accurate enough?

      Thanks for this important suggestion, which we have actually grappled with a lot during the analysis. First of all, the visual tracking data, in particular the manual data, is in our opinion (based on human visual identification) near perfect (within the limits of the video resolution, pixel resolution = 0.8 mm), i.e. on the order of 1-2 mm, and is therefore not the source of any unattributable vocalizations. If we understand the reviewer correctly, then we indeed perform the attribution as he indicates based on the tracked snouts of all mice, specifically by measuring the MPI's of both acoustic location estimates for all mice and then choosing the most reliable one. Specifically, the attributions can be grouped into 3 cases: (i) Estimated origin close to one snout, and snouts rather far apart, (ii) Estimated origin close to one snout and snouts close, and (iii) estimated origin not close to either snout. (i) is easy to address, (ii) is appropriately handled by the mouse probability index, but (iii) is tricky. Since the vocalization has to come from one of the mice, this already indicates that the localization is not working well in this case. Therefore we found it prudent (similar to Neunuebel et al. 2015) to not assign in these cases. Interestingly the MPI is not useful in these cases, as due to the exponential dependence of the normal density on distance, for example a case with a distance of 50 mm to one snout and 60 mm to another snout could lead to an MPI close to 1, which is likely not trustable. We have described this in the Methods as follows:

      "This distance threshold mainly serves to compensate for a deficiency of the 𝑀𝑃𝐼: if all mice are far from the estimate, all 𝑃𝑘 are extremely small, however, the 𝑀𝑃𝐼𝑘 will often exceed 0.95."<br /> Due to the inherent limit for localizing very quiet, short USVs by any system, we think this kind of selection (introduced originally by Neunuebel et al 2015) is a valuable and necessary step in the processing to avoid confusions (which are of course already substantially reduced through HyVL here).

      6) I guess the authors have code that we can run, but I couldn't access it. The manuscript describes the algorithms and equations that are used to calculate the location, but this doesn't really give me a feel for how it works. If you want to have the broadest impact possible, I think you would do well to make the code user-friendly (maybe it is, I don't know). In pursuit of that goal, I would suggest that the authors devote some of the paper to a guided example of how to use it.

      While the code was made available to the reviewers via the link at the beginning of the manuscript (p2, before abstract), we completely agree that this method of distribution is not very accessible. We have therefore created a publicly available GitHub repository (https://github.com/benglitz/HyVL) which hosts the code and details its use on the basis of a sample data set (which is available to the reviewers in the repository link, and later to the public under https://doi.org/10.34973/7kgc-ta72). While we do provide a sample video and analysis workflow there, our data analysis pipeline is quite integrated and other labs will likely use different pipelines. We have therefore tried to make the core functions independent of our pipeline and thus easy to integrate by others into their analysis pipelines.

      Reviewer #3 (Public Review):

      The present manuscript describes a new method to identify the emitter of ultrasonic vocalisations during social interactions between 2 or 3 mice. The method combines two technologies (an "acoustic camera" and a set of four microphones) and succeeds in increasing the spatial precision and the attribution of USV emission to one of the mice. The manuscript describes the characteristics and advantages of each method and the advantages of using both to optimize the identification of USV emitter. The authors used the method to confirm that females are also vocalising during male-female interactions and that females emit USV mostly during nose-nose contact while this was not the case for males. Interestingly, the authors identified that the vocal behaviour of two competing males was strongly asymmetric when facing a female. This was not the case for two females facing one male.

      The method is really promising since the identification of the emitter of USVs during mouse social interactions is a necessary step to speed up our understanding of this communication modality. The increase in spatial precision and in the proportion of attributed vocalisations is non-negligible and will be of great utility in the future.

      We would like to thank the reviewer for this positive perspective on the future utility of our system.

      Generally, the statistical analyses should be adjusted. Indeed, the statistical analyses do not consider the fact that the same individuals were recorded several times (if we understood well the methods). Each point was considered independent (in non-parametric Wilcoxon tests), while this is not the case given the repetitions with the same individuals (the number of repeated encounters per individual should be given in the methods section, by the way). We strongly recommend revising the statistical analyses of the results in Figures 4 and 5. In addition, it could be interesting to check whether the vocal behaviour is stable within each individual (i.e., a male that is vocalising frequently in one situation vocalises always frequently in other situations).

      We generally agree with this suggestion: In order to properly conduct the analysis for individuals as you suggest, a balanced dataset should be used. We had initially collected such a balanced dataset, which was previously not detailed in the manuscript, as the focus was on USV localization/attribution and hence only the recordings containing USVs were analyzed (detailed now in the beginning of Results and Methods). However, overall, the probability of a recording containing vocalizations at all is low: in our balanced set only 23/112 recordings contained vocalizations. We therefore had collected additional recordings with the best vocalizers which created the previously analyzed set of 83 recordings containing USVs recorded with all microphones. This dataset is therefore dominated by recordings from mice that are active vocalizers. While this does not raise any issue for the estimation of the accuracy of the method (Figure 3) or the female vocalizations (Figure 4, because recordings were always randomized across female mice), it precludes an encompassing analysis of individual differences in Figure 5, i.e. the dyadic-triadic comparison. In the new Figure 5, we address the reviewer's question for the dyadic recordings, finding that the current set of recordings does not provide sufficient evidence that individual male mice had significantly different vocalization rates. We would, however, like to point out that this is likely a consequence of the n=4 recordings that are compared here. For the female mice, we also did not find differences in vocalization rates, which is based on n=14 recordings and thus a more reliable result (p=0.16, 1-way ANOVA with factor individual).

      For the triadic recordings, however, due to a limitation in the experiment execution, we unfortunately do not have the complete information available on an experiment level for the triadic recordings, i.e. the video stream was accidentally started after all mice were placed in the platform, and since the same sex animals are visually not separable (while the female mice are separable from the males, based on a slightly shaved region on their head), we cannot completely assess this question in triadic recordings based on the available data. When including the triadic recordings in addition and assuming a single vocalizer (combining all male USVs, see below for why the males could not be assigned in the triadic condition) the male individual comparison can be approximately performed with n=8 recordings, and then the dependence on individual becomes borderline significant (p=0.028, 2-way ANOVA with factors individual and condition).

      For the comparison of vocalization rates in the previous Figure 5 that the reviewer was referring to, we cannot perform a rigorous analysis on the individual level, due to the lack of balance. While we thus agree that differences between individual mice can contribute to the differences observed, we do not think that this would change the conclusion that one of the mice dominates the vocal emissions. If the reviewers agree, we would thus leave Figures 6 (old Fig. 5) and new Figure 7 (behavioral confirmation of dominant/subordinate division) as part of the manuscript, with a clear cautioning about the possible contribution of individual differences to the observed differences. If the reviewers find it inappropriate to leave the results based on the unbalanced dataset in, all results after figure 5 could also be excluded (although we would find this unfortunate, given the additional time and effort we have invested in these).

      It is not easy to understand the rationale behind testing animals in pairs and in triads from the beginning of the manuscript. The authors should better introduce this aspect in the manuscript, especially given the fact that biological results deal with this aspect in Figure 5. The authors might strengthen the parts of the biological results extracted from their new method.

      Thank you for pointing out the need for clarification regarding the rationale behind testing animals in pairs and in triads. It is because courtship interactions are particularly vocal and social, that they are of interest to many fields, e.g. neurodevelopmental disorders.3,4 Due to the natural competitiveness between mice during courtship interactions, high accuracy is particularly beneficial in this regard because it allows disentangling USVs at close distances. We adapted the introduction to better reflect this reasoning and included an extra paragraph in the introduction and also where the biological results from old Fig. 5 / new Fig. 6 are summarized.

      More specifically, the fact that one male takes over the vocal behaviour within a triad is of high interest. Nevertheless, some behavioural data would be needed to strengthen these findings.

      We agree that this is an interesting finding and also agree that some additional behavioral analysis is useful to complement it. In order to arrive at this analysis, we performed all-frame, 3-animal tracking on the 14 triadic recordings with two males. This required switching to skeleton tracking with SLEAP5 in addition to manual post-processing to ensure that no identity switches occur. In each recording the dominant male was then defined as the one that emitted more vocalizations, and then the vocalization-independent spatial interaction histogram was computed, similar to the ones in Fig.4, but now separating between the dominant and the subordinate males (see new Figure 7). The results are consistent with the most typical location of vocalization of the male, in proximity to the female abdomen: The dominant male's spatial interaction histogram (Fig. 7A) was more clearly peaked in the location of the female abdomen very close to the male's snout, in comparison with the subordinate male's histogram (Fig. 7B), which shows up very clearly in the difference between the normalized histograms (Fig. 7C). Significance analysis was performed using 100x bootstrapping on the relative spatial positions to estimate p=0.99 confidence bounds around the histograms of the dominant and subordinate respectively. Significance at a level of p<0.01 highlights multiple relative spatial positions (Fig. 7D), including the one proximal to the snout which has the largest absolute difference (Fig. 7C). Note, that these analyses were conducted on the basis of the non-balanced dataset which contained enough vocalizations to assess the dominant male based on the vocalization rates and thus individual traits of certain animals remain as a possible confound.

      A small proportion of USVs was not assigned. The authors did not discuss the potential reason for this failure (Were the USVs too soft? Did they include specific acoustic characteristics that render them difficult to localise?). These points could be of interest when testing other mouse strains or other species.

      Good point, we agree that it is interesting to know the reasons for failure. As so often, there is not a single property that makes localization hard, but multiple factors contribute. In the SLIM paper, we already identified duration and intensity as important contributors (Fig. 3E/F), and in the speaker test (see new Supplementary Fig. 4) we again demonstrated the influence of intensity. In addition, frequency bandwidth and acoustic occlusion are two other main contributors that each influence the availability of the information/signal-to-noise ratio at the microphones:

      • Frequency bandwidth: In signals that are very narrowband, there are more opportunities for phase ambiguity, in particular for very high-frequency signals. These are avoided/reduced for more wideband signals.

      • Acoustic occlusion: As ultrasonic sounds can be quite directional, if an animal is vocalizing away from a microphone, which in addition would put its body in the way of the sounds to the microphone, then this can reduce the intensity at the microphone to a level where the information is insufficient to utilize information from this microphone. This mostly influences the 4 microphones surrounding the platform, while the Cam64 overhead will likely not be affected by acoustic occlusion in the plain.

      We have added a brief version of this explanation to the discussion under the heading: "Current limitations and future improvements of the presented system"

    1. Author Response

      Reviewer #1 (Public Review):

      The central claim that the R400Q mutation causes cardiomyopathy in humans require(s) additional support.

      We regret that the reviewer interpreted our conclusions as described. Because of the extreme rarity of the MFN2 R400Q mutation our clinical data are unavoidably limited and therefore insufficient to support a conclusion that it causes cardiomyopathy “in humans”. Importantly, this is a claim that we did not make and do not believe to be the case. Our data establish that the MFN2 R400Q mutation is sufficient to cause lethal cardiomyopathy in some mice (Q/Q400a; Figure 4) and predisposes to doxorubicin-induced cardiomyopathy in the survivors (Q/Q400n; new data, Figure 7). Based on the clinical association we propose that R400Q may act as a genetic risk modifier in human cardiomyopathy.

      To avoid further confusion we modified the manuscript title to “A human mitofusin 2 mutation can cause mitophagic cardiomyopathy” and provide a more detailed discussion of the implications and limitations of our study on page 11).

      First, the claim of an association between the R400Q variant (identified in three individuals) and cardiomyopathy has some limitations based on the data presented. The initial association is suggested by comparing the frequency of the mutation in three small cohorts to that in a large database gnomAD, which aggregates whole exome and whole genome data from many other studies including those from specific disease populations. Having a matched control population is critical in these association studies.

      We have added genotyping data from the matched non-affected control population (n=861) of the Cincinnati Heart study to our analyses (page 4). The conclusions did not change.

      For instance, according to gnomAD the MFN2 Q400P variant, while not observed in those of European ancestry, has a 10-fold higher frequency in the African/African American and South Asian populations (0.0004004 and 0.0003266, respectively). If the authors data in table one is compared to the gnomAD African/African American population the p-value drops to 0.029262, which would not likely survive correction for multiple comparison (e.g., Bonferroni).

      Thank you for raising the important issue of racial differences in mutant allele prevalence and its association with cardiomyopathy. Sample size for this type of sub-group analysis is limited, but we are able to provide African-derived population allele frequency comparisons for both the gnomAD population and our own non-affected control group.

      As now described on page 4, and just as with the gnomAD population we did not observe MFN2 R400Q in any Caucasian individuals, either cardiomyopathy or control. Its (heterozygous only) prevalence in African American cardiomyopathy is 3/674. Thus, the R400Q minor allele frequency of 3/1,345 in AA cardiomyopathy compares to 10/24,962 in African gnomAD, reflecting a statistically significant increase in this specific population group (p=0.003308; Chi2 statistic 8.6293). Moreover, all African American non-affected controls in the case-control cohort were wild-type for MFN2 (0/452 minor alleles).

      (The source and characteristics of the subjects used by the authors in Table 1 is not clear from the methods.)

      The details of our study cohorts were inadvertently omitted during manuscript preparation. As now reported on pages 3 and 4, the Cincinnati Heart Study is a case-control study consisting of 1,745 cardiomyopathy (1,117 Caucasian and 628 African American) subjects and 861 non-affected controls (625 Caucasian and 236 African American) (Liggett et al Nat Med 2008; Matkovich et al JCI 2010; Cappola et al PNAS 2011). The Houston hypertrophic cardiomyopathy cohort [which has been screened by linkage analysis, candidate gene sequencing or clinical genetic testing) included 286 subjects (240 Caucasians and 46 African Americans) (Osio A et al Circ Res 2007; Li L et al Circ Res 2017).

      Relatedly, evaluation in a knock-in mouse model is offered as a way of bolstering the claim for an association with cardiomyopathy. Some caution should be offered here. Certain mutations have caused a cardiomyopathy in mice when knocked in have not been observed in humans with the same mutation. A recent example is the p.S59L variant in the mitochondrial protein CHCHD10, which causes cardiomyopathy in mice but not in humans (PMID: 30874923). While phenocopy is suggestive there are differences in humans and mice, which makes the correlation imperfect.

      We understand that a mouse is not a man, and as noted above we view the in vitro data in multiple cell systems and the in vivo data in knock-in mice as supportive for, not proof of, the concept that MFN2 R400Q can be a genetic cardiomyopathy risk modifier. As indicated in the following responses, we have further strengthened the case by including results from 2 additional, previously undescribed human MFN2 mutation knock-in mice.

      Additionally, the argument that the Mfn2 R400Q variant causes a dominant cardiomyopathy in humans would be better supported by observing of a cardiomyopathy in the heterozygous Mfn2 R400Q mice and not just in the homozygous Mfn2 R400Q mice.

      We are intrigued that in the previous comment the reviewer warns that murine phenocopies are not 100% predictive of human disease, and in the next sentence he/she requests that we show that the gene dose-phenotype response is the same in mice and humans. And, we again wish to note that we never argued that MFN2 R400Q “causes a dominant cardiomyopathy in humans.” Nevertheless, we understand the underlying concerns and in the revised manuscript we present data from new doxorubicin challenge experiments comparing cardiomyopathy development and myocardial mitophagy in WT, heterozygous, and surviving (Q/Q400n) homozygous Mfn2 R400Q KI mice (new Figure 7, panels E-G). Homozygous, but not heterozygous, R400Q mice exhibited an amplified cardiomyopathic response (greater LV dilatation, reduced LV ejection performance, exaggerated LV hypertrophy) and an impaired myocardial mitophagic response to doxorubicin. These in vivo data recapitulate new in vitro results in H9c2 rat cardiomyoblasts expressing MFN2 R400Q, which exhibited enhanced cytotoxicity (cell death and TUNEL labelling) to doxorubicin associated with reduced reactive mitophagy (Parkin aggregation and mitolysosome formation) (new Figure 7, panels A-D). Thus, under the limited conditions we have explored to date we do not observe cardiomyopathy development in heterozygous Mfn2 R400Q KI mice. However, we have expanded the association between R400Q, mitophagy and cardiomyopathy thereby providing the desired additional support for our argument that it can be a cardiomyopathy risk modifier.

      Relatedly, it is not clear what the studies in the KI mouse prove over what was already known. Mfn2 function is known to be essential during the neonatal period and the authors have previously shown that the Mfn2 R400Q disrupts the ability of Mfn2 to mediate mitochondrial fusion, which is its core function. The results in the KI mouse seem consistent with those two observations, but it's not clear how they allow further conclusions to be drawn.

      We strenuously disagree with the underlying proposition of this comment, which is that “mitochondrial fusion (is the) core function” of mitofusins. We also believe that our previous work, alluded to but not specified, is mischaracterized.

      Our seminal study defining an essential role for Mfn2 for perinatal cardiac development (Gong et al Science 2015) reported that an engineered MFN2 mutation that was fully functional for mitochondrial fusion, but incapable of binding Parkin (MFN2 AA), caused perinatal cardiomyopathy when expressed as a transgene. By contrast, another engineered MFN2 mutant transgene that potently suppressed mitochondrial fusion, but constitutively bound Parkin (MFN2 EE) had no adverse effects on the heart.

      Our initial description of MFN2 R400Q and observation that it exhibited impaired fusogenicity (Eschenbacher et al PLoS One 2012) reported results of in vitro studies and transgene overexpression in Drosophila. Importantly, a role for MFN2 in mitophagy was unknown at that time and so was not explored.

      A major point both of this manuscript and our work over the last decade on mitofusin proteins has been that their biological importance extends far beyond mitochondrial fusion. As introduced/discussed throughout our manuscript, MFN2 plays important roles in mitophagy and mitochondrial motility. Because this central point seems to have been overlooked, we have gone to great lengths in the revised manuscript to unambiguously show that impaired mitochondrial fusion is not the critical functional aspect that determines disease phenotypes caused by Mfn2 mutations. To accomplish this we’ve re-structured the experiments so that R400Q is compared at every level to two other natural MFN2 mutations linked to a human disease, the peripheral neuropathy CMT2A. These comparators are MFN2 T105M in the GTPase domain and MFN2 M376A/V in the same HR1 domain as MFN2 R400Q. Each of these human MFN2 mutations is fusion-impaired, but the current studies reveal that that their spectrum of dysfunction differs in other ways as summarized in Author response table 1:

      Author response table 1.

      We understand that it sounds counterintuitive for a mutation in a “mitofusin” protein to evoke cardiac disease independent of its appellative function, mitochondrial fusion. But the KI mouse data clearly relate the occurrence of cardiomyopathy in R400Q mice to the unique mitophagy defect provoked in vitro and in vivo by this mutation. We hope the reviewer will agree that the KI models provide fresh scientific insight.

      Additionally, the authors conclude that the effect of R400Q on the transcriptome and metabolome in a subset of animals cannot be explained by its effect on OXPHOS (based on the findings in Figure 4H). However, an alternative explanation is that the R400Q is a loss of function variant but does not act in a dominant negative fashion. According to this view, mice homozygous for R400Q (and have no wildtype copies of Mfn2) lack Mfn2 function and consequently have an OXPHOS defect giving rise to the observed transcriptomic and metabolomic changes. But in the rat heart cell line with endogenous rat Mfn2, exogenous of the MFN2 R400Q has no effect as it is loss of function and is not dominant negative.

      Our results in the original submission, which are retained in Figures 1D and 1E and Figure 1 Figure Supplement 1 of the revision, exclude the possibility that R400Q is a functional null mutant for, but not a dominant suppressor of, mitochondrial fusion. We have added additional data for M376A in the revision, but the original results are retained in the main figure panels and a new supplemental figure:

      Figure 1D reports results of mitochondrial elongation studies (the morphological surrogate for mitochondrial fusion) performed in Mfn1/Mfn2 double knock-out (DKO) MEFs. The baseline mitochondrial aspect ratio in DKO cells infected with control (b-gal containing) virus is ~2 (white bar), and increases to ~6 (i.e. ~normal) by forced expression of WT MFN2 (black bar). By contrast, aspect ratio in DKO MEFs expressing MFN2 mutants T105M (green bar), M376A and R400Q (red bars in main figure), R94Q and K109A (green bars in the supplemental figure) is only 3-4. For these results the reviewer’s and our interpretation agree: all of the MFN2 mutants studied are non-functional as mitochondrial fusion proteins.

      Importantly, Figure 1E (left panel) reports the results of parallel mitochondrial elongation studies performed in WT MEFs, i.e. in the presence of normal endogenous Mfn1 and Mfn2. Here, baseline mitochondrial aspect ratio is already normal (~6, white bar), and increases modestly to ~8 when WT MFN2 is expressed (black bar). By comparison, aspect ratio is reduced below baseline by expression of four of the five MFN2 mutants, including MFN2 R400Q (main figure and accompanying supplemental figure; green and red bars). Only MFN2 M376A failed to suppress mitochondrial fusion promoted by endogenous Mfns 1 and 2. Thus, MFN2 R400Q dominantly suppresses mitochondrial fusion. We have stressed this point in the text on page 5, first complete paragraph.

      Additionally, as the authors have shown MFN2 R400Q loses its ability to promote mitochondrial fusion, and this is the central function of MFN2, it is not clear why this can't be the explanation for the mouse phenotype rather than the mitophagy mechanism the authors propose.

      Please see our response #7 above beginning “We strenuously disagree...”

      Finally, it is asserted that the MFN2 R400Q variant disrupts Parkin activation, by interfering with MFN2 acting a receptor for Parkin. The support for this in cell culture however is limited. Additionally, there is no assessment of mitophagy in the hearts of the KI mouse model.

      The reviewer may have overlooked the studies reported in original Figure 5, in which Parkin localization to cultured cardiomyoblast mitochondria is linked both to mitochondrial autophagy (LC3-mitochondria overlay) and to formation of mito-lysosomes (MitoQC staining). These results have been retained and expanded to include MFN2 M376A in Figure 6 B-E and Figure 6 Figure Supplement 1 of the revised manuscript. Additionally, selective impairment of Parkin recruitment to mitochondria was shown in mitofusin null MEFs in current Figure 3C and Figure 3 Figure Supplement 1, panels B and C.

      The in vitro and in vivo doxorubicin studies performed for the revision further strengthen the mechanistic link between cardiomyocyte toxicity, reduced parkin recruitment and impaired mitophagy in MFN2 R400Q expressing cardiac cells: MFN2 R400Q-amplified doxorubicin-induced H9c2 cell death is associated with reduced Parkin aggregation and mitolysosome formation in vitro, and the exaggerated doxorubicin-induced cardiomyopathic response in MFN2 Q/Q400 mice was associated with reduced cardiomyocyte mitophagy in vivo, measured with adenoviral Mito-QC (new Figure 7).

      Reviewer #2 (Public Review):

      In this manuscript, Franco et al show that the mitofusin 2 mutation MFN2 Q400 impaires mitochondrial fusion with normal GTPase activity. MFN2 Q400 fails to recruit Parkin and further disrupts Parkin-mediated mitophagy in cultured cardiac cells. They also generated MFN2 Q400 knock-in mice to show the development of lethal perinatal cardiomyopathy, which had an impairment in multiple metabolic pathways.

      The major strength of this manuscript is the in vitro study that provides a thorough understanding in the characteristics of the MFN2 Q400 mutant in function of MFN2, and the effect on mitochondrial function. However, the in vivo MFN2 Q/Q400 knock-in mice are more troubling given the split phenotype of MFN2 Q/Q400a vs MFN2 Q/Q400n subtypes. Their main findings towards impaired metabolism in mutant hearts fail to distinguish between the two subtypes.

      Thanks for the comments. We do not fully understand the statement that “impaired metabolism in mutant hearts fails to distinguish between the two (in vivo) subtypes.” The data in current Figure 5 and its accompanying figure supplements show that impaired metabolism measured both as metabolomic and transcriptomic changes in the subtypes (orange Q400n vs red Q400a in Figure 5 panels A and D) are reflected in the histopathological analyses. Moreover, newly presented data on ROS-modifying pathways (Figure 5C) suggest that a central difference between Mfn2 Q/Q400 hearts that can compensate for the underlying impairment in mitophagic quality control (Q400n) vs those that cannot (Q400a) is the capacity to manage downstream ROS effects of metabolic derangements and mitochondrial uncoupling. Additional support for this idea is provided in the newly performed doxorubicin challenge experiments (Figure 7), demonstrating that mitochondrial ROS levels are in fact increased at baseline in adult Q400n mice.

      While the data support the conclusion that MFN2 Q400 causes cardiomyopathy, several experiments are needed to further understand mechanism.

      We thank the reviewer for agreeing with our conclusion that MFN2 Q400 can cause cardiomyopathy, which was the major issue raised by R1. As detailed below we have performed a great deal of additional experimentation, including on two completely novel MFN2 mutant knock-in mouse models, to validate the underlying mechanism.

      This manuscript will likely impact the field of MFN2 mutation-related diseases and show how MFN2 mutation leads to perinatal cardiomyopathy in support of previous literature.

      Thank you again. We think our findings have relevance beyond the field of MFN2 mutant-related disease as they provide the first evidence (to our knowledge) that a naturally occurring primary defect in mitophagy can manifest as myocardial disease.

    1. Author Response

      Reviewer #1 (Public Review):

      Hoang, Tsutsumi and colleagues use 2-photon calcium imaging to study the activity of Purkinje cells during a Go/No-go task and related this activity to their location in Aldolase-C bands. Tensor component analysis revealed that a substantial part of the calcium responses can be linked to four functional components. The manuscript addresses an important question with an elegant technical approach and careful analysis. There are a few points that I think could be addressed to further improve the quality of the manuscript.

      1) The authors should be careful not to overstate the goal and results. For instance, in the abstract it is stated that dynamical functional organization is necessary for dimension reduction. However, the statement that the 4 TCs together account for about half of the variance (line 220) indicates that dimensionality may not be reduced that much. I would suggest revising the first and last sentence of the abstract accordingly.

      Dynamic functional organization of TC1 and TC2 by synchronization is the major finding of this study and we believe that it is one of the most efficient mechanisms of dimension reduction, given the unique anatomy of the cerebellum. In the revised manuscript, we added a supplemental result showing that the dimensionality of TC1 and TC2 neurons decreased and increased, respectively, in accordance with bi-directional changes in their synchronization (Figure 3 – figure supplement 1DE). Dimension reduction was further confirmed by conventional PCA (Figure 6 – figure supplement 1). However, we agree that the statement that the cerebellum reduces dimensions by self-organization of components is speculative, and we revised the abstract accordingly.

      At the end of the introduction, the authors refer to "the first evidence supporting the two major theories of cerebellar function" but which two theories is referred to and how this manuscript support them is not very obvious. Similarly, they state that "This study unveiled the secret of cerebellar functional architecture", which I would consider to be an unnecessary overstatement of the impact of the work described.

      In the revised Introduction, we explicitly stated that TC1 and TC2 are related to timing control and cognitive error learning, respectively, with some indirect causal evidence. We also revised the last paragraph of the Introduction to emphasize that this study provides the first evidence to support the view that distinct cerebellar components may serve divergent cerebellar functions in a single task. The statement "This study unveiled the secret of cerebellar functional architecture" was removed.

      In the title, the authors use the word modular. In the consensus paper on cerebellar modules (Apps et al., 2018) an attempt is made to unify the terms used to describe cerebellar anatomical structures. Here "module" is used for the longitudinal zone of interconnected PCs, CN neurons and olivary neurons. As the authors only studied PC activity (and indirectly the IO), I would suggest using band, stripe or subpopulation instead.

      Because we used TCA to identify functional components underlying the Go/No-go data, we changed the word “module” to “component” in the title.

      Finally, the term "CF firing" or "CF activity" is used when referring to the recorded signals. However, the authors measure postsynaptic calcium responses that are indeed likely driven by CF inputs, but could also be influenced by PF inputs. At the very least, because Purkinje cells and not climbing fibers are being imaged, "complex spike" should be used instead. It would be more accurate still to use the more general "calcium response" and make less of an assumption about the origin of the calcium response.

      In this study, CF-dependent dendritic Ca2+ signals in adjacent AldC compartments were recorded by the two-photon imaging. The HA_time algorithm (Hoang et al. 2020) was then applied to extract spike timings from the recorded signals. In the revised manuscript, we used the terms “calcium responses” and “complex spikes” when referring to the recorded Ca2+ signals and the estimated spikes, respectively.

      2) For some figure panels and statements in the manuscript error bars or confidence intervals and statistics are missing. This is the case for, for example, the changes in fraction correct, lick latency, fraction incorrect, etc. (Fig 1B, 2E-F, TC levels in 3, 4D-E and 5A-C). Including these is particularly relevant in Fig 4E as this is a key result, mentioned also in the abstract. Please indicate clearly if these plots are cumulative for all mice or per mouse and averaged. I advise the authors to statistically support the claim that the changes are significant and in opposite direction as this element of the study is referred to in the abstract and discussion (summary).

      We added the error bars / confidence intervals to the related figures. Most importantly, we added histograms of synchrony strength for TC1/TC2 neurons (Figure 4E) and conducted statistical tests to strengthen the claim of bi-directional changes in synchronization of TC1/TC2.

      3) Data presentation sometimes does not do the work justice. For example, the data in Figure 6 are very interesting, but hard to read because of the design of the figure. It is clear how the components are mostly confined to Aldolase-C domains, but within the domains the distribution is not clear. I would advise to also more clearly indicate what the locations of the colors within the bands refers to. The spatial distribution of the selected top 300 cells for each TC could be added.

      We added pie-chart plots for the fraction of TC1-4 neurons in each Ald-C zone and learning stage. We also indicated in the figure legend that the location of a single-color bar referred to the geographic distance of the corresponding neuron relative to Ald-C boundaries. We included spatial distribution of the selected neurons in Figure 4 – figure supplement 1D.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports a study to investigate the reporting practices in three top cardiovascular research journals for articles published in 2019. The study was preregistered, which makes the intent and methodology transparent, and the authors also make their materials, data, and code open. While the preregistration and sample strategy is a strength, it suffers from a higher than expected number of non-empirical articles decreasing the sample size and thus inference that can be drawn. The author's focus was mainly on transparency of reporting and not on the actual reproducibility or replicability of the articles; however, the accessibility of data, code, materials, and methods is a prerequisite. While the authors were still able to draw inferences to their main objectives, they could not perform some of their proposed analyses because of a small sample size (due partly to the less than half empirical articles in their sample as well as the low number of papers with accessible information to code). One of the descriptive analyses they performed, the country level scores (Figure 6), in particular suffers from the small sample size and while the authors state indicates this in their manuscript I do not think it would be reasonable to include as it has the potential to be misinterpreted since so many are based on an n=1. Overall, I found the authors presentation and discussion clear and concise; however, a lack of a more in-depth discussion is an area to improve the current manuscript. The manuscript outlines opportunities for researchers, journals, funders, and institutions to improve the way cardiovascular research is reported to enable discovery, reuse, and reproducibility.

      We appreciate the reviewer’s recognition of our pre-registration, methodology, and resource sharing and also their feedback regarding the small sample size of empirical research articles and need for a more in-depth discussion of the impacts of our study. We have now increased the number of empirical studies to a total of 393 out of 639 articles screened. We also agree that our study focuses more on transparency than reproducibility and replicability, and we have changed our title to reflect this. While the sample size of empirical papers has increased, a comparison of accessibility scores across countries continued to suffer from small sample size and we have removed it based on the recommendation of the reviewers. We have updated the Materials and Methods section to reflect our updated analyses, as well as included additional paragraphs on Limitations and Future Work in our Discussion to acknowledge future improvements that could be made to the accessibility score used in our study.

      Reviewer #2 (Public Review):

      This is a descriptive paper in the field of metascience, which documents levels of accessibility and reproducible research practices in the field of cardiovascular science. As such, it does not make a theoretical contribution, but it argues, first, that there is a problem for this field, and second, it provides a baseline against which the impact of future initiatives to improve reproducibility can be assessed. The study was pre-registered and the methods and data are clearly documented. This kind of study is extremely labour-intensive and represents a great deal of work.

      I have a major concern about the analysis. It is stated that to be fully reproducible, publications must include sufficient resources (materials, methods, data and analysis scripts). But how about cases where materials are not required to reproduce the work? In line 128-129 it is noted that the materials criterion was omitted for meta-analyses, but what about other types of study where materials may be either described adequately in the text, readily available (eg published questionnaires), or impossible to share (e.g. experimental animals).

      To see how valid these concerns might be, I looked at the first 4 papers in the deposited 'EmpricalResearchOnly.csv' file. Two had been coded as 'No Materials availability statement' and for two the value was blank.

      Study 1 used registry data and was coded as missing a Materials statement. The only materials that I could think might be useful to have might be 'standardized case report forms' that were referred to. But the authors did note that the Registry methods were fully documented elsewhere (I am not sure if that is the case).

      Study 2 was a short surgical case report - for this one the Materials field was left blank by the coder.

      Study 3 was a meta-analysis; the Materials field was blank by the coder

      Study 4 was again coded as lacking a Material statement. It presented a model predicting outcome for cardiac arrhythmias. The definitions of the predictor variables were provided in supplementary materials. I am not clear what other materials might be needed.

      These four cases suggest to me that it is rather misleading to treat lack of a Materials statement as contributing to an index of irreproducibility. Certainly, there are many studies where this is the case, but it will vary from study to study depending on the nature of the research. Indeed, this may also be true for other components of the irreproducibility index: for instance, in a case study, there may be no analysis script because no statistical analysis was done. And in some papers, the raw data may all be present in the text already - that may be less common, but it is likely to be so for case studies, for instance.

      A related point concerns the criteria for selecting papers for screening: it was surprising that the requirement for studies to have empirical data was not imposed at the outset: it should be possible to screen these out early on by specifying 'publication type'; instead, they were included and that means that the numbers used for the actual analysis are well below 400. The large number of non-empirical papers is not of particular relevance for the research questions considered here. In the Discussion, the authors expressed surprise at the large number of non-empirical papers they found; I felt it would have been reasonable for them to depart from their pre registered plan on discovering this, and to review further papers to bring the number up to 400, restricting consideration to empirical papers only - also excluding case reports, which pose their own problems in this kind of analysis.

      A more minor point is that some of the analyses could be dropped. The analysis of authorship by country had too few cases for many countries to allow for sensible analysis.

      Overall, my concern is that the analysis presented here may create a backlash against metascientific analyses like this because it appears unfair on authors to use a metric based on criteria that may not apply to their study. I am strongly in favour of open, reproducible science, and agree it is important to document the state of the science for different disciplines. But what this study demonstrates to me is that if you are going to evaluate papers as to whether they include things like materials/data/ availability statements, then you need to have a N/A option. Unfortunately, I suspect it may not be possible to rely on authors' self-evaluation of N/A and that means that metascientists doing an evaluation would need to read enough of the paper to judge whether such a statement should apply.

      We thank the reviewer for the time taken to review our paper, the appreciation of the work we conducted, and for the suggestions for improving our research methods. To address the initial concern about our analytical approach, the definition for fully reproducible publications that we used was only applicable to research that utilized empirical research methods. We recognize that publications such as editorials and reviews are not inherently reproducible experimental studies; thus, such papers were not provided with an accessibility score, were only screened for the components such as funding and conflict of interest information, and were only compared amongst each other. Additionally, articles such as meta-analyses and systematic reviews that do not include materials had adjusted accessibility scores. We expanded our Methods and Discussion section to further explain our screening process and our assumption that all empirical research articles contain methods, data, and analysis scripts and to acknowledge the limitations of our approach. We also agree that screening more empirical research articles is more in line with the intent of our pre-registration and we expanded the number of empirical research articles screened to 393. We also agree with the reviewer that the analysis by country should be excluded because of the small sample size for most countries, and we have adjusted the manuscript accordingly.

    1. Reviewer #1 (Public Review):

      The authors present a back-of-the-envelope exploration of various possible resource allocation strategies for ITNs. They identify two optimal strategies based on two slightly different objective functions and compare 3 simple strategies to the outcomes of the optimal strategies and to each other. The authors consider both P falciparum and P vivax and explore this question at the country level, using 2000 prevalence estimates to stratify countries into 4 burden categories.

      This is a relevant question from a global funder perspective, though somewhat less relevant for individual countries since countries are not making decisions at the global scale. The authors have made various simplifications to enable the identification of optimal strategies, so much so that I question what exactly was learned. It is not surprising that strategies that prioritize high-burden settings would avert more cases. Generally, I found much of the text confusing and some concepts were barely explained, such that the logic was difficult to follow.

      I am not sure why the authors chose to stratify countries by 2000 PfPR estimates and in essence explore a counterfactual set of resource allocation strategies rather than begin with the present and compare strategies moving forward. I would think that beginning in 2020 and modeling forward would be far more relevant, as we can't change the past. Furthermore, there was no comparison with allocations and funding decisions that were actually made between 2000 and 2020ish so the decision to begin at 2000 is rather confusing.

      I realize this is a back-of-the-envelope assessment (although it is presented to be less approximate than it is, and the title does not reveal that the only intervention strategy considered is ITNs) but the number and scope of modeling assumptions made are simply enormous. First, that modeling is done at the national scale, when transmission within countries is incredibly heterogeneous. The authors note a differential impact of ITNs at various transmission levels and I wonder how the assumption of an intermediate average PfPR vs modeling higher and lower PfPR areas separately might impact the effect of the ITNs. Second, the effect of ITNs will differ across countries due to variations in vector and human behavior and variation in insecticide resistance and susceptibility to the ITNs. The authors note this as a limitation but it is a little mind-boggling that they chose not to account for either factor since estimates are available for the historical period over which they are modeling. Third, the assumption that elimination is permanent and nothing is needed to prevent resurgence is, as the authors know, a vast oversimplification. Since resources will be needed to prevent resurgence, it appears this assumption may have a substantial impact on the authors' results.

      The decision to group all settings with EIR > 7 together as "high transmission" may perhaps be driven by WHO definitions but at a practical level this groups together countries with EIR 10 and EIR 500. Why not further subdivide this group, which makes sense from a technical perspective when thinking about optimal allocation strategies?

      The relevance of this analysis for elimination is a little questionable since no one eliminates with ITNs alone, to the best of my understanding.

    1. Author Response

      Reviewer #3 (Public Review):

      Because of the position of pigeon embryos in eggs, light exposure will only stimulate the right eye, leading to lateralisation of brain responses and behaviour. Lorenzi and colleagues injected manganese chloride into pigeon eggs, to assess neuronal activation in the embryonic brain. While the eggs were placed in the light or dark, manganese ions accumulated in neurons that were activated (in cell bodies and axons), which was then visualized with MRI of the embryos before hatching. The authors report lateralisation of neuronal activity in three brain regions, which could potentially be important for our understanding of experience-dependent development of lateralised neural activation.

      The tectofugal pathway in pigeons projects from the retina to the optical tectum, then to the nucleus rotundus in the thalamus, and then to the entopallium. The thalamofugal pathway projects from the retina to the GLd in the thalamus, and then to the wulst in the hyperpallium. The two pathways involve different thalamic nuclei (e.g., Deng 2006). In the methods and throughout the manuscript it should be specified which thalamic region is used as ROI.

      Here we refer to the Gld in the thalamofugal visual pathway, we did not estimate activity in the n. rotundus. We have now clarified this point in the revised MS (ll. 54, 80, 86).

      This manuscript only describes neural activity, but the MEMRI technique should also be used to assess the effect of experimental manipulations on axonal connectivity. It is important to learn about the asymmetry of contralateral projections in the light vs dark groups for answering the research question.

      Here we used systemic administration of Mn through the CAM. The Blood Brain Barrier at this embryonic stage is not completely developed and its permeability to ions and small molecules is way higher in embryo than in later stages of development (Engelhardt, B. (2003). Development of the blood-brain barrier. Cell and tissue research, 314(1), 119-129.). Other studies involving direct, local injection in selected brain regions are more apt to investigate connectivity, but this is not the protocol used here. We appreciate the reviewer’s suggestion, and this will be the object of future experiments. However, we would like to disseminate the current protocol and the results it led to at an early stage to enable and encourage its use by other researchers in the field.

      There is an overinterpretation of post-hoc statistics that are reported without correction for multiple testing. The wulst light group lateralization is probably not actually different from zero (uncorrected p=0.04).

      We considered the reviewer's observation regarding the need for improvements in the statistical methods. In response, we have made amendments to the relevant section of the manuscript, explicitly stating that significant findings were obtained using a two-way ANOVA. For comparisons between conditions within specific brain regions, we conducted two-sample t-tests, and the results were corrected for Type I errors using the false discovery rate (FDR) method. Post-hoc one-sample t-tests were employed to assess lateralization across brain regions and conditions, and the corresponding p-values were reported without correction for multiple comparisons (as explicitly reported in the text, to avoid any confusion).

      The first line in the discussion states that there is thalamofugal lateralization, but no lateralization in the tectofugal pathway. To my understanding, previous literature reported it the other way around: in altricial pigeons, light exposure in the egg mainly affected the tectofugal pathway (Deng & Rogers 2002), while the thalamofugal pathway in pigeons was not lateralized (Strockens et al., 2013). The manuscript should compare the current findings with the literature and discuss differences.

      We are aware of the substantial differences in brain lateralization of the two visual pathways between pigeons and chicks after embryonic light exposure. However, in the present work we employed chick embryos (Gallus gallus domesticus), and the space limitations of a Brief Communication do not allow for an in-depth discussion of these differences between avian species.

      Moreover, the tectum is the only region shown here from the tectofugal pathway. However, lateralization of contralateral connections is expected from tectum to the nucleus rotundus in the thalamus, and thus lateralization of activation may only arise in downstream brain regions from the optical tectum. Therefore, the conclusion that there is no lateralization in the tectofugal pathway is not supported by the data.

      In conclusion, I think it is interesting and worthwhile that the authors assessed neural activity in response to visual stimulation in the embryo prior to hatching, but multiple methodological weaknesses and unclarities should be addressed.

      The ROI that we here named Thalamus does not include the nucleus rotundus, but is referring to the nucleus geniculatus lateralis (Gld). We have now clarified this point in the revised MS (ll. 54, 80, 86), and we now refer only to the tectum, without generalizing to the entire tectofugal pathway, which will be the subject of future investigations.

    1. Reviewer #3 (Public Review):

      There has been a long-standing link between the biology of sulfur-containing molecules (e.g., hydrogen sulfide gas, the amino acid cysteine, and its close relative cystine, et cetera) and the biology of hypoxia, yet we have a poor understanding of how and why these two biological processes and are co-regulated. Here, the authors use C. elegans to explore the relationship between sulfur metabolism and hypoxia, examining the regulation of cysteine dioxygenase (CDO1 in humans, CDO-1 in C. elegans), which is critical to cysteine catabolism, by the hypoxia inducible factor (HIF1 alpha in humans, HIF-1 in C. elegans), which is the key terminal effector of the hypoxia response pathway that maintains oxygen homeostasis. The authors are trying to demonstrate that (1) the hypoxia response pathway is a key regulator of cysteine homeostasis, specifically through the regulation of cysteine dioxygenase, and (2) that the pathway responds to changes in cysteine homeostasis in a mechanistically distinct way from how it responds to hypoxic stress.

      Briefly summarized here, the authors initiated this study by generating transgenic animals expressing a CDO-1::GFP protein chimera from the cdo-1 promoter so that they could identify regulators of CDO-1 expression through a forward genetic screen. This screen identified mutants with elevated CDO-1::GFP expression in two genes, egl-9 and rhy-1, whose wild-type products are negative regulators of HIF-1, raising the possibility that cdo-1 is a HIF-1 transcriptional target. Indeed, the authors provide data showing that cdo-1 regulation by EGL-9 and RHY-1 is dependent on HIF-1 and that regulation by RHY-1 is dependent on CYSL-1, as expected from other published findings of this pathway. The authors show that exogenous cysteine activates cdo-1 expression, reflective of what is known to occur in other systems. Moreover, they find that exogenous cysteine is toxic to worms lacking CYSL-1 or HIF-1 activity, but not CDO-1 activity, suggesting that HIF-1 mediates a survival response to toxic levels of cysteine and that this response requires more than just the regulation of CDO-1. The authors validate their expression studies using a GFP knockin at the cdo-1 locus, and they demonstrate that a key site of action for CDO-1 is the hypodermis. They present genetic epistasis analysis supporting a role for RHY-1, both as a regulator of HIF-1 and as a transcriptional target of HIF-1, in offsetting toxicity from aberrant sulfur metabolism. The authors use CRISPR/Cas9 editing to mutate a key amino acid in the prolyl hydroxylase domain of EGL-9, arguing that EGL-9 inhibits CDO-1 expression through a mechanism that is largely independent of the prolyl hydroxylase activity.

      Overall, the data seem rigorous, and the conclusions drawn from the data seem appropriate. The experiments test the hypothesis using logical and clever molecular genetic tools and design. The sample size is a bit lower than is typical for C. elegans papers; however, the experiments are clearly not underpowered, so this is not an issue. The paper is likely to drive many in the field (including the authors themselves) into deeper experiments on (1) how the pathway senses hypoxia and sulfur/cysteine/H2S using these distinct mechanisms/modalities, (2) how oxygen and sulfur/cysteine/H2S homeostasis influence one another, and (3) how this single pathway evolved to sense and respond to both of these stress modalities.

      Major strengths of the paper include (1) the use of the powerful whole animal C. elegans model to reveal results that have meaning in vivo, (2) the careful demonstration through mutant rescue experiments that key transgenes have functional activity, (3) the use of CRISPR/Cas9 editing to mutate a critical residue in the catalytic domain of the EGL-9 prolyl hydroxylase, (4) transgenic rescue experiments that show that CDO-1 operates in the hypodermis with regard to the larval arrest phenotype, and (5) the thorough epistatic analysis of different pathway mutants.

      Major weaknesses of the paper include (1) the over-reliance on genetic approaches, (2) the lack of novelty regarding prolyl hydroxylase-independent activities of EGL-9, and (3) the lack of biochemical approaches to probe the underlying mechanism of the prolyl hydroxylase-independent activity of EGL-9.

      Major Issues We Feel the Authors Should Address:

      1. One particularly glaring concern is that the authors really do not know the extent to which the prolyl hydroxylase activity is (or is not) impacted by the H487A mutation in egl-9(rae276). If there is a fair amount of enzymatic activity left in this mutant, then it complicates interpretation. The paper would be strengthened if the authors could show that the egl-9(rae276) eliminates most if not all prolyl hydroxylase activity. In addition, the authors may want to consider doing RNAi for egl-9 in the egl-9(rae276) mutant as a control, as this would support the claim that whatever non-hydroxylase activity EGL-9 may have is indeed the causative agent for the elevation of CDO-1::GFP. Without such experiments, readers are left with the nagging concern that this allele is simply a hypomorph for the single biochemical activity of EGL-9 (i.e., the prolyl hydroxylase activity) rather than the more interesting, hypothesized scenario that EGL-9 has multiple biochemical activities, only one of which is the prolyl hydroxylase activity.

      2. The authors observed that EGL-9 can inhibit HIF-1 and the expression of the HIF-1 target cdo-1 through a combination of activities that are (1) dependent on its prolyl hydroxylase activity (and subsequent VHL-1 activity that acts on the resulting hydroxylated prolines on HIF-1), and (2) independent of that activity. This is not a novel finding, as the authors themselves carefully note in their Discussion section, as this odd phenomenon has been observed for many HIF-1 target genes in multiple publications. While this manuscript adds to the description of this phenomenon, it does not really probe the underlying mechanism or shed light on how EGL-9 has these dual activities. This limits the overall impact and novelty of the paper.

      3. Cysteine dioxygenases like CDO-1 operate in an oxygen-dependent manner to generate sulfites from cysteine. CDO-1 activity is dependent upon availability of molecular oxygen; this is an unexpected characteristic of a HIF-1 target, as its very activation is dependent on low molecular oxygen. Authors neither address this in the text nor experimentally, and it seems a glaring omission.

      4. The authors determined that the hypodermis is the site of the most prominent CDO-1::GFP expression, relevant to Figure 4. This claim would be strengthened if a negative control tissue, in the animal with the knockin allele, were shown. The hypodermal specific expression is a highlight of this paper, so it would make this article even stronger if they could further substantiate this claim.

      Minor issues to note:

      Mutants for hif-1 and cysl-1 are sensitive to exogenous cysteine levels, yet loss of CDO-1 expression is not sufficient to explain this phenomenon, suggesting other targets of HIF-1 are involved. Given the findings the authors (and others) have had showing a role for RHY-1 in sulfur amino acid metabolism, shouldn't the authors consider testing rhy-1 mutants for sensitivity to exogenous cysteine?

      The cysteine exposure assay was performed by incubating nematodes overnight in liquid M9 media containing OP50 culture. The liquid culture approach adds two complications: (1) the worms are arguably starving or at least undernourished compared to animals grown on NGM plates, and (2) the worms are probably mildly hypoxic in the liquid cultures, which complicates the interpretation.

      An easily addressable concern is the wording of one of the main conclusions: that cdo-1 transcription is independent of the canonical prolyl hydroxylase function of EGL-9 and is instead dependent on one of EGL-9's non-canonical, non-characterized functions. There are several points in which the wording suggests that CDO-1 toxicity is independent of EGL-9. In their defense, the authors try to avoid this by saying, "EGL-9 PHD," to indicate that it is the prolyl hydroxylase function of EGL-9 that is not required for CDO-1 toxicity. However, this becomes confusing because much of the field uses PHD and EGL-9/EGLN as interchangeable protein names. The authors need to be clear about when they are describing the prolyl hydroxylase activity of EGL-9 rather than other (hypothesized) activities of EGL-9 that are independent of the prolyl hydroxylase activity.

      The authors state in the text, "the egl-9; suox-1 double mutants are extremely sick and slow growing." We appreciate that their "health" assay, based on the exhaustion of food from the plate, is qualitative. We also appreciate that it is a functional measure of many factors that contribute to how fast a population of worms can grow, reproduce, and consume that lawn of food. However, unless they do a lifespan assay and/or measure developmental timing and specifically determine that the double mutant animals themselves are developing and/or growing more slowly, we do not think it is appropriate to use the words "slow growing" to describe the population. As they point out, the rate of consumption of food on the plate in their health assay is determined by a multitude and indeed a confluence of factors; the growth rate is one specific one that is commonly measured and has an established meaning.

    1. Neither Spread of U.S. Slavery nor Invasion of America uses language explicitly condemning slavery or imperialism, allowing the map’s usage by potentially racist and xenophobic visitors. The objective, socially-neoliberal portrayal of data without subjectivity perpetuates color-blind racism and allows bigotry to take root.

      I think this may be precisely because these maps are scholarly maps. Members of academia tend to avoid making a "subjective" or "biased" argument, especially regarding historial matters. On the other hand, non-scholarly maps created bottom-up through community engagement (such as the Anti-Eviction Mapping Project referenced in Data Feminism) can more explicitly call out injustices. I want to learn more about the ways in which we can complement the limitations of scholarly mapping projects.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their comments. We have now addressed all the comments in a revised version of the manuscript, which we believe has strengthened our paper.

      1) Introduction LINE 60: the authors cite Funato et al 2016 as the paper first describing a role for SIk3 in sleep regulation. In fact, the role for this kinase was first identified nearly a decade earlier in C. elegans (Van der Linden et al, Genetics 2008 PMID 18832350).

      Thank you for pointing us to this reference. Van der Linden et al. demonstrated that the C. elegans homolog of Sik3 (KIN-29) regulates satiety quiescence, in which worms stop moving following feeding on high quality food. However, as pointed out in Trojanowski and Raizen “Call it Worm Sleep” (2016), not all of the behavioral criteria for sleep has been applied to C. elegans satiety quiescence, and we cannot find any references that unequivocally demonstrate satiety quiescence is a sleep state. As McClanahan et al., (2020) show, quiescent states following mild sensory arousal do not fulfill the sleep criteria of changes in arousal threshold and homeostatic regulation, so not all quiescent states in C. elegans are sleep. Then again Grubbs et al, 2020 does demonstrate that KIN29 regulates both developmentally timed and stress induced sleep states in worms, suggesting that the observations in Van der Linden were ahead of its time and these behavioral states are possibly inter-related. We believe, though, that our line “the roles of… SIK3 kinase in modulating sleep homeostasis in mice (Funato et al. 2016) were identified in genetic screens” remains accurate.

      2) Introduction LINE 71: remove the word "known" from "...while some known human sleep/wake regulators, such as the...")

      Good idea. Done.

      3) I was confused regarding Supplemental data 1 describing the genes they targeted with their forward genetic screen. Am I understanding correctly from the "Summary stats" tab that 702 fish lines with virus insertions were screened behaviorally? In Figure S1, it looks like about 60 are shown in the histograms but in the text (in the Discussion) they say 25 were screened. Were all the genes listed under the Excel tabs (GPCRs, channels, etc) tested? Or was just a subset tested? Where are the sleep data for these lines? Negative results may be relevant to their manuscript since they listed (tested??) a number of ion channel genes under tab "channels" which appear to NOT have a sleep phenotype.

      We apologize for the confusion on these points. As highlighted in the legend to Supplementary Figure S1, we had planned a screening strategy with the following pipeline: Candidate mammalian gene → Zebrafish ortholog → ID viral insertion from “Zenemark” library → grow viral insertion lines from frozen sperm→ phenotype F3 heterozygous and homozygous mutant generation. Unfortunately, the company, Znomics, which held the Zenemark library, could not reliably reconstitute the correct live fish from the sperm library, and of the 702 lines we planned to screen, we could only screen 26 (25 was a typo) lines. We treated heterozygous and homozygous animals for each line independently, for a total of 52 screened lines in the histograms.

      To make this clearer, we have edited the main text as follows (lines 104-105): “For screening, we identified zebrafish sperm samples from the Zenemark collection (Varshney et al., 2013) that harboured viral insertions in genes of interest and used these samples for in vitro fertilization and the establishment of F2 families, which we were able to obtain for 26 lines.” And lines 111-112: “While most screened heterozygous and homozygous lines had minimal effects on sleep-wake behavioural parameters (Figure S1B-S1C),”

      We believe it is important to include the full set of Supplementary Data 1, even though the vast majority of these candidate lines were not tested.

      4) Results LINE 117: remove the word "prominent", which is subjective, from the sentence "...showed a prominent decrease in sleep during the..."

      Good point. Done.

      5) LINES 185-186: did you see any circadian variation in your dmist:GFP protein abundance or localization? Protein trafficking has been described as a mechanism of circadian regulation of excitability.

      For practical reasons, we imaged the membrane localization of Dmist:GFP in plasmidinjected embryos at 90% epiboly, which is about 9 hours after fertilization and when the cells remain large and in a relatively flat epithelium. Thus, we could not follow circadian fluctuations in abundance or localization. For circadian studies, we believe the best method will be to raise an antibody that recognizes Dmist.

      6) LINE 203: does the GFP-tagged Dmist rescue the loss-of-function phenotype? This is relevant to Figure 2E. it is also relevant to the issue of structure-function. If it rescues, then the C-terminus may not be essential to protein function.

      As noted, for practical reasons, we observed Dmist-GFP only transiently at early stages of development, expressed using a strong, ubiquitous promoter. A rescue experiment is a good idea for future experiments, where we carefully control the expression of Dmist in neurons.

      7) LINE 220: explain what you mean by "...consistent with nonsense-mediated decay." and/or give a reference.

      In zebrafish and other species including humans, mutant transcripts that have premature stop codons often undergo “nonsense mediated decay”, whereby the expression levels are largely reduced (Wittkopp et al., 2009). In the zebrafish community, this is often used as secondary evidence of a loss of function mutation, as relatively few antibodies are available to directly observe zebrafish proteins. We have added a reference that describes this phenomenon (Wittkopp et al., 2009).

      8) LINE 225: define "LME model"

      Now reads: “Linear mixed effects (LME).”

      9) LINES 227-229: could the vir/vir phenotype be explained by specific effects on protein structure? could vir/vir be a gain-of-function allele?

      We can’t rule this out formally, and vir/+ animals do show some sleep phenotypes, albeit weaker than those of vir/vir animals (Figure 1G). However, it is not uncommon for heterozygous mutants to show significant phenotypes that are weaker than those of their homozygous mutant siblings, and the strong suppression of dmist expression by the viral insertion (which is located in the dmist intron) is more consistent with a hypomorphic loss-of-function phenotype for the vir allele.

      10) LINES 229-230: I don't quite follow the argument for pursuing further studies only of i8/i8. i8/i8 seems to also be a hypomorphic allele based on your qPCR data.

      First, the dmist viral line was generated by an insertional mutagenesis method followed by sequencing, and each line has multiple other inserts in a background that does not match the background of the other animals reported in this paper. Second, the dmist vir allele is an insertion in the intron, leading to reduced, but not complete loss of expression. In contrast, the i8 allele was generated on the same background strain as our other existing and newly reported lines. Moreover, our i8 line is likely a loss-of-function allele and not a hypomorph. Yes, dmist expression is reduced in the i8 allele; however, this is likely due to nonsense mediated decay of dmist mRNA. The mutation introduces a frameshift in the dmist coding sequence, and as a result the amino acid sequence of the protein is altered after the N-terminal signal sequence.

      11) LINES 241-243: grammar.

      Fixed

      12) LINE 245: define "JackHMMR iterative search"

      We’ve added the phrase: “and seeding a hidden Markov model iterative search (JackHMMR)”

      13) LINE 246 is missing the word "we" prior to "...found distant homology between..."

      Added

      14) LINE 301: show data demonstrating deviation from Mendelian ratios. Also, comment on meaning of such data (embryonic lethality??).

      We have added this data in the line (301):

      “atp1a3b mutant larvae were not obtained at Mendelian ratios (55 wild type [52.5 expected], 142 [105] atp1a3b+/-, 13 [52.5] atp1a3b-/-; p<0.0001, Chi-squared) suggesting some impact on early stages of development leading to lethality.”

      15) Discussion LINES 362-372: This paragraph seems to be of only tangential relevance to the paper. Consider removing.

      Our screening strategy was a large-scale reverse genetic screen, but the number of lines was limited by the technical issues described above. We think it is important to mention that the strategy, if employed today, could benefit from newer technologies.

      16) Discussion. Another model is that Dmist and NaK pump have a developmental effect. Arguing against this developmental model is the Oubain expt.

      This is an important point. We’ve added the line (454:457): “We also cannot exclude a role for Dmist and the Na+/K+ pump in developmental events that impact sleep, although our observation that ouabain treatment, which inhibits the pump acutely after early development is complete, also impacts sleep, argues against a developmental role.”

      17) FIGURE 1G: Are these significance cut offs corrected for multiple comparisons?

      Yes, all the data is corrected for multiple comparisons.

      18) performing neuronal activity measures, either via neural activity imaging or phospho-ERK labeling in different mutants at day or night conditions, to determine whether baseline neuronal activity brain-wide or in specific brain regions are altered.

      These are excellent experiments that we plan to perform in the future.

      19) Please check all Figure numbers for accuracy.

      We have double checked these.

      20) The authors emphasize the role of increased cellular sodium, but equally plausibly, the phenotypes could be due to decreased cellular potassium. The potassium channel shaker has been previously identified as a critical sleep regulator in Drosophila.

      We completely agree. We would like to highlight that we did devote an entire paragraph to the possibility of changes in extracellular potassium in the discussion: “A third possibility is that Dmist and the Na+,K+-ATPase regulate sleep not by modulation of neuronal activity per se but rather via modulation of extracellular ion concentrations. Recent work has demonstrated that interstitial ions fluctuate across the sleep/wake cycle in mice. For example, extracellular K+ is high during wakefulness, and cerebrospinal fluid containing the ion concentrations found during wakefulness directly applied to the brain can locally shift neuronal activity into wake-like states (Ding et al., 2016). Given that the Na+,K+-ATPase actively exchanges Na+ ions for K+ , the high intracellular Na+ levels we observe in atp1a3a and dmist mutants is likely accompanied by high extracellular K+. Although we can only speculate at this time, a model in which extracellular ions that accumulate during wakefulness and then directly signal onto sleep-regulatory neurons could provide a direct link between Na+,K+ ATPase activity, neuronal firing, and sleep homeostasis. Such a model could also explain why disruption of fxyd1 in non-neuronal cells also leads to a reduction in night-time sleep.”

      We also agree that Shaker may be an important component of this sleep regulatory mechanism. Indeed, we previously showed that another potassium channel in zebrafish regulates sleep (Rihel et al., 2010).

      We have emphasized sodium homeostasis in our title and paper only because we were able to directly observe intracellular sodium levels, so we are confident that these have been altered in our mutants. We can only presume that potassium levels have also been altered, but we could not directly observe this.

      21) The similar phenotype between dmist and Fxyd1 in sleep reduction yet very different expression patterns, with dmist being mostly neuronal while fxyd1 being mostly non-neuronal, raise many possible questions: 1) are the sleep phenotypes due to neuronal Na/K imbalance? Or 2) Are the sleep phenotypes due to extracellular Na/K imbalance? Or 3) both? Some feasible experiments may help achieve a better mechanistic understanding of the observed sleep defects.

      Yes, we think these are excellent studies for future work. As noted in the previous point (20), we did discuss the possibility that changes to extracellular potassium might be a parsimonious explanation for the similar phenotypes of fxyd1 and dmist mutants.

      Future experiment suggestions (not required)

      1) Perform a double mutant analysis of fxyd1 and atp1a3a, to determine whether an epistatic relationship similar to that of dmist and atp1a3a is observed in the case of fxyd1 and atp1a3a.

      This is a great experiment that we will do in the future. Unfortunately, the fxyd1 mutant had been sperm frozen during the COVID-19 pandemic, so we cannot do this experiment at this time.

      2) Given the differences in the sleep phenotypes between vir/vir and i8/i8 mutants, would be informative to see the phenotype of the vir/i8 trans-heterozygote.

      This is also a good experiment to perform in the future. Since obtaining the cleaner i8 allele, the dmistvir/vir lines were sperm frozen.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the role of Elg1 in the regulation of telomere length. The main role of the Elg1/RLC complex is to unload the processivity factor PCNA, mainly after completion of synthesis of the Okazaki fragment in the lagging strand. They found that Elg1 physically interacts with the CST (Cdc13-Stn1- Ten1) and propose that Elg1 negatively regulates telomere length by mediating the interaction between Cdc13 and Stn1 in a pathway involving SUMOylation of both PCNA and Cdc13. Accumulation of SUMOylated PCNA upon deletion of ELG1 or overexpression of RAD30 leads to elongated telomeres. On the other hand, the interaction of Elg1 with Sten1 is SIM-dependent and occurs concurrently with telomere replication in late S phase. In contrast Elg1-Cdc13 interaction is mediated by PCNA-SUMO, is independent on the SIM of Elg1 but still dependent on Cdc13 SUMOylation. The authors present a model containing two main messages 1) PCNA- SUMO acts as a positive signal for telomerase activation 2) Elg1 promotes Cdc13/Stn1 interaction at the expense of Cdc13/Est1 interaction thus terminating telomerase action.

      The manuscript contains a large amount of data that make a major inroad on a new type of link between telomere replication and regulation of the telomerase. Nevertheless, the detailed choreography of the events as well as the role of PCNA- SUMO remain elusive and the data do not fully explain the role of the Stn1/Elg1 interaction. The data presented do not sufficiently support the claim that SUMO- PCNA is a positive signal for telomerase activation.

      We thank the reviewer for her/his review efforts and opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented. In a point-by-point letter we respond to all the specific queries.

      Reviewer #2 (Public Review):

      This paper purports to unveil a mechanism controlling telomere length through SUMO modifications controlling interactions between PCNA unloader Elg1 and the CST complex that functions at telomeres. This is an extremely interesting mechanism to understand, and this paper indeed reveals some interesting genetic results, leading to a compelling model, with potential impact on the field. The conclusions are largely supported by experiments examining protein-protein interactions at low resolution and ambiguous regarding directness of interactions like co-IP and yeast two-hybrid (Y2H) combined with genetics. However, some results appear contradictory and there's a lack of rigor in the experimental data needed to support claims. There is significant room for improvement and this work could certainly attain the quality needed to support the claims. The current version needs substantial revision and lacks the necessary experimental detail. Stronger support for the claims would add detail to help distinguish competing models.

      We thank the reviewer for her/his positive opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented by thereferees, and added all the missing experimental details. In a point-by-point letter we respond to all the specific queries.

      Reviewer #3 (Public Review):

      This paper reveals interesting physical connections between Elg1 and CST proteins that suggest a model where Elg1-mediated PCNA unloading is linked to regulation of telomere length extension via Stn1, Cdc13, and presumably Ten1 proteins. Some of these interactions appear to be modulated by sumolyation and connected with Elg1's PCNA unloading activity. The strength of the paper is in the observations of new interactions between CST, Elg1, and PCNA. These interactions should be of interest to a broad audience interested in telomeres and DNA replication.

      We thank the reviewer for her/his positive opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented. In a point-by-point letter we respond to all the specific queries.

      What is not well demonstrated from the paper is the functional significance of the interactions described. The model presented by the authors is one interpretation of the data shown, and proposes that the role of sumolyation is temporally regulate the Elg1, PCNA and CST interactions at telomeres. This model makes some assumptions that are not demonstrated by this work (such as Stn1 sumolyation, as noted) and are left for future testing. Alternative models that envision sumolyation as a key in promoting spatial localization could also be proposed based on the data here (as mentioned in the discussion), in addition to or instead of a role for sumolyation in enforcing a series of switches governing a tightly sequenced series of interactions and events at telomeres. Critically, the telomere length data from the paper indicates that the proposed model depicts interactions that are not necessary for telomerase activation or inhibition, as telomeres in pol30-RR strains are normal length and telomeres in elg1∆ strains are not nearly as elongated as in stn1 strains. One possibility mentioned in the paper is the PCNAS and Elg1 interactions are contributing to the negative regulation of telomerase under certain conditions that are not defined in this work. Could it also be possible that the role of these interactions is not primarily directed toward modulating telomerase activity? It will be of interest to learn more about how these interactions and regulation by Sumo function intersect with regulation of telomere extension.

      We present compelling evidence for a role of SUMOylated PCNA in telomere length regulation. Figure 1 shows that this modification is both necessary and sufficient to elongate the telomeres, indicating that PCNA SUMOylation plays a positive role in telomere elongation. The model we present is consistent with all our results. There are, of course, possible alternative models, but they usually fail to explain some of the results. We agree that the fact that pol30-RR presents normal-sized telomeres implies that SUMO-PCNA is not required for telomerase to solve the "end replication problem", but rather is needed for "sustained" activity of telomerase. Since elongated telomeres (by absence of Elg1 or by over-expression of SUMO-PCNA) was the phenotype monitored, this may require sustained telomerase activity. Similar results were seen in the past for Rnr1 (Maicher et al., 2017), and this mode depends on Mec1, rather than Tel1 (Harari and Kupiec, 2018). Telomere length regulation is complex, and we may not yet understand the whole picture. It appears that for normal “end replication problem” solution, very little telomerase activity may be needed, and spontaneous interactions at a low level may suffice. Future work may find the conditions at which telomerase switches from "end replication problem" to "sustained" activity. We have added further explanations on this subject to the Discussion section.

      We suspect, but could not prove, a role for Stn1 SUMOylation in the interactions. SUMOylation is usually transient, and notoriously hard to detect, and despite the fact that many telomeric proteins are SUMOylated, Stn1 SUMOylation could not be shown directly by us and others (Hang et al, 2011).

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • My main concern is the claim that SUMOylated PCNA acts as a positive signal for telomerase activation. Yet the pol30-RR mutant has no impact on telomere length. The explanation of the authors is not entirely convincing.

      We are aware that the regulation of telomere length is complex, and we may not fully understand it yet. Just consider the fact that ~500 genes participate in determining the final telomere length of a yeast (Askree et al., 2004). Since mutation in EACH of these genes has a phenotype, the implication is that the joint action of 500 players determines the outcome (a dialogue of 500 participants). Having said this, we clearly show in figure 1 that mutations that prevent PCNA SUMOylation prevent telomere length elongation in cells lacking Elg1, and overexpressing SUMOylated PCNA is enough to elongate the telomeres. Thus, SUMOylation of PCNA does act as a positive signal for elongation.

      However, it appears that to fulfill the minimal requirement of dealing with the "end- replication problem", PCNA SUMOylation is not required, and only a "sustained activity" mode requires the S-PCNA signal (as we have also shown, surprisingly, for RNR1, Maicher et al. 2017). This sustained activity mode depends on Mec1, rather than Tel1 (Harari and Kupiec, 2018). Since elongated telomeres (by absence of Elg1 or by over-expression of SUMO-PCNA) was the phenotype monitored, this may require sustained telomerase activity. Telomere length regulation is complex, and we may not yet understand the whole picture. It appears that for normal “end replication problem” solution, very little telomerase activity may be needed, and spontaneous interactions at a low level may suffice (for example, unmodified PCNA may promote telomerase activity at a lower level than that of SUMO-PCNA. Future work may find the conditions at which telomerase switches from "end replication problem" to "sustained" activity.

      We have added further explanations on this subject to the Discussion section.

      • The model is entitled « Elg1 negatively regulates the telomere length by forming an interaction with the CST complex ». Nevertheless, expression of PCNA-RR completely reversed the long telomere phenotype of elg1∆ cells. Thus it appears that although the interaction between Stn1 and Cdc13 is reduced in the absence of Elg1, Elg1/Stn1 interaction is not instrumental in the formation of the CST complex and thus in the termination of telomerase activity. Does the elg1∆SIM mutant that does not interact with Stn1 impact telomere length?

      • In the model part (lane 318), it is argued that the complex Elg1-Stn1 unloads SUMOylated PCNA. Elg1-Stn1 interaction depends on the SIM of Elg1. This SIM is however not required for Elg1's function in genome-wide SUMO-PCNA unloading, is it required specifically at telomeres?

      The interactions between Elg1 and SUMOylated PCNA are carried out through both the SIM and the Threonines 386 and 387 (Shemesh et al, 2017). Consistently, the single elg1-SIM mutant has telomeres of normal length, and its effects on telomere length can only be seen when combined with mutations in the Threonines (elg1- TT386/7AA or elg1-TT386/7DD). Although the unloading of SUMOylated PCNA by Elg1 is important, the gene is not essential, and PCNA is either eventually unloaded by RFC, or spontaneously dis-assembles. This explains why the telomere length does not reach the same length in the absence of Elg1 as in the absence of, say, Stn1.

      • The model suggests that Elg1 promotes the interaction between Cdc13 and Stn1. This is based on the data presented in Figure 5 E and F. This is an important result. Because the experiment has been done on cells synchronized in S phase and the Elg1/Stn1 interaction occurs specifically at the end of S-phase, the FACS profile should be shown or a control provided to show that the two conditions are comparable.

      The FACS profile for this experiment is shown in Figure 5C.

      • Does the interaction between Cdc13 and Pol30 depend on the SUMOyaltion of POL30 ?

      Yes. We have added this as new Figure S2, and presented the results together with Figure 3 (Figure 3 is already too crowded).

      Others points :

      • Fig 1 : it should be mentioned in the Materials and Methods or in the figure legend how the average telomere lengths (horizontal bar) were calculated from the teloblot, as the position of the bar is not always intuitive

      We estimate telomere length by using TelQuant (Rubinstein et al., 2014). We have added this to the Methods section.

      -Fig 2 : Owing to the large span of telomere length in the stn1 mutants, the epistatic relationship between elg1∆ and stn1 mutants is poorly illustrated by the teloblot.

      We repeated this experiment several times, and stn1 mutants consistently gave a very spread telomere length. In ALL the blots, however, the double mutants elg1 stn1 showed a telomere length similar to that of the single stn1 mutant, and never longer.

      • It is mentioned that other mutants in the collection showed epistasis. Are any of these mutants related to telomere replication or the proposed model?

      Since we used the collection of non-essential mutants (so far), it was quite devoid of genes involved in DNA replication, which are mostly essential. An exception was siz1, which showed epistasis with elg1Δ.

      • The section entitled « Elg1's functional activity is essential for its interaction with Cdc13 » (lane 205) is difficult to follow. The hierarchy between the different mutants of Elg1 on their capacity to unload PCNA is not totally in agreement with the data published in Itzkovich et al 2023 and Shemesh et al. 2017. In particular it appears to me from these papers that elg1-WalkerA 238 (KK343/4AA) mutant did not show a defect in contrast to elg1-WalkerA 238(KK343/4DD).

      We are sorry for the typo in the results. We used the elg1-WalkerA (KK343/4DD) allele, which has a normal SIM but no activity. In a nutshell, we used mutants that either did or did not show unloading activity and/or SIM. The results clearly show that you need to unload PCNA in order for the N-ter of Elg1 to interact with Cdc13.

      • Are the synchronization done at 30{degree sign}C ?

      Yes. We have added the information to the Methods section.

      • ChIP experiments are not described in the Materials and Methods

      We apologize for this. They are now described.

      • In the figure 6, the PCNA rings are curiously placed at the beginning of the Okasaki fragments.

      We thank the referee for noticing, we have corrected the figure.

      Reviewer #2 (Recommendations For The Authors):

      This paper purports to unveil a mechanism controlling telomere length through SUMO modifications controlling interactions between PCNA unloader Elg1 and the CST complex that functions at telomeres. This is an extremely interesting mechanism to understand, and this paper indeed reveals some interesting genetic results, leading to a compelling model, with potential impact on the field. The conclusions are largely supported by experiments examining protein-protein interactions at low resolution and ambiguous regarding directness of interactions like co-IP and yeast two-hybrid (Y2H) combined with genetics. However, some results appear contradictory and there's a lack of rigor in the experimental data needed to support claims. There is significant room for improvement and this work could certainly attain the quality needed to support the claims. The current version needs substantial revision and lacks necessary experimental detail. Stronger support for the claims would add detail to help distinguish competing models.

      Specific comments:

      Insufficient technical detail: I could find no explanation of how overexpression was achieved. No description of how teloChIP is performed, either for the PCNA IP or how the sequence analysis is performed. Too limited details on growth like exact temperatures for the cell cycle time course.

      We have significantly expanded the Methods section to include all the technical information.

      Please do not bold and underline text for emphasis-EVER

      We have removed those from the text.

      Lines 130-132: they have not shown "accumulation of SUMOylated PCNA" anywhere; this is an inference.

      We have modified the text, it says: ”show that SUMOylated PCNA, and not unmodified or ubiquitinated PCNA, is both necessary and sufficient for telomere elongation in the presence or in the absence of Elg1.”

      Fig 2A Can authors show any other very long-telomere mutant like stn1 that does show enhancement in combination with elg1∆ to show feasibility of such phenotype?

      We don't think it is appropriate for the paper, but we have systematically created double mutants with elg1Δ and found many additive and even synergistic interactions. Here is an example. in Author response image 1, taken from the PhD thesis of Taly Ben-Shitrit, a PhD student in the lab.

      Author response image 1.

      What about cdc13 or ten1? Epistatic?

      We did not test telomere length in combination with Ten1. Combining elg1 with cdc13-50 resulted in synergistic elongation. Given the complex genetic relationship between Stn1/Ten1 and Cdc13, it is hard to interpret this result.

      Seems tenuous to use Y2H to decipher protein-protein interactions occurring out of context (i.e., not at telomere but at reporter gene promoter)

      Y2H is a great method to detect interactions, even if they are transient. Whenever possible, we confirm our findings using co-IP or telo-ChIP.

      Lines 268-270: It would be more accurate to state "can be" instead of "becomes" or "is" as they have not shown that SUMOylation or PCNA unloading have occurred.

      We agree, and have changed the text.

      Cdc13snm protein level?

      Unfortunately our Western blot is not presentable, but the level of Cdc13snm was similar to that of the wt Cdc13, and this result has been already published by Hang et al., 2011.

      Fig S3A: If SUMOylated Cdc13 mediates the Stn1-Elg1 interaction, why is Stn1-Elg1 interaction maintained in cdc13snm strain? This result seems to directly contradict the premise and overall conclusion of this section that Cdc13-SUMO mediates the (Y2H) interaction of Elg1 and Stn1.

      According to our model, the interaction between Stn1 and Elg1 takes place upstream, and only then this complex interacts with SUMOylated Cdc13. Hence, if Cdc13 cannot be SUMOylated, the interaction Elg1-Stn1 is not lost, although Stn1 fails to interact with Cdc13, leading to a telomeric phenotype.

      Line 279: which data establishes Stn1-Elg1 interaction as direct? Fig 2B co-Ip indicates physical but not necessarily direct interaction, but later the authors suggest that the interaction requires a SUMOylated intermediary, and Y2H in Fig. S3B doesn't demonstrate direct interaction.

      We have changed the text, taking out the word "direct".

      Co-Ip shows that interaction of Elg1 with Stn1 occurs mainly during later Sphase and with an overall delay compared to initial Elg1-Pol3 interaction.Co-IP Interaction between Cdc13 and Stn1 is reduced in the absence of Elg1

      The subsection title: "The interaction of Elg1 with Stn1 takes place at telomeres only at late S-phase" is not well supported by the data. I agree the data are consistent with the idea of the interactions occurring at telomeres but there's no direct evidence of this.

      We have changed the subsection title. It now reads: " The interaction of Elg1 with Stn1 takes place only at late S-phase"

      Model: Is unloading happening at the fork? Doesn't PCNA unloading have to follow its loading which occurred behind the fork particularly on the lagging strand? Model now suggest that Stn1 itself is SUMOylated.

      Yes, according to the model Elg1 moves with the fork, unloading PCNA from the lagging strand. Once Elg1 reaches the telomeres, it interacts with Stn1 (Figure 5). This interaction requires SUMOylation of Stn1 or of some other protein, which is not PCNA (Figure 3D) nor Cdc13 (Figure S3A) and could be Stn1 itself or another telomeric protein (Hang et al., 2011)

      Title is rather vague.

      We think it summarizes what we present in the paper.

      Abstract:

      "We report that SUMOylated PCNA acts as a signal that positively regulates telomerase activity."

      I don't think this is supported or a good description of what they find

      Figure 1B clearly shows that SUMO-PCNA is both necessary and sufficient for telomere elongation.

      "and dissected the mechanism by which Elg1 and Stn1 negatively regulates telomere elongation, coordinated by SUMO."

      Again, I don't think this is sufficiently supported and the model invokes SUMOylation events not demonstrated like Stn1, which might be a significant step forward.

      On the positive side, their model makes several predictions that they could test much more directly and rigorously: for example, examining the impact of the relevant mutations in the recruitment of proteins to the telomere.

      We have dissected the mechanism, and future work will be devoted to examining the impact of the relevant mutations in the recruitment of proteins to the telomere.

      Reviewer #3 (Recommendations For The Authors):

      Comments:

      1) The telomere length analysis data presented here is consistent with an interpretation that Stn1 and Elg1 play roles in a similar telomere maintenance pathway because the telomere restriction fragment pattern in the double mutants are not longer than the stn1 single mutants. No comment is made with respect to the yellow bars in Figure 2 that presumably measure telomere length appearing to be slightly shorter than in the stn1 single mutants. It may be interesting and informative if the double mutants do in fact have some phenotype distinct from the single stn1 mutants. Is there an impact on viability in the double mutant?

      Given the variable telomeric phenotype of the single stn1 mutants, slight variations in the measurement of the median telomere size are expected. The difference observed is not likely to be significant. What is important is that the double mutants with elg1 do not show longer telomeres. In terms of fitness, the stn1 mutants grow slightly slowly, but the elg1 mutation does not slow them down further.

      2) It is somewhat surprising that no additional telomere length analysis is included that actually tests the proposed model, including whether this path could be operational only under certain conditions. Maybe this is a topic of the next paper?

      Indeed, future work will explore the conditions under which PCNA SUMOylation is essential, and those under which is only needed.

      3) Were the error bars in Figure 5F determined only from the experiment in E? Does this represent error in measuring the data from one biological replicate? The type of error should be made clear to avoid readers assuming the data represents measurements from more than one sample in more than one experiment. The data would be stronger if it represented measurements from multiple experiments.

      The graph was made with data from three biological replicates. We show the best blot in Figure 5E. We have now stressed this in the Figure Legend.

      4) Why was only one two hybrid reporter shown? Having the multiple reporters can give confidence in interactions. (Not a big deal here given the nice co-IP data.)

      We thought that it is enough to show one reporter, as the results with a different reporter (B-gal assay) led to the same conclusions. since this did not add information and made the paper too lengthy (and boring), we took them out. In any case all data was verified by co-IP.

      5) Line 414 - what are the 32P-radio labeled PCR fragments? Are these solely comprised of TG1-3 repeats of some length? A bit more detail in this aspect of the method could be helpful.

      We have added an explanation on the probe in the Methods section.

      6) Line 432-433 - which anti-HA or anti-My antibodies are these? (very minor detail)

      We have added the details.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary: This study by Magalhaes et al sheds light on the molecular underpinnings of the relative resistance of children to severe COVID-19. The authors found that priming of epithelial cells by resident immune cells to express tonic levels PRR receptors MDA-5 and RIG-I predisposes the epithelial cells for a faster and more robust onset of IFN-beta production upon SARS-CoV-2 infection. The study uses a combination of in vitro and ex vivo models, as well as mining of scRNA-Seq datasets from clinical specimens.

      Major comments: The claims and conclusions are supported by the data and therefore no new experiments are needed.

      Optional

      1. The use of primary cells (i.e. human airway epithelial cultures cross talking to immune cells) would make this study more compelling, although I assume that the major findings would be recapitulated in such models.
      2. It is not clear how the use of Yersinia enterocolitica to trigger activation of PBMC is relevant to this story. Using different (commensal) pathogens to achieve PBMC activation may yield different and more physiologically relevant results.
      3. The manuscripts would greatly benefit from improved structure and focus, particularly in the Abstract, Introduction and Results sections. The text is very dense, and makes it difficult for the reader to follow the flow and to distinguish important from less important information. Particularly, the introduction starts very broadly introducing COVID-19, which I think we are by now all familiar with. Directly starting with the burning question why kids get less sick with SARS-CoV-2 would capture the readers' attention better. Figure 1 a is beautiful for a review but much too dense to help the reader as a graphical abstract. In the results section, for each experiment, leading with clearly stating the rationale of the specific question, the gap in knowledge and why the gap is there, then followed by the results, then summarizing the impact of said results, would make this a much more enjoyable read and help the reader evaluate the novelty and impact better, particularly for Figures 1, 2, and 3 (but also all others). The interaction wheel graphs (Figure 4. are amazing, but are not properly explained in the text (do I read this right that in adults, all the crosstalk is basically performed by proliferating T-cells?). In all, these scientific writing issues sell an otherwise beautiful story short.

      Referees cross-commenting

      I agree with reviewers 1 and 2 that the use of primary cells would significantly elevate the story. However, I think this should be "optional", as I do not think it would change the findings.

      Significance

      General assessment:

      The main strength of the study are its topic and clearly relevant question: why do kids rarely get severe COVID-19? The main novelty is the answer to this question, that immune cell-epithelial crosstalk in children elevates the tonic expression of MDA5 and RIG-I via the IRF1 axis, leading to faster onset of IFN production and signaling upon SARS-CoV-2 challenge, which ultimately mounts an antiviral response detrimental to robust SARS-CoV-2 replication. The study uses an innovative combination of in vivo and ex vivo experiments and analysis of clinical specimens.

      The significant advance of this study to the field is clear to this reviewer, although it could be much better stated in the manuscript, as described at length above. The study is of great interest to the field of immunology and virology, and also has clinical and translational impact with respect to risk assessment for severe COVID-19 per age group, as well as epidemiological considerations for infection control.

    1. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Reviewer #3 (Public Review):

      Lee Berger and colleagues argue here that markings they have found in a dark isolated space in the Rising Star Cave system are likely over a quarter of a million years old and were made intentionally by Homo naledi, whose remains nearby they have previously reported. As in a European and much later case they reference ('Neanderthal engraved 'art' from the Pyrenees'), the entangled issues of demonstrable intentionality, persuasive age and likely authorship will generate much debate among the academic community of rock art specialists. The title of the paper and the reference to 'intentional designs', however, leave no room for doubt as to where the authors stand, despite avoidance of the word art, entering a very disputed terrain. Iain Davidson's (2020) 'Marks, pictures and art: their contributions to revolutions in communication', also referenced here, forms a useful and clearly articulated evolutionary framework for this debate. The key questions are: 'are the markings artefactual or natural?', 'how old are they?' and 'who made them?, questions often intertwined and here, as in the Pyrenees, completely inseparable. I do not think that these questions are definitively answered in this paper and I guess from the language used by the authors (may, might, seem etc) that they do not think so either.

      First, a few referencing issues: the key reference quoted for distinguishing natural from artefactual markings (Fernandez-Jalvo et al. 2014), whilst mentioned in the text, is not included in the references. In the acknowledgements, the claim that "permits to conduct research in the Rising Star Cave system are provided by the South African National Research Foundation" should perhaps refer rather to SAHRA? In the primary description of their own markings from Rising Star and their presumed significance, there are, oddly, several unacknowledged quotes from the abstract of one of the most significant European references (Rodriguez-Vidal et al. 2014). These need attention.

      Before considering the specific arguments of the authors to justify the claims of the title, we should recognise the shift in the academic climate of those concerned with 'ancient markings' that has taken place over the past two or three decades. Before those changes, most specialists would probably have expected all early intentional markings to have been made by Homo sapiens after the African diaspora as part of the explosion of innovative behaviours thought to characterise the 'origins of modern humans'. Now, claims for earlier manifestations of such innovations from a wider geographic range are more favourably received, albeit often fiercely challenged as the case for Pyrenean Neanderthal 'art' shows (White et al. 2020). This change in intellectual thinking does not, however, alter the strict requirements for a successful assertion of earlier intentionality by non-sapiens species. We should also note that stone, despite its ubiquity in early human evolutionary contexts, is a recalcitrant material not easily directly dated whether in the form of walling, artefact manufacture or potentially meaningful markings. The stakes are high but the demands are no less so.

      Why are the markings not natural? Berger and co-authors seem to find support for the artefactual nature of the markings in their location along a passage connecting chambers in the underground Rising Star Cave system. The presumption is that the hominins passed by the marked panel frequently. I recognise the thinking but the argument is weak. More confidently they note that "In previous work researchers have noted the limited depth of artificial lines, their manufacture from multiple parallel striations, and their association into clear arrangement or pattern as evidence of hominin manufacture (Fernandez-Jalvo et al. 2014)". The markings in the Rising Star Cave are said to be shallow, made by repeated grooving with a pointed stone tool that has left striations within the grooves and to form designs that are "geometric expressions" including crosshatching and cruciform shapes. "Composition and ordering" are said to be detectable in the set of grooved markings. Readers of this and their texts will no doubt have various opinions about these matters, mostly related to rather poorly defined or quantified terminology. I reserve judgement, but would draw little comfort from the similarities among equally unconvincing examples of early, especially very early, 'designs'. Two or even three half-convincing arguments do not add up to one convincing one.

      The authors draw our attention to one very interesting issue: given the extensive grooving into the dolomite bedrock by sharp stone objects, where are these objects? Only one potential 'lithic artefact' is reported, a "tool-shaped rock [that] does resemble tools from other contexts of more recent age in southern Africa, such as a silcrete tool with abstract ochre designs on it that was recovered from Blombos Cave (Henshilwood et al. 2018)", also figured by Berger and colleagues. A number of problems derive from this comparison. First, 'tool-shaped rock' is surely a meaningless term: in a modern toolshed 'tool-shaped' would surely need to be refined into 'saw-shaped', 'hammer-shaped' or 'chisel-shaped' to convey meaning? The authors here seem to mean that the Rising Star Cave object is shaped like the Blombos painted stone fragment. But the latter is a painted fragment, not a tool and so any formal similarity is surely superficial and offers no support to the 'tool-ness' of the Rising Star Cave object. Does this mean that Homo naledi took (several?) pointed stone tools down the dark passageways, used them extensively and, whether worn out or still usable, took them all out again when they left? Not impossible, of course. And the lighting?

      The authors rightly note that the circumstance of the markings "makes it challenging to assess whether the engravings are contemporary with the Homo naledi burial evidence from only a few metres away" and more pertinently, whether the hominins did the markings. Despite this honest admission, they are prepared to hypothesise that the hominin marked, without, it seems, any convincing evidence. If archaeologists took juxtaposition to demonstrate authorship, there would be any number of unlikely claims for the authorship of rock paintings or even stone tools. The idea that there were no entries into this Cave system between the Homo naledi individuals and the last two decades is an assertion, not an observation, and the relationship between hominins and designs no less so. In fact, the only 'evidence' for the age of the markings is given by the age of the Homo naledi remains, as no attempt at the, admittedly very difficult, perhaps impossible, task of geochronological assessment, has been made.

      The claims relating to artificiality, age and authorship made here seem entangled, premature and speculative. Whilst there is no evidence to refute them, there isn't convincing evidence to confirm them.

      References:

      • Davidson, I. 2020. Marks, pictures and art: their contribution to revolutions in communication. Journal of Archaeological Method and Theory 27: 3 745-770.

      • Henshilwood, C.S. et al. 2018. An abstract drawing from the 73,000-year-old levels at Blombos Cave, South Africa. Nature 562: 115-118.

      • Rodriguez-Vidal, J. et al. 2014. A rock engraving made by Neanderthals in Gibralter. Proceedings of the National Academy of Sciences.

      • White, Randall et al. 2020. Still no archaeological evidence that Neanderthals created Iberian cave art.

    2. Reviewer #4 (Public Review):

      This is potentially a landmark study with far-reaching consequences for archaeology, palaeoanthropology, and more widely. The antiquity of intentional human mark marking is a hot topic but this study – understood as initial – has as yet incomplete sources of evidence and methods; and it will be interesting to follow how the study develops in subsequent studies.

      Strengths and points to build on:

      * Heuristic potential: As knowledge advances it poses a risk to accepted knowledge – and we should accept that one such risk is moving on from long-held disciplinary tenets. In this case, there has been a growing quantum of evidence – all hotly debated – for the deep antiquity of mark-making and even symbolism by species other than ourselves. Most researchers now accept Neanderthal symbolic capacity actualised in burials, intentional mark-making and the like. The evidence here presented is not unequivocal but is very suggestive and an ideal test case for applying multi-disciplinary techniques of analysis and interpretation beyond the expertise of the listed authors *see comments in 'weaknesses'). This work by itself may be equivocal but when taken together with other such work, points to a 'human' sensu lato past that is as complex as it is long. This work then helps all researchers to at least be alive to the possibility of things like anthropic marks and residues in a context not normally thought to have it.

      * Decentering speciesism: As per the above comment, I appreciate empirical studies that erode speciesism – in particular studies that open up our minds to the possibility that multiple members of the Genus Homo were capable of intentional mark-making and even 'symbolic' behaviour, though this latter term is not well understood or uniformly used. This is probably because of continuous unconscious bias on our part as currently the only exemplar of our genus living - in contrast to most of the past in which different species and genera co-existed - if not on the same landscape and/or at exactly the same time, then with enough overlap that people would have realised 'others' were about either by sight and/or by encountering their physical remains and artefacts.

      * Problematising 'firsts' and deep time: A strength – but which needs to be developed in this manuscript – is our understanding of time and change. We have a plethora of dating techniques but relatively few substantive monographs, articles, and think tanks on time – and especially on how change comes about and what causes it. This leads us to privilege 'firsts' and the 'oldest' finds in 'deep' time above those that are more recent and in 'shallow' time. I would suggest in addition to the claims for the oldest of the reported marks, the authors develop nascent remarks on the possibility the suite of marks may have been made over time. This will help counter criticism that these marks – if established to be anthropic – were not just a singularity, but part of patterned behaviour, which would move it towards the realm of 'symbolic' cognitive behaviour. And indeed, it would be good to hear more about why in this place, these marks were made to establish a replicable model for identifying early anthropic marks.

      Ultimately, this manuscript presents evidence that those who are pro the deep antiquity of intentional mark-making by Homo (and possibly even other genera) will find enough evidence to support; while those sceptical of such claims will find enough methodological flaws and evidential limits to refute those claims. The next decade of work will likely be definitive and this article makes a key contribution to the debate.

      Weaknesses and points to attend to:

      * Definitions: The term 'rock engraving' is used rather uncritically and also the term 'etching' – and it would be useful to have a short definition of how the authors understand the term. Rock art scholars regularly debate these terms and whether they are or are not 'rock art' with its overwhelmingly visual bias; which this discovery may usefully help overthrow and advance.

      * Dating: There is no evidence provided for dating the marks found in the cave system. They could, for example, have been made more recently than the dates claimed – and by another species (if we accept their anthropogenic authorship). This is a perennial problem of much rock art research – especially when it comes to understanding the wider archaeological/palaeoanthropological context. More crucially, accurate dating allows a more reliable understanding of authorship and who/what was responsible for a particular artefact or feature. This has not been demonstrated in this case, though we do have fossil evidence of Homo naledi in the cave system. The article title is this incorrect / and unsupported claim as the marks, if they are anthropic, have not been dated and are of unknown age. The authors allow that there may have been multiple episodes, but not that the marks can belong to a time other than they posit – either earlier, later, or distributed over a long period as the authors allow for in their concluding remarks.

      * Authorship: The study does not utilise either a geoscientist as one of the authorial team, or a rock art specialist. These are key oversights as the former would help better contextualise the dating of the marks reported on, as well as explore alternative non-anthropogenic agents that may have created the marks reported on. For example, the marks and 'pitting' etc may be the result of water bringing abrasive agents during times of flooding, hitting prominent rock features in the cave system. Some explanation is given from lines 114-124, but are uncited. The overlying 'sediment' may be similar to the mondmilch found in cave systems and which is of natural origin. It may be that these non-anthropogenic causes are easy to discount; but the arguments do need to be made. Or, that the polishing was made by Homo naledi brushing against the surfaces as they moved in the cave system, independent of any mark-making. A Table showing the pros and cons of intentional anthropic versus natural authorship would be very effective - as well as showing some of the natural linear marks in the cave system to avoid any confirmation or similar bias. FTIR analysis of the panel A-C would be more than useful to determine whether an additional layer of material has been added. This is mentioned for future work, but this seems a rather post-hoc research programme.

      * Use-wear analysis: If the marks are anthropic in origin; they are likely to have been made by a stone tool, which would leave characteristic marks, directionality and sequencing, distinct from natural causes. It is vital this work – such as was done on the Blombos engraved ochre – is done here – for example, linking to the chert and other tools described on lines 152-158. Note Figure 19, of such a tool, is very hard to make out. The Blombos – and Klasies River Mouth engraved ochres (curiously not referenced) – have very similar geometric markings and there is a real opportunity to compare these in securely dated contexts of 70-120 kya –which could support the argument made here for Homo naledi's cognitive capacity. On figure 16 it would be good to know on what basis some marks were selected as anthropic – and why others were not; this would help demonstrate the methodology and ability to distinguish between the two kinds of marks.

      * Viewshed: The rock art specialist would have added essential expertise on how to study anthropic marks. For example, the images of the marks shown are all of individual or small collections of motifs rather than showing each panel as well as all panels together, to help understand the iconographic context as an ensemble – a 'feature' rather than isolated 'artefacts' or 'motifs'. Line 60 mentions being able to see these as a 'triptych' but the reader is not able to have this view in this manuscript. From the cave map, it is not clear whether all three 'panels' (an unfortunate art historical term that suggests a framed entity - better to use a term like 'cluster') can be viewed simultaneously or in sequence. The view shed in relation to the area where the bodies were recovered is vaguely stated as 'only a few metres away' and is worth developing. I understand 3D scans have been made so it would be useful to have a version showing the marks in relation to where the bodies were recovered and as a 3-cluster ensemble.

      * Image enhancements: Also, in addition to polarised images, have colour enhancement tools like DStretch been tried to see if, for example, attempts at colouring with different coloured sands were made? Similarly, a 3D scan of the motif and panel – (Metashape is mentioned but not shown) – might assist in understanding how the marks and the rock they are on might relate to each other- as research in European upper Palaeolithic contexts has shown. Here, experimenting with different kinds of lighting - or in the absence of lighting, of tactility and how these marks and their rock support may have been experienced by those who may have made and interacted with them? As a note, it would be useful to have a scale in each image of the 'engravings' and it is a pity the one in situ photograph with the scale is not a standard rock art colour-corrected scale as is commonly used in rock art research.

    3. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-Point Response (author’s replies in plain text)


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Silao et al make the intriguing observation that yeasts that are generally considered less pathogenic are unable to catabolize proline than Candida albicans. They then, in Candida albicans, construct mutants defective for the two key enzymes (Put1, Put2) required to convert proline to glutamate, which they show to be essential for proline utilization as an energy (carbon) and nitrogen source. The authors proceed to untangle the regulatory aspects of proline degradation, including the respective cellular localization of its key enzymes. They then make the important discovery that strains lacking either Put1 or Put2 suffer from a proline-dependent growth defect, which they attribute to resulting defects in mitochondrial metabolism.

      The manuscript then goes on to analyze a broad range of infection models including: reconstituted human epithelial skin model, Drosophila, mouse systemic infections, organ colonization in these mice (kidney, spleen, brain, liver and histochemistry of the kidneys) as well as survival when incubated with cultured human neutrophils. Finally, they use yeast cells constitutively expressing yEmRFP (so that yeasts can be distinguished from other host cells) and coated with FITC before incubation with the host cells (which coats the wall of the original cells, but does not spread to progeny) and they go on to perform an impressive set of analyses of C. albicans growth within mouse kidneys both in vivo and ex vivo, exploiting an implanted window together with intravital imaging with a two photon microscope at different time points. The system is impressive and visualizes tissue invasion by hyphal cells beautifully. Finally, they compare the intra vital images from WT and put2-/- cells and show that, as in vitro, put2-/- cells do not form filaments and do not show extensive invasion of the kidney tissue. While the in vivo aspect of the study includes many different models, it finds defects in virulence for different subsets of put mutants and the relative importance of filamentation vs proline utilization for virulence is not conclusively resolved.

      Overall, this is an important and timely manuscript, which significantly contributes to the understanding of how proline metabolism intersects with yeast fitness in the context of infections. However, there are several major concerns regarding some of the conclusions drawn from the study. In addition, some general recommendations that would improve the manuscript are provided.

      Specifically, the manuscript provides a very detailed description of experiments and observations. However, in several parts it is difficult to follow and the reader needs more guidance about the logic involved in reaching conclusion. Specifically, several aspects of the paper are written for experts in Candida (yeast) metabolism. Here, explaining the rationale for some of the experiments, and providing more background information that is not obvious to a non-expert, is required.

      In particular, writing a clear and measured summary sentence at the end of each paragraph and a conclusion paragraph that summarizes key findings in simple terms would help make the manuscript more digestible for readers.

      In addition, the impressive microscopy and broad range of in vivo experiments is comprehensive but only adds incremental information relevant to proline metabolism-that filamentous growth in vivo and virulence is reduced in cells carrying some mutations in one or more put genes. However, this broad sweep of model systems and the development of the in vivo imagining system might have more impact in a separate paper focused on the real-time in vivo visualization of kidney invasion.

      We thank Reviewer 1 for the extensive list of comments and have endeavored to adjust the manuscript to address all of the major and minor concerns. It is evident that Reviewer 1 clearly understood the significance of the work and we appreciate that the comments are presented in a positive manner intended to improve our manuscript.

      Major comments:

      1. The main finding that impressed this reviewer is that "removing the ability to catabolize proline, in an organism that evolved to catabolize it, leads to (growth) defects". This point could be better highlighted throughout the manuscript.

      Thanks for the comment. We will adjust the text to reflect this suggestion.

      1. The authors show that deletion strains for proline metabolism have defects that are important for in vivo pathogenicity. This is an important finding. However, as the manuscript reads now, it suggests that the main findings are that the ability to use proline in the respective host niche is key. Mechanistically, the manuscript revolves primarily around defects that arise when deleting PUT1 and/or PUT2 (i.e., an "unknown" toxicity of proline in the case of put1-/- (or put1-/- put2-/-) and the additional P5C-dependent toxicity for put2-/- mutants; see below).

      Yes, the reviewer is correct in that we believe that proline catabolism is necessary to initiate and power hyphal growth, which is coupled to virulence. We have previously shown that upon phagocytosis by macrophages, the expression of Put1, Put2 and even Gdh2 are induced in phagocytized C. albicans cells, which is consistent with the analysis shown in Fig. 2D and Fig. S2B. Consequently, proline, or an amino acid that is metabolized via the proline catabolic pathway, must be present in the phagosomal compartment. However, as we now report, proline inhibits growth of cells lacking the capacity to catabolize it. Although we cannot differentiate the cause of reduced virulence in put mutants, i.e., the lack of energy due to the inability to catabolize proline vs proline toxicity, proline catabolism is clearly important and a robust indicator of virulence. As point 1, we have adjusted the text to make this clearer.

      1. In order to claim that catabolizing prolines promotes pathogenicity (as opposed to the alternative hypothesis that the inability to catabolize proline leads to the observed defects), additional experiments would be required. For example, the put mutants would need to be compared with mutants that significantly reduce/impair proline uptake, such as the referenced gnp2 mutant (Garbe et al 2022). While the finding that less pathogenic yeast species are unable to catabolize proline is both intriguing and important, it also remains as is presented as a loose, non-quantitative correlation that only tangentially address the question of whether "proline catabolism is key for pathogenicity".

      We have in fact already shown that proline uptake is required to induce filamentation (Martínez and Ljungdahl 2003, Fig. 6). The main point of our current work, which we believe is important and of general interest, is that C. albicans is adapted to use proline as sole energy source, which reflects the environment (humans) in which it evolved. See the response to point 2. Interestingly, the differences in the expression levels of Put1 (off in the absence of proline, induced robustly by proline) and Put2 (low level of constitutive expression, induced robustly by proline) suggest that cells are primed to decrease the likelihood of becoming inhibited by P5C, i.e., the constitutive expression of Put2 is able to ameliorate the potential toxicity of P5C. Regardless, the finding that put1 and put2 mutants exhibit significantly reduced virulence in two host models provides clear support for proline catabolism being key for C. albicans pathogenicity.

      1. 238 onwards: The conclusion that "the primary growth inhibitory effect of proline is linked to catabolic intermediates formed by Put1 and that are metabolized further by Put2"does not appear to be fully supported by the evidence. Addition of proline to put1 mutants already reduced OD600 by ~50% (Figure 2); and is further reduced to ~10% when put2 is deleted. This implies that there are two inhibitory effects of proline, not one primary one. At the least, this option should be discussed, including why deletion of PUT1 leads to proline toxicity. The latter is not clear-is it that too much proline accumulates in the cell and this accumulation is toxic? If this is the case, the effect would be expected to be proline concentration dependent. Performing a relatively simple experiment as performed for the put2 mutant (Fig. 3 / S3F) may clarify this issue. Particularly, if the experiment would be coupled with intracellular quantification of proline.

      Precisely! Proline toxicity is evident even in put1 mutants, clearly suggesting that proline, without being further catabolized, exerts a growth inhibitory effect (Fig. 3A). We traced this inhibitory effect to decreased mitochondrial respiration (Fig. 3E). There are two parameters to consider regarding the inhibitory effects of proline in put2 mutants. First, the presence of proline induces the expression of Put1 independent of Put2 (Fig. S2C), consequently, the levels of the toxic intermediate P5C increases (Fig. 3B). P5C has previously been postulated to inhibit mitochondrial respiration, which is well-aligned with our analysis (Fig. 3E; see response Point 5). We initially tested whether a proline-P5C cycle, suggested by work in mammalian cells, would play a role in proline-mediated toxicity; however, increasing cytoplasmic pools of proline by supplying high levels of glutamate (which according to work in mammalian cells should efficiently convert to cytoplasmic proline) did not occur; we did not see glutamate-enhanced Put1 expression (Fig. 2D, S2A, S2B). We agree with the reviewer with respect to the suggested experiment, and have monitored growth of put1 in media with different proline concentrations. The results are incorporated in the revised Fig. 3.

      1. The caption "P5C mediates a respiratory block" is misleading, as the evidence is not that compelling: Although P5C increases in put2, but not in put1 mutants, and given that both single mutants experience a proline-dependent respiratory defect (Fig. 3E), the results suggest a more complex relationship.

      Previous work using pure P5C (Ref. 36; Nishimura et al) showed that it targets respiration, hence the caption “respiratory block” in the header. In mammals, PRODH (Put1) physically interacts with mitochondrial respiratory complex II in the inner mitochondrial membrane (line 89-90), while P5CDH (Put2) is in the matrix. The put1 mutation might affect basal activity of the respiratory chain resulting in lowered respiration, which may compound when proline accumulates in the mitochondria. The inhibitory mechanism remains unknown, and in going forward we have begun characterizing various GFP-tagged respiratory complex components in put1 mutants and in strains co-expressing Put1-RFP (for interaction studies). The results are out of the scope of this current work.

      1. The virulence assays and in vivo experiments do not present a unifying view: in Drosophila put2∆∆ is less virulent than put1∆∆, which appears similar to put3∆∆. Given that put2 mutants grow slowly, likely because of P5C inhibition, this seems logical. However, in mice, put3∆∆ remains highly virulent while put1∆∆ and put2∆∆ results for survival are mixed. Furthermore, in 4 mouse organs, put1∆∆ and put2∆∆ are not significantly different from one another but are different from wt, while put3∆∆ has no significant reduction in CFU. Kidney histology shows very little invasion by put1 and put2 and more by put3, but visually put3 appears to invade much less than the WT, and the human neutrophil experiment shows effects of put2 or put3 but not put1. This leaves the reader rather confused. It may be worth discussing the reasons for different results in different models. Is the availability of proline in each of the organisms and organs similar?

      We thank the reviewer for these thoughtful observations, however, we note that all of the diverse assay systems employed provide a clear and consistent indication that the inability to completely catabolize proline significantly reduces virulence. This is well-aligned with our previous data regarding the need for proline catabolism to escape macrophages (Silao et al, 2019). The requirement for Put3 may not be very strict since the Put enzymes are still expressed in the absence of Put3 (Fig. 2D/S2A/S2B), indicating the activity of additional regulatory factors; hence, this may explain why the put3 strain behaves like wildtype in the murine model (Fig. 5B). The dispensability of Put3 in the murine model could be due to a lower neutrophil count and that murine neutrophils exhibit a lower affinity for fungal cells as compared to human blood (Machata et al., 2020, Front Immunol). The more pronounced requirement of Put3 to survive in whole human blood and when co-cultured with human neutrophils could indeed be linked to the need to rapidly derepress PUT1/PUT2 (and even other target genes) as suggested by the global RNASeq analysis that shows that proline catabolism is a core response of C. albicans during neutrophil interaction (Niemiec MJ et al., 2017, BMC Genomics). In Drosophila, a well-established model to study innate immunity, the presence of hemocytes that fulfill the equivalent functions of neutrophils and macrophages could explain the increased requirement for Put3. In summary, although it is impossible to know the precise mechanistic basis underlying the observed differences, we believe it unreasonable to expect that all mutations behave identically in each virulence model. In fact, differences considered trivial such as the use of mouse background can have profound effects on virulence. Presumably the differences we report are due to the specific nutrient composition (proline and metabolites feeding into the proline catabolic network) and physical parameters intrinsic to each model. For instance, Lionakis et al. (2013) suggested that filamentation occurs faster in the kidney compared to other organs, such as the liver/spleen, indicating the presence of kidney-specific cues that drive infections of this organ.

      1. The ex vivo and in vivo analysis of the dynamics of C. albicans growth in the host is visually impressive, but it distracts from the focus of the paper and the metabolic findings. Showing that put mutant cells do not form filaments in vivo (as in vitro) does not add much conceptually to the paper. Furthermore, this lovely advance in in vivo visualization is lost at the end of this paper and the authors should consider whether it might fit better in manuscript that could really highlight the in vivo visualization approach.

      We appreciate this comment. Indeed, our lab is at an advanced stage of completing a manuscript focused on the use of intravital and clearing microscopy to follow the onset of an upper urinary tract infection (UTI) in a murine candidemia model. However, our ability to visualize in 3D the onset of an infection in a living host is not a trivial achievement and we were impressed that it provided a clear answer as to whether a single C. albicans cell can initiate an infection and undergo morphogenesis leading to hyphal growth. Furthermore, we tested a put2 strain, the growth of which is highly sensitive to the presence of proline, and found that it did not exhibit filamentous growth. This clearly shows that cells colonizing the kidney are exposed to an environment that requires a functional proline catabolic network to exhibit filamentous growth, a characteristic of renal infections. Our results are consistent with the kidney being a metabolic hub for arginine/proline biosynthesis, which likely increases the levels of these amino acids in this organ.

      1. The discussion of cells stained with FITC and expressing yEmRFP does not clearly point out that the FITC is only an indicator for those cells that were used to innoculate the tissue and that finding cells without FITC indicates that they are mitotic progeny, indicating that they have been dividing. The authors clearly understand this, but a naive reader may miss this important point if it is not stated explicitly.

      We have adjusted the text to explicitly clarify this.

      Minor comments:

      1. Throughout: what is the distinction between utilization of proline for C or for energy? These terms seem to be used interchangeably.

      C. albicans is heterotroph that can use proline to generate biomass (gluconeogenesis, etc) and its catabolism generates sufficient amounts of ATP to power growth. Thus, when proline is used as sole carbon source, it can also serves as the sole energy source. In the text, we have tried to be consistent using “carbon source” when discussing proline as a component of growth media, and “energy source” when discussing proline catabolism.

      1. Introducing the schematic in Fig. 2A at the beginning of Figure 1, would help explain proline catabolism before delving into the growth experiments that rely upon this framework. This should include an explanation, for readers less familiar with the metabolic issues, of the main limitations to catabolizing proline, and the key issues for being able to use proline for nitrogen, carbon, and energy (potentially indicated in the overview figure, e.g. pointing towards gluconeogenesis etc.).

      We have considered the reviewers suggestion, however, we believe that the placement of the schematic in Fig 2 is appropriate as is, and where it will hopefully enable readers to more readily grasp the strain construction and experiments documented in Fig.2.

      1. Saccharomyces can only grow on proline as a nitrogen source, but not as energy/carbon source. Could the authors briefly mention or discuss why this is the case? This is not clearly apparent after reading the manuscript and it leaves the reader confused and trying to understand if the fact that proline is required for carbon utilization is a new finding of this paper or was already known. Do the authors think this is tied to the presence of complex 1 components in C. albicans that are not found in S. cerevisiae. Is this consistent for the pathogenic, but not the non-pathogenic yeasts analyzed in figure 1?

      We have adjusted the text to clarify our thoughts regarding this. Indeed, we do believe that a major reason for the ability of C. albicans to efficiently grow using proline as a sole energy source is the presence of Complex I. However, C. glabrata appears to be able to grow well using proline as sole energy source despite apparently lacking Complex I. Consequently, alternative NADH dehydrogenases exist in C. glabrata, but how this is coupled to energy metabolism will require additional work that is out of the scope of the present work.

      1. 100: While Gdh2 is apparently an important enzyme for generating ammonium, why is it not necessary for macrophage escape and virulence as shown in reference 18? A recent paper from Garbe et al (ref 12) suggests that Gnp2 is the major proline permease in C. albicans and what is known, and not known, about proline uptake would be good to mention, given that PUT gene functions require that proline enters the cells.

      We have recently shown that ammonia generation by Gdh2 is dispensable for macrophage escape and documented that phagosome alkalinization is not a requisite for the induction of hyphal growth (Silao et al. 2020). We have referred to the work of Garbe et al., which is consistent with our previous work (Martinéz and Ljungdahl, 2004) where we reported that proline-dependent filamentation is dependent on Csh3. Csh3 is an ER membrane-localized chaperone responsible for catalyzing the proper folding of amino acid permeases, in csh3 null mutant strains, amino acid permeases accumulate in the ER as non-functional unfolded aggregates. Consistently, we have tested and found that proline-induced Put2-GFP expression is dependent on Csh3 (unpublished), clearly establishing that the regulatory effects of proline are dependent on its uptake. We have not generated a gnp2-/- strain, but suspect that we could find growth conditions where such a mutant would be refractory to proline induction. We have adjusted the text to include this information.

      1. 116: Is the "low sugar environment of the host" referring to a specific niche, such as the GI tract, or human blood? Compared to most natural environments, glucose is abundant in the host, e.g., at ~5 mM, it is the most abundant metabolite in blood, and similarly, in the GI tract, levels can go beyond 50 mM glucose (see e.g. PMIDs 34371983, 21359215). Or is this comment indicating that the in vivo sugar concentration is lower than that in common lab growth media? Please spell out the niche/concentration for clarification - and compare that to other niches that are considered "high sugar environments".

      We have adjusted the text to clarify our statement. The natural environment of C. albicans is the human host. Virulent infections are not within the GI with high sugar content, but rather result when C. albicans cells successfully cross into the blood with a relatively low glucose (5 mM), which importantly is a level that does not effectively repress mitochondrial function. A major point of our recent work is that laboratory experiments with C. albicans growing on YPD or SD with 2% glucose (111 mM) examine growth of cells with repressed mitochondrial functions.

      1. 123: "proline as sole energy source" - suggest "is the source of carbon, nitrogen, and energy"

      The text is adjusted (see response to Minor Point 1).

      1. 142: it is worth noting to readers that C. neoformans is a basidiomycete and thus VERY distant from the other yeasts studied here-it is in a different major phylum of fungi.

      Again, thanks for this suggestion, the text is adjusted. We included C. neoformans since the role of proline catabolism has been characterized and linked to its pathogenicity (reviewed in Christgen and Becker, 2018, Antioxi Redox Signal, Ref. 1).

      1. 143: Here it is implied that put1 and put2 mutant strains do not grow on SPD, but this is not stated explicitly.

      The put1 and put2 mutants are unable to grow in/on all media containing proline as sole nitrogen source. The phenotype is very tight that we were able to exploit this as a selection phenotype for reconstitution (Fig. 1A). We have adjusted the text to make this clear.

      1. 151: The abbreviation SPG is not explained in main text. This was explained in the methods (1% glycerol as primary carbon source).

      As suggested, we have defined SPG in the main text.

      1. Paragraph 156 onwards: this section is particularly hard to read and very dense. Also, it is difficult to understand the significance of these experiments for the overall findings of the paper. Please at least provide a small conclusion / summary at the end of the paragraph that puts the findings into perspective.

      We have adjusted text to make it more accessible.

      1. Figure 2 C: simplifying the scheme (e.g. lots of redundant information, P2 and Mito - just give it one name) would help. This figure may be better in the supplementary material.

      The schematic of our subcellular fractionation study uses standard designations routinely used by the cell biology community. We believe that its inclusion will help readers judge the how we mapped the intracellular localization of the reporter proteins, which is essential to understand the proline catabolic network.

      1. Figure 2B: It is not directly apparent from the micrographs that Put1-RFP localisation is mitochondrial. Co-localisation of the RFP with a mitochondrial dye (e.g., mitotracker) or something similar is required to validate it.

      We have previously reported that Put2 is a bona fide mitochondrial protein (by confocal microscopy, subcellular fraction, and co-localization with Mitotracker (Far Red) (Silao et al., Ref 17). The fact that the Put1-RFP associated fluorescence exhibits a distinct mitochondrial signature, is spatially exclusive and exhibits no overlap with the cytosolic pattern of Gdh2-GFP, co-fractionates with Put2-HA and the mitochondrial marker Atp1, should suffice to confirm that Put1-RFP is a mitochondrial localized protein.

      1. Throughout the manuscript (figure legends): Suggest using "mean" instead of "Ave."

      We have adjusted the legends.

      1. 175: According to the 'Yeasttract' and 'Pathoyeasttract' databases, Put1 regulates at least 36 and 22 genes, in S. cerev. and C. alb., respectively (based on DNA binding and/or regulatory changes). The only gene in common between these two lists of genes is PUT1. Thus, it is quite likely that Put3 regulates many other processes that explain its function and that its major function may not be only to regulate Put1.

      We assume that the reviewer is referring to Put3 (instead of Put1). Yes, Tebung et al. (2017) suggested that Put3 also regulates other genes. However, their data show that C. albicans put3 mutant was unable to grow in medium (YCB+Pro) compared to SPD (2% glucose as carbon source) where proline is used merely as a nitrogen source (Tebung et al., Fig. 3A). Our data in Fig. 1C shows that a put3 null strain exhibits residual growth on SPD, which aligns well with the expressed levels of PUT enzymes (Fig. 2D). Our conclusion is that despite being essential for rapid proline-dependent derepression of proline catabolic genes, Put3 is not the only transcription factor operating at the promoters of the PUT genes.

      1. 175: Is it clear whether the Put3-independent mechanisms are positive or negative with respect to Put1?

      We have accumulated evidence that an additional transcription factor positively regulates PUT1 expression and have a manuscript in preparation to describe this factors. The manuscript will focus on the Put3-independent regulation of PUT1, PUT2, and GDH2 expression.

      1. 218: Suggestion: "growth was indistinguishable".Unless growth curves or growth rates are provided and if one time-point data are the basis for this point, than "rates" is not a relevant term.

      The reviewer is correct; we will adjust the text accordingly. We have performed growth assays in a multi-well microplate format (Bioscreen) and found that the growth rates are not statistically different between WT, put1, put2, and put1 put2 strains in the presence and absence of proline in SD with 2% glucose. This is consistent with glucose repression of mitochondrial function, i.e., proline toxicity depends on derepression of mitochondrial function.

      1. 256 onwards: did the authors test if the ROS scavenging effectively reduced ROS? i.e. does the luminol-HRP assay yield less ROS in +proline +scavenger treatment? This is necessary to effectively conclude that the growth inhibitory effect of proline is due to blocking respiration.

      Indeed, we used NAC as a control in the luminol-HRP system and we saw reduction in ROS formation. In fact, this is the underlying reason why we used high levels of NAC for growth rescue (in Fig. 3D). We include the control data as Fig S3F.

      1. The Figure captions are extremely lengthy and detailed, making it cumbersome to find the relevant information. Suggest moving some of the information, such as additional experimental details, into the methods section.

      We have streamlined the figure legends.

      1. 277-301: Phloxine is not exclusively a live/dead cell indicator-it is an indicator of metabolic activity. In Scerev. and Calb. it also indicates slower growth, opaque growth, and it has been used as an indicator of aneuploidy in C. glabrata (https://journals.asm.org/doi/10.1128/msphere.00260-22) and of diploids vs haploids in S. pombe. The colonies illustrated aer made up of many live cells, and thus the section "Defective proline utilization is linked to cell death" needs to be presented more carefully. In addition, it appears that this section shifts from using defined medium to using rich medium and 37C instead of 30C. Why was this shift necessary?

      The reviewer is correct that phloxine (PXB) has been used to identify opaque growth (EFG1-dependent). However, the fact that the accumulation of PXB in the put mutants is evident in both SC5314 and cph1 efg1 backgrounds (Fig. 3G and Fig. S4C) suggests that we are not assaying opaque switching. We mention that we have observed an increase in the number of PI+ cells in put mutants under similar conditions, but as we pointed out, we were unable to reliably quantitate this by FACS due to the clumping of put mutants. Zheng et al 2022, the paper cited by the reviewer, used PXB to assess the ploidy of C. glabrata strains, but their assay was developed using 5 μg/ml PXB, half of the concentration we used. The homogenous accumulation of PXB as the macrocolonies grow (Fig. 3G), suggests that the accumulation is not a consequence of spontaneously occurring ploidy variations. Thus, we believe that the accumulation of PXB does indeed reflect enhanced cell death. The point here is to trace the consequences of proline toxicity and to test the dependency on mitochondrial function. We used complex media, which contains multiple nitrogen sources (amino acids, peptides), to specifically highlight the contribution of proline catabolism in the fitness of C. albicans. The put1, put2 and put1 put2 mutants grow normally on YPD+PXB (30 oC) without accumulating the dye; we only observed visible PXB uptake in put2 after 2-3 days in mature macrocolonies. We attribute the gradual increase in PXB accumulation to be a consequence of glucose becoming limiting, derepressing mitochondrial functions, a requisite for proline toxicity. Consistently, the accumulation is more evident in cells grown on non-fermentable C-sources (Fig. 3G and Fig S4C).

      1. 295-301: Related to the point above, these results are hard to interpret due to the switch from defined medium in all prior experiments to rich growth medium here. Also, it is not clear why a 48h old YPD culture was chosen to show that the degree of PI staining correlates with mitochondrial activity - is this due to the culture age? It would be more clear to image cells grown on glucose vs. glycerol/lactate, or under repressive / de-repressive glucose concentrations (e.g., as shown in Fig. S4C where a PI+ difference is apparent for 0.2% glucose vs. 2% glucose at 30 oC).

      See response to Point 19 for our rationale to switch to rich medium. We have adjusted the text to enhance its readability. In liquid YPD, all strains grow, however, we noticed that the put mutants tend to flocculate (sign of stress in yeast) when cells enter stationary phase, giving rise to erratic OD readings, particularly evident in the put1 mutant. At 48h, the cultures become dense and cells experience glucose limitation, derepress mitochondrial functions and exhibit maximal flocculation (Fig. S4D). In put mutants, the derepression of mitochondrial function results in proline sensitivity. We tested the notion that this would also increase cell death, which it does, see Fig. S4E.

      1. 313-14: The statement 'the invasion process was dependent on the ability of cells to catabolize proline' doesn't take into account that put mutant cells are defective in filamentous growth irrespective of their utilization of proline...and like the efg1 cph1 double mutant.

      Proline-induced filamentous growth is dependent on the catabolism of proline, which activates Efg1 and consequently the hyphal growth program. In Fig. 4A we show that put mutants grown on Spider media, initiate filamentation (as evidence by wrinkled colonies) but do not grow invasively (no halo). In Fig. 4B we developed and used a novel invasion assay to assess growth through a collagen plug. Similar to the control cph1 efg1 mutant, the put mutants exhibit drastically reduced capacity to penetrate through the plug, and reach the D10 media in the transwell (D10 = DMEM with 10% FBS). However, it is important to note that although these results are linked to two distinct processes - the filamentation defect of cph1 efg1 is due to the inability respond to multiple filamentation cues (e.g., CO2, 10% FBS, etc.), whereas the filamentation defect of the put mutants is linked to the inability to catabolize proline and to its toxicity. Clearly, the WT strain relies on proline catabolism, coming from one or three possible sources of proline (see response to Reviewer 3): 1) DMEM/F-12 medium used in the PureCol EZ Gel; 2) diffusion of nutrients up through the collagen from the recovery medium DMEM supplemented with 10% FBS; and 3) the proteolytic breakdown of collagen. Also, in contrast to the put mutants, WT cells are refractory to inhibition by proline.

      1. 316-327: The results of the experiment described can only be interpreted as an effect of proline catabolism if the three strains (efg1 cph1; put1; put2) have similar growth rates as yeast cells in vitro. Why weren't the cells competed directly (efg1 cph1 vs put cells)?

      We believe that the relevant comparisons are to WT. We recovered cells from the top of the collagen (see Fig. 4B inset) to monitor their ability to survive and grow on top of the collagen. We found that the ability to catabolize proline enables WT and cph1 efg1 cells to grow equally well (recovered similar ratio as starting input). This was not the case with the put mutants, they did not grow as well and almost 100% of the cells recovered were WT.23.

      Fig 6: The logical order of the experiments, and in the text, is: 1) 4 h window, 2) 26 h window and then 3) ex vivo. The cartoon in 6B should be in this order as well.

      Thanks for bringing this issue up. We have adjusted the figure and text placing the schematic time-lines in proper order.

      1. 337: it is not clear what the 'direct exposure...' is trying to tell us. Can this be made more explicit?

      The direct exposure means that the fungal cells are in contact with the culture media at the edges/border of the 3D skin model (see schematic diagram). Hence, fungal cells are in direct contact with 10% FBS, facilitating the observed filamentous growth. The inability of the put mutants to invade the skin model should be evaluated at the center of the artificial epithelium where there is likely a local increased concentration of proline stemming from the proteolytic activities associated with fibroblasts and keratinocytes.

      1. 340-346: Here proteins with high proline content were used to ask if they could be induce transcription of PUT1 or PUT2 RNA and protein. This experiment is designed only to test the role of these proteins to induce utilization of nitrogen, as glucose is included in the medium. Given that these proline-rich proteins need to be lysed by proteases before they can be imported, and since no import pathways were tested, the results appear to tell us that mucin is more readily digested to peptides that contain proline-but why that is the case is not clear and how it relates to proline utilization is also not clear.

      We thank the reviewer for raising this important point. First, we monitored protein not mRNA levels. We will adjust the text to provide better context for this experiment. Briefly, these experiments were initiated as we were perplexed as to why the wildtype cells took such a long time (14 days) to fully invade the collagen matrix (Fig. 4B); we naïvely assumed that fungal cells would secrete proteases to degrade the collagen and assimilate the liberated proline. In going forward, our experimental strategy was to incubate various proteins with a dense culture of cells in HBSS medium (pH 7.4) supplemented with low glucose (3.8 mM) and lactate (0.83 mM). This condition mimics interstitial fluid, where most broad range proteolytic enzymes are inactive or at least operating suboptimal. The results were clear; with the exception of mucin, the proteins did not stimulate Put1 or Put2 expression. We conclude that host-dependent processes play an important role on the release of the amino acids/peptides from these high-proline content proteins (see line 531-553 for discussion). The capacity of mucin to efficiently induce Put1 expression is interesting since mucin is abundant in the gut where systemic infections are thought to originate. It is important to be cautious here, we used a commercial mucin preparation (Sigma, 2 batches) that may contain degradation products, e.g., proline-rich peptides, that can easily be assimilated by C. albicans. Put1 expression is an excellent readout for proline uptake since its expression responds tightly to the presence of proline derived from exogenous supply or from intracellular conversion (Fig. 2D, S2A, S2B).

      1. 363-369 An alternative is that Put3 induces different proteins important for growth.

      We included this possibility in the revised text.

      1. 379-380-the conclusion for this paragraph is somewhat of an overstatement as there is no analysis of the degree to which proline utilization is a predictor of virulence. It simply shows that put mutants affect the ability to survive in neutrophils.

      We have adjusted the text.

      1. Discussion: The statement that "S. cerevisiae" evolved in high sugar environments is debatable. The natural niche could well be forest soil and tree bark, or insect/wasp guts with arguably little glucose around.

      The reviewer is correct, S. cerevisiae can be isolated from diverse environments with variable sugar contents, but it is the capacity to deal with high sugar environments that makes this yeast stand out in comparison to Candida spp. The unique attribute of S. cerevisiae have been exploited and truly benefited humankind in making alcohol and bread. We have amended the text to state this more accurately.

      1. 469-470-how strong is the 'correlation' between the ability to utilize proline and virulence? Given that different mutants had different effects in different models, this seems like a very loose 'correlation'; it would be good to have some quantitative measures to make this claim.

      We have used directed genetic approaches to determine whether a gene/protein is essential for virulence by testing them in currently available infection models. It is important to note that all virulence assays provided a consistent and clear read-out, namely that the inability to catabolize proline significantly reduced the expression of virulence characteristics. Presumably the differences we report are due to the specific nutrient composition (proline and metabolites feeding into the proline catabolic network) and physical parameters intrinsic to each model. In fact, the expression of virulence factors (i.e., hyphal growth) can significantly differ in different organs within a same mouse model (Lionakis et al., 2013) and that virulence outcomes can change depending on mouse background. We fail to see how this can be viewed as loose. This has not been shown before. Please refer to our response to major point 6.

      1. 500: Was the experiment was done in larvae, and not in adult Drosophila? Fig 5 legend says flies and shows a picture of a fly and larvae are only mentioned much later in the text.

      These experiments were performed using adult flies. We now include a reference regarding the levels of arginine in hemolymph in both larvae and adult Drosophila (Priyankage et al., 2012; Anal Chem).

      1. 512:Why is it presumed that proline accumulates in the mitochondria in put1 mutants? How strong is the presumption?

      Despite a great deal of efforts in many labs, the mechanism of proline transport across the mitochondrial membrane is not known. What has been shown in mammalian and plant systems is that proline can readily enter and accumulate in mitochondria where it is catabolized. (https://link.springer.com/article/10.1007/s00425-005-0166-z; https://www.sciencedirect.com/science/article/pii/0003986177902089). Our presumption that proline accumulates in the mitochondria is based on our finding that proline inhibits mitochondrial respiration when Put1, catalyzing the first oxidation reaction, is absent.

      1. 539: why are MMPs important for digestion of collagen? This is not clear at this point of the Discussion.

      In mammalians cells, some secreted MMPs have collagenase activity (e.g., MMP-1) that degrade proteins comprising the extracellular matrix, which releases proline. We emphasize this since the 3D skin model is comprised of dermal fibroblasts and keratinocytes that are known to secrete MMPs (Ref. 69).

      1. 574: Concluding sentence of this paragraph seems unsubstantiated. There are at least two defects in put2 strains-hyphal growth and growth in general, presumably because of P5C accumulation.

      See response to point 21. Proline-induced filamentous growth is dependent on its catabolism, which activates Efg1 and consequently the hyphal growth program. However, there are many potential cues in hosts that could induce hyphal growth in situ. Our finding that strains unable to catabolize proline do not filament, indicates that proline is a key modulator of virulence.

      1. Fewer abbreviations would make the manuscript easier for non-experts to read. For example, P5C is not defined in the abstract. Furthermore, if an abbreviation is not used more than 3 times, it is not necessary to provide it (e.g., mammalian proteins in the last paragraph).

      We have adjusted the text.

      typos:

      1. 82: should read 'is restricted to the mitoch...'

      2. 102-103: should read 'to evade macrophages'

      3. Fig. S4F is mislabelled as Fig. S4G.

      Thanks!

      **Referees cross-commenting**

      Overall, we stand by our initial assessment of the study. However, we were not aware of previous studies that investigated proline utilization in yeasts, as noted by Rev # 2 (https://onlinelibrary.wiley.com/doi/epdf/10.1002/yea.1845). The current study suggests that using proline as an energy/carbon source is more wide-spread, beyond pathogenic yeasts. Further, the C. albicans strain they used for this study (ATCC 10231) was apparently unable to grow on proline in the quoted paper. In light of this, we think the authors should reference this study, tone down the claims about the clear correlation of pathogenicity and proline utilization, and address this apparent discrepancy with the indicated Candida albicans isolate. We note that our review considered this a paper mostly of interest to specialists.

      Although other non-pathogenic fungi have been shown to use proline as pointed out by Reviewer 2, this metabolic attribute has not been previously tested in members of the pathogenic Candida spp. complex. We have included the reference and included a statement that many fungi, isolated from diverse environmental niches, can use proline as a carbon source.

      Reviewer #1 (Significance (Required)):

      1. The advance in this paper is conceptual for the proline utilization connection to virulence in a range of species and technical for the in vivo microscopy. Limitations are that the conceptual advance is based only on qualitative work in figure 1 and that the animal studies do not provide a conceptual advance, although the technical advance of in vivo visualization of kidney tissue is impressive and (to the knowledge of this reviewer) quite new as the only prior work was in mouse ears.

      In response to the reviewer’s comment regarding Fig. 1, although it is qualitative, it is very reproducible. We even tried several clinical isolates of S. cerevisiae and observed consistent behavior to the standard laboratory strains (i.e., they do not grow on SP medium where proline is used as sole carbon/nitrogen/energy source). We tried to quantify growth of all strain in liquid SP medium at 30 oC using a TECAN microplate reader, but then the results show very erratic reading among species (and replicates) as each behaves differently; C. tropicalis, C. krusei, and C. parapsilosis form pseudohyphae and clump readily, while C. albicans forms hyphae and pseudohyphae.

      2.The work fits well as an extension of the body of work from the corresponding author's lab with additions from the labs with expertise in models of infection.

      1. People interested in yeast metabolism and pathogenic yeast virulence will be the audience for this paper and as written it is for a specialized audience interested in pathogenic yeast metabolism and, perhaps, (although not mentioned at all in the text) for those who want to try PUT gene products as new drug targets.

      This was actually mentioned in the last paragraph of the discussion (line 581-582).

      1. Reveiwer expertise is in pathogenic yeast biology and yeast metabolism. Little expertise in high tech microscopy.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The study is part of the continuous work by the authors to dissect the mechanism of utilization of proline as a carbon source in Candida spp. In particular, this work shows that the inability to process proline leads to accumulation of the toxic intermediate P5C and subsequent inhibition of mitochondrial respiration and toxic effect on the cells. Furthermore, the study demonstrates that proline utilization is important for C. albicans kidney colonization. The experiments are meticulously designed and the study adds to the overall understanding of the metabolic utilization of proline as a carbon source and its potential relevance for infection.

      I find this work interesting, but the role of Put1 and Put2 in proline utilization is not particularly novel. The novelty here is the subcellular localization of the two proteins. Also, the importance of proline utilization for infection is unclear. The host-pathogen interaction assays are ambiguous as each assay gives different result. Lastly, the authors try to generalize the importance of use of proline as a energy source by other Candida spp.. This is not very surprising, given that it has been reported previously by others (example DOI: 10.1002/yea.1845) and that many pathogenic or closely related to C. albicans species use various amino acids, not only proline, as a carbon source.

      Yes, as reviewer 2, we are not surprised that many of the pathogenic members of the Candida spp. complex are able to use proline, but this needed to be checked. The fact that proline can be used as a sole carbon/nitrogen/energy source clearly set them apart from the paradigm yeast S. cerevisiae. A major question is what amino acids are important in the context of the host? To assess this, we have used mutations that specifically block proline utilization. Our past studies demonstrating that proline catabolism is rapidly activated in C. albicans cells phagocytized by macrophages indicates that proline is present in the phagosomal compartment. Furthermore, put mutations clearly affect virulence in flies and murine systems. We are at a loss to understand why the reviewer believes that our data, which consistently shows that proline catabolism is important, is ambiguous.

      The expectation that all three mutant strains, i.e., put1, put2 and put3, would behave identically in the different infection models reflects an unnuanced view of how infection works. In fact, differences considered trivial such as the use of mouse background can have a profound effects on virulence. Consequently, it is striking how the diverse infections models consistently and unequivocally demonstrate that proline catabolism affects virulence. Also, it should be appreciated that we are not testing mutations affecting proteins with many overlapping functions, where it may be appropriate to challenge claims as to their direct role in virulence. Here we tested mutants that lack the enzymes that catalyze proline utilization. A more reasonable expectation is that the virulence is commensurate to the specific nutrient composition of model systems (as asked by reviewer#1), which can fluctuate among models (see our response to the major comment 6 of reviewer 1). As it is not practical to precisely test the proline levels in the models, we have worked to identify and focus on critical phenotypes that can be analyzed in vitro. Our findings provide the basis for understanding the virulence and growth properties of the mutants in the context of the complex infection models.

      Moreover, the authors take C. albicans as an example to demonstrate the role of PUT in invasion and infection. Proline is known stimulus for hyphal growth in this species, but many other Candida spp., including C. auris, do not filament. So how, aside from supporting growth, proline is linked to infection in these species? I think the authors oversell the importance of proline in Candida spp. pathogenesis and should tone this part down or remove completely. A new story that validates the importance of PUT in non-albicans species can bring clarity to why and where proline is critical for survival and infection.

      The fact that proline supports growth in the host environment is one of the critical aspects of our work. The lack of appreciation for this finding represents a common misconception in infection biology. It is not just the ability to gain access to a host and initiate an infection that counts, it is equally important to sustain growth and to thrive within the host. Thus, the adaptation to the host environment is critical. Here we document that proline catabolism not only initiates but sustains an infection acting as a critical carbon/energy source. The inability of the put1 and put2 mutants, which are sensitive to proline, to grow and infect multiple models clearly suggests the substantial quantity of proline is accessible. Also, we have constructed C. glabrata (Fig. S1C) and C. auris (not shown) strains that lack the ability to catabolize proline, and are currently characterizing the virulence properties of these strains. This is out of the scope of the present study.

      Major comments: I am not convinced by the data that proline is important to initiate infection. Candida infections of the kidney occur only at late stages of sepsis. The authors need more compelling data to prove that proline is important for infection in the host.

      Again, not sure why there is such skepticism here, regardless of whether kidney infections occur late, the fact that in contrast to WT, we do not observe put mutants filamenting, clearly suggesting that the capacity to catabolize proline plays a role in the expression of virulence characteristics of C. albicans. Based on our findings using IVM, which provides 3D information, we can at least conclude that a single isolated C. albicans cell can initiate hyphal growth, initiating a point of infection. In addition, our newly added whole human blood data suggests that proline catabolism is required for survival in the blood; human blood contains high amount of proline, arginine, and ornithine that are all catabolized via the proline catabolic network.

      Minor comments: I find the manuscript difficult to read and the discussion part is overly long. Some streamlining and adding a bit more explanation for the rationale of each experiment will make the work easier to follow. Some language/style needs refining as well.

      We have attempted to take this critique into account during the revision of the manuscript and have streamlined the text and added explanations regarding the rationale underlying our experimental approaches.

      **Referees cross-commenting**

      In this manuscripts the authors clarify the cellular compartmentalization of steps in proline catabolism. However, it is not novel that proline is a valuable carbon source. The role of proline utilization for establishing or progression of infection remains ambiguous even after the authors provide different in vivo results. The overall significance of the study is limited.

      Please refer to our comments below. We do not understand that the reviewers apparently question the obvious role of proline utilization facilitating virulence.

      Reviewer #2 (Significance (Required)):

      The strengths of this study are in the experimental design and variety. The data is well presented and visualized. The limitations are as pointed above - I find it especially difficult to figure out where, in a real infection scenario (e.g. breach of the gut barrier and entry into the bloodstream) proline will be the primary energy source. To me the significance of this work is minor.

      C. albicans is the primary human fungal pathogen placed under the “Critical Priority Group” by WHO and yet our understanding of nutrient assimilation in this fungal pathogen is only a fraction of what is known in the model yeast S. cerevisiae, which has proven not to be the best paradigm for understanding the regulatory circuits operating in human fungal pathogens. This manuscript, as well as other recent publications, have revisited and corrected earlier assumptions regarding C. albicans growth, providing novel information that reflect important regulatory differences specifically relevant to the life of C. albicans in the host. For example, had it not been for the recent findings (Ref. 10, 18, 31) that show that proline utilization in C. albicans is not subject to nitrogen catabolite repression (NCR) and that glucose represses mitochondrial function, the perception in the field would remain that C. albicans cannot utilize proline as a carbon and/or nitrogen source in the presence of a “preferred” source of nitrogen, which is applicable in the blood that contains high concentrations of possible sources of carbon and nitrogen. Furthermore, the low but constitutive expression of Put2 and the tight highly responsive Put1 expression in response to proline (Fig. 2D, S2A, S2B), suggest that C. albicans is well equipped to productively anticipate proline availability depending on the host status, entirely consistent with its “opportunistic” character. The many incorrect and previously held assumptions regarding C. albicans, uncritically propagated in several influential reviews, likely have hampered efforts to develop novel antifungal therapies. We do not understand, nor accept the view that a more precise understanding of the proline catabolism is incremental.

      The type of question raised by the reviewer is exactly what we hope to achieve in the future but to get there we have to have correct assumptions in place, and this is only possible if we have a more thorough understanding of the regulatory mechanisms driving proline utilization in C. albicans. The idea that certain proteins are refractory to degradation by C. albicans suggest that other external factors are triggering the release of amino acids from these proteins. This work however, suggest that proline is likely accessible in the gut due to the presence of proline-rich proteins like mucin (Fig. S5A/B).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript of Silao et al. describes an in-depth investigation of the role of Put1 and Put2 enzymes in proline catabolism and virulence in Candida albicans. This is an extension of previous work in this system. The basic biochemistry and genetics are solid and support the role of these enzymes in the proposed pathway and provide evidence that the build up a toxic intermediate in the absence of Put2 is likely involved in the poor growth of the strain when proline is the only carbon source.

      Note that we observe the toxic effects of proline even when it is not the sole carbon source, however, and importantly, toxicity is dependent on mitochondrial function, which is repressed by high levels of glucose. Proline toxicity is observed when glycerol/lactate are present as carbon sources in addition to proline. Under these conditions, mitochondria are not repressed and exogenous proline impairs growth, particularly evident in put2 cells that accumulate the toxic intermediate P5C.

      The conclusions regarding its role in virulence are less convincing, particularly the data derived from the collagen invasion assay, the ex vivo skin model and the ex vivo/in vivo imaging. The survival and fungal burden assays support a modest role in virulence and a modest reduction in infectivity (although the presented data for survival does not have statistical significance data reported for the kaplan analysis.

      See below for response regarding collagen assay. We have included the significance values derived from Kaplan analysis in the revised Fig. 5B.

      The manuscript is clearly written. The methods are well described.

      **Referees cross-commenting**

      I remain unconvinced of the broad significance of the advances and stand by my assessment that this is for the most part a reasonable study but does not move the field forward. The novel technical aspects are either extensions of previous in vivo imaging or are not well controlled (collagen invasion assay)s.

      See below for response.

      Reviewer #3 (Significance (Required)):

      This is a detailed study of an area that is fairly mature and thus will be of interest to those in the field but does not represent a large advance and is thus truly incremental.

      See below for response.

      Major limitations of the work are as follows. First, the collagen invasion assay may be flawed. The recovery media is made with DMEM which is a medium that lacks proline and is fairly stringent. Control experiments need to be done to be sure that the mutants grow in the recovery medium. Second, the data from the RHE model are hard to interpret since so few cells are present in the tissue. It is hard to see if there are few filaments of if there are just too few cells to assess in the tissue. Third, in vitro experiments assessing the filamentation of the mutants in the medium in which these assays are preformed need to be done as controls. Candida albicans filaments in many conditions such as tissue culture medium. Spider medium is a strong inducer of filamentation but is very different than in vivo/ ex vivo conditions.

      Related to the collagen invasion assay, there is a misunderstanding. The reviewer appears to confuse the put mutations with proline auxotrophy. The put mutants are proline prototrophs and can synthesize proline as they possess a full repertoire of biosynthetic enzymes. In contrast, the put mutants cannot utilize proline to obtain nitrogen or energy. In fact, the presence of excess proline imposes toxicity to the put mutants. There are three possible sources of proline. 1) PureCol EZ Gel is a ready-to-use collagen solution that forms a firm gel when warmed to 37 °C. It contains purified Type I bovine collagen (5 mg/ml) dissolved in DMEM/F-12 medium, which has multiple amino acids, including a substantial amount of arginine. 2) The recovery medium DMEM supplemented with 10% FBS. The presence of FBS provides amino acids and induces filamentous growth. As the reviewer points out, C. albicans grows in this media and exhibits filamentous growth. 3) The proteolytic breakdown of collagen is expected to liberate proline. Consequently, the poor growth of the mutants clearly demonstrate the importance of proline catabolism. Also, the fact that we recovered put mutants surviving on top of the collagen (Fig. 4B, inset) suggests that they remain viable but simply are unable to efficiently invade the collagen. Consistently, microscopic inspection of the wells of the put mutants showed extremely few or even complete absence of invading cells in the recovery medium. We will adjust the text and provide a more detailed description of the experimental set-up. In summary, the main concern of the reviewer with respect to lack of proline is not relevant.

      Regarding the 3D-skin model, equal numbers of fungal cells were applied on top of the RHE. To avoid overgrowth, only low numbers (100 C. albicans cells) can be applied for the WT strain, and consequently for all other strains. In contrast to WT, which clearly proliferates, the apparent low level of put1 and put2 cells at the center of the 3D skin model is the consequence of poor growth. The upper layer of the RHE consists of stratified keratinocytes. To grow, WT fungal cells obtain proline either directly from the keratinocyte, from secreted proteases that liberate proline from keratin (proline not as abundant in keratin as in collagen, the main component of the dermis), or from the medium that basolaterally feeds the RHE. At the border of the model leakage from the medium can occur. Our results, showing poor growth of the mutants in the center of the 3D-skin model, entirely consistent with the collagen plug experiments, indicates that proline catabolism plays a determinant role to enable invasive growth.

      Lastly, the imaging experiments are highly problematic. First, reference must be made to previous ex vivo imaging reported by the Lionakis lab in 2013. Second, the number of cells imaged is so low that there is no power to make any conclusions. At 24 hr, the mutants may be delayed in filamentation or they may be delayed in establishing infection. There is no way to know what is causing the apparent lack of filaments. This technique as presented is not any higher resolution than traditional histology and in fact histology would provide a more convincing case for reduced filamentation.

      These considerations significantly reduce the overall significance of the work.

      I work on Candida albicans.

      We thank the reviewer for highlighting the beautiful study by Lionakis et al which document the host response, specifically the role of macrophages in mitigating C. albicans infection of the kidney. However, the reviewer apparently failed to recognize that their method is completely differed from ours. Lionakis et al. performed ex vivo imaging of kidney slices using regular confocal imaging, and the authors express an awareness regarding the limitations of this approach. In fact, these authors even state in their discussion that intravital microscopy should be pursued in the future to further investigate Candida-macrophage interactions in the kidney. Also, they point out that kidney-specific factors seem to facilitate rapid filamentous growth of C. albicans. In our work, we have experimentally addressed both of these astute statements. To our knowledge, our work is the first report of imaging a Candida cell infecting a kidney in a living mouse, which on its own is a major development and achievement considering the complexity of the kidney microenvironment. The finding that the put2 mutant does not exhibit filamentous growth in the kidney of a living mouse (24 h) is striking and strongly suggests that a substantial quantity of proline, or amino acids (e.g., arginine) that are metabolized via the proline catabolic network, is present in the kidney. This is clear based on finding that WT C. albicans cells respond accordingly to initiate hyphal growth. Consistent to this, it is well documented that the kidney is a major metabolic hub for arginine and proline metabolism. The work by Lionakis aligns remarkably well with our previous and current work in that put mutants exhibit greatly reduced survivability in co-culture with macrophages and do not evade these primary immune cells due to their inability to induce filamentous growth within the phagosome (Silao et al., 2019). We have adjusted the text to include a discussion that places our work in the context of the Lionakis work.

      We have added a Fig. 6C showing an example of the scanned area of the kidney. Further we added the following in the revised legend to indicate that large areas of kidneys were imaged in our assessment of fungal growth and filamentation:

      “Sites of colonization where localized using a spiral scan in the Las-X Navigator-module in the FITC channel. The entire area of the renal surface attached to the glass imaging window was scanned; circles highlight examples of regions of interest (ROI) exhibiting stronger and deviating fluorescence from the background. Each ROI was examined in detail using FITC, yEmRFP and autofluorescence. Scale bar, 500 µm.”

      CONCLUDING STATEMENT – SUMMARY RESPONSE:

      Our current work is based our previous discovery that proline metabolism provides energy to induce and support filamentous growth (PLoS Genetics, 2019). This turned out to be important since we also discovered that C. albicans cells depend on mitochondrial proline metabolism to evade engulfing macrophages, implicating this process as being an important virulence determinant. Consistently, using time-lapse microscopy, we subsequently found that proline catabolic enzymes are rapidly induced in C. albicans cells upon phagocytosis by macrophages. These results demonstrated that proline is present within phagosomes. As exciting as these findings are, they focused on a single phenotype, i.e., filamentation, and were obtained using in vitro experimental approaches. These results demanded that we pursue additional avenues to further characterize and test the in vivo relevance and merely provide a solid background for the current work.

      In contrast to reviewer 2 and 3, we do not believe that our finding that proline catabolism plays such a critical role in virulence as being merely “incremental”. We also could not have foreseen that the ability to use proline as an energy source is a common feature of multiple fungal pathogens capable of causing human disease. This is conceptionally very important in that human fungal pathogens, unlike the well-studied yeast Saccharomyces cerevisiae, are not readily found out in nature, and thus have evolved to use a similar spectrum of nutrients as host cells, including cancer cells. It is important for the fungal pathogen community to realize that regulatory switches operating in C. albicans are wired substantially differently to those in S. cerevisiae, and are likely optimized to reflect the actual condition in the host environment. The growing appreciation that diverse cancers are able to shift metabolism to exploit proline as an energy source is strikingly and fascinatingly similar to our findings with pathogenic fungi. This represents a conceptual advance in that it points to the wealth of proline stored within extracellular matrix proteins as providing a potential and significant source of energy for virulent fungal and cancerous growth.

      Finally, we strongly believe it is improper to extrapolate virulence properties based on in vitro findings, and that it is essential to actually test host-microbial pathogen interactions using refined in vivo models. Our successful use of advanced intravital microscopy goes beyond traditional and accepted murine infection models and has provided us with a unique state-of-the-art vantage point. Our findings that a single C. albicans cell is able to initiate and establish a site of infection in a kidney within a living mouse is itself important, and coupled to the novel finding that hyphal development at sites of infection depends on the ability of the fungal cells to catabolize proline must reflect the physiological conditions in the kidney. This is not an incremental finding, and we do not understand that reviewers 2 and 3 diminish the significance of these findings. Clearly, our manuscript provides a strong foundation for more detailed and advanced studies.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Silao et al make the intriguing observation that yeasts that are generally considered less pathogenic are unable to catabolize proline than Candida albicans. They then, in Candida albicans, construct mutants defective for the two key enzymes (Put1, Put2) required to convert proline to glutamate, which they show to be essential for proline utilization as an energy (carbon) and nitrogen source. The authors proceed to untangle the regulatory aspects of proline degradation, including the respective cellular localization of its key enzymes. They then make the important discovery that strains lacking either Put1 or Put2 suffer from a proline-dependent growth defect, which they attribute to resulting defects in mitochondrial metabolism.

      The manuscript then goes on to analyze a broad range of infection models including: reconstituted human epithelial skin model, Drosophila, mouse systemic infections, organ colonization in these mice (kidney, spleen, brain, liver and histochemistry of the kidneys) as well as survival when incubated with cultured human neutrophils. Finally, they use yeast cells constitutively expressing yEmRFP (so that yeasts can be distinguished from other host cells) and coated with FITC before incubation with the host cells (which coats the wall of the original cells, but does not spread to progeny) and they go on to perform an impressive set of analyses of C. albicans growth within mouse kidneys both in vivo and ex vivo, exploiting an implanted window together with intravital imaging with a two photon microscope at different time points. The system is impressive and visualizes tissue invasion by hyphal cells beautifully. Finally, they compare the intra vital images from WT and put2-/- cells and show that, as in vitro, put2-/- cells do not form filaments and do not show extensive invasion of the kidney tissue. While the in vivo aspect of the study includes many different models, it finds defects in virulence for different subsets of put mutants and the relative importance of filamentation vs proline utilization for virulence is not conclusively resolved.

      Overall, this is an important and timely manuscript, which significantly contributes to the understanding of how proline metabolism intersects with yeast fitness in the context of infections. However, there are several major concerns regarding some of the conclusions drawn from the study. In addition, some general recommendations that would improve the manuscript are provided.

      Specifically, the manuscript provides a very detailed description of experiments and observations. However, in several parts it is difficult to follow and the the reader needs more guidance about the logic involved in reaching conclusion. Specifically, several aspects of the paper are written for experts in Candida (yeast) metabolism. Here, explaining the rationale for some of the experiments, and providing more background information that is not obvious to a non-expert, is required.

      In particular, writing a clear and measured summary sentence at the end of each paragraph and a conclusion paragraph that summarizes key findings in simple terms would help make the manuscript more digestible for readers.

      In addition, the impressive microscopy and broad range of in vivo experiments is comprehensive but only adds incremental information relevant to proline metabolism-that filamentous growth in vivo and virulence is reduced in cells carrying some mutations in one or more put genes. However, this broad sweep of model systems and the development of the in vivo imagining system might have more impact in a separate paper focused on the real-time in vivo visualization of kidney invasion.

      Major comments:

      1. The main finding that impressed this reviewer is that "removing the ability to catabolize proline, in an organism that evolved to catabolize it, leads to (growth) defects". This point could be better highlighted throughout the manuscript.
      2. The authors show that deletion strains for proline metabolism have defects that are important for in vivo pathogenicity. This is an important finding. However, as the manuscript reads now, it suggests that the main findings are that the ability to use proline in the respective host niche is key. Mechanistically, the manuscript revolves primarily around defects that arise when deleting PUT1 and/or PUT2 (i.e., an "unknown" toxicity of proline in the case of put1-/- (or put1-/- put2-/-) and the additional P5C-dependent toxicity for put2-/- mutants; see below).
      3. In order to claim that catabolizing prolines promotes pathogenicity (as opposed to the alternative hypothesis that the inability to catabolize proline leads to the observed defects), additional experiments would be required. For example, the put mutants would need to be compared with mutants that significantly reduce/impair proline uptake, such as the referenced gnp2 mutant (Garbe et al 2022). While the finding that less pathogenic yeast species are unable to catabolize proline is both intriguing and important, it also remains as is presented as a loose, non-quantitative correlation that only tangentially address the question of whether "proline catabolism is key for pathogenicity".
      4. 238 onwards: The conclusion that "the primary growth inhibitory effect of proline is linked to catabolic intermediates formed by Put1 and that are metabolized further by Put2"does not appear to be fully supported by the evidence. Addition of proline to put1 mutants already reduced OD600 by ~50% (Figure 2); and is further reduced to ~10% when put2 is deleted. This implies that there are two inhibitory effects of proline, not one primary one. At the least, this option should be discussed, including why deletion of PUT1 leads to proline toxicity. The latter is not clear-is it that too much proline accumulates in the cell and this accumulation is toxic? If this is the case, the effect would be expected to be proline concentration dependent. Performing a relatively simple experiment as performed for the put2 mutant (Fig. 3 / S3F) may clarify this issue. Particularly, if the experiment would be coupled with intracellular quantification of proline.
      5. The caption "P5C mediates a respiratory block" is misleading, as the evidence is not that compelling: Although P5C increases in put2, but not in put1 mutants, and given that both single mutants experience a proline-dependent respiratory defect (Fig. 3E), the results suggest a more complex relationship.
      6. The virulence assays and in vivo experiments do not present a unifying view: in Drosophila put2∆∆ is less virulent than put1∆∆, which appears similar to put3∆∆. Given that put2 mutants grow slowly, likely because of P5C inhibition, this seems logical. However, in mice, put3∆∆ remains highly virulent while put1∆∆ and put2∆∆ results for survival are mixed. Furthermore, in 4 mouse organs, put1∆∆ and put2∆∆ are not significantly different from one another but are different from wt, while put3∆∆ has no significant reduction in CFU. Kidney histology shows very little invasion by put1 and put2 and more by put3, but visually put3 appears to invade much less than the WT, and the human neutrophil experiment shows effects of put2 or put3 but not put1. This leaves the reader rather confused. It may be worth discussing the reasons for different results in different models. Is the availability of proline in each of the organisms and organs similar?
      7. The ex vivo and in vivo analysis of the dynamics of C. albicans growth in the host is visually impressive, but it distracts from the focus of the paper and the metabolic findings. Showing that put mutant cells do not form filaments in vivo (as in vitro) does not add much conceptually to the paper. Furthermore, this lovely advance in in vivo visualization is lost at the end of this paper and the authors should consider whether it might fit better in manuscript that could really highlight the in vivo visualization approach.
      8. The discussion of cells stained with FITC and expressing yEmRFP does not clearly point out that the FITC is only an indicator for those cells that were used to innoculate the tissue and that finding cells without FITC indicates that they are mitotic progeny, indicating that they have been dividing. The authors clearly understand this, but a naive reader may miss this important point if it is not stated explicitly.

      Minor comments:

      1. Throughout: what is the distinction between utilization of proline for C or for energy? These terms seem to be used interchangeably.
      2. Introducing the schematic in Fig. 2A at the beginning of Figure 1, would help explain proline catabolism before delving into the growth experiments that rely upon this framework. This should include an explanation, for readers less familiar with the metabolic issues, of the main limitations to catabolizing proline, and the key issues for being able to use proline for nitrogen, carbon, and energy (potentially indicated in the overview figure, e.g. pointing towards gluconeogenesis etc.).
      3. Saccharomyces can only grow on proline as a nitrogen source, but not as energy/carbon source. Could the authors briefly mention or discuss why this is the case? This is not clearly apparent after reading the manuscript and it leaves the reader confused and trying to understand if the fact that proline is required for carbon utilization is a new finding of this paper or was already known. Do the authors think this is tied to the presence of complex 1 components in C. albicans that are not found in S. cerevisiae. Is this consistent for the pathogenic, but not the non-pathogenic yeasts analyzed in figure 1?
      4. 100: While Gdh2 is apparently an important enzyme for generating ammonium, why is it not necessary for macrophage escape and virulence as shown in reference 18? A recent paper from Garbe et al (ref 12) suggests that Gnp2 is the major proline permease in C. albicans and what is known, and not known, about proline uptake would be good to mention, given that PUT gene functions require that proline enters the cells.
      5. 116: Is the "low sugar environment of the host" referring to a specific niche, such as the GI tract, or human blood? Compared to most natural environments, glucose is abundant in the host, e.g., at ~5 mM, it is the most abundant metabolite in blood, and similarly, in the GI tract, levels can go beyond 50 mM glucose (see e.g. PMIDs 34371983, 21359215). Or is this comment indicating that the in vivo sugar concentration is lower than that in common lab growth media? Please spell out the niche/concentration for clarification - and compare that to other niches that are considered "high sugar environments".
      6. 123: "proline as sole energy source" - suggest "is the source of carbon, nitrogen, and energy"
      7. 142: it is worth noting to readers that C. neoformans is a basidiomycete and thus VERY distant from the other yeasts studied here-it is in a different major phylum of fungi.
      8. 143: Here it is implied that put1 and put2 mutant strains do not grow on SPD, but this is not stated explicitly.
      9. 151: The abbreviation SPG is not explained in main text.
      10. Paragraph 156 onwards: this section is particularly hard to read and very dense. Also, it is difficult to understand the significance of these experiments for the overall findings of the paper. Please at least provide a small conclusion / summary at the end of the paragraph that puts the findings into perspective.
      11. Figure 2 C: simplifying the scheme (e.g. lots of redundant information, P2 and Mito - just give it one name) would help. This figure may be better in the supplementary material.
      12. Figure 2B: It is not directly apparent from the micrographs that Put1-RFP localisation is mitochondrial. Co-localisation of the RFP with a mitochondrial dye (e.g., mitotracker) or something similar is required to validate it.
      13. Throughout the manuscript (figure legends): Suggest using "mean" instead of "Ave."
      14. 175: According to the 'Yeasttract' and 'Pathoyeasttract' databases, Put1 regulates at least 36 and 22 genes, in S. cerev. and C. alb., respectively (based on DNA binding and/or regulatory changes). The only gene in common between these two lists of genes is PUT1. Thus, it is quite likely that Put3 regulates many other processes that explain its function and that its major function may not be only to regulate Put1.
      15. 175: Is it clear whether the Put3-independent mechanisms are positive or negative with respect to Put1?
      16. 218: Suggestion: "growth was indistinguishable".Unless growth curves or growth rates are provided and if one time-point data are the basis for this point, than "rates" is not a relevant term.
      17. 256 onwards: did the authors test if the ROS scavenging effectively reduced ROS? i.e. does the luminol-HRP assay yield less ROS in +proline +scavenger treatment? This is necessary to effectively conclude that the growth inhibitory effect of proline is due to blocking respiration.
      18. The Figure captions are extremely lengthy and detailed, making it cumbersome to find the relevant information. Suggest moving some of the information, such as additional experimental details, into the methods section.
      19. 277-301: Phloxine is not exclusively a live/dead cell indicator-it is an indicator of metabolic activity. In Scerev. and Calb. it also indicates slower growth, opaque growth, and it has been used as an indicator of aneuploidy in C. glabrata (https://journals.asm.org/doi/10.1128/msphere.00260-22) and of diploids vs haploids in S. pombe. The colonies illustrated aer made up of many live cells, and thus the section "Defective proline utilization is linked to cell death" needs to be presented more carefully. In addition, it appears that this section shifts from using defined medium to using rich medium and 37C instead of 30C. Why was this shift necessary?
      20. 295-301: Related to the point above, these results are hard to interpret due to the switch from defined medium in all prior experiments to rich growth medium here. Also, it is not clear why a 48h old YPD culture was chosen to show that the degree of PI staining correlates with mitochondrial activity - is this due to the culture age? It would be more clear to image cells grown on glucose vs. glycerol/lactate, or under repressive / de-repressive glucose concentrations (e.g., as shown in Fig. S4C where a PI+ difference is apparent for 0.2% glucose vs. 2% glucose at 30{degree sign}C).
      21. 313-14: The statement 'the invasion process was dependent on the ability of cells to catabolize proline' doesn't take into account that put mutant cells are defective in filamentous growth irrespective of their utilization of proline...and like the efg1 cph1 double mutant.
      22. 316-327: The results of the experiment described can only be interpreted as an effect of proline catabolism if the three strains (efg1 cph1; put1; put2) have similar growth rates as yeast cells in vitro. Why weren't the cells competed directly (efg1 cph1 vs put cells)?
      23. Fig 6: The logical order of the experiments, and in the text, is: 1) 4 h window, 2) 26 h window and then 3) ex vivo. The cartoon in 6B should be in this order as well.
      24. 337: it is not clear what the 'direct exposure...' is trying to tell us. Can this be made more explicit?
      25. 340-346: Here proteins with high proline content were used to ask if they could be induce transcription of PUT1 or PUT2 RNA and protein. This experiment is designed only to test the role of these proteins to induce utilization of nitrogen, as glucose is included in the medium. Given that these proline-rich proteins need to be lysed by proteases before they can be imported, and since no import pathways were tested, the results appear to tell us that mucin is more readily digested to peptides that contain proline-but why that is the case is not clear and how it relates to proline utilization is also not clear.
      26. 363-369 An alternative is that Put3 induces different proteins important for growth.
      27. 379-380-the conclusion for this paragraph is somewhat of an overstatement as there is no analysis of the degree to which proline utilization is a predictor of virulence. It simply shows that put mutants affect the ability to survive in neutrophils.
      28. Discussion: The statement that "S. cerevisiae" evolved in high sugar environments is debatable. The natural niche could well be forest soil and tree bark, or insect/wasp guts with arguably little glucose around.
      29. 469-470-how strong is the 'correlation' between the ability to utilize proline and virulence? Given that different mutants had different effects in different models, this seems like a very loose 'correlation'; it would be good to have some quantitative measures to make this claim.
      30. 500: Was the experiment was done in larvae, and not in adult Drosophila? Fig 5 legend says flies and shows a picture of a fly and larvae are only mentioned much later in the text..
      31. 512:Why is it presumed that proline accumulates in the mitochondria in put1 mutants? How strong is the presumption?
      32. 539: why are MMPs important for digestion of collagen? This is not clear at this point of the Discussion.
      33. 574: Concluding sentence of this paragraph seems unsubstantiated. There are at least two defects in put2 strains-hyphal growth and growth in general, presumably because of P5C accumulation.
      34. Fewer abbreviations would make the manuscript easier for non-experts to read. For example, P5C is not defined in the abstract. Furthermore, if an abbreviation is not used more than 3 times, it is not necessary to provide it (e.g., mammalian proteins in the last paragraph).

      Typos: 1. 82: should read 'is restricted to the mitoch...' 2. 102-103: should read 'to evade macrophages' 3. Fig. S4F is mislabelled as Fig. S4G.

      Referees cross-commenting

      Overall, we stand by our initial assessment of the study. However, we were not aware of previous studies that investigated proline utilization in yeasts, as noted by Rev # 2 (https://onlinelibrary.wiley.com/doi/epdf/10.1002/yea.1845). The current study suggests that using proline as an energy/carbon source is more wide-spread, beyond pathogenic yeasts. Further, the C. albicans strain they used for this study (ATCC 10231) was apparently unable to grow on proline in the quoted paper. In light of this, we think the authors should reference this study, tone down the claims about the clear correlation of pathogenicity and proline utilization, and address this apparent discrepancy with the indicated Candida albicans isolate. We note that our review considered this a paper mostly of interest to specialists.

      Significance

      1. The advance in this paper is conceptual for the proline utilization connection to virulence in a range of species and technical for the in vivo microscopy. Limitations are that the conceptual advance is based only on qualitative work in figure 1 and that the animal studies do not provide a conceptual advance, although the technical advance of in vivo visualization of kidney tissue is impressive and (to the knowledge of this reviewer) quite new as the only prior work was in mouse ears.
      2. The work fits well as an extension of the body of work from the corresponding author's lab with additions from the labs with expertise in models of infection.
      3. People interested in yeast metabolism and pathogenic yeast virulence will be the audience for this paper and as written it is for a specialized audience interested in pathogenic yeast metabolism and, perhaps, (although not mentioned at all in the text) for those who want to try PUT gene products as new drug targets.
      4. Reviewer expertise is in pathogenic yeast biology and yeast metabolism. Little expertise in high tech microscopy.
    1. Author Response

      Reviewer #1 (Public Review):

      Various parts of the premotor cortex have been implicated in choices underlying decisionmaking tasks. Further, norepinephrine has been implicated in modulating behavior during various decision-making tasks. Less work has been done on how noradrenergic modulation would affect M2 activity to alter decision-making, nor is it clear whether noradrenergic modulation effects on activity would differ between the male and female sexes.

      This manuscript addresses some of these questions.

      • In particular, clear sex differences in task engagement are seen.

      • May also show some interesting differences and distributions of β2 adrenergic receptors in M2 between males and females.

      We thank the reviewer for their summary of our findings and thoughtful critique of our manuscript. In our revised manuscript we have taken measures to address the reviewer’s comments in line (blue edits in text and revised figures) with direct responses outlined below. We believe these revisions improve the scientific rigor of our findings and provide relevant context for our studies. We hope that they have sufficiently addressed the reviewer’s concerns.

      Less clear is the specificity of systemic antagonism of β adrenergic receptors on the changes in M2 activity reported. As propranolol was given systemically, changes in M2 firing rates could also be due to broader circuit (indirect) activity changes. As it was not given locally, nor were local receptor populations manipulated, one is unable to make the conclusion that changes in neural activity are due to the direct effects of adrenergic receptors within M2 populations.

      We agree that propranolol driven changes in anterior M2 activity may arise via multiple mechanisms, including direct action on the adrenoreceptors within M2, and indirect action via other regions that project to M2. Although locally activating inhibitory interneurons within M2 is sufficient to disrupt cueguided action plans and behavior in a 2AFC task (Inagaki et al., 2018), our noradrenergic manipulation was not restricted to M2. We have clarified our conclusions and provided additional discussion to highlight that propranolol actions were multifaceted and that direct actions in M2 are likely working in concert with propranolol mediated actions in other regions.

      Also not clear, is the contribution of M2 to this task, and whether the changes in M2 activity patterns observed are directly responsible for the behavioral disruptions measured.

      We have revised our introduction and discussion to more clearly outline the critical role of cue-guided action plans in M2 for successful behavior in 2AFC tasks. Suppression of cue-guided activity in M2 results in behavioral performance at near chance levels, similar to what we saw in females after propranolol (Guo et al., 2017; Inagaki et al., 2018; Li et al., 2016). Furthermore, targeted photostimulation of action plan encoding neurons in M2 is sufficient to drive behavioral responses (Daie et al., 2021). In our investigations it is plausible to expect propranolol related disruptions in other cognitive, sensory or motor regions. Based on the strong foundational evidence for M2 activity in 2AFC, the propranolol driven changes in anterior M2 in females, whether direct or indirectly mediated, are likely sufficient to drive behavioral disruptions in accuracy and/or trial completion.

      Reviewer #2 (Public Review):

      This paper by Rodbarg et al describes an interesting study on the role of beta noradrenergic receptors in action-related activity in the premotor cortex of behaving rats. This work is precious because even if the action of neuromodulatory systems in the cortex is thought to be critical for cognition, there is very little data to actually substantiate the theories. The study is well conducted and the paper is well written. I think, however, that the paper could benefit from several modifications since I can see 3 major issues:

      We thank the reviewer for their generous comments on the potential impact of our manuscript as well as their suggestions to improve this work. Below we outline responses to specific comments raised by the reviewer in addition to adresing them in the revised manuscript. We hope these responses sufficiently address the reviewer’s concerns.

      Both from a theoretical and from a practical point of view, the emphasis on 'cue-related' activity and the potential influence of NA on sensory processing is problematic. First, recent studies in rodents and primates have clearly demonstrated that LC activation is more closely related to actions than to stimulus processing (see Poe et al, 2020 for review).

      Indeed during optimal performance the peaks of LC activity are larger when PETH are aligned to action initiation rather than the cue itself (Clayton et al., 2004). This alignment resolves variability in decision processing times and omitted cues. Although LC responses align with action they are evoked by, and occur after, cue presentation with LC responses to visual cues occurring ~ 60ms after presentation (Aston-Jones & Bloom, 1981). The same behavioral action without preceding task relevant cues does not evoke an LC response (Rajkowski et al., 2004)

      In our current study cues initiate activity in anterior M2, this is our primary interest and where our electrodes are placed. The window between cue delivery and action completion hones in on our goal of investigating the role for β noradrenergic signaling in target cortical processing, rather than LC explicitly. In both NHP and rodents NE signaling (and evoked LC) promotes sustained cortical representations between cue onset and actions across cortical regions (dlPFC, S1) (Ramos & Arnsten, 2007; Vazey et al., 2018; Wang et al., 2007). In the current study we aligned neural data to either cue presentation (Figure 3) or action (lever press; Figure 4). Both presentations support a critical role for β adrenoreceptor signaling in suppressing irrelevant information, resolving and maintaining action plans. A unique feature of aligning the data to cue onset is that it allows us to see how the neural activity changes not only on completed trials (that end with a lever press) but also on omitted trials (which strongly increase after propranolol). We propose the reason we are seeing large increases in omitted trials is because β adrenoreceptor blockade either directly or indirectly prevents anterior M2 from resolving an action plan.

      Second, the analysis of neural activity around cue onset should be examined with spikes aligned on the action, since M2 is a motor region and raster plots suggest that activity is strongly related to action (I'll be more specific below).

      We agree that M2 shows important action plan activity which we highlight throughout the manuscript. In cued tasks, M2 neurons have been shown to represent action plans starting at cue onset that continues up to behavioral execution. Neural data was examined and results presented aligned to cue onset (illustrated in Figure 3) and aligned to action - lever press (illustrated in Figure 4). The impact of propranolol in diminishing action plan selection was similar in both action, and cue-aligned analyses.

      The distinction between neural activity and behavior or cognition is not always clear. I understand that spike count can be related to motor preparation or decision, but it should not be taken for granted that neuronal activity is action planning. The analysis should be clarified and the relation between neural activity, behavior, and potential hidden cognitive operations should be explicated more clearly.

      We have worked to clarify in our revised introduction, results and discussion the specifics of the known roles of neural activity in M2 in both action planning and decision making. We further expand that the neuronal activity in our study may reflect potential changes in cognitive processing and thus alter resultant behavioral outcomes.

      The sex difference is interesting, but at the moment it seems anecdotal. From a theoretical point of view, is there any ecological/ biological reason for a sex dependency of noradrenergic modulation of the cortex? Is there any background literature on sex differences in motor functions in rats, or in terms of NA action? If not, why does it matter (how does it change the way we should interpret the data?) From a practical point of view, is there a functional sex difference in absence of treatment, or is it that the drug has a distinct effect on males vs females? This has very distinct consequences, I think.

      We did not find overt differences in behavior in the absence of treatment. Only when noradrenergic function was challenged using propranolol did we identify functional sex differences. We agree that this has very distinct consequences – specifically it supports sex differences that can be revealed by perturbations of normal function. These functional sex differences may be a result of differences in the anatomy of central noradrenergic systems, a hypothesis further supported by our mRNA expression findings and existing literature on LC anatomy across species (Bangasser et al., 2011, 2016; Luque et al., 1992; Mulvey et al., 2018; Ohm et al., 1997; Pinos et al., 2001). Collectively these results have potential ramifications for understanding sex differences in disease prevalence and targeted treatments.

      Background literature supports some innate sex differences in motor function and executive function in rodents and humans. Of particular relevance to our investigation is an established difference in behavioral strategy with females being more risk averse than males (Grissom & Reyes, 2019). Ethologically risk adverse strategies may support parental care roles, and increased inhibitory mechanisms may be selected for in females. Although this strategy was not directly tested in our study, the large increase in omissions after propranolol seen in females is in line with avoiding risk (incorrect choices) during uncertainty (disrupted neural signaling). As with other executive functions, the utilization of norepinephrine within the cortex along with other neuromodulators, and local microcircuit interactions would all contribute to promoting risk averse behavior.

      These issues could be clarified both in the introduction and in the discussion, but the authors might have a different view on what is theoretically relevant here. In the result section, however, I think that both the lack of specificity in the description of behavior and cognitive operation and the confusion between 'sensory' and 'motor' functions make it very difficult to figure out what is going on in these experiments, both at a behavioral and at a neurophysiological level. First, the description of the behavior in the task is clearly not sufficient, which makes the interpretation of the measures very difficult.

      We have made an effort to better specify the task and relevant behavioral operations in both the methods and results and have included a clearer task schematic (Figure 1A). We agree that the confusion between ‘sensory’ and ‘motor’ functions may make it more difficult to understand the findings in this study. Anterior M2 plays a unique role in representing motor/action plans that can be informed by sensory information. This integrative function creates difficulty in parsing the neural activity of anterior M2 as strictly motor, sensory or cognitive. In attempts to improve clarity we have expanded and highlighted relevant information on the known roles of M2 in the introduction and discussion.

      One possible interpretation of the effects of the drug is a decrease in motivation, for instance, due to a decrease in reward sensitivity or an increase in sensitivity to effort. But there are others. More importantly, none of these measures can be used to tease apart action preparation from action execution, even though the study is supposed to be about the former.

      Neural activity during action planning, prior to action execution is known to be an essential function of M2 (Barthas & Kwan, 2017; Gremel & Costa, 2013; Guo et al., 2017; Inagaki et al., 2018, 2022; Li et al., 2016; Siniscalchi et al., 2016; Sul et al., 2011; Wei et al., 2019) for optimal performance in 2AFC tasks. In all, we found that the representation/separation of opposing action plans (a well validated function of M2) prior to responses (lever press) is degraded after propranolol, especially in females. We have provided additional emphasis on these foundational studies throughout our revised manuscript.

      To minimize impact of motivational factors, effort and reward size remain consistent within our task, and all trials require a random initiation hold prior to cue delivery. As described in our general response to the editor above (Figure 1, above), we investigated whether motivational changes may be reflected in our M2 recordings. PETHs from the first and last 10 trials within saline sessions did not identify potential motivation related differences in anterior M2 activity. Similarly, across propranolol sessions the neural activity was consistent between early and late trials. We used early and late trials as there was a mild decrease in trial rate during saline sessions in both males and females, potentially indicative of motivation/reward sensitivity changes during these sessions. M2 neural responses consistently separate action plans (after saline) or failed to separate action plans (propranolol sessions).

      Also, but this is less critical: In Figures 2C and D, it looks like there is a bimodal distribution for the effect of propranolol in females. Is there something similar in the neuronal effects of the drug? And in the distribution of receptors? Can it be accounted for by hormonal cycles/ anything else?

      Although there is some clustering in behavioral outcomes all data passed normality assumption as appropriate. Propranolol treatments were not synchronized to hormonal cycles, and the data likely include animals at various hormonal stages. Similar clustering was not apparent in neuronal effects of propranolol, although propranolol increased variability in many measures.

      In a pilot experiment we did not see any difference in baseline performance on our 2AFC task across the hormonal cycle (diestrous, proestrous, estrous or metestrous) of females in any measure including accuracy (F(3,33)=0.59, p=0.63, one-way ANOVA) and omissions (F(3,33)=0.51, p=0.68).

      The description of neural activity is also very superficial. In general, it is not clear how spike count measures have been extracted. For example, legend and figure C are not clear, is the (long) period of cue presentation included in the 'decision time'?? "Cues were presented at a variable interval 200-700ms after initiation and until animals left the well, 'Well Exit'. The time from cue onset to well exit was identified as the decision time (yellow)." Yet on the figure only the period after cue presentation is in yellow. This is critical because, given the duration of the cue, the animals are probably capable of deciding (to exit the well) before the cue turns off. Indeed, as shown in fig 2D, the animals can decide within about 500 ms. So to what extent is the 'cue response' actually a 'decision response'?

      We have clarified the task and spike count measurements in methods and added a revised task schematic. It is correct that the cues are available throughout the decision time (for up to 5 seconds or until well exit), and an action plan is generated before well exit/cues turn off as reflected by the separation of neural action plans (Fig 3, saline). Anterior M2 neurons maintain action plan representation from cue onset until the lever press under normal conditions (Fig 4, saline). These action plans encapsulate “cue responses” and “decision responses”. We have aligned neural data to discrete timestamps at either end of the window in which M2 processing is known to be critical, specifically between cues and actions (lever press) and focus on neural activity relative to those points. We refer to this activity throughout the manuscript as an ‘action plan’ as action planning functions of M2 activity have been well established in prior studies.

      When looking at figure 3A, there is clearly a pattern on the raster, a line going from top left to bottom right. If the trials are sorted chronologically, something is happening over time. If, as I suspect, trials are sorted by ascending response time, this raster is showing that what authors are calling a 'response to cues' is actually a response around action. Basically, if propranolol slows down reaction time, the spikes will be delayed from cue onset only because they remain locked to the action. Then the whole analysis and interpretation need to be reconsidered. But it might be for the best: as I mentioned earlier, recent work on LC activity has clearly emphasized its influence on motor rather than sensory processing (Poe et al, 2020).

      Figure 3A is a single neuron example, and data analyses focus on population-wide activity. Neural data is presented both aligned to cues, for all trials in which a cue was received, and aligned to lever press (action), for all trials on which a lever press occurred. In both cases, aligned to cue or aligned to action, the impact of propranolol is the same. β adrenoreceptor blockade reduces the separation of action plans in M2, severely so in females. However, a major finding is that females receive a cue but omit a large number of trials after propranolol, for this outcome the action does not occur. We propose this is due to the lack of action plan separation in anterior M2 (either directly or indirectly). When no behavioral response occurs, these trials cannot be aligned to action, yet we are still interested in the neural activity during the critical window between cue delivery and actions. We are not assigning this neural activity to sensory processing but using this discrete sensory event within our trials (cue) to align the data as there is substantial evidence that action plans in M2 arise after cue presentation in tasks such as ours where performance is guided by external cues.

      Fig 2D-F: it is hard to believe that the increase in firing rate induced by propranolol in females is not significant. Presumably, because the range of the median firing rate is so high in the first place, distribution (2E) really indicates an increase in firing. Maybe some other test? e.g paired t.test, or standardized values (z.score) to get rid of variability in firing across neurons?

      We agree that the session wide firing rate appears rightward shifted in females after propranolol. As our recordings were taken on different days, several days apart we cannot assume they are the same neurons for paired analyses. In our revised manuscript we evaluated these distributions using a MannWhitney test to increase power and decrease the impact of variability within the population. Previously we had used a Kolmogorov-Smirnov test. Using our new analysis, we can confirm that the propranolol significantly increases session wide firing rates in anterior M2 of females (p=0.027) but not males. This finding increases evidence for direct actions of propranolol within M2 and supports our hypothesis that propranolol leads to local disinhibition by reducing β noradrenergic signaling in interneurons and that without this noradrenergic tone anterior M2 is less efficient at suppressing irrelevant action plans.

      Along those lines, would it be worth looking for effects on specific populations (interneurons) which are sometimes characterized by thinner spikes and higher mean firing rates? Given the distribution of beta receptors RNA on interneurons, one would actually expect an effect of propranolol on the firing rate irrespective of task events. Or what is it that prevents the influence of propranolol on interneurons from changing the firing rate? In any case, one of the strengths of this study is the localization of beta receptors on specific neuronal populations in the cortex, so I think that the authors should really try to build on it and find something related to the neurophysiological effects. Otherwise, one cannot exclude the possibility that the behavioral effects are not related to the influence of the drug on these receptors in that region.

      Data were collected using stainless steel electrode arrays and our sample population of task related neurons is likely biased to pyramidal neurons, with a small number of fast spiking interneurons. We used validated spike waveform parameters of interneurons in premotor cortex (peak-to-trough ratio and duration; Giordano et al., 2023) in an attempt to isolate putative interneurons and found only a very small number of these cells in our recordings (n=5-7 per group). This population is too small to make any inferences about specific impacts. We have focused on the collective population activity of M2 as this is most strongly related to optimal action planning.

      You are correct that from the given findings we cannot conclusively show that the results found here are a result of propranolol acting solely within anterior M2. We have made sure to clarify throughout our revised manuscript that the behavioral and physiological changes we identified are a result of collective direct and indirect actions of propranolol.

      The conclusion that neuronal discrimination decreases because the proportion of neurons showing no effect increases is confusing (negative results, basically). It would be clearer if they were reporting the number of neurons that do show an effect, and presumably that this number shows a significant decrease.

      The reviewer is correct that the number of neurons that do show an effect (task related activity) does significantly decrease with propranolol (from n=70 to 27 in females and n=71 to 48 in males). These n are now given adjacent to the proportions rather than at the end of the paragraph. Proportions were used for statistical analysis due to an overall decrease in the total number of units after propranolol. All PETH presented are from neurons that show some task related activity, these PETH confirm that neural activity no longer effectively discriminates/separates action plans in M2.

      Figs 3F-I: a good proportion of neurons (at least 20%) show a significant encoding before cue onset. How is it possible? This raises the issue of noise level/ null hypothesis for this kind of repeated analysis. How did the author correct for multiple comparison issues?

      In response to reviews, we have altered the manner in which we identify the significantly modulated neurons to increase rigor and no longer include these figures or analyses. The proportion of neurons showing action plan encoding prior to cue onset was likely an artifact of how the data was analyzed and an insufficient correction for multiple comparisons, allowing inclusion of internally generated action plans in some neurons.

      The description of the action-related activity is globally confusing. Again, how can the authors discriminate between activity related to planning vs action itself? What is significant and what is not, in males vs females? What is being measured here? For example, a very unclear statement on line 238: "Propranolol primarily disrupted active inhibition of irrelevant action selection in M2 activity, reducing the ability to maintain action plan representation in M2, delaying lever press responses (Figure 4L, 4M)." What is 'active inhibition? What is an irrelevant action plan? What is selection? All of that should be defined using objective behavioral criteria and tested formally.

      We have changed our wording to clarify what we are describing and why we have chosen the words we have, and to ensure consistency and objectivity throughout the manuscript. Much of the wording we have used – for example action planning or action plan selection, are the words used in the literature to describe M2 neural activity. We call the activity in M2 action planning (either externally/cue guided or internally guided) because that is what has been previously demonstrated. In our task design and analysis we are tracking cue guided actions, as opposed to internally guided.

      We also separate the electrophysiology data as preferred and nonpreferred because the literature has shown individual M2 neurons show specific directional tuning as noted in our results, using the term ‘preferred’ encapsulates that tuning regardless of left/right direction. An example M2 neuron that increases activity for left cues and responses (preferred direction), will show active inhibition (low/negative z scores) on trials with right cues and responses (nonpreferred), other neurons would show the inverse relationship with direction.

      A primary impact of propranolol was the loss of negative z-scores for nonpreferred trials ie neurons with a left preference that are usually inhibited on right trials were still firing and vice-versa. After propranolol neurons continue to fire for an irrelevant action plan (for the opposite direction), and the resulting population activity is not significantly different for opposing cues/responses. Behavioral responses normally occur after opposing action plans have significantly separated in M2, collapsing action plans by preventing relevant signaling (Guo et al., 2017; Inagaki et al., 2018; Li et al., 2016) or facilitating irrelevant signaling as we see here with propranolol leads impairments in 2AFC performance.

      Also, the description of the classifier analysis should be more thorough. Referencing the toolbox is not sufficient to understand what has been done.

      We have added additional explanation in both the methods and description of the results to clarify the functions of the neural decoding box and how we are using it to evaluate information encoding within M2. We have provided detail on how the algorithm was trained, how shuffled data was generated and how we determined significance of decoding accuracy.

      Measuring Beta adrenoceptors is a great idea, and the results are interesting, especially the difference between neuron types. But again, how does that fit with neurophysiological results? Note, that since this is RNA measures, it should not be phrased as 'receptors' but 'receptors RNA' throughout. One possible interpretation of these anatomical results that cannot be reconciled with physiology is that protein expression at the membrane shows a distinct pattern.

      We have changed the references to β receptor expression to β receptor mRNA expression throughout the manuscript. Although mRNA provides a valuable proxy for adrenoreceptor production, as noted by the reviewer protein expression at the membrane may differ. Reliable antibodies that allow quantitative analysis of membrane bound adrenoreceoptors in situ with co-labeling of specific cell types are limited. The goal of assessing mRNA expression within M2 was to determine if the functional sex differences we identified in M2 neurophysiology when manipulating β adrenoreceptor function could be mediated by basal differences in adrenoreceptors. The causal impact of differential mRNA expression in anterior M2 was not directly tested but our findings provide preliminary evidence that adrenoreceptor regulation may differ across sexes. Our results provide a plausible avenue for differential sensitivity to β adrenoreceptor manipulation across sexes, that may also be found in other brain regions.

      In conclusion, I think that this is a very interesting study and that the results are potentially relevant for a wide audience. But the paper would clearly benefit from revisions. If the authors could clearly identify a significant relationship between the action of NA on beta receptors on specific cortical neurons, at a physiological and behavioral level, that would be a seminal study. At the moment, the evidence is not convincing enough but the data suggest that it is the case.

      We thank the reviewer for the kind remarks. We have undertaken a number of new analyses, refined existing analysis and clarified our claims in the manuscript to improve rigor. Collectively our data reflect that the behavioral and neural deficits after systemic propranolol are likely due to both direct and indirect actions on M2. We believe this work is compelling and that it will inform future work investigating potential sex differences in central noradrenergic anatomy and functional sex differences after perturbations of noradrenergic signaling.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) What's the rationale of trypsinizing the tissue prior to mitochondrial isolation? This is not standard for subsequent proteomics analysis. This step will inevitably cause protein loss, especially for the post mitochondrial fractions (PMF). Treating samples with 0.01ug/uL trypsin for 37oC 30 min is sufficient to partially digest a substantial portion of the proteome. If samples from different subjects were not of the same weight, then this partial digestion step may introduce artificial variability as variable proportions of proteins from different subjects would be lost during this step. In addition, the mitochondrial protein enrichment in the mito fraction, despite statistically significant, does not look striking (Figure 1E, ~30% mitochondrial proteins in the mito fraction). As a comparison, Williams et al., MCP 2018 seem to have obtained high mitochondrial protein content in the mito fraction without trpsinizing the frozen quadriceps using a similar SWATH-MS-based approach.

      Trypsinisation of the tissue prior to mitochondrial isolation is based on previous work and a Nature Protocol (1, 2) which isolated mitochondria for skeletal muscle. The rationale is that it aids in mechanical homogenisation from highly fibrous tissues such as quadriceps muscle by digesting extracellular matrix proteins. The trypsin/protein ratio used to aid in this process is at least 400 times lower than the amount of trypsin used for formal proteomic tryptic digestion. Three pieces of evidence suggest this step has negligible effect on downstream proteomic analysis. First, because the trypsinisation buffer is detergent free, trypsin will only affect extracellular or exposed membrane proteins. Filtering our PMF dataset for proteins with ‘extracellular matrix’ gene ontology identifies at least 90 unique extracellular matrix proteins indicating good retention of proteins susceptible to partial digestion. Second, the trypsin dose used is 50 times lower than the concentration used for passaging cultured cells, which retain viability after trypsinisation. Third, and contrary to the point raised by the reviewer, we observe less missingness in PMF samples compared to mitochondrial samples. We thank the reviewer for bringing the Williams et al. 2018 MCP paper to our attention. We note that mitochondrial enrichment between the two papers is comparable (~2- fold). To improve clarity line 408 now reads: “Whole quadriceps muscle samples were prepared as previously described with modification (99, 100). First, tissue was snap frozen with liquid nitrogen…” and line 95 reads: “Mitochondrial proteins were defined based on their presence in MitoCarta 3.0 (24) and consistent with previous work (25) were approximately two-fold enriched in the mitochondrial fraction relative to the PMF (Fig 1E).”

      (2) The authors mentioned that the proteomics data were Log2 transformed and median- normalized. Would it be possible to provide a bit more details on this? Were the subjects randomized?

      Samples were randomised prior to sample processing and mass spectrometry analysis. Because of possible variation in total protein content, it is critical to normalise protein intensities between samples. Median normalisation adjusts the samples so that they have the same median, thereby accounting for technical variation. Log2 normalisation helps to achieve normal distributions, critical for many downstream statistical tests. Line 471 now reads: “…to achieve normal distributions and account for technical variation in total protein.”

      (3) In Figure 1D, what were the numbers of mice the authors used for the CV comparisons in each group? Were they of similar age and sex? Were the differences in CV values statistically significant?

      The mitochondrial and PMF proteomes originated from the same quadriceps sample from the same mouse, and thus the age and sex are the same across both proteomes. After quality control, we had mitochondrial proteomes for 194 mice and PMF proteomes for 215 mice. The overall CV in the mitochondrial fraction was significantly greater than in the PMF, however whether the source of this variation is biological, or the result of mitochondrial isolation is unclear and as such we have avoided making a statement within the body of the manuscript. We have now more clearly described the nature of the samples in the revised manuscript and added sample sizes to figure 1F.

      (4) The authors stated in lines 155-157 that proteins negatively associated with the Matsuda index were further filtered by presence of their cis-pQTLs. Perhaps more explanations would be needed to justify this filtering criterion? Having a cis-pQTL would mean the protein abundance variation is explained by the variation in its coding gene, this however conceptually would not be relevant to its association with the Matsuda index. With the data that the authors have in hand, would it not be natural to align the Matsuda index QTL with the pQTLs (cis and trans if available), and/or to perform mediation analysis to examine causal relationships with statistical significance?

      The rationale for filtering by cis-pQTL was not to study the genetics of either Matsuda or associated proteins but rather to identify proteins that were more likely to be causally associated with Matsuda Index as opposed to adaptively associated. To clarify this line 165 now reads: “Filtering based on cis-pQTL presence was based on the rationale that if genetic variation can explain protein abundance differences between mice, then we can be confident that phenotype (Matsuda Index) is not driving the observed differences and therefore the protein-phenotype associations are likely causal. Importantly, this assumption can only be made for cis-acting pQTLs.” Previous work by Matthew et al. (see https://qtlviewer.jax.org/) has demonstrated that cis-pQTL have markedly higher LOD scores than trans-pQTLs, and our own unpublished work suggests that trans-pQTLs do not reproduce well between datasets. The reviewer rightfully suggests aligning protein QTL with those for Matsuda. This is our long-term goal but to identify genome wide significant peaks associated with altered Matsuda will require many more mice than studied here.

      (5) It seems a bit odd that the first half of the paper focused extensively on the authors' discoveries in the mitochondrial proteome, and how proteins involved in mitochondrial processes (such as complex I) were associated with Matsuda Index, but the final fingerprint list of insulin resistance, which contained 76 proteins, only had 7 mitochondrial proteins. Was this because many mitochondrial proteins were filtered out due to no cis-pQTL presenting?

      There are three reasons our fingerprint is lacking mitochondrial proteins: 1) there are more non-mitochondrial than mitochondrial proteins in the muscle proteome; 2) we focussed on negatively associated proteins, and as demonstrated in figure 2c, the mitochondrial proteome is enriched for positively associated proteins; 3) as implied by the reviewer, we filtered for pQTL presence, further reducing the number of mitochondrial proteins in our fingerprint. To improve clarity, line 170 now reads: “Low mitochondrial representation in the fingerprint is the result of selecting negatively associating proteins, and as seen (Figure 2C) previously, the mitochondrial proteome is enriched for positive contributors to insulin resistance.”

      (6) The authors found that thiostrepton-induced insulin resistance reversal effects were not through insulin signalling. It activated glycolysis but the mechanism of action was not clear. What are the proteins in the fingerprint list that led to identification of thiostrepton on CMAP?

      Is thiostrepton able to bind or change the expression of these proteins? Since thiostrepton was identified by searching the insulin resistance fingerprint protein list against CMAP, it would be rational to think that it exerts the biological effects by directly or indirectly acting on these protein targets.

      This is indeed the implication of our data. Because of the timescales involved it is unlikely that thiostrepton is changing fingerprint protein levels but could be binding to and inhibiting them. Searching the CMAP thiostrepton signature reveals ARHGDIB and NAGK as the fingerprint proteins with the most positive and negative fold-changes respectively perhaps suggesting they play a role in thiostrepton’s mechanism of action. Experiments are underway to test this hypothesis however these are beyond the scope of the current paper.

      Reviewer #2 (Public Review):

      Line 105: The observation that variance in respiratory proteins is stable while lipid pathways is variable is quite interesting. Is this due to lower overall levels of lipid metabolism enzymes (ex. do these differ substantially from similar pathways ranked from high-low abundance?).

      The relationship between coefficient of variation (CV) and relative abundance of proteins is important to consider. To address this, we have now also performed GSEA on proteins ranked from high to low relative abundance. These comparisons have been added to supplementary figure 1 and line 110 now reads: “As a control experiment, we also performed enrichment analysis on proteins ranked by LFQ relative abundance. High CV pathways (enriched for high CV proteins) tended to be lower in relative abundance (enriched for low relative abundance proteins) (Supplementary Fig 1a, b). However, many high variability pathways, lipid metabolism for example, were not enriched in either direction based on relative abundance suggesting differences in relative abundance do not fully explain pathway variability differences.”

      Line 154: the 664 associations are impressive and potentially informative. It would be valuable to know which of these co-map to the same locus - either to distinguish linkage in a 2mb window or identify any cis-proteins which directly exert effects in trans-

      To assess this, we have analysed pQTL position relative to gene position to generate a ‘hotspot’ plot. We have also generated a histogram of this pQTL density (in a 2 Mbp window) and added these figures to figure 3. We did not detect any obvious pQTL hotspots, and the distribution of pQTLs across the genome appears fairly uniform. Line 159 now reads: “These were distributed across the genome and were predominately cis acting (Figure 3A)...”

      Line 194: Cross-platform validation of the CMAP fingerprint results is an admirable set of validations. It might be good to know general parameters like how many compounds were shared/unique for each platform. Also the concordance between ranking scores for significant and shared compounds.

      The Connectivity Map (CMap) query included 5163 compounds, the Prestwick library included 1120, and the overlap was 420. We have added these comparisons to supplementary figure 2. Supplementary figure 2 now also contains a comparison of CMap scores between overlapping compounds (found in CMap and the Prestwick library) against all significant compounds identified by CMap (supplementary figure 2b). Interestingly, compounds present in both platforms scored higher on average, suggesting the Prestwick library captures a significant proportion of highly scoring CMap candidates. Line 206 now reads: “In total, 420 compounds were found across both platforms, and these consensus compounds captured a significant proportion of highly scoring CMap compounds (Supplementary Figure 2A, B).”

      Line 319: Another consideration in the molecular fingerprint is how unique these are for muscle. While studies evaluating gene expression have shown that many cis-eQTLs are shared across tissues, to my knowledge, this hasn't been performed systematically for pQTLs. Therefore, consider adding a point to the discussion pointing out that some of the proteins might be conserved pQTLs whereas others which would be more relevant here present unique druggable targets in muscle.

      To examine tissue specificity, we determined whether our skeletal muscle fingerprint proteins were detected and contained a pQTL in two metabolically important tissues, liver and adipose. Despite detecting almost all the fingerprint proteins in both adipose and liver tissue, they were depleted for pQTL compared to skeletal muscle. These data have now been added to figure 3c. Line 172 now reads: “To assess the tissue specificity of our fingerprint we searched for the same proteins in metabolically important adipose and liver tissues. Despite detecting 94% and 82% of muscle fingerprint proteins across each tissue respectively, both adipose and liver were depleted for pQTL presence (Figure 3C) suggesting that regulation of our fingerprint protein abundance is specific to skeletal muscle.”

      Line 332: These are fascinating observations. 1, that in general insulin signaling and ampk were not themselves shown as top-ranked enrichments with matsuda and that this was sufficient to alter glucose metabolism without changes in these pathways. While further characterization of this signaling mechanism is beyond the scope of this study, it would be good to speculate as to additional signaling pathways that are relevant beyond ROS (ex. CNYP2 and others)

      We have now added further discussion to the manuscript to address this point., Line 347 now reads: “Aside from glycolysis, other pathways may be involved in enhancing insulin sensitivity. For example, the negatively associated protein ARHGDIA (Figure 2F) is a potent negative regulator of insulin sensitivity, and our fingerprint of insulin resistance contained its homologue ARHGDIB. Both ARHGDIA and ARHGDIB have been reported to inhibit the insulin action regulator RAC1 thus lowering GLUT4 translocation and glucose uptake. Further investigations may uncover a role for thiostrepton in modulating the RAC1 signalling pathway via ARHGDIB.”

      Line: 314: Remove the statement: "While this approach is less powerful than QTL co- localisation for identifying causal drivers,", as I don't believe that this has been demonstrated. Clearly, the authors provide a sufficient framework to pinpoint causality and produce an actionable set of proteins.

      We have edited line 314, which now reads: “Moreover, our approach has the major advantage that it requires far fewer mice to obtain meaningful outcomes (222 mice in this study) compared to that required for genetic mapping of complex traits like Matsuda Index.”

      Line 346: I would highlight one more appeal of the approach adopted by the authors. Given that these compound libraries were prioritized from patterns of diverse genetics, these observations are inherently more-likely to operate robustly across target backgrounds.

      This point is further supported by our thiostrepton results in both C57BL6/j and BXH9 mice. Line 317 now reads: “Furthermore, because we have used genetically diverse datasets (DOz mice and multiple cell lines in Connectivity Map) our findings are likely robust across diverse target backgrounds.”

      Line 434: I might have missed but can't seem to find where the muscle data are available to researchers. Given the importance and novelty of these studies, it will be important to provide some way to access the proteomic data.

      These data are now available via the ProteomeXchange Consortium. Line 465 now reads: “The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (104) partner repository with the dataset identifier PXD042277.”

      1. Frezza C, Cipolat S, Scorrano L. Organelle isolation: functional mitochondria from mouse liver, muscle and cultured filroblasts. Nat Protoc. 2007;2(2):287-95.

      2. Acin-Perez R, Benador IY, Petcherski A, Veliova M, Benavides GA, Lagarrigue S, et al. A novel approach to measure mitochondrial respiration in frozen biological samples. The EMBO Journal. 2020;39(13):e104073.

      3. Chick JM, Munger SC, Simecek P, Huttlin EL, Choi K, Gatti DM, et al. Defining the consequences of genetic variation on a proteome-wide scale. Nature. 2016;534(7608):500- 5.

      4. Gatti DM, Svenson KL, Shabalin A, Wu L-Y, Valdar W, Simecek P, et al. Quantitative Trait Locus Mapping Methods for Diversity Outbred Mice. G3 Genes|Genomes|Genetics. 2014;4(9):1623-33.

    1. Reviewer #2 (Public Review):

      Accumulating data suggests that the presence of immune cell infiltrates in the meninges of the multiple sclerosis brain contributes to the tissue damage in the underlying cortical grey matter by the release of inflammatory and cytotoxic factors that diffuse into the brain parenchyma. However, little is known about the identity and direct and indirect effects of these mediators at a molecular level. This study addresses the vital link between an adaptive immune response in the CSF space and the molecular mechanisms of tissue damage that drive clinical progression. In this short report the authors use a spatial transcriptomics approach using Visium Gene Expression technology from 10x Genomics, to identify gene expression signatures in the meninges and the underlying brain parenchyma, and their interrelationship, in the PLP-induced EAE model of MS in the SJL mouse. MRI imaging using a high field strength (11.7T) scanner was used to identify areas of meningeal infiltration for further study. They report, as might be expected, the upregulation of genes associated with the complement cascade, immune cell infiltration, antigen presentation, and astrocyte activation. Pathway analysis revealed the presence of TNF, JAK-STAT and NFkB signaling, amongst others, close to sites of meningeal inflammation in the EAE animals, although the spatial resolution is insufficient to indicate whether this is in the meninges, grey matter, or both.

      UMAP clustering illuminated a major distinct cluster of upregulated genes in the meninges and smaller clusters associated with the grey matter parenchyma underlying the infiltrates. The meningeal cluster contained genes associated with immune cell functions and interactions, cytokine production, and action. The parenchymal clusters included genes and pathways related to glial activation, but also adaptive/B-cell mediated immunity and antigen presentation. This again suggests a technical inability to resolve fully between the compartments as immune cells do not penetrate the pial surface in this model or in MS. Finally, a trajectory analysis based on distance from the meningeal gene cluster successfully demonstrated descending and ascending gradients of gene expression, in particular a decline in pathway enrichment for immune processes with distance from the meninges.

      Although these results confirm what we already know about processes involved in the meninges in MS and its models and gradients of pathology in sub-pial regions, this is the first to use spatial transcriptomics to demonstrate such gradients at a molecular level in an animal model that demonstrates lymphoid like tissue development in the meninges and associated grey matter pathology. The mouse EAE model being used here does reproduce many, although not all, of the pathological features of MS and the ability to look at longer time points has been exploited well. However, this particular spatial transcriptomics technique cannot resolve at a cellular level and therefore there is a lot of overlap between gene expression signatures in the meninges and the underlying grey matter parenchyma.

      The short nature of this report means that the results are presented and discussed in a vague way, without enough molecular detail to reveal much information about molecular pathogenetic mechanisms.

      The trajectory analysis is a good way to explore gradients within the tissues and the authors are to be applauded for using this approach. However, the trajectory analysis does not tell us much if you only choose 2 genes that you think might be involved in the pathogenetic processes going on in the grey matter. It might be more useful to choose some genes involved in pathogenetic processes that we already know are involved in the tissue damage in the underlying grey matter in MS, for which there is already a lot of literature, or genes that respond to molecules we know are increased in MS CSF, although the animal models may be very different. Why were C3 and B2m chosen here?

      Strengths:<br /> - The mouse model does exhibit many of the features of the compartmentalized immune response seen in MS, including the presence of meningeal immune cell infiltrates in the central sulcus and over the surface of the cortex, with the presence of FDC's HEVs PNAd+ vessels and CXCL13 expression, indicating the formation of lymphoid like cell aggregates. In addition, disruption of the glia limitans is seen, as in MS. Increased microglial reactivity is also present at the pial surface.<br /> - Spatial transcriptomics is the best approach to studying gradients in gene expression in both white matter and grey matter and their relationship between compartments.<br /> - It would be useful to have more discussion of how the upregulated pathways in the two compartments fit with what we know about the cellular changes occurring in both, for which presumably there is prior information from the group's previous publications.

      Limitations:<br /> - EAE in the mouse is not MS and may be far removed when one considers molecular mechanisms, especially as MS is not a simple anti-myelin protein autoimmune condition. Therefore, this study could be following gene trajectories that do not exist in MS. This needs a significant amount of discussion in the manuscript if the authors suggest that it is mimicking MS.<br /> - The model does not have the cortical subpial demyelination typical of MS and it is unknown whether neuronal loss occurs in this model, which is the main feature of cytokine-mediated neurodegeneration in MS. If it does not then a whole set of genes will be missing that are involved in the neuronal response to inflammatory stimuli that may be cytotoxic.<br /> - Visium technology does not get down to single cell level and does not appear to allow resolution of the border between the meninges and the underlying grey matter.<br /> - Neuronal loss in the MS cortex is independent of demyelination and therefore not related to remyelination failure. There does not appear to be any cortical grey matter demyelination in these animals, so it is difficult to relate any of the gene changes seen here to demyelination.<br /> - No mention of how the ascending and descending patterns of gene expression may be due to the gradient of microglial activation that underlies meningeal inflammation, which is a big omission.

    2. Author Response:

      We thank Reviewer #1 for their positive assessment of our work.

      Reviewer #2 (Public Review):

      […] Although these results confirm what we already know about processes involved in the meninges in MS and its models and gradients of pathology in sub-pial regions, this is the first to use spatial transcriptomics to demonstrate such gradients at a molecular level in an animal model that demonstrates lymphoid like tissue development in the meninges and associated grey matter pathology. The mouse EAE model being used here does reproduce many, although not all, of the pathological features of MS and the ability to look at longer time points has been exploited well. However, this particular spatial transcriptomics technique cannot resolve at a cellular level and therefore there is a lot of overlap between gene expression signatures in the meninges and the underlying grey matter parenchyma.

      We appreciate the reviewer’s concise summary and comments on our manuscript. We agree that the Visium spatial sequencing technology we applied is limited in its resolution and cannot precisely distinguish individual cells or anatomic regions. For that reason, there is undoubtedly some overlap between gene expression signatures in the meninges and underlying parenchyma, particularly in spots on the borders of the meningeal inflammation clusters. However, we believe that the majority of meningeal inflammation (“cluster 11”) spots are indeed in the meninges and represent the spatial transcriptome of that niche. To support this, in the revised manuscript we will provide H&E images with the UMAP clusters overlayed to demonstrate the anatomic borders that correlate with the clusters.

      The short nature of this report means that the results are presented and discussed in a vague way, without enough molecular detail to reveal much information about molecular pathogenetic mechanisms.

      We thank the reviewer for this comment. The goal of this work is to transcriptomically characterize the spatial relationship between areas of meningeal inflammation and the underlying parenchyma. While we agree that mechanistic studies are needed to further evaluate the role of presented signaling pathways, those experiments are beyond the scope of this brief report.

      The trajectory analysis is a good way to explore gradients within the tissues and the authors are to be applauded for using this approach. However, the trajectory analysis does not tell us much if you only choose 2 genes that you think might be involved in the pathogenetic processes going on in the grey matter. It might be more useful to choose some genes involved in pathogenetic processes that we already know are involved in the tissue damage in the underlying grey matter in MS, for which there is already a lot of literature, or genes that respond to molecules we know are increased in MS CSF, although the animal models may be very different. Why were C3 and B2m chosen here?

      We appreciate the reviewer’s points here. C3 and B2m were chosen as examples of genes that have differential fit to the gradient descending pattern to assist the reader in interpreting subsequent gene set trajectory analysis. However, we agree that there are many other genes of interest and will expand the number of genes displayed in our revised manuscript. 

      Strengths: <br /> - The mouse model does exhibit many of the features of the compartmentalized immune response seen in MS, including the presence of meningeal immune cell infiltrates in the central sulcus and over the surface of the cortex, with the presence of FDC's HEVs PNAd+ vessels and CXCL13 expression, indicating the formation of lymphoid like cell aggregates. In addition, disruption of the glia limitans is seen, as in MS. Increased microglial reactivity is also present at the pial surface. <br /> - Spatial transcriptomics is the best approach to studying gradients in gene expression in both white matter and grey matter and their relationship between compartments. <br /> - It would be useful to have more discussion of how the upregulated pathways in the two .compartments fit with what we know about the cellular changes occurring in both, for which presumably there is prior information from the group's previous publications.

      Limitations: <br /> - EAE in the mouse is not MS and may be far removed when one considers molecular mechanisms, especially as MS is not a simple anti-myelin protein autoimmune condition. Therefore, this study could be following gene trajectories that do not exist in MS. This needs a significant amount of discussion in the manuscript if the authors suggest that it is mimicking MS. <br /> - The model does not have the cortical subpial demyelination typical of MS and it is unknown whether neuronal loss occurs in this model, which is the main feature of cytokine-mediated neurodegeneration in MS. If it does not then a whole set of genes will be missing that are involved in the neuronal response to inflammatory stimuli that may be cytotoxic. <br /> - Visium technology does not get down to single cell level and does not appear to allow resolution of the border between the meninges and the underlying grey matter. <br /> - Neuronal loss in the MS cortex is independent of demyelination and therefore not related to remyelination failure. There does not appear to be any cortical grey matter demyelination in these animals, so it is difficult to relate any of the gene changes seen here to demyelination. <br /> - No mention of how the ascending and descending patterns of gene expression may be due to the gradient of microglial activation that underlies meningeal inflammation, which is a big omission.

      We thank the reviewer for their insightful comments on the strengths and limitations of our study. Regarding the SJL EAE model we use in this paper, it certainly is not a perfect model of meningeal inflammation in MS, indeed we believe that no such animal model exists, but it does recapitulate several key features of human disease as described by the reviewer. Spatial transcriptomics of cortical grey matter lesions and overlying meninges of samples derived from patients with MS would be ideal, though access to this tissue is highly limited. In the revised manuscript we will include more detailed discussion of the limitations in applying these findings to MS. However, in addition to potential implications for MS research, our data contribute more generally to understanding of meningeal inflammation and penetrance of inflammation into brain tissue.

      We acknowledge that sub-pial neuronal loss has not been assessed in SJL EAE, and if present it would increase the relevance of this model to neurodegeneration. We are currently working to assess this.

      We agree with the reviewer that Visium technology is limited in its ability to discriminate individual cells, as discussed above (2.2).

      We agree that gene expression by activated microglia is likely a major driver of the transcriptomic changes observed in the parenchyma, and thank the reviewer for highlighting this. We will add discussion of this to our revised manuscript, and intend to generate additional data regarding the contribution of subpial microglial activation to the measured transcriptomic changes.

      Finally, we thank Reviewer #3 for their assessment of our work.

    1. Author Response

      eLife assessment:

      Trypanosoma brucei evades mammalian humoral immunity through the expression of different variant surface glycoprotein genes. In this fundamental paper, the authors extend previous observations that TbRAP1 both interacts with PIP5pase and binds PI(3,4,5)P3, indicating a role for PI(3,4,5)P3 binding and suggesting that antigen switching is signal dependent. While much of the evidence is compelling, one reviewer suggested that the work would benefit from further controls.

      We appreciate the evaluation of the work and agree that the findings substantially advance our understanding of antigenic variation. A detailed response to the public review is included below, which addresses and clarifies the issues raised by the reviewers, including those concerning controls. We also want to highlight the comment by Reviewer #3 “The methods used in the study are rigorous and well-controlled…. their results support the conclusions made in the manuscript.”. We hope this and our comments will help address the issue of controls in this eLife statement.

      Reviewer #1 (Public Review):

      Trypanosoma brucei undergoes antigenic variation to evade the mammalian host’s immune response. To achieve this, T. brucei regularly expresses different VSGs as its major surface antigen. VSG expression sites are exclusively subtelomeric, and VSG transcription by RNA polymerase I is strictly monoallelic. It has been shown that T. brucei RAP1, a telomeric protein, and the phosphoinositol pathway are essential for VSG monoallelic expression. In previous studies, Cestari et al. (ref. 24) have shown that PIP5pase interacts with RAP1 and that RAP1 binds PI(3,4,5)P3. RNAseq and ChIPseq analyses have been performed previously in PIP5pase conditional knockout cells, too (ref. 24). In the current study, Touray et al. did similar analyses except that catalytic dead PIP5pase mutant was used and the DNA and PI(3,4,5)P3 binding activities of RAP1 fragments were examined. Specifically, the authors examined the transcriptome profile and did RAP1 ChIPseq in PIP5pase catalytic dead mutant. The authors also expressed several C-terminal His6-tagged RAP1 recombinant proteins (full-length, aa1-300, aa301-560, and aa 561-855). These fragments’ DNA binding activities were examined by EMSA analysis and their phosphoinositides binding activities were examined by affinity pulldown of biotin-conjugated phosphoinositides. As a result, the authors confirmed that VSG silencing (both BES-linked and MES-linked VSGs) depends on PIP5pase catalytic activity, but the overall knowledge improvement is incremental. The most convincing data come from the phosphoinositide binding assay as it clearly shows that N-terminus of RAP1 binds PI(3,4,5)P3 but not PI(4,5)P2, although this is only assayed in vitro, while the in vivo binding of full-length RAP1 to PI(3,4,5)P3 has been previously published by Cestari et al (ref. 24) already. Considering that many phosphoinositides exert their regulatory role by modulating the subcellular localization of their bound proteins, it is reasonable to hypothesize that binding to PI(3,4,5)P3 can remove RAP1 from the chromatin. However, no convincing data have been shown to support the author’s hypothesis that this regulation is through an “allosteric switch”. Therefore, the title should be revised.

      We appreciate the reviewer’s detailed evaluation of our work. There are a few general comments that we would like to clarify. We will break them into three points. All data included here are new and were not previously published.

      i) “RNAseq and ChIPseq analyses have been performed previously …(ref. 24).” Reference 24 is Cestari et al. 2019, Mol Cell Biol. We, or others, have not published ChIP-seq of RAP1 in T. brucei. Previous work showed ChIP-qPCR, which analyses specific loci. The ChIP-seq shows genome-wide binding sites of RAP1, and new findings are shown here, including binding sites in the BES, MESs, and other genome loci such as centromeres. We also identified DNA sequence bias defining RAP1 binding sites (Fig 2A). We also show by ChIP-seq how RAP1-binding to these loci changes upon expression of catalytic inactive PIP5Pase. As for the RNA-seq, this is also the first time we show RNA-seq of T. brucei expressing catalytic inactive PIP5Pase, which establishes that the regulation of VSG silencing and switching is dependent on PIP5Pase enzyme catalysis, i.e., PI(3,4,5)P3 dephosphorylation. To improve clarity in the manuscript, we edited page 4, line 122, as follows: “We showed that RAP1 binds telomeric or 70 bp repeats (24), but it is unknown if it binds to other ES sequences or genomic loci.”

      ii) “The in vivo binding of full-length RAP1 to PI(3,4,5)P3 has been previously published by Cestari et al. (ref. 24) already.”. We published in reference 24 that RAP1-HA can bind agarose beads-conjugated synthetic PI(3,4,5)P3. Here, we were able to measure T. brucei endogenous PI(3,4,5)P3 associated with RAP1-HA (Fig 4F). Moreover, we showed that the endogenous RAP1-HA and PI(3,4,5)P3 binding is about 100-fold higher when PIP5Pase is catalytic inactive than WT PIP5Pase. The data establish that in vivo endogenous PI(3,4,5)P3 binds to RAP1-HA and how the binding changes in cells expressing mutant PIP5Pase; this data is new and relevant to our conclusions.

      iii) “no convincing data have been shown to support the author’s hypothesis that this regulation is through an “allosteric switch””. We show here in vitro and in vivo data supporting the conclusion. We show that PI(3,4,5)P3 binds to the N-terminus of rRAP1-His with a calculated Kd of about 20 µM (Fig 4B-E, Table 1). In contrast, we show by EMSA and binding kinetics by microscale thermophoresis that rRAP1-His binds to 70 bp and telomeric repeats via protein regions encompassing the Myb (central) or Myb-L domains (C-terminal) but not the N-terminus containing the VHP domain (Fig 3C-G, and Fig S5). Using microscale thermophoresis, we also show that rRAP1-His binds to 70 bp and telomeric repeats with Kd of 10 and 24 nM, respectively (Fig 3 and Table 1). Notably, we show that 30 µM of PI(3,4,5)P3, but not PI(4,5,)P2 – used as a control – disrupts rRAP1-His binding to 70 bp and telomeric repeats, changing Kds to about 188 and 155 nM, respectively (Fig 5A-C). We also show that PI(3,4,5)P3 does not disrupt the binding of rRAP1-His fragments (Myb or MybL) without the N-terminus domain (Fig S5), implying binding of PI(3,4,5)P3 to RAP1 N-terminus is required for displacement of RAP1 DNA binding domains (Myb and MybL) from telomeric and 70 bp repeats, and that PI(3,4,5)P3 is not competing for Myb or Myb-L binding to DNA. Moreover, we show that RAP1-HA binding to 70 bp and telomeric repeats in vivo is displaced in T. brucei cells expressing catalytic inactive PIP5Pase (Fig 5D-G), which we show results in RAP1-HA binding about 100-fold more endogenous PI(3,4,5)P3 than in T. brucei expressing WT PIP5Pase (Fig 4F). The in vivo data agrees with the in vitro data. The data show a typical allosteric regulator system, in which binding of a ligand to one site of the protein, here PI(3,4,5)P3 binding to RAP1 N-terminus, affects other domains (RAP1 Myb and Myb-L domains) binding to DNA. To improve the clarity of the title, we will change it in the revised version to imply a direct role of PI(3,4,5)P3 regulation of RAP1 in the process. This will provide more specific information to the readers and addresses the concern of the reviewer related to the “allosteric switch”. The new title will be: PI(3,4,5)P3 allosteric regulation of RAP1 controls antigenic switching in trypanosomes

      There are serious concerns about many conclusions made by Touray et al., according to their experimental approaches:

      1) The authors have been studying RAP1’s chromatin association pattern by ChIPseq in cells expressing a C-terminal HA tagged RAP1. According to data from tryptag.org, RAP1 with an N-terminal or a C-terminal tag does not seem to have identical subcellular localization patterns, suggesting that adding tags at different positions of RAP1 may affect its function. It is therefore essential to validate that the C-terminally HA-tagged RAP1 still has its essential functions. However, this data is not available in the current study. RAP1 is essential. If RAP1-HA still retains its essential functions, cells carrying one RAP1-HA allele and one deleted allele are expected to grow the same as WT cells. In addition, these cells should have the WT VSG expression pattern, and RAP1-HA should still interact with TRF. Without these validations, it is impossible to judge whether the ChIPseq data obtained on RAP1-HA reflect the true chromatin association profile of RAP1.

      Tryptag data show both N- and C-terminus RAP1 with nuclear localization in procyclic forms, although there are differences in signal intensities in the images (http://tryptag.org/?id=Tb927.11.370). It is important to note that Tryptag data is from procyclic forms, and DNA constructs are not validated for their integration in the correct locus. As for the RAP1-HA localization in bloodstream forms, we demonstrated that C-terminally HA-tagged RAP1 co-localizes with telomeres by a combination of immunofluorescence and fluorescence in situ hybridization (Cestari and Stuart, 2015, PNAS), and RAP1-HA co-immunoprecipitate telomeric and 70 bp repeats (Cestari et al. 2019 Mol Cell Biol). We also showed by immunoprecipitation and mass spectrometry that HA-tagged RAP1 interacts with nuclear and telomeric proteins, including PIP5Pase (Cestari et al. 2019). Others have also tagged T. brucei RAP1 in bloodstream forms with HA without disrupting its nuclear localization (Yang et al. 2009, Cell; Afrin et al. 2020, Science Advances). As for the experiment suggested by the reviewer, there is no guarantee that cells lacking one allele of RAP1 will behave as wildtype, i.e., normal growth and repression of VSGs genes. Also, less than 90% of T. brucei TRF was reported to interact with RAP1 (Yang et al. 2009, Cell), which might be indirect via their binding to telomeric DNA repeats rather than direct protein-protein interactions.

      2) Touray et al. expressed and purified His6-tagged recombinant RAP1 fragments from E. coli and used these recombinant proteins for EMSA analysis: The His6 tag has been used for purifying various recombinant proteins. It is most likely that the His6 tag itself does not convey any DNA binding activities. However, using His6-tagged RAP1 fragments for EMSA analysis has a serious concern. It has been shown that His6-tagged human RAP1 protein can bind dsDNA, but hRAP1 without the His6 tag does not. It is possible that RAP1 proteins in combination with the His6 tag can exhibit certain unnatural DNA binding activities. To be rigorous, the authors need to remove the His6 tag from their recombinant proteins before the in vitro DNA binding analyses are performed. This is a standard procedure for many in vitro assays using recombinant proteins.

      We show in Fig 3C-G that His-tagged full-length rRAP1 does not bind to scrambled telomeric dsDNA sequences, which indicates that His-tagged rRAP1 does not bind unspecifically to DNA. Moreover, in Fig 3G, we show that His-tagged rRAP11-300 also does not bind to 70 bp or telomeric repeats. In contrast, full-length His-tagged rRAP1, rRAP1301-560, or rRAP1561-855 bind to 70 bp or telomeric repeats (Fig 3C-G). Since all proteins were His-tagged, the His tag cannot be responsible for the DNA binding.

      As for the statement that human rRAP1-His has unspecific DNA binding properties, we could not find a reference to this statement; we cannot compare it without knowing the details of the experiment. Biochemical assays can result in unspecific binding depending on binding/buffer conditions. Also, humans and T. brucei RAP1 share only 15% of amino acid identity; unspecific binding to DNA could be specific to human RAP1.

      3) It is unclear why Nanopore sequencing was used for RNAseq and ChIPseq experiments. The greatest benefit of Nanopore sequencing is that it can sequence long reads, which usually helps with mapping, particularly at genome loci with repetitive sequences. This seems beneficial for RAP1 ChIPseq analysis as RAP1 is expected to bind telomere repeats. However, for ChIPseq, the chromatin needs to be fragmented. Larger DNA fragments from ChIPseq experiments will decrease the accuracy of the final calculated binding sites. Therefore, ChIPseq experiments are not supposed to have long reads to start with, so Nanopore sequencing does not seem to bring any advantage. In addition, compared to Illumina sequencing, Nanopore sequencing usually yields smaller numbers of reads, and the sequencing accuracy rate is lower. The Nanopore sequencing accuracy may be a serious concern in the current study. All telomeres have the perfect TTAGGG repeats, all VSG genes have a very similar 3’ UTR, and all 70 bp repeats have very similar sequences. In fact, the active and silent ESs have 90% sequence identity. Are sequence reads accurately mapped to different ESs? How is the sequencing and mapping quality controlled? Furthermore, it is unclear whether the read depth for RNAseq is deep enough.

      The mean sequence length for the ChIP-seq was about 500 bp (see Table S3), which helps to align reads to ESs and distinguish the different ESs, and it is a reasonable size range to define RAP1 binding sites. Although sequencing depths are usually higher in Illumina than in nanopore (all depending on the amount of sequencing), most Illumina short reads map to multiple genomic sequences, making it difficult to distinguish ESs. This is particularly important for RAP1 because it binds to repeats such as 70 bp and telomeric repeats. Mapping short reads to those regions would be virtually impossible; hence, our choice of nanopore sequencing. For RNA-seq, the ~500 bp read length help sequence alignment to the subtelomeric regions containing many VSG genes. The nanopore reads obtained here had an average sequencing score 12 (i.e., base call accuracy of 94%). Filtering reads with MAPQ ≥ 20 (99% probability of correct alignment) helped us to distinguish RAP1 binding to specific ESs, including silent vs active ES (ChIP-seq) or VSG sequences (RNA-seq). The details of the analysis and sequencing metrics (i.e., sequencing depth and read length) were described in the Methods section “Computational analysis of RNA-seq and ChIP-seq” and Table S3, respectively.

      4) Many statements in the discussion section are speculations without any solid evidence. For example, lines 218 - 219 “likely due to RAP1 conformational changes”, no data have been shown to support this at all. In lines 224-226, the authors acknowledged that more experiments are necessary to validate their observations, so it is important for the authors to first validate their findings before they draw any solid conclusions. Importantly, RAP1 has been shown to help compact telomeric and subtelomeric chromatin a long time ago by Pandya et al. (2013. NAR 41:7673), who actually examined the chromatin structure by MNase digestion and FAIRE. The authors should acknowledge previous findings. In addition, the authors need to revise the discussion to clearly indicate what they “speculate” rather than make statements as if it is a solid conclusion.

      The statement “likely due to RAP1 conformational changes” in lines 218-219 (page 6) is part of the Discussion. We did not make a strong statement but discussed a possibility. We believe that it is beneficial to the reader to have the data discussed, and we do not feel this point is overly speculative.

      For lines 224-226 (page 6), the statement refers to the finding of RAP1 binding to centromeric regions by ChIP-seq, which is a new finding but not the focus of this work. Hence, future studies are necessary for this finding, and we believe it is appropriate in the Discussion to be upfront and highlight this point to the readers. However, for the RAP1 binding to telomeric ES sites, e.g., 70 bp repeats and telomeric repeats (the focus of this work), we validated the binding by EMSA and by performing binding kinetics using microscale thermophoresis.

      We did not include Pandya et al. 2013 NAR because the authors demonstrated RAP1 compaction of chromatin to occur in procyclic forms only. Pandya et al. stated in their abstract: “no significant chromatin structure changes were detected on depletion of TbRAP1 in BF cells”. Hence, the suggested reference is not relevant to the context of our conclusions in bloodstream forms. Nevertheless, we have reviewed the Discussion to avoid broad speculations in the revised version of the manuscript.

      There are also minor concerns:

      1) In the PIP5Pase conditional knockout system, the WT or mutant PIP5Pase with a V5 tag is constitutively expressed from the tubulin array. What’s the relative expression level of this allele and the endogenous PIP5Pase? Without a clear knowledge of the mutant expression level, it is hard to conclude whether the mutant has any dominant negative effects or whether the mutant phenotype is simply due to a lower than WT PIP5pase expression level.

      The relative mRNA levels of the exclusive expression of PIP5Pase Mut compared to the WT is available in the Data S1, RNA-seq. The Mut allele’s relative expression level is 0.85-fold to the WT allele (both from tubulin loci). We also showed by Western blot the WT and Mut PIP5Pase protein expression (Cestari et al. 2019, Mol Cell Biol). Concerning PIP5Pase endogenous alleles, we compared RNA-seq reads counts per million from the conditional null PIP5Pase cells exclusively expressing WT or the Mut PIP5Pase alleles (Data S1, this work) to our previous RNA-seq of single-marker 427 strain (Cestari et al. 2019, Mol Cell Biol). We used the single-maker 427 because the conditional null cells were generated in this strain background. The PIP5Pase WT and Mut mRNAs expressed from tubulin loci are 1.6 and 1.3-fold the endogenous PIP5Pase levels in single-marker 427, respectively. We include a statement in the Methods, page 7, lines 265-268: “The WT or Mut PIP5Pase mRNAs exclusively expressed from tubulin loci are 1.6 and 1.3-fold the WT PIP5Pase mRNA levels expressed from endogenous alleles in the single marker 427 strain. The fold-changes were calculated from RNA-seq reads counts per million from this work (WT and Mut PIP5Pase, Data S1) and our previous RNA-seq from single marker 427 strain (24).”

      2) In EMSA analysis, what are the concentrations of the protein and the probe used in each reaction? The amount of protein used in the binding assay appears to be very high, and this can contribute to the observation that many complexes are stuck in the well. Better quality EMSA data need to be shown to support the authors’ claims.

      All concentrations were provided in the Methods section. See page 9 Electrophoretic mobility shift assays: “100 nM of annealed DNA were mixed with 1 μg of recombinant protein…”. For microscale thermophoresis, also see page 9, Microscale thermophoresis binding kinetics: “1 μM rRAP1 was diluted in 16 two-fold serial dilutions in 250 mM HEPES pH 7.4, 25 mM MgCl2, 500 mM NaCl, and 0.25% (v/v) N P-40 and incubated with 20 nM telomeric or 70 bp repeats…”. Note that two different biochemical approaches, EMSA and microscale thermophoresis, were used to assess rRAP1-His binding to DNA. Both show similar results (Fig 3 and 5, and Fig S5; microscale thermophoresis shows the binding kinetics, data available in Table 1). The EMSA images clearly show the binding of RAP1 to 70 bp or telomeric repeats but not to scramble telomeric repeat DNA.

      Reviewer #2 (Public Review):

      This manuscript by Touray, et al. provides a significant new twist to our understanding of how antigenic variation may be regulated in T. brucei. Key aspects of antigenic variation are the mutually exclusive expression of a single antigen per cell and the periodic switching from expression of one antigen isoform to another. In this manuscript, the authors show, as they have previously shown, that depletion of the nuclear phosphatidylinositol 5-phosphatase (PIP5Pase) results in a loss of mutually exclusive VSG expression. Furthermore, using ChIP-seq, the authors show that the repressor/activator protein 1 (RAP1) binds to regions upstream and downstream of VSG genes located in transcriptionally repressed expression sites and that this binding is lost in the absence of a functional PIP5Pase. Importantly, the authors decided to further investigate this link between PIP5Pase and RAP1, a protein that has previously been implicated in antigenic variation in T. brucei, and found that inactivation of PIP5Pase results in the accumulation of PI(3,4,5)P3 bound to the RAP1 N-terminus and that this binding impairs the ability of RAP1 to bind DNA. Based on these observations, the authors suggest that the levels of PI(3,4,5)P3 may determine the cellular function of RAP1, either by binding upstream of VSG genes and repressing their function, or by not binding DNA and allowing the simultaneous expression of multiple VSG genes in a single parasite.

      While I find most of the data presented in this manuscript compelling, there are aspects of Figure 1 that are not clear to me. Based on Figure 1F, the authors claim that transient inactivation of PIP5Pase results in a switch from the expression of one VSG isoform to another. However, I am not exactly sure what the authors are showing in this panel, nor do the data in Figure 1F seem to be consistent with those shown in Figure 1C. Based on Figure 1F, a transient inactivation of PIP5Pase appears to result in an almost exclusive switch to a VSG located in BES12. However, based on Figure 1E, the VSG transcripts most commonly found after a transient inactivation of PIP5Pase are those from the previously active VSG (BES1) and VSGs located on chr 1 and 6 (I believe). The small font and the low resolution make it impossible to infer the location of the expressed VSG genes, nor to confirm that ALL VSG genes located in expression sites are activated, as the authors claim. Also, I was not able to access the raw ChIP-seq and RNA-seq reads. Thus, could not evaluate the quality of the sequencing data.

      We appreciate the reviewer’s comments and evaluation of our work. Fig 1E shows VSG-seq of a population after transient (24h) exclusive expression of the PIP5Pase mutant, followed by re-expression of the WT PIP5Pase allele for 60 hours (multiple VSGs are detected). As a control, it also shows VSG-seq in cells continuously expressing WT PIP5Pase (mostly VSG2, BES1 is detected). Fig 1F and Fig S1 show the sequencing of VSGs expressed by clones isolated (5-6 days of growth) after a temporary knockdown (24h) of PIP5Pase (tet -), followed by its re-expression. For comparison, no knockdown (tet +) was included. Fig 1F shows potential switchers in the population, the Fig 1E confirms VSG switching in clones.

      To clarify the difference between Fig 1E and 1F, we edited the manuscript on page 3, lines 103-110: “To verify PIP5Pase role in VSG switching, we knocked down PIP5Pase for 24h (Tet -), then restored its expression (Tet +) and isolated clones by limiting dilution and growth for 5-6 days. Analysis of isolated clones after temporary PIP5Pase knockdown (Tet -/+) confirmed VSG switching in 93 out of 94 (99%) of the analyzed clones (Fig 1F, Fig S1). The cells switched to express VSGs from silent ESs or subtelomeric regions, indicating switching by transcription or recombination mechanisms. Moreover, no switching was detected in 118 isolated clones from cells continuously expressing WT PIP5Pase (Tet +, Fig 1F).”. We also edited Fig 1F to indicate temporary knockdown (Tet -/+) vs no knockdown (Tet -). The modifications will be available in the resubmitted version of the manuscript.

      We agree that the heat map is difficult to read due to the amount of information. We will include in the revised version of the manuscript a table with the data in the supplementary information; the reader will be able to evaluate the data in detail.

      A preference for switching to specific ESs has been observed in T. brucei (Morrison et al. 2005, Int J Parasitol; Cestari and Stuart, 2015, PNAS), which may explain several clones switching to BES12. Many potential switchers were detected in the VSG-seq (Fig 1F, the whole cell population is over 107 parasites), but not all potential switchers were detected in the clonal analysis because we analyzed 212 clones total, a fraction of the over 107 cells analyzed by VSG-seq (Fig 1E). Also, it is possible that not all potential switchers are viable. However, the point of the clonal analysis is to validate the VSG switching after genetic perturbation of PIP5Pase.

      Fig 1C shows examples of ES derepression by RNA-seq after 24h exclusive expression of the mutant compared to WT PIP5Pase. The RNA-seq shows that all ESs are derepressed (Fig 1B). This can be visualized in the volcano plot (Fig 1B, BES and MES VSGs are labelled) and on the spreadsheet Data S1. Although all ESs are derepressed after PIP5Pase mutant expression, not all ESs are selected during switching, as observed in Fig 1E-F. This agrees with our previous observations in switching assays with proteins that control VSG switching (Cestari and Stuart, 2015, PNAS).

      As for metrics of sequencing and raw sequencing data. See Methods section, page 13, lines 483-485: “Sequencing information is available in Table S3 and fastq data is available in the Sequence Read Archive (SRA) with the BioProject identification PRJNA934938.” Table S3 has a summary of sequencing data. Metrics information such as sequencing quality and analysis can be found in the Methods section “Computational analysis of RNA-seq and ChIP-seq”. The latter includes information about nanopore reads, i.e., mean Q-score of 12.

      Reviewer #3 (Public Review):

      In this manuscript, Touray et al investigate the mechanisms by which PIP5Pase and RAP1 control VSG expression in T. brucei and demonstrate an important role for this enzyme in a signalling pathway that likely plays a role in antigenic variation in T. brucei.

      The methods used in the study are rigorous and well-controlled. The authors convincingly demonstrate that RAP1 binds to PI(3,4,5)P3 through its N-terminus and that this binding regulates RAP1 binding to VSG expression sites, which in turn regulates VSG silencing. Overall their results support the conclusions made in the manuscript.

      There are a few small caveats that are worth noting. First, the analysis of VSG derepression and switching in Figure 1 relies on a genome that does not contain minichromosomal (MC) VSG sequences. This means that MC VSGs could theoretically be misassigned as coming from another genomic location in the absence of an MC reference. As the origin of the VSGs in these clones isn’t a major point in the paper, I do not think this is a major concern, but I would not over-interpret the particular details of switching outcomes in these experiments.

      The authors state that “our data imply that antigenic variation is not exclusively stochastic.” I am not sure this is true. While I also favor the idea that switching is not exclusively stochastic, evidence for a signaling pathway does not necessarily imply that antigenic variation is not stochastic. This pathway could be important solely for lifecycle-related control of VSG expression, rather than antigenic variation during infection. Nevertheless, these data are critical for establishing a potential pathway that could control antigenic variation and thus represent a fundamental discovery.

      Another aspect of this work that is perhaps important, but not discussed much by the authors, is the fact that signalling is extremely poorly understood in T. brucei. In Figure 1B, the RNA-seq data show many genes upregulated after expression of the Mut PIP5Pase (not just VSGs). The authors rightly avoid claiming that this pathway is exclusive to VSGs, but I wonder if these data could provide insight into the other biological processes that might be controlled by this signaling pathway in T. brucei.

      Overall, this is an excellent study that represents an important step forward in understanding how antigenic variation is controlled in T. brucei. The possibility that this process could be controlled via a signalling pathway has been speculated for a long time, and this study provides the first mechanistic evidence for that possibility.

      We thank the reviewer for the evaluation of our work. We agree that it is difficult to ensure the origin of all VSG genes not having minichromosome sequences; hence we did not emphasize this point in the manuscript. We used the 427-2018 reference genome assembled by PacBio and Hi-C (Muller et al. 2018, Nature), which we believe is the best assembly for the 427 strain, especially related to the VSG genes.

      We also agree that having signaling controlling switching in vitro does not mean the switching necessarily occurs by signaling in vivo. Nevertheless, stochastic switching is an accepted model; but it has not been proved, whereas we provide molecular evidence that signaling can cause switching. To express this reviewer’s suggestion, we edited the Discussion, page 7, line 250: from “our data imply that antigenic variation is not exclusively stochastic” to “our data suggest that antigenic variation is not exclusively stochastic”.

      Most of the RNA-seq data were VSGs genes/pseudogenes. Other genes upregulated included retrotransposons and DNA/RNA processing enzymes such as endonucleases and polymerases. We included in the Results, page 3, line 100: “Other genes upregulated include primarily retrotransposons, endonucleases, and polymerase proteins.”.

    1. Reviewer #3 (Public Review):

      It is well known that as seasonal day length increases, molecular cascades in the brain are triggered to ready an individual for reproduction. Some of these changes, however, can begin to occur before the day length threshold is reached, suggesting that short days similarly have the capacity to alter aspects of phenotype. This study seeks to understand the mechanisms by which short days can accomplish this task, which is an interesting and important question in the field of organismal biology and endocrinology.

      The set of studies that this manuscript presents is comprehensive and well-controlled. Many of the effects are also strong and thus offer tantalizing hints about the endo-molecular basis by which short days might stimulate major changes in body condition. Another strength is that the authors put together a compelling model for how different facets of an animal's reproductive state come "on line" as day length increases and spring approaches. In this way, I think the authors broadly fulfill their aims.

      I do, however, also think that there are a few weaknesses that the authors should consider, or that readers should consider when evaluating this manuscript. First, some of the molecular genetic analyses should be interpreted with greater caution. By bioinformatically showing that certain DNA motifs exist within a gene promoter (e.g., FSHbeta), one is not generating robust evidence that corresponding transcription factors actually regulate the expression of the gene in question. In fact, some may argue that this line of evidence only offers weak support for such a conclusion. I appreciate that actually running the laboratory experiments necessary to generate strong support for these types of conclusions is not trivial, and doing so may even be impossible. I would therefore suggest a clear admission of these limitations in the paper.

      Second, I have another issue with the interpretation of data presented in Figure 3. The data show that FSHbeta increases in expression in the 8Lext group, suggesting that endogenous drivers likely act to increase the expression of this gene despite no change in day length. However, more robust effects are reported for FSHbeta expression in the 10v and 12v groups, even compared to the 8Lext group. Doesn't this suggest that both endogenous mechanisms and changes in day length work together to ramp up FSHbeta? The rest of the paper seemed to emphasize endogenous mechanisms and gloss over the fact that such mechanisms likely work additively with other factors. I felt like there was more nuance to these findings than the authors were getting into.

      Third, studies 1 - 3 are well controlled; however, I'm left wondering how much of an effect the transitions in day length might have on the underlying molecular processes that mediate changes in body condition. While the changes in day length are themselves ecologically relevant, the transitions between day length states are not. How do we know, for example, that more gradual changes in day length that occur over long timespans do not produce different effects at the levels of the brain and body? This seemed especially relevant for study 3, where animals experience a rather sudden change in day length. I recognize that these experimental methods are well described in the literature, and they have been used by endocrinologists for a long time; nonetheless, I think questions remain.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their insights and comments on this manuscript. Specific responses to reviewer concerns are detailed below. We made a couple of significant changes based on the feedback. First, we performed more experiments to increase biologic replicates and then quantified image data for multiple figures. The new quantitative information added to Figure 3 fully supports our original conclusions about changes to the ONH in Hes-TKO mutants. The quantification of Atoh7, Otx2, Rbpms and Crx expressing cells among the different genotypes revealed interesting differences in Notch intracellular gene requirements for both RGC and cone development. The most startling outcome is that changes in both cell types correlate with significant changes in Otx2, but not Atoh7. This singular finding suggests interesting future work is needed, well beyond the scope of this paper about the molecular mechanisms underlying these cell fates. Second, our data presentation was reorganized with new information added to Fig 1 that clarifies the relationships between Hes1, Hes5, Foxg1 and Pax2; old Figs 6 & 7 about neurogenesis were merged; and some data moved to new Suppl Figs 2 and 5. The numbering for multiple figures changed and a new summary model (now Fig 8) is provided. In addition, the manuscript was completely rewritten to improve clarity. We hope this revised manuscript is acceptable for publication.

      Reviewer #1 Summary:

      In this study, the authors employed an impressive set of mouse mutant or Cre lines to investigate the complexity of Notch signaling across different stages of retinal development. These comprehensive analyses led to two main findings: 1. Sustained hes1 in the OHS/OS is Notch-independent; 2. Rbpj and Hes1 exhibited opposing roles in cone photoreceptor development. Although the study is potentially interesting, the current manuscript needs the essential research background and quantification, a lack of which significantly reduced the clarity of the manuscript and the credibility of the major conclusions. Also, how the authors organized the results is quite confusing, making the manuscript very difficult to follow.

      Response: We agree with all reviewers concerning incomplete quantification of the data. We directly addressed this shortcoming in revised Figs 3 and 6 (the latter combines old Figs 6 +7). To do this, we repeated some IHC experiments to add more replicates and reorganized all of the neurogenesis phenotypic data figures. Our quantifications uncovered several surprising outcomes that clarify our model. For these reasons, the manuscript was exhaustively rewritten. We merged E13 neurogenesis data into revised Figure 6 and moved the most relevant E16 analyses to new supplemental data Fig 5. All changes made should make the paper easier to understand for retinal development, neurogenesis, and Notch pathway aficionados, in addition to readers lacking such expertise.

      Major comments: 1. The authors needed to make the quantification for many analyses to strengthen the conclusions, such as Fig. 1F, 1G, and etc.

      Response: We quantified optic nerve head (ONL) immunohistochemistry data in the revised Fig 3. We also quantified neurogenesis markers Atoh7, Otx2, Rbpms (RGCs), and Crx at E13 in revised Fig 6 (former Figs 6 and 7). Older stages were moved to a new Suppl Fig 5.

      Respectfully, Hes5 mRNA expression in old Fig 1F and 1G shows that Hes5, like other retinal progenitor cell (RPC) markers, expanded in Rax-Cre deletion but not Chx10-Cre deletion conditions. This is analogous to Pax6 and Rax expansion in Rax-Cre;Hes1 CKO eyes and Pax2 mutants (doi: 10.1523/JNEUROSCI.2327-19.2020) (1). In revised Fig 1, we now show analogous expansion of Hes5 mRNA in Pax2 mutant retinas (compare Figs 1F-1I). Because Hes5 RNA in situ hybridization experiments are nonquantitative, we do not discuss the possibility of Hes5 mRNA level changes in labeled cells.

      The authors reported many exciting results. However, further mechanistic insights are largely missing. They may focus on one of these exciting findings and give some mechanistic insights. For example, hes1 suppresses hes5 expression as the ONH boundary forms; hes1 expression in the ONH is Notch independent; differential influences of Rbpj and Hes1 on cone development. It is better for the authors to select one of these exciting findings and provide a deeper mechanistic study.

      Response: This revision brings fresh focus to Notch regulation of RGC and photoreceptor development, particularly differential influences for Rbpj versus Hes1. We also better support our interpretation of image data in Fig 1. We include new data about the spatial relationships between Hes5-GFP/Pax2 and Hes5-GFP/Foxg1. In summary, we find that as Pax2 becomes restricted to the nasal optic cup prior to the onset of RGC genesis, it becomes mutually exclusive with Hes5-GFP, at the same time that Hes5-GFP+ cells coexpress Hes1. This is consistent with Hes1 indirectly regulating Hes5-GFP as a marker of neurogenic RPCs at the forming ONH. Furthermore, it emphasizes the importance of genetically teasing apart the separate and potentially compensatory roles for Hes1 versus Hes5 undertaken here. These relationships remain poorly resolved during vertebrate CNS development.

      Some analyses lack an explanation of the rationale. For example, "To understand if the loss of multiple Hes genes is more catastrophic than Hes1 alone..."(PAGE 7). Please explain its significance.

      Response: We assume the reviewer is referring to the first sentence of the last paragraph on this page. We analyzed Hes triple mutant mice (TKO) to understand if removing multiple Hes genes reveals redundant functions. This is an open question, given that Hes1 is expressed in the ONH/OS, which is normally devoid of Hes5 by the time retinal neurogenesis begins. These questions have only been explored in a handful of tissues throughout the body. Also see response to point 2 above. In general, we have expanded the rationale for all of the experiments throughout the revised manuscript.

      Significance: In general, many results are quite interesting. However, the significance of these findings is largely hampered in the following aspects: 1. The authors were unable to provide the sufficient research contexts that are essential for understanding many results.2. Many conclusions were solely based on descriptive images but lacked statistical quantification, which significantly weakened many conclusions. 3. Many interesting findings are quite descriptive, and some mechanistic understandings of one of these exciting findings will be beneficial to improve the focus and significance of the study. Current format of the manuscript fits more specialized audience.

      Response: During in vivo development, we wished to understand which particular Notch pathway genes can interact in a Notch-dependent versus a Notch-independent manner. Genetic (phenotypic) studies produce extremely rigorous datasets, in our opinion. This revision now extensively quantifies key findings. Here we dissected the "receipt" of a Notch signal by identically testing the functional requirements of particular pathway members. For Mastermind (Maml), there are 3 paralogues, double mutants for Maml1 and Maml3 are early lethal, and no floxed alleles exist, so it was logical to employ the ROSA-dnMaml mouse strain, particularly since it has been discussed throughout the Notch literature as "analogous" to removing either a Notch receptor or Rbpj. Our finding that the dnMAML allele does not function like a Rbpj null in the retina is important for researchers in the broad Notch field to consider when designing and interpreting experiments.

      Reviewer #2: Hes genes are effectors of the Notch signaling pathway but can also act down-stream of other signaling cascades. In this manuscript the authors attempt to address the complexity of Hes effectors during optic cup development and retinal neurogenesis. To do so, they compared optic cup patterning and retinal neurogenesis in seven germline or conditional mutant mouse embryos generated with two spatio-temporally distinct Cre drivers. These lines allowed for the analysis of the consequences of perturbing the Notch ternary complex and multiple Hes genes alone or in combination. The authors show that the optic disc/nerve head is regulated by Notch independent Hes1 function. They also confirm that perturbation of Notch signaling interferes with cell proliferation enhancing the production of differentiated ganglion cells, whereas photoreceptor genesis requires both Rbpj and Hes1 with Notch dependent and independent mechanisms. This is a rather complex study that dissects further the role of the Notch pathway and Hes proteins during eye development, a topic that has been addressed in many previous studies but perhaps not with the details that the authors have used here. In this respect, this study adds to current literature but will likely be of interest to retina aficionados. The manuscript reads well and the figures are of very good quality. However, many of the statements are based on qualitative rather than on quantitative analysis. This should be, at least in some cases, remediated, despite the effort that this may require given the number of mouse lines used in the study.

      Response: As described in the response to Reviewer 1, we agree and present considerably more quantification data. We extensively reorganized and rewrote this manuscript to emphasize that Hes1 in the ONH/OS is fully Notch-independent and highlight branchpoints in Notch-dependent signaling, for Rbpj versus Hes,1 during early retinal neurogenesis. It is too simplistic that the ternary complex (Rbpj-NICD-Maml) simply activates Hes1 (and/or multiple Hes genes) to regulate downstream signaling targets. This paradigm has been portrayed in the literature numerous times for many processes throughout vertebrate development, homeostasis or relative to particular diseases. By focusing on one tissue and a narrow window of development, our phenotypic studies delved more deeply to show the greater complexity and molecular cross-talk that we think underlie the modulation of signaling levels with in vivo context. Thus, our results are of broad interest and impact to the greater Notch field.

      1. The title is somewhat misleading. The authors have explored mostly the role of Hes1, 3 and5. Although these are Notch effectors, there is already evidence that they participate in other pathways This is confirmed by the data present here. I would suggest to eliminate Notch from the title and use instead "Hes" to better reflect the findings. Furthermore, it is unclear why there is a reference to "mutations" or what are the Notch branchpoints to which the authors refer at the beginning of the discussion.

      Response: We appreciate the reviewer’s viewpoint but disagree this paper is mostly about Hes genes, as there is a critical direct, comparable evaluation with Rbpj and dn-Maml. Direct comparison of 7 genotypes highlights where each pathway member exhibits idiosyncratic phenotypes. We are striving for a clear, simple title about a very complex topic, involving the in vivo genetic dissection of a signaling pathway. We modified the title to: "Notch pathway mutations do not equivalently perturb mouse embryonic retinal development "

      1. "Although the Pax6-Pax2 boundary is intact in Rax-Cre;RbpjCKO/CKO eyes, ONH shape was attenuated compared to controls (Fig 3I)". This statement is arguable as the difference seems subtle. Perhaps some kind of quantification would help.

      Response: We quantified Pax2+ cells (ONH domain) using the adjacent proximal terminus of the retinal pigmented epithelium (RPE) to indicate a transition from ONH to optic stalk (OS). We also quantified the number of Pax2+Pax6+ double positive cells where the 2 domains abut (boundary cells). Some higher magnification examples are now provided in Fig 3H';3K';3N'. Grossly, the imaging data support that the Pax2+ ONH is expanded in Chx10-Cre;TKO eyes, while boundary cells are most affected in Rax-Cre;HesTKO eyes, due to an expansion of retinal tissue. This is supported by our quantitative data (Fig 3O,3P). We observed even in controls that Pax2-expressing cells show some numerical variability. We attributed this to the position of the section through the ONH, which is a 3-dimsenional ring (torus). Therefore, we quantified additional wild-type controls and mutant samples in the new Fig 3O,3P graphs, improving statistical power, and allowing us to detect quantitative differences.

      Page 12 first paragraph. "....but all other genotypes were unaffected". This statement is unclear. All lines in which the Rax-Cre has been used seem to have an increased number of apoptotic cells. This should be better explained

      Response: Respectfully, only one genotype, Rax-Cre;Rbpj mutants contain a statistically significant increase in apoptotic cells (Fig 5P). This is demonstrated by one-way ANOVA analyses that included all pairwise comparisons. To ensure that the quantification was not misleading due to changes in tissue morphology, data in Figs 5, 6, and 7 were normalized to optic cup area. The area was traced in FIJI, creating a polygon whose area was determined in square microns. For every section image, the marker+ cells were divided by the square micron area of the retina (excluding the opening for the optic nerve). Such a method is critical for comparison across this allelic series, given the morphologic changes, differences in cell clustering where rosettes form, and reduced proliferation whenever Notch signaling is lost or reduced.

      Page 12, end of second paragraph: "E13.5 Chx10-Cre;HesTKO eyes had a milder RGC phenotype (Figs 6G, 6N, 6U), but all other mutants were unaffected (Figs 6E, 6F, 6L, 6M, 6S, 6T). This statement is also rather subjective. The phenotype of Chx10-Cre;HesTKO is quite strong and the other mutants seem to have a phenotype. Some quantifications here will help.

      Response: We agree and provide quantification for both Atoh7 and Rbpms positive cells in the revised Figure 6. This is now in the same figure with quantification of Otx2+, Otx2+Atoh7+ and Crx+ cells. The reviewer is correct that both ROSA-dnMaml and both HesTKO mutants have a statistically significant increase in RGCs. Surprisingly, neither of the Rbpj CKO mutants have this outcome (Fig 6Y).

      1. Page 13, toward the bottom..."...but noted that Chx10-Cre RbpjCKO/CKO eyes were not different from controls (Figs 7E, 7AA)". Again, this statement is questionable as staining for both CRX and Rbpms seem reduced as compared to controls as quantifications in 7AA seems also to indicate (about half?). Did the authors calculate whether there is a statistical difference between controls and Chx10-Cre RbpjCKO/CKO ?

      Response: Rbpms+ RGCs and Crx+ photoreceptor precursors were colabeled and quantified on sections for all genotypes. All counts were normalized to area as described above. Upon quantification and ANOVA with pairwise comparisons, there was no statistical difference in Crx+ or Rbpms+ cells between control and Chx10-Cre;Rbpj mutants (new Fig 6Y and Z).

      In Fig 7CC the authors should make the effort of including at least one additional sample, 2 biological replicates seem insufficient to draw a conclusion.

      Response: The Rax-Cre;Hes1CKO/+ X Hes1CKO/CKO matings stopped producing litters in late 2022. While this manuscript was out for review, we obtained younger mice, from which new control and Rax-Cre; Hes1 mutant littermates were collected, stained, imaged and quantified. Upon adding samples, we found that the outcome was unchanged, but the data better support the lack of a statistical difference in rods between genotypes at E17. These data were moved to revised Suppl Fig 5.

      Significance: This is a rather complex study that dissects further the role of the Notch pathway and Hes proteins during eye development, a topic that has been addressed in many previous studies but perhaps not with the details that the authors have used here. In this respect, this study adds to current literature but will likely be of interest to retina aficionados. The manuscript reads well and the figures are of very good quality. However, many of the statements are based on qualitative rather than on quantitative analysis. This should be, at least in some cases, remediated, despite the effort that this may require given the number of mouse lines used in the study.

      Response: To increase the impact of our manuscript, we quantified all markers except Tubb3, since its localization in cell bodies and axons make it impossible to assign to individual cells. We feel that this additional quantification strongly improves the quality of our findings and allowed us to make well-supported and novel conclusions. While we certainly believe that the retinal development community will find this paper of interest, it will also be of value to the broader Notch pathway scientific community. In this manuscript, we simultaneously compared phenotypes for Notch pathway genes in signal receiving cells. We could find essentially no studies like this for the mouse CNS and only a few from the Kopan lab about the kidney and immune system. Interestingly, one of us (NLB) is a coauthor on a recent paper about Notch signaling in the cortex, in which ROSA-dnMaml behaves analogously to Notch1CKO or RbpjCKO. This emphasizes that findings in one organ may not recapitulate the "rules" for this pathway for other cell types or tissues (doi: 10.1242/dev.201408)(2). Deeper understanding of how the Notch pathway in the retina functions, analogously or differently, is important. We feel our revised study advances when and where there are "branchpoints" in canonical signaling that may be overlooked in other developing tissues and organs.

      Reviewer #3: I have reviewed a manuscript submitted by Bosze et al., which is entitled "Not all Notch pathway mutations are equal in the embryonic mouse retina". The authors focused on Notch signaling pathway. Notch signaling is deeply conserved across vertebrate and invertebrate animal species: in general, two transmembrane proteins, Delta and Notch, interact as a ligand and a receptor, respectively, which induces proteolytic cleavage of Notch receptors to generate Notch intracellular domain (NICD). NICD is translocated into nucleus, then forms the transcription factor complex including Rbpj (also referred to as CBF1) and Mastermind-like (Maml), and activates the transcription of Hes family transcription factors. Three Hes proteins, Hes1, 3, and 5, are important for nervous system development. In the vertebrate developing retina, these Hes proteins inhibit neurogenesis to maintain a pool of neural progenitor cells. In addition to their primary role in neurogenesis, the authors recently reported that Hes1 promotes cone photoreceptor differentiation. In the later stages of development, Hes proteins also promote Müller glial differentiation. In addition, Hes1 is highly expressed in the boundary between the neural retina and optic stalk and required for this boundary maintenance. To understand precise regulation of Notch component-mediated signaling network for retinal neurogenesis and cell differentiation, the authors compared retinal phenotypes in the knockdown of three Notch pathway components, that is (1) Hes1/3/5 cTKO, (2) Rbpj KO, and (3) dominant-negative Maml (dnMaml) overexpression, under the control of two Cre derivers; Rax-Cre and Chx10-Cre. First, the authors found that Hes1 expression in the boundary between optic stalk and neural retina is lost in Rax-Cre; Hes1/3/5 cTKO, but still retained in Rax-Cre; Rbpj KO and Rax-Cre; dnMaml overexpression, suggesting that Delta-Notch interaction is not required for Hes1 expression in the boundary between optic stalk and neural retina. Furthermore, Hes1 expressing boundary region expands distally at the expense of the neural retina in Chx10-Cre; Hes1/3/5 cTKO. Maintenance of ccd2 expression in this expanded boundary area suggests that Hes1 normally maintains a proliferative state in the optic stalk, which may allow these cells to differentiate into astrocyte in later stages. Second, in addition to precocious RGC differentiation in all the Notch component KO, the authors found that, as compared with wild-type, cone and rod photoreceptor genesis is highly enhanced in Rax-Cre; Rbpj KO and Rax-Cre; dnMaml overexpression and mildly enhanced in Chx10-Cre; dnMaml overexpression. On the other hand, in Rax-Cre; Hes1/3/5 cTKO, cone and rod photoreceptor genesis is not enhanced but similar to wild-type level. Since the authors previously reported that cone genesis is reduced in Rax-Cre; Hes1 cKO and Chx10-Cre; Hes1 cKO, so Rax-Cre; Hes1/3/5 cTKO may rescue decrease in cone genesis in single Hes1 cKO. The authors raise the possibility that elevated Hes5 expression in single Hes1 cKO may suppress cone photoreceptor genesis. The authors also found that amacrine cell genesis is significantly suppressed in Rax-Cre; Rbpj KO but not changed in Rax-Cre; dnMaml overexpression and Rax-Cre; Hes1/3/5 cTKO, suggesting that Rbpj is specifically required for amacrine cell genesis. From these observations, the authors propose that there are at least two branchpoints for photoreceptor and amacrine cell genesis in Notch component-mediated signaling network. Their findings are very interesting and provide some new insight on how Notch signaling components are integrated into other signaling pathways and promote to generate diverse but well-balanced retinal cell-types during retinal neurogenesis and cell differentiation, in addition to conventional classic view of Notch signaling pathway. However, one weak point is that, although the authors figured out what kinds of phenotypic difference appear in the KO retinas between these Notch components, the research result is descriptive and less analytical. Most of their conclusions may be supported by their previous works or others; it is still hypothetical. So, it is important to show more analytical data to support their interpretation and more clearly show what is new conceptual advance for Notch signaling pathways.

      For example, sustained Hes1 expression in the boundary region between optic stalk and neural retina may be reminiscent to brain isthmus situation. I would like to request the authors to show more direct evidence that Hes1 regulation in optic stalk/retina boundary is independent of Delta-Notch interaction. One possible experiment is whether DAPT treatment phenocopies Rax-Cre; Rbpj KO and Rax-Cre; dnMaml overexpression (Hes1 in optic stalk boundary is normal?).

      Response: Usage of the gamma secretase inhibitor DAPT is an interesting experiment as it can phenocopy the loss of Notch signaling in developing tissues. However, the reviewer's proposed DAPT experiment is problematic for two major reasons. First, DAPT blocks the gamma secretase complex, which has more than 90 protein targets in the cell membrane (3). Therefore, DAPT may not be informative for Hes1 regulation given the myriad of expected off-target effects. Second, it would be difficult to treat embryos at the relevant stages with DAPT. Injections into pregnant mice are lethal and we cannot localize drug to the relevant area during in vivo development. Our direct phenotypic comparisons with two Cre drivers strongly indicate that Hes1 is independent of canonical Notch signaling in the developing optic stalk.

      We include an extra related data figure (Reviewer Fig 1) showing anti-Hes1 immunolabeling of E13.5 Rax-Cre;Notch1CKO/CKO (n=2) and E13.5 Rax-Cre;Notch2CKO/CKO eyes (n=3). The Notch1 mutant lost oscillating Hes1 expression in retinal progenitors, but the uniform Hes1 ONH domain remains. Interestingly, the Notch2 mutant had essentially no effect on Hes1 (oscillating or sustained), or Hes5 mRNA expression. A Notch2 RNA in situ hybridization demonstrates that Notch2 mRNA was lost in the E13 optic cup and RPE (Rax-Cre expressing tissues). These data emphasize: A) the Notch1-specific dependency of oscillating Hes1 expression in retinal progenitors is absent from the ONH; B) although coexpressed in the same tissue, Notch receptors have unequal activities.

      Does Rax-Cre; Rbpj KO; Hes1-cKO phenocopy Rax-Cre; Hes1-cKO (or Rax-Cre; Hes1/3/5 cTKO)?

      Response: This is a good question! The first author tried very hard to produce Rax-Cre; Rbpj CKO;Hes1 CKO double mutant embryos. However, these progeny could not be recovered from E10-E13 embryos, despite collecting more than 10 litters. Thus, it is likely that this genotype is lethal before eye formation.

      Could the authors identify an enhancer element that drives Hes1 transcription in optic stalk/retina boundary, which should be not overlapped with that of NICD/ Rbpj binding motif? Such additional evidence will make their conclusion more convincing.

      Response: Another interesting question. We have been working for >3 years on Hes1 cis regulatory enhancers, but the pandemic greatly delayed progress. The proximal Hes1 600bp upstream region is a generic enhancer that contains Hes1 binding sites for repressing its own expression (4) and has a pair of Rbpj consensus sites for Notch ternary complex activation of Hes1 expression (5,6). Nearby is a binding site occupied by Gli2 in the E16 mouse retina (7). Recently, it was shown that Ikzf4 binds slightly farther away (8). The upstream 1.8 kb region (including the 600bp just described) can drive destabilized GFP or dsRed reporters in early postnatal retinal explants (9). However, this sequence was used to make and analyze a classic Hes1-GFP transgenic reporter mouse, in which GFP was not expressed in the early embryonic mouse optic vesicle or cup (10). Therefore, any early eye-specific enhancer(s) are located farther upstream, in an intron, or downstream (or combination thereof). Public domain epigenetic and chromatin accessibility datasets support this idea. Identifying the gene regulatory logic for Hes1 expression in the eye will be an exciting future story, well beyond this manuscript. We are excited to use live imaging of enhancer reporters to discern oscillating versus sustained activity patterns during early ocular development.

      Regarding the conclusion on new branchpoints on photoreceptor and amacrine cell genesis, a model shown in Figure 9 is still hypothetical. Figure 9B indicate a model in which the increase of Otx2+ cells and Crx+ cells in Rax-Cre; Rbpj KO is mediated by Hes1, which is presumed to be activated in Notch-independent signaling. However, Hes1 expression in the neural retina is markedly reduced in Rax-Cre; Rbpj KO (Fig. 2I), which does not fit in with the model.

      Response: We removed Fig 9B and now present new models about the Notch-dependent versus -independent roles for both Rbpj and Hes1. The new summary is Fig 8.

      So, I would like to request the authors to examine whether the increase of Otx2+ cells and Crx+ cells in Rax-Cre; Rbpj KO, (or Rax-Cre; dnMaml overexpression and Chx10-Cre; dnMaml overexpression) is inhibited by Hes1 KO.

      Response: If we understand this correctly, it would mean generating double mutants, some of which we determined are not viable (see the response above, and Suppl Table 2). Given there is only a partial knockdown of Hes1 or Hes5 in either dnMaml mutant we do not believe repeating this in the Hes1 CKO genetic background to be informative and it would take 3 generations to perform.

      Second, the authors concluded that both cone and rod genesis are enhanced in Rax-Cre; Rbpj KO by showing the data on Crx/Nr2e3 labeling in Rax-Cre; Hes1 cKO in Fig. 7BB. However, as the authors mentioned in the manuscript, Hes5 expression is elevated in Rax-Cre; Hes1 cKO (Fig. 1G). So, since Rax-Cre; Hes1 cKO has residual Hes activity in the retina, Fig. 7BB should be replaced with labeling of Crx/Nr2e3 in Rax-Cre; Hes1/3/5 cTKO.

      Response: Unfortunately, Rax-Cre;HesTKO embryos do not live past E13 (Suppl Table 2). Thus, we cannot evaluate rods, whose genesis starts around E13.5. Revised Fig 1G shows the Hes5 domain is shifted with the expansion of retinal tissue in E13.5 Hes1 single mutants, but importantly, also analogously shifted in Pax2 mutants (Fig 1H). We do not conclude that mRNA levels are "elevated" since mRNA in situ hybridization is not a quantitative technique. Our initial examination of rods in E17 Rax-Cre;Hes1 CKO mutants tested the idea of a fate shift from cones to rods. However, deeper quantification (Suppl Fig 5) do not support such a fate change.

      Furthermore, possibly, it is best to examine labeling of the retinas of Rax-Cre; Rbpj KO with rod and cone-specific markers and confirm that the number of both rods and cones is significantly increased. Third, as for defects in amacrine cells genesis in Rax-Cre; Rbpj KO, I would like to request the authors to show the data on Crx10-Cre; Rbpj KO. Although Rbpj KO is mosaic in Crx10-Cre; Rbpj KO, we can distinct Rbpj KO cells by GFP expression (Fig. S2C, C', C'). So, the authors can confirm that amacrine cell genesis is inhibited in a cell-autonomous manner in Crx10-Cre; Rbpj KO retinas but not in Crx10-Cre; dnMaml overexpression. Addition of such data will make the authors' conclusion is more convincing.

      Response: Suppl Table 1 lists multiple references (two from the NLB lab) that demonstrated both a rod and cone increase in Rbpj loss-of-function conditions. Chx10;Rbpj CKO animals were evaluated by Zheng et al., who showed an amacrine loss phenotype in these mutants (11). This is equivalent to what we see in our Rax-Cre;Rbpj CKO data, but without the complications of Chx10 mosaic Cre expression upon Rbpj deletion.

      Other comments: 1) Title of this manuscript is "Not all Notch pathway mutations are equal in the embryonic mouse retina". However, this title is quite obscure in what is research advancement of their findings. I suggest the authors to include more concrete and conclusive sentence in the title, for example "Hes and Rbpj differentially promotes retina/optic stalk boundary maintenance and photoreceptor genesis, in parallel with neurogenic inhibition by Notch signaling pathway".

      Response: We appreciate the reviewer's perspective. We are striving for a relatively simple title about a very complex topic, involving the in vivo genetic dissection of a signaling pathway. We modified the title to "Notch pathway mutations do not equivalently perturb mouse embryonic retinal development ".

      2) The "Results" section is a bit difficult to follow logics without detailed knowledge on roles of Notch signaling in mouse retinal development. I suggest the authors to improve a writing style of "Results" section for readers without such detailed knowledge on mouse Notch mutant phenotypes to follow logical flow more easily. There are many additional descriptions on research background before start to mention results. Such introductory sentences should be moved to the "Introduction" section, by which logical flow in the Results section should be simpler. In addition, the authors should show a concrete question at the beginning of each result subsection. Furthermore, the authors sometimes jump over from one result subsection and suddenly move to cite another figure panel in a far ahead subsection whose data has not been explained. Such a back-and-forth citation of figure data generally makes it difficult to follow logical flow.

      Response: We now present a considerable amount of new quantified data, reorganized multiple figures, and extensively rewrote the paper. We significantly revised the summary figure to improve clarity. In addition, Suppl Table 1 provides a wealth of background information to orient the reader on this topic. We feel that this extensive revision has greatly improved the quality, logical flow, and readability of the manuscript.

      3) In addition, figure configuration is not well organized. Each figure compared some particular marker expression in wild-type, Rax-Cre; HesTKO, Rax-Cre; Rbpj cKO, Rax-Cre; dn-Maml-GFP, Chx10-Cre; HesTKO, Chx10-Cre; Rbpj cKO, Chx10-Cre; dn-Maml-GFP. For example, Fig. 2 shows Hes1 for inhibition of neurogenesis, Fig. 3 shows Vsx2; Mitf and Pax2; Pax6 for retinal pigmented epithelium and optic stalk, Fig. 6 shows Atoh7, Rbpms, and Tubb3 for retinal ganglion cells. Fig. 7 shows Crx, Otx2, and Thrb2 for photoreceptor differentiation. Fig. 8 shows Prdm1, and Ptf1a for photoreceptors and amacrine cells. Although this figure configuration is convenient to show phenotypic difference between different genetic mutations, it is difficult to know how each differentiation steps are spatially and temporally coordinated during development. At least, I recommend the authors to show one summary figure, which shows spatio-temporal expression profile of retinal markers in wild-type mouse retinas.

      Response: We recognize this point and completely reorganized and combined Figs 6 and 7 to improve clarity. New Figure 6 presents E13 quantification for Atoh7, Otx2, Atoh7/Otx2, Rbpms and Crx expressing retinal populations. E16-E17 data were condensed and moved to a new Suppl Fig 5.

      4a) Page 7, line 7-10 "With earlier deletion using Rax-Cre, hes5 mRNA abnormally extended into the optic stalk": I wonder how the authors define the optic stalk. It is likely that optic stalk area (Pax2+, Vax1+ area) is shifted to more proximal (depart from the optic cup and move toward the brain), and neural retina is expanded accordingly (Fig. 4B, 4F), resulting in expansion of hes5 expression. Thus, it may be better to mention that optic stalk/neural retina boundary is abnormally shifted toward the brain.

      Response: The retina, including the optic nerve head, ends where the adjacent RPE terminates. This is conspicuous morphologically in our sections. We also defined this by colabeling for Pax2 and Pax6, which is now quantified in revised Fig 3. To clarify this further, we added the words " in all panels the brain is to the right" in the Fig 4 legend.

      4b) Page 8, line 14-15, "ONH/OS cells still express it (Hes1), demonstrating that sustained Hes1 is independent of Notch": I presume that Cre-Rax drives Cre in neural retina as well as optic stalk and pigmented epithelium. However, it is likely that Rbpj is not expressed in optic stalk/neural retina boundary area in wild type (Fig. S2A). No expression of Rbpj in optic stalk/neural retina boundary may support that Hes1 expression in this boundary area is Notch-independent. However, Rbpj expression is retained in some vitreal cells near optic nerve head in Rax-Cre; Rbpj-CKO retinas (Fig. S2B). What are these Rbpj+ cells? I would like to request the authors to confirm that Rbpj expression is completely absent in both neural retina and optic stalk in Rax-Cre; Rbpj-CKO mice. Otherwise, this conclusion is still not fully supported.

      Response: We show the Rax-Cre lineage in Suppl Fig 2 via the Ai9 (tomato) reporter. The results are striking, with all of the optic cup derivatives (retina, RPE, ONH, optic stalk, and presumptive ciliary tissue and iris) being tomato positive, while the well-described population of vascular cells in the hyaloid space lack tomato expression. Furthermore, our figure shows that Rbpj expression is only absent from the optic cup derivates, rather than the vascular structures in the vitreous. Vascular cells also depend on the Notch pathway and express Rbpj. Based on considerable evidence from the literature and our lineage experiments, the population of cells the reviewer highlights represents the hyaloid vasculature and associated cell types. It does not represent any population that derives from neuroectoderm.

      4c) Page 9, line 16-18, "Foxg1 had spread into the nasal optic stalk": Is Foxg1 expanded nasal area really "OS" rather than expanded retina? I suggest the authors to confirm molecular markers Pax2 expression is overlapped with Foxg1. Otherwise, it is difficult to conclude that foxg1 is expanded into the optic stalk territory, because foxg1 is normally a marker of retina. Indeed, Fig. 3K shows pax2 expression is shifted into more inside towards the brain, suggesting that neural retina is expanded. Please explain the situation.

      Response: Foxg1 (BF-1) mRNA and protein are found in the nasal retina and are expressed in other brain tissues. Multiple studies show Foxg1 in the nasal side of the E10 optic cup/retina/optic stalk and developing hypothalamus (See extra data figure Reviewer Fig 2; top row figure is data from Smith et al., 2017 (12) with Foxg1 mRNA in purple. Also see our new manuscript panel Fig 1C. We include here for reviewers (extra data Reviewer Fig 2 showing E13 ocular cryosections colabeled for Foxg1 and Pax2, highlighting their relationship in the retina, optic stalk and adjacent forming hypothalamus. On page 9 the text now reads "At E13.5 Rax-Cre;HesTKO eyes, the Foxg1 nasal retinal domain was contiguous with the nasal optic stalk (Suppl Fig 4D). This is reminiscent of younger stages (Fig 1C), since normally at E13.5, Foxg1 in the nasal optic cup/retina is separated from expression in the ONH/OS (Suppl Fig 4A). Based on the expansion of Pax6, Vsx2 and Hes5 RPC domains into the optic stalk, we conclude that the change in Foxg1 similarly reflects an extension of retinal tissue."

      4d) Page 10, line 4-5, In Rax-Cre; Hes1/3/5 cTKO eye, this tissue (RPE) extended into the optic stalk": This description seems to be incorrect. A part of Pax2 area, which is adjacent to the neural retina, contacts with RPE in wild type (Fig. 3AH), so most of RPE covers the neural retina even in Fig. 3DK.

      Response: We disagree with the reviewer’s interpretation. Fig 3D shows Mitf labeling of RPE nuclei. Figure 3K shows the adjacent section labeled with Pax2 and Pax6 (labels both retina and RPE). As the retina extended "towards the brain", the RPE analogously extends and surrounds the retinal domain. We also added higher magnification data panels 3H, 3K and 3N, showing merged and single channels.

      4e) Page 10, line 22-23, "For Chk10-Cre; Hes1/3/5 cTKO, there was a unique presence of ectopic Pax2 within the retinal territories": I wonder if this description is correct. I suspect that proliferative Pax2+ cells expand into regressing territory of Hes KO retinal cells, which undergo precocious neurogenesis and lose proliferative activity, in Chk10-Cre; HesTKO. In this case, it is possible that the Pax2/Pax6 interface may be maintained. Please show red and green channel panels for Fig. 3N to confirm that there is ectopic pax2 and pax6 double positive cells.

      Response: New quantification in revised Fig 3 (see panels O,P) fully supports our original conclusion. Only Chx10-Cre;HesTKO mutants have a statistically significant increase in Pax2+ cells. There are not more Pax2+Pax6+ double labeled cells. Only this particular genotype has an increase in Pax2+ single labeled cells.

      5a) Page 11, line 20-25. There seems to be inconsistency between result description and image data of Fig. 5A-G, and histogram Fig. 5O. Authors mentioned that a modest loss of pH3+ cell fraction in Chx10-Cre; Hes1/3/5 cTKO but not in Rax-Cre; Hes1/3/5 cTKO. However, Fig. 5D indicates severe reduction of pH3+ cell fraction in Rax-Cre; Hes1/3/5/ cTKO, which is similar to reduction of pH3+ cell fraction in Rex-Cre; Rbpj (Fig. 5B), but histogram data is different (Fig. 5O). Furthermore, pH3+ cell fraction is severely reduced in Chx10-Cre; ROSA(dn-Maml-GFP) (Fig. 5F) and modestly reduced in Chx10-Cre; Hes1/3/5 cTKO (Fig. 5G). However, pH3+ cell fraction seems to be normal in Chx10-Cre; Rbpj (Fig. 5E). These Chx10-Cre image data do not match the histogram of Fig. 5O. Please check their situation.

      Response: Images in old Figs 5-8 were normalized using area measurements, see methods and above comments (note: old Figs 6&7 were combined into new Fig 6). One-way ANOVA with pairwise comparisons for each mutant genotype compared to control were calculated using Prism. All genotypes except two have a statistically significant loss of M phase cells and we discuss possibilities for this outcome (Fig 5O). A normalization method for the sampled area is an essential component of these studies since morphologic differences are apparent for particular genotypes. The quantitative data are consistent with our original conclusions.

      5b) Fig. 5H-N, P: I wonder if the stage E13 is appropriate to evaluate cell death and survival because optic cup already becomes smaller in Rax-Cre; Rbpj, Hes1/3/5 cTKO, or ROSA(dn-MAML-GFP) than in wild-type control. I suggest the authors examine more earlier stage.

      Response: While an earlier effect is possible, we only observed size differences in a subset of the genotypes. Thus, E13 serves as a critical timepoint to examine early developmental phenotypes across the totality of our mutant conditions. It is also first age when the ONH is fully formed.

      5c) Page 12, line 19-20, "all other mutants (Chx10-Cre; Rbpj, and Chx10-Cre; ROSA(dn-MAML-GFP) were unaffected (Fig. 6EF, LM, ST)": It is likely that atoh7 expressing cells are mildly decreased and neuronal marker, Tubb3 and Rbpms-expressing cells are increased in Chx10-Cre; Rbpj, and Chx10-Cre; ROSA(dn-MAML-GFP). I requested the authors to evaluate the fraction of these markers in retinal area statistically in all the cases.

      Response: As described above, we quantified Atoh7 and Rbpms nuclear expression by immunohistochemistry. We do not believe that Tubb3+ cells can be reliably quantified. Nonetheless, it is useful to qualitatively show the extent of excess neuron formation. Importantly, we observed that it is not the Atoh7 status that matters for RGC formation, rather it is the Otx2 expression status. This is in good agreement with single cell-RNA transcriptomics data from Wu et al 2021 showing that Atoh7 mRNA in all early transitional RPCs remains fairly constant and its loss does not block the formation of early RGC cell states (13). By contrast Otx2 fluctuates but remains expressed in transitional RPCs that progress to photoreceptor lineages.

      6a) Page 7, line 19 "Ectopic blood vessels protruded from the ONH (Fig. 1K, 1L)": It is difficult to see blood vessel structures in these panels (Fig. 1I-L). Please show some molecular marker of blood vessels to confirm how blood vessel is organized in Hes1/3/5 cTKO.

      Response: These vascular structures are highly conspicuous by morphology in the H&E insets. Nonetheless, we used adjacent P21 sections to immunolabel for Endomuscin (14) and Tubb3 antibodies. This colabeling confirms the morphology and position of ectopic blood vessels in the abnormal tissue masses in Chx10-Cre;HesTKO mutant eyes. Ectopic tissue contains only rare Tubb3+ cells or cell processes suggesting it is overwhelmingly nonneural. All P21 data were moved to a new Suppl Fig 2. A full detailing of vascular phenotypes is beyond the scope of this manuscript and, interestingly, would be potentially attributable to non-autonomous effects of perturbing the Hes genes in the adjacent retina.

      6b) Fig. 5: Increase of pH3 fraction indicates several possibilities, for example (1) increased fraction of mitotic cells due to precocious neurogenesis, (2) increased fraction of mitotic cells due to activated cell proliferation of retinal progenitor cells, (3) increased cell-cycle arrest in M phase due to some stress response of progenitor cells. So, I suggest the authors to examine (1) BrdU percentage of retinal section area, (2) the percentage of pH3+ cells in PCNA+ retinal cells.

      Response: The data listed in Suppl Table 1 presents a unified picture that disrupting Notch signaling reduced proliferation. This paradigm extends to other model organisms (e.g., Drosophila, chick, frog, zebrafish and even to nonneural tissues). We included the phospho-histone H3 staining so readers would see how the six mutants evaluated in this study align with this paradigm, providing confidence for the novel findings in other figures. A full evaluation of cell cycle kinetics is interesting, but beyond the scope and focus of this manuscript.

      6c) Fig. 5: It is better that cell death fraction will be evaluated by TUNEL and labeling with anti-activated caspase 3 antibody.

      Response: We disagree. The DNA repair enzyme PARP is inactivated upon cleavage by activated caspase 3. There are currently ~3,600 citations that use it as a marker of apoptosis. PARP also has a separate and very specific role in maintaining the integrity of sperm DNA. This antibody works on all metazoans and is amenable to many tissue preparations and fixatives, making it easy to use, robust and quantifiable.

      7a) Please show red channel (Hes1) image in Fig1BC.

      Response: This was added to Revised Fig 1 (Fig 1A).

      7b) Fig. 1DH should be shown in neighbor. Fig. 1H should be assigned as Fig. 1E.

      Response: The new Fig 1 layout addresses this point.

      7c) Fig. S2D, F, H, J: Please show GFP green channel as well. Otherwise, it is difficult to see non-overlapping expression in optic stalk area.

      Response: In the revision, this is Suppl Fig 3. Chx-10-Cre is not expressed by ONH-OS cells (1). The green and fuchsia overlap (coexpression) in RPCs is white, we feel this is fairly clear. If needed, all readers can turn on and off the green channel in the final PDF version of this figure to compare GFP with Hes1 expression for those panels.

      7d) Fig. 9B: It is better to show Rax-Cre: Hes1/3/5 TKO rather than Rax-Cre: Hes1 cKO. 7e) Fig. 9B: Lettering "Rbpj mutant" should be revised as "Rax-Cre: Rbpj KO".

      Response: Fig 9B was removed so these terms are now irrelevant. Our models are presented in new Fig 8.

      Significance: The senior author of this manuscript, Dr. Nadean Brown, is an expert scientist who has investigate the role of Notch signaling pathway in vertebrate ocular tissue, including the neural retina and lens. In general, Notch signaling pathway consists of signaling stream from the interaction of Delta and Notch, Notch receptor activation by proteolytic cleavage, translocation of Notch intracellular domain (NICD) into nucleus, formation of transcription factor complex consisting of NICD/Rbpj/Maml, to the transcriptional activation of Notch target genes, Hes family transcription factors. Finally, Hes suppresses neurogenic program and maintain a pool of neural progenitor cells. Therefore, Notch is a key factor to regulate the balance between neurogenesis and progenitor proliferation. In this manuscript, the authors investigated retinal phenotypes in the knockout mice of different Notch signaling components, including Rbpj, Maml, and Hes. They found that functions of these three factors are not always equal in retinal cell differentiation; rather, they specifically regulate a particular step of retinal development. The authors propose the possibility that each of Notch signaling components may be modified by other signaling pathways and achieve some new roles beyond the conventional frame of classic Notch signaling pathway. In this point, this work has a potential to provide a new conceptual advance in the field of developmental and cell biology.

      We fully agree this work is a significant advance for the fields of developmental and cell biology. Our findings provide new information and stimulate fresh ideas for anyone working on signal transduction and signal integration.

      References cited:

      1. Bosze et al., 2020 Journal of Neuroscience Vol 40:1501-13; Bosze et al. 2021 Dev Biol Vol 472:18-29.
      2. Han et al., 2023 Development Vol 150 dev201408.
      3. Kopan and Ilagan, 2004 Nat Rev Cell Biol. Vol 5:499-504
      4. Hirata et al., 2002 Science Vol 298:840-3
      5. Friedmann and Kovall, 2010 Protein Sci. Vol 19:34-46
      6. Ong et al., 2006 JBC Voll24:5106-19
      7. Wall et al., 2009 J Cell Biol. Vo 184: 101-12.
      8. Javed et al., 2023 Development Vol 150:dev200436
      9. Matuda and Cepko 2007 PNAS Vol 104: 1027-1032
      10. Ohtsuka et al., 2006 Mol. Cell Neurosci. Vol 31:109-22
      11. Zheng et al., 2009 Molecular Brain Vol 2:38
      12. Smith et al., 2017 Journal of Neuroscience Vol 37:7975-93.
      13. Wu et al., 2021 Nature Communications Vol 12:1465: doi 10.1038/s41467-021-21704-4
      14. Saint-Geniez et al., 2009 IOVS Vol 50: 311-21.
    1. Author Response

      Reviewer #2 (Public Review):

      Associative learning assigns valence to sensory cues paired with reward or punishment. Brain regions such as the amygdala in mammals and the mushroom body in insects have been identified as primary sites where valence assignment takes place. However, little is known about the neural mechanisms that translate valence-specific activity in these brain regions into appropriate behavioral actions. This study identifies a small set of upwind neurons (UpWiNs) in the Drosophila brain that receive direct inputs from two mushroom body output neurons (MBONs) representing opposite valences. Through a series of behavioral, imaging, and electrophysiological experiments, the authors show that UpWiNs are differentially regulated by the two MBONs, i.e., inhibited by the glutamatergic MBON-α1(encoding negative valence) while activated by the cholinergic MBON-α3 (encoding positive valence). They also show that UpWiNs control the wind-directed behavior of flies. Activation of UpWiNs is sufficient to drive flies to orient and move upwind, and inhibition of UpWiNs reduces flies' upwind movement toward the source of reward-predicting odors (CS+). These results, together with existing knowledge about the function of the mushroom body in memory processing, suggest an appealing model in which reward learning decreases and increases the responses of MBON-α1 and MBON-α3 to the CS+ odor, respectively, and these changes cause UpWiNs to respond more strongly to the CS+ odor and drive upwind locomotion. Interestingly, in the final part of the results, the authors reveal a wind-independent function of UpWiNs: increasing the probability that flies will revisit the site where UpWiNs were activated. Thus, UpWiNs guide learned reward-seeking behavior with and without airflow. Although the mushroom body has been extensively studied for its role in learning and memory, the downstream neural circuits that read the information from the mushroom body to guide memory-driven behaviors remain poorly characterized. This study provides an important piece of the puzzle for this knowledge gap.

      Strength

      1) Memory studies have predominantly relied on binary choice (go or no-go) assays as measures of memory performance. While these assays are convenient and efficient, they fall short of providing a comprehensive understanding of underlying behavioral structures. In an effort to overcome this limitation, the current study used video recording and tracking software to delve deeper into memory-guided behavior. This innovative approach allowed the authors to uncover novel neurons and examine their contribution to behavior with a level of detail not possible with binary choice assays.

      2) This study used electron microscopy-based Drosophila hemibrain connectome data to reveal the synaptic connection between UpWiNs and MBON-α1 and MBON-α3. Using this method, the study shows that a single UpWiN receives direct input from both MBON-α1 and MBON- α3, which is confirmed by a functional imaging experiment. The connectome dataset also reveals several neurons downstream of UpWiNs, opening avenues for further research into the neural mechanisms linking memory and behavior.

      Weakness

      1) The authors repeatedly state in the manuscript that MBON-α1 and MBON-α3 convey appetitive or aversive memories, respectively. This assertion may not be entirely accurate. Evidence from sugar reward conditioning experiments suggests that MBON-α3 is potentiated and required for sugar reward memory retrieval. Therefore, the compartmentalization for appetitive and aversive memories appears not as obvious at the level of MBONs.

      What we intended was that activation of DANs in these compartments can induce aversive and appetitive memories, respectively, when paired with odors, and that these are the sole output pathway from these compartments to read out the memories in these compartments. As we previously proposed (Aso et al., 2014a eLife), these MBONs can integrate inputs from MBONs of other compartments and their activity can reflect appetitive memory stored as synaptic plasticity in other compartments. Since DANs in the α3 compartment respond to heat, bitter and electric shock but not sugar, the observation that MBON-α3 acquires an enhanced CS+ odor response after appetitive conditioning is presumably due to these intercompartmental connections rather than plasticity of KC-MBON synapses in the α3 compartment. In any case, the fact that excitatory activity of MBON-α1 and MBON-α3 conveys opposite valence of memory still holds true since appetitive conditioning induces depression and potentiation of odor responses, respectively.

      To clarify this point, we now cited related literature in the following sentence in the final paragraph of Introduction: “UpWiNs receive inputs from several types of lateral horn neurons and integrate inhibitory and excitatory inputs from MBON-α1 and MBON-α3, which are the output neurons of MB compartments that store long-lasting appetitive or aversive memories, respectively (Aso and Rubin, 2016; Ichinose et al., 2015; Jacob and Waddell, 2022a; Pai et al., 2013; Yamagata et al., 2015).”

      2) This study did not conclusively establish the importance of the MBON-α1/α3 to UpWiN pathways in memory-driven behavior. In the experiments shown in Figure 5, flies were trained to associate the activation of reward-related DANs with a specific odor (CS+). After conditioning, UpWiNs were observed to show enhanced responses to the CS+ odor. However, the results should be interpreted with caution because the driver line used to activate DANs (R58E02-LexAp65) labels not only DANs projecting to the MBON-α1 compartment, but all DANs in the protocerebral anterior medial (PAM) cluster. Thus, it remains unclear to what extent the observed enhanced responses are influenced by changes in inhibitory inputs from MBON-α1. While UpWiNs have been shown to play a critical role in the expression of sugar reward memory (Figure 7), it should be noted that UpWiNs receive inputs from multiple upstream neurons, making it difficult to accurately assess the contribution of MBON-α1/α3 to UpWiN pathways in UpWiN recruitment. Further research is needed to fully address this issue.

      We totally agree with this point and added a sentence to explain an alternative mechanism. “This enhancement of CS+ response can be most easily explained as an outcome of disinhibition from MBON-α1 whose output had been decreased by memory formation; MBON-α1 is inhibitory to UpWiNs (Figure 4B) and MBON-α1 response to the CS+ is reduced following the same training protocol (Yamada et al. 2023). In addition to such a mechanism, plasticity in the β1 compartment may contribute to the enhanced CS+ response in UpWiNs because the driver R58E02 contains DANs in the β1 and glutamatergic MBON from the β1 directly synapse on the dendrites of MBON-α1 and MBON-α3. “

      3) UpWind neurons (UpWiNs) were so named because their activation promotes upwind locomotion. However, when activated in the absence of airflow, flies show increased locomotor speed and an increased probability of revisiting the same location (Figure 7 and Figure 7-figure supplement 1). The revisiting behavior can be observed during the activation of UpWiNs, which is distinct from the local search behavior that typically begins after a reward stimulus is turned off (e.g., Gr64f-GAL4 results in Figure 7-figure supplement 1).

      Return probability was calculated within a 15-s time window. High return probability during LED ON period (10-20s) in Figure 7-figure supplement 1 does not necessarily mean that flies returned during LED ON period. If a fly is at the position A when t=10s, to be counted as “returned”, it needs to move more than 10mm away from A and move back to the position less than 3mm distance from A by t=25s. In the case of sugar sensory neuron activation with Gr64f-GAL4, the peak of return probability is shifted toward a later time point because flies stop and extend proboscis during activation period.

      Because revisiting a location can also be a consequence of repeated turns, it seems more accurate to describe UpWiNs as controlling the speed and likelihood of turns and promoting upwind movement by integrating with neurons that sense the direction of airflow.

      The return probability plotted in Figure 7E is probability of return to the position at the end of LED period within 15s post LED period when angular speed of SS33917>CsChrimson and SS33918>CsChrimson flies are identical to empty-split-GAL4>CsChrimson control flies (Figure 7-figure supplement 1). Thus, revisiting behavior cannot be explained by a simple increase in turing probability.

      Although functions of UpWiNs are not limited to promotion of wind-directed walking, we still think that the “UpWind Neurons” is a practical name for broad readers and oral communications at the current stage of investigations, because EM neuron IDs and names (SMP348, SMP353, SMP354, SLP399 and SLP400) are too lengthy and do not contain any functional information. We initially defined a set of 11 neurons labeled by SS33197 split-GAL4 as “UpWind Neurons (UpWiNs)” based on initial optogenetic screening (Figure 2A). We found other driver lines for mushroom body interneuron cell types that can promote release of dopamine and more robust returning phenotype (e.g. SS49755), but SS33917 remained to be the champion driver line for upwind locomotion phenotype.

      Reviewer #3 (Public Review):

      Aso et al. provide insight into how learned valences are transformed into concrete memory-driven actions, using a diverse set of proven techniques.

      Here the authors use a four-armed arena to evaluate flies' preference for a reward-predicting odor and measure upwind locomotion. This behavioral paradigm was combined with the photoactivation of different memory-eliciting neurons, revealing that appetitive memories stored in different compartments of the mushroom bodies (center of olfactory memory) induce different levels of upwind locomotion. The authors then proceed to a non-exhaustive optogenetic screen of the neurons located downstream of the output neurons of the mushroom bodies (MBONs) and identify a group of 8-11 Cholinergic neurons promoting significant changes in upwind locomotion, the UpWins. By combining confocal immunolabelling of these neurons with electron microscope images, they manage to establish the UpWins' connectome within themselves and with the MBONs. Then, using two in vivo cell recording techniques, electrophysiology, and calcium imaging, they define that UpWins integrate both inhibitory and excitatory synaptic inputs from the MBONs encoding appetitive and aversive memory, respectively. In addition, they show that the UpWins' response to a reward-predicting odor is increased after appetitive training. On a behavioral level, the authors establish that the UpWins respond to wind direction only and are not involved in lower-level motor parameters, such as turning direction and acceleration. Finally, they demonstrate that the UpWins' activity is necessary for long-term appetitive memory retrieval, and even suggest a broader role for the UpWins in olfactory navigation, as their photoactivation increases the probability of revisiting behavior. In the end, the authors state that they provide new insights into how memory is translated into concrete behavior, which is fully supported by their data. Altogether, the authors present a pretty complete study that provides very interesting and reliable data, and that opens a new field of investigation into memory-driven behaviors.

      Strengths of the study:

      • To support their conclusions, the authors provide detailed data from different levels of analysis (behavioral, cellular, and molecular), using multiple sophisticated techniques.

      • The measurement of multiple parameters in the behavioral analysis supports the strong changes in upwind locomotion. In addition, taken individually these parameters provide precise insights into how upwind locomotion changes, and allow the authors to more precisely define the role of the UpWins.

      • The authors use split-Gal4 drivers instead of Gal4, allowing them to better refine neuron labelling.

      The authors discussed and investigated all possible biases, making their data very reliable. For example, they demonstrated that the phenotypes observed in the behavioral assay were wind-directed behaviors and could not be explained by bias avoidance of the arena's center area.

      Limitations of the study:

      • In the absence of more precise drivers, the UpWins' labelling lacks precision. For example, there is no way to know exactly which UpWin is responding in the electrophysiological experiment presented in Figure 4.

      We have ongoing efforts to generate split-GAL4 and split-LexA driver lines for specific subsets of UpWiN neurons, but the data using those lines are not ready for this manuscript. However, we would like to point out that historically, identification of a group of neurons with striking phenotype has been foundational to promote follow-up studies. A good example is P1 neurons for courtship behavior.

      • The screening of neurons located downstream of the MBONs is not exhaustive, meaning that other groups of neurons might be involved in memory-driven upwind locomotion. Although, it does not diminish the authors' conclusions.

      The UpWiNs is certainly not the only one cell type for mediating memory-driven upwind locomotion, since our and other groups’ studies (e.g. Matheson et al., 2022; PMCID: PMC9360402) identified a collection of cell types that can promote upwind locomotion upon optogenetic activation.

      In 2021, we released images and driver lines of a larger collection of split-GAL4 driver lines at https://splitgal4.janelia.org. We are preparing a manuscript to provide anatomical descriptions of these lines. This collection of new drivers will help elucidate more comprehensive views of circuits for memory-driven actions.

      • All data were obtained with walking flies. So far, there have been no experiments on flying flies.

      This is an intriguing question and we mentioned in Discussion that “Our study was limited to walking behaviors, and the role of UpWiNs in flight behaviors remains to be investigated.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors present a PyTorch-based simulator for prosthetic vision. The model takes in the anatomical location of a visual cortical prostheses as well as a series of electrical stimuli to be applied to each electrode, and outputs the resulting phosphenes. To demonstrate the usefulness of the simulator, the paper reproduces psychometric curves from the literature and uses the simulator in the loop to learn optimized stimuli.

      One of the major strengths of the paper is its modeling work - the authors make good use of existing knowledge about retinotopic maps and psychometric curves that describe phosphene appearance in response to single-electrode stimulation. Using PyTorch as a backbone is another strength, as it allows for GPU integration and seamless integration with common deep learning models. This work is likely to be impactful for the field of sight restoration.

      1) However, one of the major weaknesses of the paper is its model validation - while some results seem to be presented for data the model was fit on (as opposed to held-out test data), other results lack quantitative metrics and a comparison to a baseline ("null hypothesis") model. On the one hand, it appears that the data presented in Figs. 3-5 was used to fit some of the open parameters of the model, as mentioned in Subsection G of the Methods. Hence it is misleading to present these as model "predictions", which are typically presented for held-out test data to demonstrate a model's ability to generalize. Instead, this is more of a descriptive model than a predictive one, and its ability to generalize to new patients remains yet to be demonstrated.

      We agree that the original presentation of the model fits might give rise to unwanted confusion. In the revision, we have adapted the fit of the thresholding mechanism to include a 3-fold cross validation, where part of the data was excluded during the fitting, and used as test sets to calculate the model’s performance. The results of the cross- validation are now presented in panel D of Figure 3. The fitting of the brightness and temporal dynamics parameters using cross-validation was not feasible due to the limited amount of quantitative data describing temporal dynamics and phosphene size and brightness for intracortical electrodes. To avoid confusion, we have adapted the corresponding text and figure captions to specify that we are using a fit as description of the data.

      We note that the goal of the simulator is not to provide a single set of parameters that describes precise phosphene perception for all patients but that it could also be used to capture variability among patients. Indeed, the model can be tailored to new patients based on a small data set. Figure 3-figure supplement 1 exemplifies how our simulator can be tailored to several data sets collected from patients with surface electrodes. Future clinical experiments might be used to verify how well the simulator can be tailored to the data of other patients.

      Specifically, we have made the following changes to the manuscript:

      • Caption Figure 2: the fitted peak brightness levels reproduced by our model

      • Caption Figure 3: The model's probability of phosphene perception is visualized as a function of charge per phase

      • Caption Figure 3: Predicted probabilities in panel (d) are the results of a 3-fold cross- validation on held-out test data.

      • Line 250: we included biologically inspired methods to model the perceptual effects of different stimulation parameters

      • Line 271: Each frame, the simulator maps electrical stimulation parameters (stimulation current, pulse width and frequency) to an estimated phosphene perception

      • Lines 335-336: such that 95% of the Gaussian falls within the fitted phosphene size.

      • Line 469-470: Figure 4 displays the simulator's fit on the temporal dynamics found in a previous published study by Schmidt et al. (1996).

      • Lines 922-925: Notably, the trade-off between model complexity and accurate psychophysical fits or predictions is a recurrent theme in the validation of the components implemented in our simulator.

      2) On the other hand, the results presented in Fig. 8 as part of the end-to-end learning process are not accompanied by any sorts of quantitative metrics or comparison to a baseline model.

      We now realize that the presentation of the end-to-end results might have given the impression that we present novel image processing strategies. However, the development of a novel image processing strategy is outside the scope of the study. Instead, The study aims to provide an improved simulation which can be used for more realistic assessment of different stimulation protocols. The simulator needs to fit experimental data, and it should run fast (so it can be used in behavioral experiments). Importantly, as demonstrated in our end-to-end experiments, the model can be used in differentiable programming pipelines (so it can be used in computational optimization experiments), which is a valuable contribution in itself because it lends itself to many machine learning approaches which can improve the realism of the simulation.

      We have rephrased our study aims in the discussion to improve clarity.

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments

      • Lines 810-814: Computational optimization approaches can also aid in the development of safe stimulation protocols, because they allow a faster exploration of the large parameter space and enable task-driven optimization of image processing strategies (Granley et al., 2022; Fauvel et al., 2022; White et al., 2019; Küçükoglü et al. 2022; de Ruyter van Steveninck et al., 2022; Ghaffari et al., 2021).

      • Lines 814-819: Ultimately, the development of task-relevant scene-processing algorithms will likely benefit both from computational optimization experiments as well as exploratory SPV studies with human observers. With the presented simulator we aim to contribute a flexible toolkit for such experiments.

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end user (Fauvel et al., 2022; Beyeler & Sanchez-Garcia, 2022).

      3) The results seem to assume that all phosphenes are small Gaussian blobs, and that these phosphenes combine linearly when multiple electrodes are stimulated. Both assumptions are frequently challenged by the field. For all these reasons, it is challenging to assess the potential and practical utility of this approach as well as get a sense of its limitations.

      The reviewer raises a valid point and a similar point was raised by a different reviewer (our response is duplicated). As pointed out in the discussion, many aspects about multi- electrode phosphene perception are still unclear. On the one hand, the literature is in agreement that there is some degree of predictability: some papers explicitly state that phosphenes produced by multiple patterns are generally additive (Dobelle & Mladejovsky, 1974), that the locations are predictable (Bosking et al., 2018) and that multi-electrode stimulation can be used to generate complex, interpretable patterns of phosphenes (Chen et al., 2020, Fernandez et al., 2021). On the other hand, however, in some cases, the stimulation of multiple electrodes is reported to lead to brighter phosphenes (Fernandez et al., 2021), fused or displaced phosphenes (Schmidt et al., 1996, Bak et al., 1990) or unpredicted phosphene patterns (Fernández et al., 2021). It is likely that the probability of these interference patterns decreases when the distance between the stimulated electrodes increases. An empirical finding is that the critical distance for intracortical stimulation is approximately 1 mm (Ghose & Maunsell, 2012).

      We note that our simulator is not restricted to the simulation of linearly combined Gaussian blobs. Some irregularities, such as elongated phosphene shapes were already supported in the previous version of our software. Furthermore, we added a supplementary figure that displays a possible approach to simulate some of the more complex electrode interactions that are reported in the literature, with only minor adaptations to the code. Our study thereby aims to present a flexible simulation toolkit that can be adapted to the needs of the user.

      Adjustments:

      • Added Figure 1-figure supplement 3 on irregular phosphene percepts.

      • Lines 957-970: Furthermore, in contrast to the assumptions of our model, interactions between simultaneous stimulation of multiple electrodes can have an effect on the phosphene size and sometimes lead to unexpected percepts (Fernandez et al., 2021, Dobelle & Mladejovsky 1974, Bak et al., 1990). Although our software supports basic exploratory experimentation of non-linear interactions (see Figure 1-figure supplement 3), by default, our simulator assumes independence between electrodes. Multi- phosphene percepts are modeled using linear summation of the independent percepts. These assumptions seem to hold for intracortical electrodes separated by more than 1 mm (Ghose & Maunsell, 2012), but may underestimate the complexities observed when electrodes are nearer. Further clinical and theoretical modeling work could help to improve our understanding of these non-linear dynamics.

      4) Another weakness of the paper is the term "biologically plausible", which appears throughout the manuscript but is not clearly defined. In its current form, it is not clear what makes this simulator "biologically plausible" - it certainly contains a retinotopic map and is fit on psychophysical data, but it does not seem to contain any other "biological" detail.

      We thank the reviewer for the remark. We improved our description of what makes the simulator “biologically plausible” in the introduction (line 78): ‘‘Biological plausibility, in our work's context, points to the simulation's ability to capture essential biological features of the visual system in a manner consistent with empirical findings: our simulator integrates quantitative findings and models from the literature on cortical stimulation in V1 [...]”. In addition, we mention in the discussion (lines 611 - 621): “The aim of this study is to present a biologically plausible phosphene simulator, which takes realistic ranges of stimulation parameters, and generates a phenomenologically accurate representation of phosphene vision using differentiable functions. In order to achieve this, we have modeled and incorporated an extensive body of work regarding the psychophysics of phosphene perception. From the results presented in section H, we observe that our simulator is able to produce phosphene percepts that match the descriptions of phosphene vision that were gathered in basic and clinical visual neuroprosthetics studies over the past decades.”

      5) In fact, for the most part the paper seems to ignore the fact that implanting a prosthesis in one cerebral hemisphere will produce phosphenes that are restricted to one half of the visual field. Yet Figures 6 and 8 present phosphenes that seemingly appear in both hemifields. I do not find this very "biologically plausible".

      We agree with the reviewer that contemporary experiments with implantable electrodes usually test electrodes in a single hemisphere. However, future clinically useful approaches should use bilaterally implanted electrode arrays. Our simulator can either present phosphene locations in either one or both hemifields.

      We have made the following textual changes:

      • Fig. 1 caption: Example renderings after initializing the simulator with four 10 × 10 electrode arrays (indicated with roman numerals) placed in the right hemisphere (electrode spacing: 4 mm, in correspondence with the commonly used 'Utah array' (Maynard et al., 1997)).

      • Line 518-525: The simulator is initialized with 1000 possible phosphenes in both hemifields, covering a field of view of 16 degrees of visual angle. Note that the simulated electrode density and placement differs from current prototype implants and the simulation can be considered to be an ambitious scenario from a surgical point of view, given the folding of the visual cortex and the part of the retinotopic map in V1 that is buried in the calcarine sulcus. Line 546-547: with the same phosphene coverage as the previously described experiment

      Reviewer #2 (Public Review):

      Van der Grinten and De Ruyter van Steveninck et al. present a design for simulating cortical- visual-prosthesis phosphenes that emphasizes features important for optimizing the use of such prostheses. The characteristics of simulated individual phosphenes were shown to agree well with data published from the use of cortical visual prostheses in humans. By ensuring that functions used to generate the simulations were differentiable, the authors permitted and demonstrated integration of the simulations into deep-learning algorithms. In concept, such algorithms could thereby identify parameters for translating images or videos into stimulation sequences that would be most effective for artificial vision. There are, however, limitations to the simulation that will limit its applicability to current prostheses.

      The verification of how phosphenes are simulated for individual electrodes is very compelling. Visual-prosthesis simulations often do ignore the physiologic foundation underlying the generation of phosphenes. The authors' simulation takes into account how stimulation parameters contribute to phosphene appearance and show how that relationship can fit data from actual implanted volunteers. This provides an excellent foundation for determining optimal stimulation parameters with reasonable confidence in how parameter selections will affect individual-electrode phosphenes.

      We thank the reviewer for these supportive comments.

      Issues with the applicability and reliability of the simulation are detailed below:

      1) The utility of this simulation design, as described, unfortunately breaks down beyond the scope of individual electrodes. To model the simultaneous activation of multiple electrodes, the authors' design linearly adds individual-electrode phosphenes together. This produces relatively clean collections of dots that one could think of as pixels in a crude digital display. Modeling phosphenes in such a way assumes that each electrode and the network it activates operate independently of other electrodes and their neuronal targets. Unfortunately, as the authors acknowledge and as noted in the studies they used to fit and verify individual-electrode phosphene characteristics, simultaneous stimulation of multiple electrodes often obscures features of individual-electrode phosphenes and can produce unexpected phosphene patterns. This simulation does not reflect these nonlinearities in how electrode activations combine. Nonlinearities in electrode combinations can be as subtle the phosphenes becoming brighter while still remaining distinct, or as problematic as generating only a single small phosphene that is indistinguishable from the activation of a subset of the electrodes activated, or that of a single electrode.

      If a visual prosthesis happens to generate some phosphenes that can be elicited independently, a simulator of this type could perhaps be used by processing stimulation from independent groups of electrodes and adding their phosphenes together in the visual field.

      The reviewer raises a valid point and a similar point was raised by a different reviewer (our response is duplicated). As pointed out in the discussion, many aspects about multi- electrode phosphene perception are still unclear. On the one hand, the literature is in agreement that there is some degree of predictability: some papers explicitly state that phosphenes produced by multiple patterns are generally additive (Dobelle & Mladejovsky, 1974), that the locations are predictable (Bosking et al., 2018) and that multi-electrode stimulation can be used to generate complex, interpretable patterns of phosphenes (Chen et al., 2020, Fernandez et al., 2021). On the other hand, however, in some cases, the stimulation of multiple electrodes is reported to lead to brighter phosphenes (Fernandez et al., 2021), fused or displaced phosphenes (Schmidt et al., 1996, Bak et al., 1990) or unpredicted phosphene patterns (Fernández et al., 2021). It is likely that the probability of these interference patterns decreases when the distance between the stimulated electrodes increases. An empirical finding is that the critical distance for intracortical stimulation is approximately 1 mm (Ghose & Maunsell, 2012).

      We note that our simulator is not restricted to the simulation of linearly combined Gaussian blobs. Some irregularities, such as elongated phosphene shapes were already supported in the previous version of our software. Furthermore, we added a supplementary figure that displays a possible approach to simulate some of the more complex electrode interactions that are reported in the literature, with only minor adaptations to the code. Our study thereby aims to present a flexible simulation toolkit that can be adapted to the needs of the user.

      Adjustments:

      • Lines 957-970: Furthermore, in contrast to the assumptions of our model, interactions between simultaneous stimulation of multiple electrodes can have an effect on the phosphene size and sometimes lead to unexpected percepts (Fernandez et al., 2021, Dobelle & Mladejovsky 1974, Bak et al., 1990). Although our software supports basic exploratory experimentation of non-linear interactions (see Figure 1-figure supplement 3), by default, our simulator assumes independence between electrodes. Multi- phosphene percepts are modeled using linear summation of the independent percepts. These assumptions seem to hold for intracortical electrodes separated by more than 1 mm (Ghose & Maunsell, 2012), but may underestimate the complexities observed when electrodes are nearer. Further clinical and theoretical modeling work could help to improve our understanding of these non-linear dynamics.

      • Added Figure 1-figure supplement 3 on irregular phosphene percepts.

      2) Verification of how the simulation renders individual phosphenes based on stimulation parameters is an important step in confirming agreement between the simulation and the function of implanted devices. That verification was well demonstrated. The end use a visual-prosthesis simulation, however, would likely not be optimizing just the appearance of phosphenes, but predicting and optimizing functional performance in visual tasks. Investigating whether this simulator can suggest visual-task performance, either with sighted volunteers or a decoder model, that is similar to published task performance from visual-prosthesis implantees would be a necessary step for true validation.

      We agree with the reviewer that it will be vital to investigate the utility of the simulator in tasks. However, the literature on the performance of users of a cortical prosthesis in visually-guided tasks is scarce, making it difficult to compare task performance between simulated versus real prosthetic vision.

      Secondly, the main objective of the current study is to propose a simulator that emulates the sensory / perceptual experience, i.e. the low-level perceptual correspondence. Once more behavioral data from prosthetic users become available, studies can use the simulator to make these comparisons.

      Regarding the comparison to simulated prosthetic vision in sighted volunteers, there are some fundamental limitations. For instance, sighted subjects are exposed for a shorter duration to the (simulated) artificial percept and lack the experience and training that prosthesis users get. Furthermore, sighted subjects may be unfamiliar with compensation strategies that blind individuals have developed. It will therefore be important to conduct clinical experiments.

      To convey more clearly that our experiments are performed to verify the practical usability in future behavioral experiments, we have incorporated the following textual adjustments:

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments.

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end (Fauvel et al., 2022; Beyeler & Sanchez- Garcia, 2022).

      3) A feature of this simulation is being able to convert stimulation of V1 to phosphenes in the visual field. If used, this feature would likely only be able to simulate a subset of phosphenes generated by a prosthesis. Much of V1 is buried within the calcarine sulcus, and electrode placement within the calcarine sulcus is not currently feasible. As a result, stimulation of visual cortex typically involves combinations of the limited portions of V1 that lie outside the sulcus and higher visual areas, such as V2.

      We agree that some areas (most notably the calcarine sulcus) are difficult to access in a surgical implantation procedure. A realistic simulation of state-of-the-art cortical stimulation should only partially cover the visual field with phosphenes. However, it may be predicted that some of these challenges will be addressed by new technologies. We chose to make the simulator as generally applicable as possible and users of the simulator can decide which phosphene locations are simulated. To demonstrate that our simulator can be flexibly initialized to simulate specific implantation locations using third- party software, we have now added a supplementary figure (Figure 1-figure supplement 1) that displays a demonstration of an electrode grid placement on a 3D brain model, generating the phosphene locations from receptive field maps. However, the simulator is general and can also be used to guide future strategies that aim to e.g. cover the entire field with electrodes, compare performance between upper and lower hemifields etc.

      Reviewer #3 (Public Review):

      The authors are presenting a new simulation for artificial vision that incorporates many recent advances in our understanding of the neural response to electrical stimulation, specifically within the field of visual prosthetics. The authors succeed in integrating multiple results from other researchers on aspects of V1 response to electrical stimulation to create a system that more accurately models V1 activation in a visual prosthesis than other simulators. The authors then attempt to demonstrate the value of such a system by adding a decoding stage and using machine-learning techniques to optimize the system to various configurations.

      1) While there is merit to being able to apply various constraints (such as maximum current levels) and have the system attempt to find a solution that maximizes recoverable information, the interpretability of such encodings to a hypothetical recipient of such a system is not addressed. The authors demonstrate that they are able to recapitulate various standard encodings through this automated mechanism, but the advantages to using it as opposed to mechanisms that directly detect and encode, e.g., edges, are insufficiently justified.

      We thank the reviewer for this constructive remark. Our simulator is designed for more realistic assessment of different stimulation protocols in behavioral experiments or in computational optimization experiments. The presented end-to-end experiments are a demonstration of the practical usability of our simulator in computational experiments, building on a previously existing line of research. In fact, our simulator is compatible with any arbitrary encoding strategy.

      As our paper is focused on the development of a novel tool for this existing line of research, we do not aim to make claims about the functional quality of end-to-end encoders compared to alternative encoding methods (such as edge detection). That said, we agree with the reviewer that it is useful to discuss the benefits of end-to-end optimization compared to e.g. edge detection will be useful.

      We have incorporated several textual changes to give a more nuanced overview and to acknowledge that many benefits remain to be tested. Furthermore, we have restated our study aims more clearly in the discussion to clarify the distinction between the goals of the current paper and the various encoding strategies that remain to be tested.

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments

      • Lines 810-814: Computational optimization approaches can also aid in the development of safe stimulation protocols, because they allow a faster exploration of the large parameter space and enable task-driven optimization of image processing strategies (Granley et al., 2022; Fauvel et al., 2022; White et al., 2019; Küçükoglü et al. 2022; de Ruyter van Steveninck, Güçlü et al., 2022; Ghaffari et al., 2021).

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end user (Fauvel et al., 2022; Beyeler & Sanchez-Garcia, 2022).

      2) The authors make a few mistakes in their interpretation of biological mechanisms, and the introduction lacks appropriate depth of review of existing literature, giving the reader the mistaken impression that this is simulator is the only attempt ever made at biologically plausible simulation, rather than merely the most recent refinement that builds on decades of work across the field.

      We thank the reviewer for this insight. We have improved the coverage of the previous literature to give credit where credit is due, and to address the long history of simulated phosphene vision.

      Textual changes:

      • Lines 64-70: Although the aforementioned SPV literature has provided us with major fundamental insights, the perceptual realism of electrically generated phosphenes and some aspects of the biological plausibility of the simulations can be further improved and by integrating existing knowledge of phosphene vision and its underlying physiology.

      • Lines 164-190: The aforementioned studies used varying degrees of simplification of phosphene vision in their simulations. For instance, many included equally-sized phosphenes that were uniformly distributed over the visual field (informally referred to as the ‘scoreboard model’). Furthermore, most studies assumed either full control over phosphene brightness or used binary levels of brightness (e.g. 'on' / 'off'), but did not provide a description of the associated electrical stimulation parameters. Several studies have explicitly made steps towards more realistic phosphene simulations, by taking into account cortical magnification or using visuotopic maps (Fehervari et al., 2010;, Li et al., 2013; Srivastava et al., 2009; Paraskevoudi et al., 2021), simulating noise and electrode dropout (Dagnelie et al., 2007), or using varying levels of brightness (Vergnieux et al., 2017; Sanchez-Garcia et al., 2022; Parikh et al., 2013). However, no phosphene simulations have modeled temporal dynamics or provided a description of the parameters used for electrical stimulation. Some recent studies developed descriptive models of the phosphene size or brightness as a function of the stimulation parameters (Winawer et al., 2016; Bosking et al., 2017). Another very recent study has developed a deep-learning based model for predicting a realistic phosphene percept for single stimulating electrodes (Granley et al., 2022). These studies have made important contributions to improve our understanding of the effects of different stimulation parameters. The present work builds on these previous insights to provide a full simulation model that can be used for the functional evaluation of cortical visual prosthetic systems.

      • Lines 137-140: Due to the cortical magnification (the foveal information is represented by a relatively large surface area in the visual cortex as a result of variation of retinal RF size) the size of the phosphene increases with its eccentricity (Winawer & Parvizi, 2016, Bosking et al., 2017).

      • Lines 883-893: Even after loss of vision, the brain integrates eye movements for the localization of visual stimuli (Reuschel et al., 2012), and in cortical prostheses the position of the artificially induced percept will shift along with eye movements (Brindley & Lewin, 1968, Schmidt et al., 1996). Therefore, in prostheses with a head-mounted camera, misalignment between the camera orientation and the pupillary axes can induce localization problems (Caspi et al., 2018; Paraskevoudi & Pezaris, 2019; Sabbah et al., 2014; Schmidt et al., 1996). Previous SPV studies have demonstrated that eye-tracking can be implemented to simulate the gaze-coupled perception of phosphenes (Cha et al., 1992; Sommerhalder et al., 2004; Dagnelie et al., 2006; McIntosh et al., 2013, Paraskevoudi & Pezaris, 2021; Rassia & Pezaris 2018, Titchener et al., 2018, Srivastava et al., 2009)

      3) The authors have importantly not included gaze position compensation which adds more complexity than the authors suggest it would, and also means the simulator lacks a basic, fundamental feature that strongly limits its utility.

      We agree with the reviewer that the inclusion of gaze position to simulate gaze-centered phosphene locations is an important requirement for a realistic simulation. We have made several textual adjustments to section M1 to improve the clarity of the explanation and we have added several references to address the simulation literature that took eye movements into account.

      In addition, we included a link to some demonstration videos in which we illustrate that the simulator can be used for gaze-centered phosphene simulation. The simulation models the phosphene locations based on the gaze direction, and updates the input with changes in the gaze direction. The stimulation pattern is chosen to encode the visual environment at the location where the gaze is directed. Gaze contingent processing has been implemented in prior simulation studies (for instance: Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018) and even in the clinical setting with users of the Argus II implant (Caspi et al., 2018). From a modeling perspective, it is relatively straightforward to simulate gaze-centered phosphene locations and gaze contingent image processing (our code will be made publicly available). At the same time, however, seen from a clinical and hardware engineering perspective, the implementation of eye-tracking in a prosthetic system for blind individuals might come with additional complexities. This is now acknowledged explicitly in the manuscript.

      Textual adjustment:

      Lines 883-910: Even after loss of vision, the brain integrates eye movements for the localization of visual stimuli (Reuschel et al., 2012), and in cortical prostheses the position of the artificially induced percept will shift along with eye movements (Brindley & Lewin, 1968, Schmidt et al., 1996). Therefore, in prostheses with a head-mounted camera, misalignment between the camera orientation and the pupillary axes can induce localization problems (Caspi et al., 2018; Paraskevoudi & Pezaris, 2019; Sabbah et al., 2014; Schmidt et al., 1996). Previous SPV studies have demonstrated that eye-tracking can be implemented to simulate the gaze-coupled perception of phosphenes (Cha et al., 1992; Sommerhalder et al., 2004; Dagnelie et al., 2006, McIntosh et al., 2013; Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018; Srivastava et al., 2009). Note that some of the cited studies implemented a simulation condition where not only the simulated phosphene locations, but also the stimulation protocol depended on the gaze direction. More specifically, instead of representing the head-centered camera input, the stimulation pattern was chosen to encode the external environment at the location where the gaze was directed. While further research is required, there is some preliminary evidence that such a gaze-contingent image processing can improve the functional and subjective quality of prosthetic vision (Caspi et al., 2018; Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018). Some example videos of gaze-contingent simulated prosthetic vision can be retrieved from our repository (https://github.com/neuralcodinglab/dynaphos/blob/main/examples/). Note that an eye-tracker will be required to produce gaze-contingent image processing in visual prostheses and there might be unforeseen complexities in the clinical implementation thereof. The study of oculomotor behavior in blind individuals (with or without a visual prosthesis) is still an ongoing line of research (Caspi et al.,2018; Kwon et al., 2013; Sabbah et al., 2014; Hafed et al., 2016).

      4) Finally, the computational capacity required to run the described system is substantial and is not one that would plausibly be used as part of an actual device, suggesting that there may be difficulties with converting results from this simulator to an implantable system.

      The software runs in real time with affordable, consumer-grade hardware. In Author response image 1 we present the results of performance testing with a 2016 model MSI GeForce GTX 1080 (priced around €600).

      Author response image 1.

      Note that the GPU is used only for the computation and rendering of the phosphene representations from given electrode stimulation patterns, which will never be part of any prosthetic device. The choice of encoder to generate the stimulation patterns will determine the required processing capacity that needs to be included in the prosthetic system, which is unrelated to the simulator’s requirements.

      The following addition was made to the text:

      • Lines 488-492: Notably, even on a consumer-grade GPU (e.g. a 2016 model GeForce GTX 1080) the simulator still reaches real-time processing speeds (>100 fps) for simulations with 1000 phosphenes at 256x256 resolution.

      5) With all of that said, the results do represent an advance, and one that could have wider impact if the authors were to reduce the computational requirements, and add gaze correction.

      We appreciate the kind compliment from the reviewer and sincerely hope that our revised manuscript meets their expectations. Their feedback has been critical to reshape and improve this work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer # 1

      Specific comments

      1) Figure 1: it is unclear how many mice were used for the described phenotypic analyses (panels D and E). Please clarify.

      We acknowledge that we made a mistake in failing to clearly describe the phenotypic analyses. In Figure 1D and E, we performed statistical analysis on the number of TEBs in whole mammary mounts. One mouse stained a mammary whole mount with Carmine-alum staining. Thus, “n” represents the 10 mice we analyzed. We have modified the legend of Figure 1 to " D, E. Quantification of the average number of TEBs and bifurcated TEBs in littermate Crb3fl/fl (n=10) and Crb3fl/fl;MMTV-Cre (n=10) mice at 8 weeks old" in lines 909-911.

      2) Figure 2: in panels B and C it is unclear how the data was quantified; the legend states "n=10", does this mean the experiment in B was done 10 times? And that 10 acini per condition were measured in panel C? In panel D a difference in 0.3% between NC and shCRB3 seems miniscule; do the authors mean 30% instead? And how many acini were counted per condition per (how many) experiments? Same applies to panels G and H, it is unclear how many cells were analyzed per (how many) experiments.

      Thanks for your suggestions. We failed to describe the details of the statistical analysis well in the experimental method. To provide a brief overview of our statistical analysis method, we took 3-4 random bright-field micrographs of each well in the chamber slide system and repeated the experiment three times. We then counted the number of acini in all micrographs (Figure 2B) and examined the diameter of all acini in each photograph, averaging the values as data (Figure 2C). We also determined the percentage of aberrant acini in each photograph, which was used as an analysis value (Figure 2D). We carefully confirmed that the vertical axis of Figure 3D was indeed mislabeled and should mean 30%, and revised the original figure. For IF analysis of the mitotic spindle orientation during lumen formation, we examined the division angle of one cell in one acinus that was mitotically dividing, 3-4 acini were randomly examined in each well in the chamber slide system, and this experiment was repeated three times (Figure 2G and H). Therefore, we have provided a detailed description of these issues in the Figure 2 legend. The revised parts are found in lines 922-924, lines 926-927, lines 929-930, and line 932.

      3) Figure 2: it would be desirable if authors were able to quantify the data in panels E and I.

      Thank you for your comments. According to your suggestions, we performed the quantitative analysis of Figure 2E and I, which is now presented in the new Figure 2D and H.

      4) For all cell-based assays using shRNA to knock down CRB3 (Fig. 2A-H; Fig. 3A-F; Fig. 4C-E; Fig. 5G-J; Fig. 6C; Fig. 7C, D; Fig. 8E-G), it would be desirable to perform rescue experiments to ensure that the observed phenotype of CRB3 depleted cells is specific and not due to off-target effects of the shRNA.

      Yes, rescue experiments involving overexpression of CRB3 in CRB3 depleted cells can accurately account for the specific phenotype as well as eliminate the off-target effects of shRNA. However, our group has long focused on the role of the cell polarity protein CRB3 in contact inhibition and tumorigenesis. Our previous studies have ruled out the off-target effects of shRNA and reported that CRB3 regulates contact inhibition and tumorigenesis through Hippo or Wnt signaling pathways (Cell Death Dis 2017;8(1):e2546, Oncogenesis 2017;6(4):e322, J Cell Mol Med 2018;22(7):3423-33). Therefore, we will pay close attention to rescue experiments to ensure experimental integrity and phenotypic specificity in our subsequent studies.

      5) Figure 3: how many cells were counted/measured per condition (in how many experiments) in panels B, D, H, F, G and H? In panels C and D, what is the CRB3 protein level in these cells? This is of relevance as protein overexpression per se could impinge on ciliation frequency. This question could be addressed by performing a western blot analysis with CRB3 antibody.

      We did not clearly describe the measurement and statistical analysis methods in the previous manuscript. Similarly, we took 3-4 random IF and SEM micrographs of each sample in one experiment, and this experiment was repeated three times. Subsequently, the number of ciliated cells and total cells were counted, and the proportion of ciliated cells was calculated (Figure 3B, D and F). In these figures, the cilium length of representative ciliated cells was measured in each photograph. In the knockout mouse model, we needed to find the intact mammary ductal lumen and renal tubule in IF staining of mouse mammary and renal tissue sections, with 5-6 random fields micrographs taken per slice, and the proportion of ciliated cell was measured by counting and taking the average. A total of ten mice were repeated in these experiments (Figure 3G and H). Therefore, the legend of Figure 3G and H has been partially modified and a detailed description has been added to the Figure 3 legend. The revised parts are in lines 945-946, lines 950-951, line 953.

      Thank you for your suggestions that we perform a western blot analysis with CRB3 antibody in Figure 3C and D. And we have added the western blotting with CRB3 analysis in the new Supplementary Figure 3A.

      6) Figure 3G: it is very difficult to see that the red stained structures are primary cilia.

      Yes, the staining structure of primary cilia in mammary ductal lumen are less clear than that of individual cells and in renal tubule in Figure 3G. We used recognized acetylated tubulin and γ-tubulin to stain the primary cilia, which were clearly labeled in individual cells. However, the labeled primary cilia in renal tubule were longer length and demonstrated a more pronounced structure than those in the mammary ductal lumen. In the mammary ductal lumen of the 10 mice we analyzed, the primary cilia showed shorter length and staining structure than the others shown in Figure 3G. This difference may be due to the distinct characteristics of primary cilia in different tissues.

      7) Figure 4B: how many cells were analyzed in how many experiments?

      Our statistical methods for analyzing cellular experiments using IF were essentially the same. We randomly selected 3-4 IF micrographs of each sample in one experiment, and this experiment was repeated three times. Subsequently, the number of colocalization cells and total cells were counted, and the proportion of cells with pericentrin and CRB3 colocalization was calculated (Figure 4B). The detailed description has been added to the Figure 4 legend. The revised part is in lines 962-963.

      8) Lines 217-219: since the cells were not stained with a cilia marker, only a centrosome marker, the claim that CRB3 localizes to the base of cilia is unsubstantiated.

      Thank you for your comments. The base of cilia is the basal body, which develops from the mother centriole of the centrosome (Cancer Res. 2006;66(13): 6463-7). Firstly, we found colocalization of CRB3 and pericentrin, a centrosome marker, in MCF10A cells (Figure 4A and B). Secondly, we verified the colocalization of CRB3 with γ-tubulin, a marker of basal body in primary cilia, in confluent quiescence cells (Figure 4C and D). In addition, we found that CRB3 was localized at the base of primary cilia labeled with acetylated tubulin (Figure 4E and F). Due to the species of commercialized CRB3 antibody, we were able to indirectly claim that CRB3 localizes to the base of cilia through these experiments.

      9) Figure 3 and Figure 4: is it problematic to use gamma tubulin as centrosome marker if CRB3 depletion causes reduced centrosomal recruitment of gamma tubulin ring complex components? Also, in Figure S3A no gamma tubulin staining can be seen in the lower panel, why?

      Thank you for your positive comments. As is well known, γ-tubulin is a marker of the centrosome, and we found that CRB3 depletion causes reduced centrosomal recruitment of gamma tubulin ring complex components. However, Our Figure 3 was illustrated the effect of CRB3 on ciliary assembly, and Figure 4 was analyzed the localization of CRB3 in primary cilia. In some reports on ciliary assembly, the fluorescent double staining of acetylated tubulin and γ-tubulin have been used to label primary cilia, and the effect of target genes on ciliary number and assembly were analyzed by these markers (Nature. 2013;502(7470): 254-7, Cell. 2007;130(4): 678-90 and so on). Although CRB3 affects the recruitment of gamma tubulin ring complex components, it does not affect the analysis of ciliary number and localization in Figures 3 and 4.

      In Figure S3A, green staining labeled with γ-tubulin could be clearly found in the lower left panel. The representative area from the left amplification may have been poorly selected, resulting in no γ-tubulin staining on the right side. We have updated the lower right panel in the new Supplementary Figure 3B.

      10) Figure S4A: the grouping of indicated proteins is factually wrong. For example, FBF1, SCLT1 and ODF2 are not IFT-B components, and several of the proteins indicated as localizing to the basal body also localize to (unciliated) centrioles. In contrast, CP110 is usually only found on unciliated centrioles and not mature basal bodies. Authors should consult the relevant literature and correct the figure accordingly. Alternatively, this misleading text/grouping could be removed from the figure. Furthermore, in the legend to Figure S4 there is no information provided about this quantitative analysis (how many independent experiments, which cells were analyzed etc.).

      Thank you for your helpful suggestions. We have taken your advice and removed this misleading information from the manuscript, Supplementary Figure 4A and its corresponding legend. In the legend to Supplementary Figure 4A, we have added the detailed information for this quantitative analysis in the legend. The revised legend is shown in lines 1098-1100.

      11) Figure S4B: how do authors know which of the bands correspond to CRB3 fusion protein?

      Based on the construction strategy of the CRB3-GFP fusion protein (Figure 6D) and its base sequence, we were able to calculate its molecular weight. Then the molecular weight of CRB3-GFP fusion protein was verified by western blotting (Figure 6F and 7A). Meanwhile, exogenous overexpression allowed for the production of the CRB3-GFP fusion protein in large quantities. Due to these features, we could know that the band indicated by the black arrow is most likely CRB3-GFP fusion proteins. In order to check the molecular weight, we have labeled the key molecular weight markers in the new Supplementary Figure 4B.

      12) Lines 251-253: this seems like data overinterpretation.

      Thank you for your comments. We have revised this sentence in lines 252-254.

      13) Lines 260-261: the data showing perturbed gamma tubulin localization is not convincing as data was not quantified.

      According to your suggestions, we performed the quantitative analysis of Figure 4C, which is now presented in the new Figure 4E.

      14) Figure 5H and Figure 6C: to show that the GCP6 IP actually worked, these blots should be probed also for GCP6.

      Thank you for your good suggestions. We have added these blots probed for GCP6 in new Figure 5H and 6C.

      15) Figure 5I: how many cells were analyzed in how many experiments?

      Our statistical methods for analyzing cellular experiments using IF were essentially the same. We took 3-4 random IF micrographs of each sample in one experiment, and this experiment was repeated three times. The detailed description has been added to the Figure 5 legend. The revised part is in lines 992-994.

      16) Figure S5: it looks like GPC6 and Rab11 are localizing all over the cell, are the antibodies used for the IFMs specific for these proteins?

      After checking the specificity of these antibodies used for the IFMs, we have decided to delete the corresponding results in the Supplementary Figure 5 and their description in the original manuscript.

      17) Lines 43, 89, and 314-315: the claim that CRB3 directly binds Rab11 is not supported by the data. The data provided only shows that these proteins interact indirectly. To show direct interaction, yeast-2-hybrid analysis or pull-down assays with purified proteins would be required.

      Thank you for your positive comments. Since we were unable to complete the relevant experiments to demonstrate direct interaction of two proteins, we have revised our conclusions. Replace " CRB3 directly binds Rab11" with " CRB3 binds Rab11" in the manuscript.

      18) Figure 6G and lines 314-315: this result is surprising as it indicates GTP- and GDP-locked versions of Rab11 have the same inhibitory effect on CRB3 binding? Please comment, and also indicate how data in Figure 6G was quantified (and how many independent experiments were used for the quantification).

      We were also puzzled by the results shown in Figure 6G. Based on the western blotting bands, we suspected that there may have been some issues with the experiment. Specifically, we believed that the inefficient transfection of Flag-Rab11aWT, Flag-Rab11a[Q70L], Flag-Rab11a[S20V], and Flag-Rab11a[S25N] plasmids, as well as the insufficient amount of GFP antibody used in the co-IP experiment, led to the corresponding bands being too weak and masking the true differences.

      To address this, we optimized the experimental conditions, strictly increased the experimental control, and repeated the experiment in triplicate. The new results are shown in the revised Figure 6G. The statistics from the three independent experiments revealed that CRB3b had a stronger interaction with Rab11a[Q70L] and Rab11a[S20V], while showing a weaker interaction with Rab11a[S25N], compared to Rab11aWT. As this result, we revised the original manuscript in lines 308-310 and added a detailed description to the Figure 6 legend in lines 1012-1013.

      19) Figure 8G: data needs to be quantified.

      Thank you for your comments. We replaced the unattractive bands in the western blotting of Figure 8G with better quality ones. The statistical analysis of the Figure 8G data is shown in Supplementary Figure 6.

      Further minor comments

      1) Abstract should indicate that this study describes conditional knockout of Crb3 in mouse mammary gland epithelial cells.

      This is good writing advice. We have added the relevant description in lines 40-42.

      2) Line 87: specify which gland (mammary?).

      We have modified to " mammary gland" in line 87.

      3) Line 140: sentence states that knockout of Crb3 is essential for branching morphogenesis in mammary gland development, I do not think this is correct.

      We have removed the inappropriate finding.

      4) Line 152: "formed more number" should be "formed more" or "formed higher number of".

      We modified "formed more number" to "formed more" in line 154.

      5) Lines 157-163: text and logic are difficult to follow for a non-expert.

      We have modified the logic of this paragraph, as detailed in lines 158-165.

      6) Figure 4A, C: figure resolution could be improved. It is difficult to see what the authors claim these figures are showing.

      The clarity of the original images in Figure 4A and C is acceptable, while the images on the right are electronically enlarged. Although there is a decrease in pixels, it can still display our findings.

      7) Figure 7D, E: images look pixelated.

      The clarity of the original images in Figure 7D and E is acceptable using a laser confocal microscope, while the images on the right are electronically enlarged.

      8) Line 222: unclear what authors mean by "detected a series".

      We modified "detected a series" to "some important" in line 226.

      9) Lines 221-225: which cells were used for the analysis in Fig. S4?

      We used MCF10A cells for the analysis in Supplementary Figure 4, and modified its legend in line 1098.

      10) Line 245: what is "cytomembrane"?

      We modified "cytomembrane" to "cell membrane" in lines 246-247.

      11) Lines 246-250: wording is unclear/difficult to understand.

      We have modified this paragraph, as detailed in lines 248-251.

      12) Line 273: should "regimented" be "sedimented"?

      We modified "regimented" to "sedimented" in line 274.

      13) Line 287-288: sentence does not make sense.

      We have removed this sentence.

      14) Figure 5A: it would be desirable to show the original dataset (Excel file) used for generating this figure.

      To maintain data integrity, we should provide the original dataset (Excel file). However, there are some unpublished data in this file that we must withhold for the time being. If needed, the corresponding author can be requested to provide the file.

      15) Lines 298-299: wording is unclear.

      We have modified this sentence, as detailed in lines 296-298.

      16) Lines 285-287: replace "instead of" with "but not".

      We modified "instead of" to "but not" in line 286.

      17) For all IFMs showing merged images of the green and red channel, please also show the red and green channel separately.

      Most of our fluorescence images are presented separately for each channel in this manuscript, with only a few merged images due to space limitations. This type of presentation is commonly used in published papers.

      18) Lines 326 and 327: replace "bonded" with "bound".

      We have modified in lines 322-323.

      19) Lines 327-328 and 361-364: wording is unclear/grammatically incorrect.

      We have modified these paragraphs, as detailed in line 323 and lines 357-360.

      20) Line 342: what is meant by "the combination of"?

      We modified "the combination of" to "the binding of" in line 338.

      21) Line 365: localization of what?

      This means "subcellular localization" in lines 360-361.  

      Reviewer # 2

      Major points

      1) CRB3 is present in mammals as 2 isoforms, A and B, originating from alternative splicing. In this study, the authors never mention this fact and when using approaches to KO or KD CRB3A/B they are likely to deplete both isoforms which have been shown to have different C-terminal domains and functions (Fan et al., 2007). This is also important for the CRB3 antibodies used in the study since according to the material and methods section they are either against the extracellular domain common to both isoforms or the intracellular domain which is only similar in the domain close to transmembrane between the 2 isoforms. Since the antibodies used in each figure are not detailed it is impossible to know if the authors are detecting CRB3A or B or both. Please provide the information and correct for the actual isoform detected in the data and conclusions.

      Thanks for your positive comments. In mammals, CRB3 has two isoforms, CRB3a and CRB3b, distinguished by alternative splicing within the fourth exon of the CRB3 gene, which in turn produces a protein with 23 amino acid differences at the C terminus. Both CRB3a and CRB3b have mostly identical amino acid sequences, and have indistinguishable molecular weight sizes. As a result, the knockout mouse construction strategy and the design principles of RNAi sequences target both CRB3a and CRB3b. This is described in lines 100-104 and lines 149-150. Additionally, commercially available antibodies detect both CRB3a and CRB3b, as mentioned in line 123 and lines 636-637 in revised manuscript.

      However, it should be noted that our CRB3 overexpression, as shown in the CRB3 structural domain in Figure 6D, refers specifically to the sequence of CRB3b. As a result, we have updated the original manuscript as well as the legends of Figures 3C, 3E, 4A, 5A, 5B, 6D-G, 7A, 7B and Supplementary Figure 2F-H, 3A, 4B, 6B to reflect this change. All instances of overexpressed CRB3 have been changed to CRB3b.

      2) CRB3A and B have been localized in the cilium itself (Fan et al., 2004; 2007) but in the study CRB3A/B does not enter the cilium but is localized in the basal body (figure 4). How the authors reconcile these different localizations?

      Indeed, we found that CRB3 is mainly localized at the basal body of the primary cilium, which differs from previous reports in the literature (Curr Biol. 2004;14(16):1451-61 and J Cell Biol. 2007;178(3):387-98). However, upon closer examination of one of these reports (Curr Biol. 2004;14(16):1451-61), it appears that CRB3 was actually scattered on the primary cilia, with a strong focus at the basal body. Additionally, in rat kidney collecting ducts, the localization of CRB3 on primary cilia was significantly reduced, with obvious localization at the basal body. Another study (J Cell Biol. 2007;178(3):387-98) also reported the co-localization of CRB3b and γ-tubulin in MDCK cells, which is consistent with our conclusion. We further verified the co-localization of CRB3 with the centrosome by overexpressing CRB3b in mammary epithelial cells, indicating that CRB3 mainly localizes to the basal body of the primary cilium. This information is discussed in the Discussion section of the manuscript (lines 400-410).

      3) The authors use GFP-CRB3A/B, it is not stated which isoform, over-expression to localize CRB3A/B in MCF10A cells (figure 4A). The levels of expression appear to be very high in the GFP panel and it is likely that the secretory pathway of the cells is clogged with GFP-CRB3A/B in transit from the ER to the plasma membrane. Thus, the colocalization with pericentrin might be due to the accumulation of ER and Golgi around the centrosome. This colocalization should be done with the endogenous CRB3A/B and with a better resolution.

      Thank you for your comments. We were also interested in the co-localization of endogenous CRB3 and centrosome proteins. However, the only commercial CRB3 antibody available is the rabbit species, and the pericentrin antibody (Abcam, ab4448) that is very useful is also the rabbit species. We had difficulty finding commercial centrosome-associated antibodies for other species. Therefore, we examined the co-localization of endogenous CRB3 with γ-tubulin in Figure 4C and combined the results with those of exogenous CRB3 to illustrate the co-localization of CRB3 with centrosomes.

      4) The staining for CRB3A/B in figure 4C (red) is striking with a very strong accumulation in an undefined intracellular structure and the authors do not provide any explanation for such a difference with the GFP-CRB3A/B just above.

      Thank you for your good suggestions. The immunofluorescence images of GFP-CRB3 in Figure 4a were obtained using a fluorescence microscope, while the images of endogenous CRB3 were obtained using a laser confocal microscope. The fluorescence microscope excites a fluorescent dye to emit a signal, which is amplified into a visible light signal and presents a full fluorescent signal. In Figure 4a, we can clearly see the full distribution of exogenous CRB3 in MCF10A cells, including its tight junctional localization consistent with previous reports in the literature and its co-localization with centrosomal proteins. On the other hand, laser confocal microscopy uses a laser as the light source to excite the fluorescence within the sample point by point. It employs a precision pinhole filtering technique with strong laminar imaging capabilities. In the specific analysis of endogenous CRB3 co-localization studies with centrosomes and primary cilium, signals at tight junctions must be excluded. Therefore, Figure 4c represents the fluorescence signal at the level of intracellular CRB3 co-localization with γ-tubulin. The two methods use different detection means and techniques, and are not directly comparable.

      5) The staining in figure 4E is also different from those shown in figure 4F in which the CRB3A/B staining is right at the base of the axoneme while it is not the case in figure 4E where we can see a red dot close to but not right at the base of the axoneme.

      Thank you for your comments. The new Figure 4F displays the localization relationship between CRB3 and primary cilium, analyzed using laser confocal microscopy. With the unique single-level detection function of this microscope, the problem of level selection may cause the red dots to appear close to, rather than right at the basal body of the primary cilium. However, the new Figure 4G, based on the use of 3D reconstruction scanning technique, clearly demonstrates the localization of CRB3 at the basal body of the primary cilium under the same cells and conditions.

      6) The authors claim that CRB3A/B interacts directly with Rab11 but they only show co-immunoprecipitation experiments from cell lysates which do not support direct interactions. The only way to show a direct interaction is to produce both proteins in vitro. Thus, the term direct interaction should be removed.

      Thank you for your positive comments. Since we were unable to complete the relevant experiments to demonstrate direct interaction of two proteins, we have revised our conclusions. Replace " CRB3 directly binds Rab11" with " CRB3 binds Rab11" in the manuscript.

      7) In addition, the authors claim (Line 251/252) that Rab11 is necessary for the transport of CRB3A/B but they should KD Rab11 to show this.

      Thank you for your good suggestions. It is essential to observe CRB3 trafficking after knockdown Rab11. However, in Figure 5C, we used the endocytosis inhibitor dynasore, which also inhibits Rab11-positive endosomes. This result shows that dynasore can significantly inhibit CRB3 trafficking in MCF10A cells. We believe that this experiment partially demonstrates that inhibiting Rab11 function can affect CRB3 trafficking.

      8) The domain of CRB3A/B that is necessary for the interaction with Rab11 is the N-terminal part of the extracellular domain. This domain is thus inside the transport vesicles and not accessible from the cytoplasm. Given that Rab11 is a cytoplasmic protein, how the 2 proteins could interact across the membrane? The authors do not even discuss this essential point for their hypothesis.

      Thank you for your positive comments. As shown in the schematic model in Figure 9, we believe that when cells form tight junctions, CRB3 is primarily located on the cell membrane. Subsequently, endosomes are involved in the intracellular degradation process of CRB3 on the cell membrane. Intracellular CRB3 can bind to Rab11 through the extracellular domain, which in turn participates in primary cilia assembly. We have made detailed modifications to lines 418-421.

      9) Figures are not numbered.

      Thank you for your comments. We have updated the numbers in the original manuscript as well as the legends of Figures 1D, 1E, 2B, 2D, 2F, 2G, 3B, 3D, 3F-H, 4B, 4E, 5I, 6, 8G and Supplementary Figure 1E, 2, 3C, 4A, 5B, 6.

      Minor points

      1) The authors cite several studies showing that a down regulation of CRB3A/B in human cells promotes cancer but other studies show the contrary: Lin et al., 2015 for example. Please discuss these discrepancies.

      Thanks for your good suggestion. We have included additional studies with contrasting results in the discussion section, specifically in lines 378-380.

      2) Line 98: "exhibit smaller" smaller than what?

      We modified "exhibit smaller" to "exhibit smaller size" in line 97.

      3) Line 152: "form more number, ..." ???

      We modified "formed more number" to "formed more" in line 154.

      4) Line 180: "Compared with the control, the number of cells with primary cilium was significantly increased ». To me it is the contrary! This part is not clear at all. Please rewrite.

      We have revised the sentence in lines 183-185.

      5) Authors should check and review extensively for improvements to the use of English.

      Thanks for your good writing advice. We have carefully reviewed and revised the entire manuscript to improve its readability.

    1. Reviewer #1 (Public Review):

      In principle a very interesting story, in which the authors attempt to shed light on an intriguing and poorly understood phenomenon; the link between damage repair and cell cycle re-entry once a cell has suffered from DNA damage. The issue is highly relevant to our understanding of how genome stability is maintained or compromised when our genome is damaged. The authors present the intriguing conclusion that this is based on a timer, implying that the outcome of a damaging insult is somewhat of a lottery; if a cell can fix the damage within the allocated time provided by the "timer" it will maintain stability, if not then stability is compromised. If this conclusion can be supported by solid data, the paper would make a very important contribution to the field.

      However, the story in its present form suffers from a number of major gaps that will need to be addressed before we can conclude that MASTL is the "timer" that is proposed here. The primary concern being that altered MASTL regulation seems to be doing much more than simply acting as a timer in control of recovery after DNA damage. There is data presented to suggest that MASTL directly controls checkpoint activation, which is very different from acting as a timer. The authors conclude on page 8 "E6AP promoted DNA damage checkpoint signaling by counteracting MASTL", but in the abstract the conclusion is "E6AP depletion promoted cell cycle recovery from the DNA damage checkpoint, in a MASTL-dependent manner". These 2 conclusions are definitely not in alignment. Do E6AP/MASTL control checkpoint signaling or do they control recovery, which is it?

      Also, there is data presented that suggest that MASTL does more than just controlling mitotic entry after DNA damage, while the conclusions of the paper are entirely based on the assumption that MASTL merely acts as a driver of mitotic entry, with E6AP in control of its levels. This issue will need to be resolved.

      and finally, the authors have shown some very compelling data on the phosphorylation of E6AP by ATM/ATR, and its role in the DNA damage response. But the time resolution of these effects in relation to arrest and recovery have not been addressed.

      Revised manuscript:<br /> I think the authors did a good job in revising the paper, and provide compelling support for a timer function in the checkpoint. I do think they still have missed one important point how MASTL could act as a timer to control recovery. The data clearly show that MASTL somehow controls ATM/ATR activity, whilst their final model (fig.9) places MASTL upstream of CDK activity, without mentioning its feedback on ATM/ATR. I think there are 2 possible explanations for the timer function of MASTL they have discovered here, both may be relevant. The first is enhanced CDK activation by direct control of CDK phosphorylation through MASTL/B55/PP2A. The second is through MASTL-mediated shut-down of ATM/ATR activation (mechanism to be determined) which is also reported here. Their final model and discussion do not display sufficient appreciation for this latter option, and I would argue that the HU-recovery experiment shown in Fig.5B is actually in strong support of the second explanation, rather than the first.

    1. Public Review:

      In countries endemic for P vivax the need to administer a primaquine (PQ) course adequate to prevent relapse in G6PD deficient persons poses a real dilemma. On one hand PQ will cause haemolysis; on the other hand, without PQ the chance of relapse is very high. As a result, out of fear of severe haemolysis, PQ has been under-used.

      In view of the above, the Authors have investigated in well-informed volunteers, who were kept under close medical supervision in hospital throughout the study, two different schedules of PQ administration: (1) escalating doses (to a total of 5-7 mg/kg); (2) single 45 mg dose (0.75 mg/kg).

      It is shown convincingly that regimen (1) can be used successfully to deliver within 3 weeks, under hospital conditions, the dose of PQ required to prevent P vivax relapse.

      As expected, with both regimens acute haemolytic anaemia (AHA) developed in all cases. With regimen (2), not surprisingly, the fall in Hb was less, although it was abrupt. With regimen (1) the average fall in Hb was about 4 G. Only in one subject the fall in Hb mandated termination of the study.

      Since the data from the Chicago group some sixty years ago, there has been no paper reporting a systematic daily analysis of AHA in so many closely monitored subjects with G6PD deficiency. The individual patient data in the Supplementary material are most informative and more than precious.

      Having said this, I do have some general comments.<br /> 1. Through their remarkable Part 1 study, the Authors clearly wish to set the stage for a revision of the currently recommended PQ regimen for G6PD deficient patients. They have shown that 5-7 mg/kg can be administered within 3 weeks, whereas the currently recommended regimen provides 6 mg/kg over no less than 8 weeks.<br /> 2. Part 2 aims to show that, as was known already, even a single PQ dose of 0.75 mg/kg causes a significant degree of haemolysis: G6PD deficiency-related haemolysis is characteristically markedly dose-dependent. Although they do not state it explicitly in these words (I think they should), the Authors want to make it clear that the currently recommended regimen does cause AHA.<br /> 3. Regulatory agencies like to classify a drug regimen as either SAFE or NOT-SAFE; they also like to decide who is 'at risk' and who is 'not at risk'. A wealth of data, including those in this manuscript, show that it is not correct to say that a G6PD deficient person when taking PQ is at risk of haemolysis: he or she will definitely have haemolysis. As for SAFETY, it will depend on the clinical situation when PQ is started and on the severity of the AHA that will develop.

      The above three issues are all present in the discussion, but I think they ought to be stated more clearly.

      Finally, by the Authors' own statement on page 15, the main limitation is the complexity of this approach. The authors suggest that blister packed PQ may help; but to me the real complexity is managing patients in the field versus the painstaking hospital care in the hands of experts, of which volunteers in this study have had the benefit. It is not surprising that a fall in Hb of 4 g/dl is well tolerated by most non-anaemic men; but patients with P vivax in the field may often have mild to moderate to severe anaemia; and certainly they will not have their Hb, retics and bilirubin checked every day. In crude approximation, we are talking of a fall in Hb of 4 G with regimen (1), as against a fall in Hb of 2 G with regimen (2), that is part of the currently recommended regimen: it stands to reason that, in terms of safety, the latter is generally preferable (even though some degree of fall in Hb will recur with each weekly dose). In my view, these difficult points should be discussed deliberately.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      Reply to general assessment of referee #2:

      1. General assessments: The current study adds some to these observations…some of these observations are incremental…biological significance is limited. While this reviewer does not suggest additional experimentation, this manuscript would be suitable as a resource paper.

      Reply: It appears we were not clear enough in explaining the novel aspects of our study.

      The starting points are two published studies from our lab demonstrating a global increase of ISGF3 association with ISG promoters in IFNγ-treated cells and a remarkable similarity of IFN-γ and type I IFN-induced early transcriptome changes. These findings challenge the notion in the field (as mentioned by the referee) that IFNγ specificity is produced by the predominant deployment of STAT1 homodimers. We thus tested the hypothesis that the specificity of the IFNγ-induced transcriptome is generated over time, rather than during the early response, and relies on secondary responses to transcription factors such as IRF1. In contrast, IRF1 plays no or only a small role in the type I IFN response that utilises ISGF3 and/or unknown secondary factors in the delayed response. We tested this hypothesis with PRO-seq technology to rule out confounding effects of mRNA processing over a 48h period. The data are clear in showing that many genes associated with the antibacterial or anti parasite profile of activated macrophages are indeed much more abundant in late-stage rather than briefly IFNγ-treated macrophages and these delayed changes are to a large extent dependent on IRF1. Our findings are based on the best available technologies, a combination of nascent transcript analysis with genetics and protein interaction studies. In addition, our findings rule out alternative models of sustained or secondary ISG transcription, such as the employment of alternative ISGF3 complexes (such as STAT2-IRF9) or of ISGF3 complexes formed with unphosphorylated STAT1 and STAT2. We provide evidence for higher order waves of transcription caused by unknow transcription factors that are produced by transcriptional activation of ISGF3 or IRF1 target genes and identify candidates among the AP1 and Ets transcription factor families. We agree that some of the data are confirmatory rather than novel (i.e. some of the genes we describe were known from previous literature to be IRF1 targets), but it is the systems approach of our study, and particularly the delineation of conditions under which the largely neglected delayed response diverts the IFNβ and IFNγ-induced transcriptomes, that generates a comprehensive and conclusive view of IFNγ acting predominantly as a macrophage activating factor, and IFNβ being an essential antiviral cytokine. We do think this main outcome is immunologically meaningful and not incremental. For this reason, we would prefer to publish the paper as a relevant contribution to innate immunology rather than a resource. Emphasizing our point, a paper appeared in ‘Cell’ while our study was under review, showing that human IRF1 mutations cause mendelian susceptibility to mycobacterial disease (MSMD), a term coined by JL Casanova and colleagues for immunological defects that reduce the ability of macrophages to cope with intracellular bacteria (new ref. 65). This important study emphasizes the main conclusions of our study about the relevance of IRF1 for macrophage activation. We discuss this paper on p. 14 lines 9-14.

      Revision: We tried to better explain the scientific motivation for this study and the significance of the results (p. 4, lines, lines 12-25).

      Revision plan: n. a.

      2. Description of the planned revisions

      Referee #3; major comment 1:

      In Fig. 1d is difficult to interpret and misleading for many reasons. First, the cluster numbering is disconnected from the cluster order; why not numbering them based on the hierarchical clustering and writing the cluster number besides the cluster itself? Second, having a 2-color gradient is misleading; negative values shouldn't be in the same color tone than the positive values. Third, the authors did not provide adequate rationale behind using only the top 1,000 most expressed gene? Why not using all the differentially expressed genes in at least one of the condition to provide a comprehensive analysis? Could this potentially lead to bias in the data, and is there any information lost by not using the - lower - expressed genes fraction? Fourth, it is not clear what the color scale is representing and how the data was transformed. Was a mean centering of the expression values of the log2FC applied to the RNA-seq data to facilitate clustering? Mean centering and z-scoring is a common technique used to adjust expression data, but it can potentially exaggerate differences between samples. More information about the data and analysis should be provided, as it is difficult to determine whether this was a valid approach or not.

      Reply:

      • To create the heatmap, we used the pheatmap package from R and the cutree_rows option to separate 11 clusters with strikingly different patterns of gene expression based on visual exploration. The numbering was autogenerated by the program.
      • The data is now shown in red-blue.
      • We restricted our list to only 1000 genes from each comparison as we aimed to analyze the prominent patterns of gene expression across timepoints. Considering all differentially expressed genes based on a padj value would also include genes expressed at very low levels as evident from the low baseMean values obtained from DESeq2. Hence, we applied a selection of 1000 genes which effectively represented the major patterns of gene expression across timepoints.
      • Variance stabilized transformation was applied on read counts obtained from PRO-seq using the DESeq2 package. The transformed reads were z-score normalized and used for performing hierarchical clustering by the “Ward.D2” method using the pheatmap package in R. A total of 3126 genes were used for this analysis. 11 distinct clusters were defined using cutree_rows option. The color scale represents z-score normalized counts. The genes represented in the heatmap were selected based on the following criteria: each timepoint of interferon treatment was compared to the homeostatic condition (untreated sample) in wildtype BMDMs. The differentially expressed genes from each comparison were selected based on the filtering criteria: absolute log2FoldChange >=1 and adjusted p value <0.01 by Wald test. Following the differential analysis, the first 1000 differentially expressed genes in each treatment condition (ordered based on adjusted p values) were selected for both IFN types and combined and selected for creating a list which consisted of 3126 unique genes. The scale in the heatmap represents z-scores of variance-stabilized reads, calculated across all genotype and treatment conditions, separately for each IFN type.

      Revision plan: We will label the clusters with the cluster number next to it in addition to the color codes.

      Referee #3; major comment 3:

      The large standard deviation bars in the claim that ChIP data confirmed the binding of ISGF3 components to the promoter of Mx2 cast doubt on the validity of the results and conclusions. The authors should consider additional experiments or complementary analyses to validate their findings. Or alternative, to adjust their claims accordingly.

      Reply: To demonstrate sufficient quality of the data the ratio of Stat1/ Stat2 was calculated for early (1.5hrs) and late (48h) separately. The unpaired two-tailed t test comparing this ratio between 1.5 hrs and 48hs, shows that they are not significantly different. This indicates that all ISGF3 components are associated with ISG during both early and delayed responses, i. e., that STAT2/IRF9 complexes are unlikely to contribute to delayed ISG control. However, we agree with the referee that the standard deviations of the kinetic ChIP experiment are high and that it would be good to generate additional data.

      Revision plan: We will perform additional ChIP experiments to improve the statistical power of the results in fig. S2c.

      Referee #3, major comment 6:

      The authors interpret their ATAC-seq and ChIP-seq results based on a 2kb window to the TSS of genes, not considering relatively close enhancers or longer range cis-regulatory interactions in their interpretation. For example, they mention on p.7 "Contrasting the strong binding of IRF9 and IRF1 to the Mx2 (cluster 2) and Gbp2 (cluster 9) promoters, respectively, we saw no evidence for direct binding to Lrp11 (cluster 3) and Ptgs2 (cluster 10)", but on Fig 3d they show only the proximal regions. No scale bars are shown either. Moreover, exploring the same published IRF1 ChIP-seq dataset, there is a clear IRF1 binding site at the promoter of Ptgs2, while the authors report none.

      Reply:

      • According to the literature (e. g. refs. 11, 27), most IFN-induced accessibility changes occur in the vicinity of the TSS of ISG. This is further strengthened by the data shown in this manuscript. In addition, most functionally validated GAS and ISRE sequences are in the DNA interval chosen for our analysis. While distal ISG enhancers have been reported (e. g. DOI: 10.26508/lsa.202201823), an analysis beyond the placement of most control regions increases the risk of wrong assignments between ISG and their regulatory elements, hence the causality between transcription factor binding and accessibility changes.
      • We extended the regions for the analysis of the Lrp11 and Ptgs2 regulatory regions and found no evidence for the binding of ISGF3 or IRF1. We find no evidence for a clear peak in the Ptgs2 promoter. There is a peak called by the Macs2 algorithm, but visual inspection of the track (bigwig file) shows it consists of a minor increase in reads above background that does not suggest a bona fide IRF1 binding site (see below). This view is supported by our inability to find an IRF binding site in the vicinity of the peak.

      IRF1 binding indicated by bigWig browser tracks and corresponding peakfiles detected at the locus. We identified the peakfile from Langlais et al., 2016 and identified peaks using MACS2, however using mm10 genome as the analysis in the original paper was done with mm9 genome. The peak identified here appears to be an artefact of the MACS2 program as there is no evident enrichment at the gene promoter region upon inspection of the bigWig files.

      Revision plan: Scales will be added to the browser tracks as requested.

      Referee #3, major comment 7:

      Lack of statistical analysis on chromatin accessibility claims: The authors claim that ATAC-seq data in BMDMs stimulated with IFNβ or IFNγ for a short (1.5 hours) or long (48 hours) period reveals a striking similarity between transcription and the general trends of chromatin accessibility at regions up to 1000 bp upstream of the TSS (Fig. 2a), suggesting continuous chromatin remodeling during the transcriptional response. However, I would like to know if this conclusion is well-supported by the correlation between the chromatin accessibility from ATAC-seq data from only one sample and the PRO-seq data.

      Reply: See revision plan.

      Revision plan: We will analyze single experiments whether they support the conclusions derived from the z-score of the triplicate samples.

      Referee #3, major comment 8:

      The need for additional experiments to verify claims such as the dependence of Ifi44 on IRF1 for gaining ATAC signal, as stated in the claim, "Expression required IRF1 for both, but accessibility of the Ifi44 regulatory region depended upon IRF1 whereas that of Gbp2 acquired an open structure independently of IRF1 (Fig. 5c).

      Reply: We think the lack of clarity might be related to the size of figures 5a and 5b and the density of the dots in some areas of the plot. We agree it is very difficult to assign our gene labels unambiguously to a single dot.

      Fig. 5a combines ATACseq data in wt and IRF1 knockout cells with the expression data from the Pro-seq experiment, Fig. 5b is the same set-up, but IRF9-deficient macrophages are analyzed.

      Blue dots show ATACseq signals induced by IFN treatment. Violet dots represent genes that require IRF1 (Fig. 5a) or IRF9 (Fig. 5b) for transcriptional induction. Yellow dots mark genes such as IFI44 requiring IRF1 (Fig. 5a) or IRF9 (Fig. 5b) for both expression and the accessibility change in the promoter region. Fig. 5c visualizes representative examples of genes whose accessibility is coupled to the transcription factor dependence of the transcriptional induction (IFI44), or not (Gbp2). Thus Fig. 5c must be interpreted based on the dot color code in fig. 5a and we admit this has been difficult with the figure in its present form.

      Revision plan: We will improve the clarity of figs 5a and 5b in several ways:

      • We will label the panels to better indicate the intersected data sets.
      • We will increase the size of the panels and figure legends and make sure that the correspondence between gene names and dots are unambiguous.
      • We will include trend lines of the Ifi44 and Gbp2 genes to visualize their induction and IRF1 dependence.

      Referee #3, major comment 13 (see also section 3):

      The authors have not adequately addressed the methodological limitations in their discussion, which extends beyond the aforementioned comments. It is suggested they include a comprehensive discussion of the claims made pertaining to the necessity of IRF1 for accessibility and the potential biases in the interactomes, along with their associated consequences.

      Reply: The contribution of IRF1 to the accessibility of ISG promoters emerges from the data in figures 5a, whose clarity will be improved (see reply to point 8). We do not interpret the impact of IRF1 beyond the data, in fact we state a relatively minor effect of IRF1 in the control of promoter accessibility (p. 10, lines 20-22) and we have added a reference in agreement with an impact of IRF1 on basal expression of antiviral genes (ref. 39, as suggested by the referee).

      We have added discussion on potential limitations of the TurboID approach (p. 11, lines 22-24 and p. 15, lines 3-11).

      Revision plan: Improvement of fig 5a (see ref. #3, point 8).

      Referee #3, minor comment 2

      Fig 1e. The color scales on the GO enrichment graphs are misleading since they use the same blue-to-red gradient for adj p-values ranging from 10-25 to 10-49 and 0.008 to 0.016, which could be considered non significant.

      Reply: We agree that this is confusing. It results from automated assignments of the color gradients by the software.

      Revision plan: We will investigate possibilities to change color codes for different ranges of p values.

      Referee #3, minor comment 4

      The incomplete schema in Figure 1a, which only focuses on PRO-seq and does not include the ATAC-seq element.

      Reply: We will add a new figure to visualize the set-up of the ATAC seq experiments and their intersection with the Pro-seq data.

      Revision plan: We will add a new figure in accordance with the referee’s request.

      Referee #3, minor comment 6

      The clearer labeling of Figure 5a and 5b.

      Reply: Please refer to our reply to major point 8.

      Referee #3, minor comment 10

      Fig S1b, S3b. The PRO-seq was generated in triplicates, hence these graphs should include the Log2FC for the individual data points.

      Reply: The Log2FC from DESeq2 were calculated from the triplicates, the software does not compute Log2FC from individual replicates.

      Revision plan: We mention the p-values for the Log2FC to show the degree of consistency (figure legends). We will provide a table with log2FC and corresponding padj values of the genes represented at each timepoint (table_showing_padj_values_and_log2fc).

      Referee #3, minor comment 12

      In the genomic snapshot shown, only bars or fading triangles are shown in place of the gene body. The authors should provide an accurate gene structure; i.e., exons and introns.

      Reply: We will try to include the exon-intron structure wherever the size of the figure allows this.

      Revision: n. a.

      Revision plan: If figure size permits, we will add the exon-intron structure of the genes in browser tracks as requested.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Referee #1, major comment 1

      Figure 2. Difficult to interpret data as it is presented. Consider quantifying figure 2C in order to make "changes in Pol II pausing were more pronounced during IFNb signaling" statement more apparent.

      Reply: We presented the pausing data in two different graphic representations (figures 2c and S2) to make the understanding of the information content easier. In hindsight we may have generated more confusion than clarity.

      Revision: We removed the original figure 2c and replaced it with original figure S2. This representation is quite intuitive as the graphs represent a direct quantitative logarithmic display whether and how much the relative amount of paused polymerase changes when comparing IFN-treated and untreated cells. The calculation of these ratios is now explained better in the legend to figure 2.

      Referee #1, major comment 2

      How are you distinguishing autocrine signaling in the BMDMs driven by IFN treatment from late transcripts (for example, at 48 hours are differential genes due to autocrine cytokine signaling or are they truly late transcripts)?

      Reply: We do not exclude autocrine effects. In case of ISG, the most likely autocrine factor would be secreted interferon. According to our Proseq data, the differentially expressed genes do not include any interferon genes. That being said, it is possible that the transcription factors from the AP1 family we hypothesize as drivers of secondary or tertiary waves of transcription are activated by non-IFN cytokines secreted from IFN-treated cells (see also reply to comment 3).

      Revision: We now mention that enhanced IFN production is not sustaining ISG responses (p.5 lines 18/20). We mention the possibility that secreted factors may drive secondary or tertiary waves of ISG transcription (p. 8, lines 21/23).

      Referee #1, major comment 3

      Figure 3D. Authors choose Gbp2 (as positive control for IFNg driven gene), but don't show that Gbp2 is a IFNb independent gene. Consider using IRF1 KO BMDMs in this data as well.

      Reply: This is a misunderstanding. Gbp2 is not shown as an IFNγ-specific gene (it’s induction by both IFN types has been shown previously and emerges from our Pro-seq analysis, see also response to minor issue no. 2). It represents the cluster of genes that are sustained specifically after IFNγ treatment in an IRF1-dependent manner. The purpose of fig. 3D is to show that not all ISGF3/IRF9-dependent genes have promoter binding sites for ISGF3 and not all IRF1-dependent genes have binding sites for IRF1. This suggests indirect effects of both transcription factors in sustaining IFN-induced transcription (in line with the referee’s comment 1).

      Previous figure S3e (now S2f) confirms binding of IRF1 to the GBP2 promoter by ChIP with kinetics correlating to its transcriptional effect. This experiment is normalized with an IgG control. IRF1 knockout cells did not produce a ChIP signal with IRF1 antibody, as expected (data not shown).

      Revision: We better explain the rationale behind the experiments shown in figure 3D (text on p8, lines 12-16). In addition, we show the trend line of Gbp2 expression in WT vs IRF1KO as well as that of additional genes showing delayed/sustained responses in the new Figure S3.

      Referee #1, minor comment 2

      Define known IFNg and IFNb driven genes when they are introduced in figure 2 rather than in discussion.

      Reply: Following the referee’s suggestion we provide the examples of IFNβ and IFNγ-controlled genes and the characteristics of their regulation in the context of our description of the results displayed by fig. 2 (p.6 lines 15-21). This includes Gbp2 (see major issue no. 3).

      Revision: The text on p. 6 lines 15-21 has been modified in accordance with the request.

      Referee #1, minor comment 4

      Unclear whether IRF1 expression in figure 3A is from whole cell lysate or nuclear fraction.

      Reply: We indicate in the figure legend that whole cell lysates were used.

      Revision: We added a sentence with the relevant information in the legend of figure 3.

      Referee #1, minor comment 5

      Authors suggest IFNb treatment induces less IRF1 at later time points, however loading control also seems slightly lower than other considerations. Is it possible that IFNb treated cells are dying at later time points, given that type I IFN signaling can be pro-apoptotic.

      Reply: The graph below the blot represents quantified IRF1 signals, normalized to the loading control. It shows that the differences are not generated by unequal loading of the blotted gel. We and others have shown that IFNβ may indeed enhance macrophage death, however only when the cells are simultaneously infected with an intracellular pathogen (e.g. new ref. 25). These studies also show that treatment with IFNβ alone over periods used in the present study does not affect macrophage viability.<br /> Revision: We added a sentence about the viability of IFN-treated macrophages (p. 4, lines 31-32).

      Revision plan: n. a.

      Referee #2, major comment 3

      The sequencing and BioID data are not submitted to public databases.

      Reply: An accession number has been added.

      Revision: The accession number was added on p.29, line 25.

      Referee #3, major comment 1 (see also revision plan, section 2):

      Revision: The rationale for using the top 1.000 genes is explained (p.5, lines 7-9). The description of the pro-seq read count processing has been extended in accordance with our reply to the referee in the legend of figure 1d and in the methods section (p. 33, lines following line 10.)

      Referee #3, major comment 2

      Fig 2c. The authors claim that RNA Pol II pausing is a major factor in controlling the dynamics of ISG transcription. However, they did not provide sufficient explanation of the results, and in all fairness there is not much variation between the clusters to sustain the claim that this is a major factor in ISG transcriptional control.

      Reply: We agree with the referee that we cannot posit RNA pol II pausing as a major factor for the differences of transcriptional control of ISG in individual clusters. We have made sure to remove any statements suggesting this possibility. We also try to better integrate our findings with RNA pol II pausing into the existing literature.

      Revision: We added relevant literature on p. 6 lines 28-30 and p. 7, lines 4-6.

      Referee #3, major comment 4

      On p.5, the authors mention "Representative browser tracks from the Gbp2 and Slfn1 genes further validate this observation" but they are simply referring to genome browser snapshot, i.e., specific genomic examples, extracting from the same single dataset. Without using an independent dataset, this can not "further validate" the initial findings.

      Reply: We agree the wording is incorrect.

      Revision: We changed the paragraph describing this experiment (p. 6, lines 15-21).

      Referee #3, major comment 5

      IRF1 was successfully pulled down with STAT1 bait but not in the reciprocal experiment. The author should discuss this point as it is important for the conclusions. Could it potentially indicate issues with the technique used, and if this could introduce any bias into the results. The statement, "In contrast, interactors of the IRF1 bait did not include STAT1. This discrepancy could result from steric constraints of the tagged proteins due to the limitation of the 10nm distance reached by the biotin ligase," does not seem to be sufficient to explain this discrepancy.

      Reply: STAT1 was present in the IRF1 pull-down and the interaction increased significantly after IFN treatment but after normalization to the NLS control it did not conform to our criterium of a 95% confidence interval for the FDR. To be consistent we did not include it in the list of IRF1 interactors. We have observed on several occasions that the significance of proximity is not reciprocal, even for well- documented physical interactions. A prime example for this is the interaction between STAT1 and IRF9 in IFN-treated cells which is recorded in the STAT1 pull-down, but not that with IRF9 (ref. 10). Apart from steric reasons the lack of reciprocity may result from different signal/noise ratios in pull downs with different baits.

      Revision: We mention that IRF1 was a STAT1 interactor below the statistical cut-off (p. 11, lines 26-28) as well as the possibility of different signal/noise ratios in the IRF1 and STAT1 pull-downs on p.11, lines 22-24.

      Referee #3, major comment 9

      In the figure legends, there is missing information about the number of times experiments were replicated, suggesting that some were done a single time. Moreover, some graphs are missing statistical analysis, e.g., in Fig S3cS3e, S3f, the ChIP-qPCR experiments were done on biological triplicates, there is no mention of statistical test performed, it is not mentioned what the error bars represents (SD, SEM, etc.) and the variance is large, but the authors still interpret these results as significant enrichment of the transcription factors to the Mx2 promoter.

      Reply: Where missing the relevant information has been added to figure legends. In brief, all experiments represent at least three biological replicates. The only exception is the western blot shown in figure S3a, (no S2a) which represents two independent replicates. Here, the clarity of the difference of IRF1 expression and the fact that the only purpose is to show that Raw264.7 macrophages behave like bone marrow-derived macrophages in fig. 3a justifies the omission of another replicate (please see also answer to point 3).

      Revision: The relevant information has been added to figure legends where necessary (figs. 1, a, 3a, 6a-f, S1, S4, S5).

      Referee #3, major comment 10

      Another example are the RNA Pol II pausing index ratios, which show minor variations and not are supported by statistics to support a possible significance. Proper description, replication and statistical analyses of the results are critical.

      Reply: We agree.

      Revision: Statistics underlying the RNA Pol II pausing data are included in supplementary data 2.

      Referee #3, major comment 11

      The authors used CRISPR-Cas9 genome editing to generate knockout cell lines. However, they did not verify the knockouts at the protein level. Further experiments could confirm that the targeted proteins are not expressed in the knockout cell lines.

      Reply: We included a western blot showing the lack of IRF1 and STAT1 expression in the respective cell lines.

      Revision: New figure S6.

      Referee #3, major comment 12

      On p.9, it is mentioned "IRF1 affects chromatin structure ...". Here chromatin structure is related to minor changes in chromatin accessibility, this can not be qualified as changes in chromatin structure.

      Reply: ‘structure’ has been changed in accordance with the request.

      Revision: ‚structure‘ has been replaced with ‘accessibility’. (p. 10, lines 19 and 21).

      Referee #3, major comment 13 (see also section 2, revision plan, major comment 8)

      The authors have not adequately addressed the methodological limitations in their discussion, which extends beyond the aforementioned comments. It is suggested they include a comprehensive discussion of the claims made pertaining to the necessity of IRF1 for accessibility and the potential biases in the interactomes, along with their associated consequences.

      Reply: The contribution of IRF1 to the accessibility of ISG promoters emerges from the data in figures 5a, whose clarity will be improved (see reply to point 8). We do not interpret the impact of IRF1 beyond the data, in fact we state a relatively minor effect of IRF1 in the control of promoter accessibility (p. 10, lines 20-22) and we have added a reference in agreement with an impact of IRF1 on basal expression of antiviral genes (ref. 39, as suggested by the referee).

      We have added discussion on potential limitations of the TurboID approach (p. 11, lines 22-24 and p. 15, lines 3-11).

      Revision: Change of the discussion section (p. 11, lines 22-24 and p. 15, lines 3-11).

      Revision plan: Improvement of fig 5a (see ref. #3, point 8).

      Referee #3, major comment 15

      The work should be discussed in the context of the demonstrated physiopathological evidence of the IRF1 and IRF9 functions. IRF9 (Hernandez et al., JEM 2018) and more recently IRF1 (Rosain et al Cell, 2023) were identified as causing non overlapping phenotypes in human patients carrying loss-of-function mutations for these genes. The authors must interpret their results in this context.

      Reply: We thank the referee for reminding us about the importance of these papers for our work.

      Revision: The papers have been mentioned and discussed (p. 13 lines 19-28 and p.14, lines 9-14).

      Referee #3, minor comment 3

      The inconsistency in the title referring to IFNb as Type 1 but using IFNg instead of Type 2 nomenclature, perhaps consistency is best.

      Reply: We agree about the importance of consistency but find ourselves in yet another quandary. While the use of ‘type I IFN’ is clearly indicated and widely used as a collective name for this group of cytokines, the use of ‘type II IFN’ for IFNγ is rare because it is the only member of this type. Hence, we decided for sticking with convention at the expense of a bit of consistency. We agree about the title, though, and have changed type I IFN to IFNβ.

      Revision: We adapted the title in agreement with the referee’s comment.

      Referee #3, minor comment 5

      Figure 6d includes a color scale of -1 to +3, but it is unclear what these values represent and how they were calculated per interactor. The figure legend should be revised to clarify this information.

      Reply: We agree. The relevant information has been added to the figure legend.

      Revision: We added information (log2FC with regard to the NLS control) to the legend of fig. 6d.

      Referee #3, minor comment 9

      Fig 1e, S1c. Graphs having circles of varying sizes in function of a value are named "bubble plots" and not "dot plots".

      Reply: Thank you for pointing this out, we corrected our mistake.

      Revision: We changed dot plot to bubble plot in legend to figure S1c.

      Referee #3, minor comment 11

      Fig S3c legend. It is mentioned "Graph represents RT-qPCR of genomic Mx2". RT-qPCR usually stands for reverse transcription quantitative PCR, hence we suggest to change to "ChIP-qPCR" or qPCR. Confusingly, in the literature the term "RT-PCR" is used for real-time PCR and "qPCR" for quantitative PCR. Also, the authors should be specific about the "genomic" region targeted; the graphs mention "promoter", hence it would be appropriate to use the same designation in the legend.

      Reply: We agree and thank the referee for correction of the terminology.

      Revision: We changed RT-PCR to qPCR throughout the manuscript. Moreover, we specifically refer to ‘promoter region’ as the amplified DNA.

      Referee #3, minor comment 12

      Fig S3e. The y-axis names are missing.

      Reply: Thanks for spotting this.

      Revision: The y axis in the figure received its proper label.

      Referee #3, minor comment 14

      Raw cells are sometimes spelled as "Raw" and other times as "RAW". Please choose one for consistency.

      Revision: This inconsistency has been corrected

      Referee #3, minor comment 15

      In p.10 l.20, the figure number is missing.

      Revision: We corrected this mistake.

      4. Description of analyses that authors prefer not to carry out

      Referee #1, minor comment 1

      Simplify figure 4B- consider focusing on most differentially expressed genes between clusters

      Reply: The purpose of fig. 4B is to provide a visual overview of the kinetics of eRNA transcription in response to both IFN types and of the effects of IRF9 and IRF1 knockouts. This information needs to be given to demonstrate the similarities and differences between the control of eRNA and the corresponding ISG transcripts in the different regulatory clusters (as shown in figs. 1d and 2a).

      Simplifying the figure would mean to separate it according to time point, IFN type treatment or knock-out effect. We think this would require to mentally reassemble the figure to understand the interrelationships between these parameters. To our opinion the visual display of the data interrelationship in fig. 4B facilitates the impropriation of the information content.

      Revision: n. a. - we hope our reasoning has become sufficiently clear.

      Revision plan: n. a.

      Referee #1, minor comment 3

      Clarify which cell types (IRF1 KO vs IRF9 KO) are used in figure 5 A/B.

      Reply: The cell type (bone marrow-derived macrophages) is mentioned in the first sentence of the figure legend. Since all experiments except the Bio-ID experiment were performed with this cell type we decided not to label each figure.

      Revision: n. a.

      Revision plan: n. a.

      Referee #2, major comment 2 and referee #3, major comment 14

      Ref #2: Biological significance is limited as this study is largely descriptive and they do not test the hits obtained from BioID.

      Ref #3: Although the TurboID experiments identify known STAT1 and IRF1 interactors, the proposed new interactors are numerous, and none are validated through independent co-IP experiments. Moreover, the results are very noisy, with little differences between untreated BMDMs (where IRF1 is barely expressed) and IFN-treated conditions.

      Reply: The big advantage of BioID or TurboID is the ability to score proximity and very transient interactions. Validating BioID hits with technologies such as coIP is not particularly useful as the two technologies will obviously produce different interactomes. In fact, we show in this manuscript that IRF1 and STAT1 show proximity, but they do not form a stable complex under co-IP conditions. This leaves genetic approaches (LOF or GOF) as alternatives. However, apart from the workload (> 100 genes would have to be knocked out or their products overexpressed), most of our hits are expected to produce very broad effects in such experiments, hard to interpret regarding ISGF3 and IRF1 activities.

      In view of this situation, we publish exclusively the high confidence nuclear interactors identified in our screen: biological replicates were performed in triplicate, a stringent internal control (TurboID-NLS) was used, and a stringent statistical cut-off for high-confidence interactors (95% FDR between groups) was applied. We further account for the experimental situation by limiting interpretation of the data to confirmed molecular events. For example, STAT1 dimers and the ISGF3 complex are required for histone acetylation in response to IFN, and ISGF3 is known to contribute to the exchange of the H2AZ histone variant (refs 11, 14, 71, 72). Our data show that IRF1 contributes to promoter accessibility changes and this is in line with its proximity to a remodelling complex. Thus, the BioID data indeed validate previous findings. However, in agreement with the referee’s comment, some of the data remain descriptive (such as the intriguing proximity of both STAT1 and IRF1 to nuclear products of ISG). To determine the importance of this molecular proximity is a major undertaking and beyond the scope of this study.

      Revision: We added discussion to state the difficulty of validating TurboID-based interactions and the limitations of the TurboID experiments (p.15 lines 3-11).

      Referee #3, minor comment 1

      In most graphs the expression values or log2FC are shown separately for IFNb and IFNg, however in the heatmaps (Fig 1d, S1d) the IFNb and IFNg results are intercalated keeping them side-by-side for each time point, which makes them more difficult to interpret.

      Reply: We are in a quandary about the design of the figure. On the one hand our goal is to visualize gene clusters with distinct behaviors for each IFN type. For this purpose, it would be advantageous to separate the IFN types. On the other hand, we aim at showing similarities and differences between genes induced by each IFN type, for this purpose it is better to maintain the current sample order. While understanding the referee’s point, we prefer to keep the figure as it is, because the suggested change will not increase its overall clarity.

      Revision: n. a.

      Revision plan: n. a.

      Referee #3, minor comment 7

      The statement that "IFN-I are the more important mediators of antiviral immunity" is not entirely accurate and may be an oversimplification, as there are certainly articles which suggest a larger role for type ll IFN elements than type l (ref: Yamane D et al., 2019 Nature microbiology). While yes, IFN-I plays a critical role in the innate immune response to viral infections, IFNγ also has antiviral activity and is involved in the adaptive immune response to viral infections, and in some instances to a larger extent than IFN l.

      Reply: The Yamane et al study (now mentioned on p 10, lines 22-25 and referenced) agrees with our findings because it shows that IRF1 contributes to the basal expression of an ISRE-driven ISG subset. Our statement about the predominant role of type I IFN versus IFNγ refers to genetic data in both humans (mainly Casanova’s work including effects of autoantibodies against type I IFN, see also the paper about human STAT2 deficiency in the June 15th issue of the JCI, https://doi.org/10.1172/JCI168321) and mice (hundreds of papers) showing that disruption of type I IFN synthesis or response causes profound effects of antiviral immunity (i.e. resulting susceptibilities are first and foremost to viral pathogens) whereas susceptibilities as a consequence of disrupting the IFNγ pathway are first and foremost to intracellular nonviral pathogens such a mycobacteria. In fact, the term mendelian susceptibility to mycobacterial disease (MSMD) was coined by Casanova and colleagues to describe a variety of human mutations that include those of the IFNγ, but not the type I IFN pathway.

      Maybe more importantly, the Rosain et al. paper mentioned by the referee which appeared in ‘Cell’ while our study was under review, shows that human IRF1 mutations also fall into the MSMD category (new ref. 65). In contrast, the authors did not observe diminished antiviral immunity. This emphasizes the main conclusions of our study about the relevance of IRF1 for macrophage activation. We discuss this paper on p 14. lines 9-14.

      Obviously, this does not exclude a role of type I IFN in nonviral infection or of IFNγ in viral infection, in fact much of our own work has been dedicated to a role of type I IFN in infections with L. monocytogenes. Nevertheless, we think that in a generic statement about the difference between type I IFN and IFNγ it is correct to label the former as predominantly antiviral and the latter predominantly as a macrophage activating factor against nonviral, intracellular pathogens.

      Revision: We added discussion of Rosain et al. (ref. 65) on p 14. lines 9-14.

      Referee #3, minor comment 8

      The authors claim that a significant portion of ISG promoters is associated with ISGF3 upon IFNγ receptor engagement and that the transcriptomes of macrophages treated briefly with IFNβ or IFNγ exhibit remarkable similarity and sensitivity to Irf9 deletion. However, I am uncertain about the extent of consensus on this claim.

      Reply: The data were surprising but supported by ChIP-seq and RNA-seq in wt and IRF9 ko macrophages (ref 10). Data in a follow-up study (ref. 11) and in this manuscript support our original conclusion by demonstrating the impact of the IRF9 ko on IFNγ responses. Importantly, we don’t claim this is true in all cell types, it may well depend on STAT/IRF9 expression levels and tonic IFN signaling.

      Revision: n. a.

      Revision plan: n. a.

    1. Background Polygenic risk score (PRS) analyses are now routinely applied in biomedical research, with great hope that they will aid in our understanding of disease aetiology and contribute to personalized medicine. The continued growth of multi-cohort genome-wide association studies (GWASs) and large-scale biobank projects has provided researchers with a wealth of GWAS summary statistics and individual-level data suitable for performing PRS analyses. However, as the size of these studies increase, the risk of inter-cohort sample overlap and close relatedness increases. Ideally sample overlap would be identified and removed directly, but this is typically not possible due to privacy laws or consent agreements. This sample overlap, whether known or not, is a major problem in PRS analyses because it can lead to inflation of type 1 error and, thus, erroneous conclusions in published work.Results Here, for the first time, we report the scale of the sample overlap problem for PRS analyses by generating known sample overlap across sub-samples of the UK Biobank data, which we then use to produce GWAS and target data to mimic the effects of inter-cohort sample overlap. We demonstrate that inter-cohort overlap results in a significant and often substantial inflation in the observed PRS-trait association, coefficient of determination (R2) and false-positive rate. This inflation can be high even when the absolute number of overlapping individuals is small if this makes up a notable fraction of the target sample. We develop and introduce EraSOR (Erase Sample Overlap and Relatedness), a software for adjusting inflation in PRS prediction and association statistics in the presence of sample overlap or close relatedness between the GWAS and target samples. A key component of the EraSOR approach is inference of the degree of sample overlap from the intercept of a bivariate LD score regression applied to the GWAS and target data, making it powered in settings where both have sample sizes over 1,000 individuals. Through extensive benchmarking using UK Biobank and HapGen2 simulated genotype-phenotype data, we demonstrate that PRSs calculated using EraSOR-adjusted GWAS summary statistics are robust to inter-cohort overlap in a wide range of realistic scenarios and are even robust to high levels of residual genetic and environmental stratification.Conclusion The results of all PRS analyses for which sample overlap cannot be definitively ruled out should be considered with caution given high type 1 error observed in the presence of even low overlap between base and target cohorts. Given the strong performance of EraSOR in eliminating inflation caused by sample overlap in PRS studies with large (>5k) target samples, we recommend that EraSOR be used in all future such PRS studies to mitigate the potential effects of inter-cohort overlap and close relatedness.

      This work has been peer reviewed in GigaScience (see Description), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      ** Jack Pattee**

      Overall, I think that this manuscript is strong and describes a well-formulated method to address a relevant problem. There are a few outstanding questions about the performance of the EraSOR method from my perspective, which I'll detail as follows.My understanding of reference [16] indicates that equation (3) of this manuscript only holds for null SNPs, i.e. if SNP g is not associated with the outcome Y. If this is the case, then this should be discussed in the manuscript. I wonder if this can partially explain the 'under-estimation' behavior we see in the application to real data in Supplementary Figure 3. In particular, I am referencing the behavior where the EraSOR correction will under-estimate the predictive accuracy of the PRS in the target data, i.e. where delta-R^2 is negative. This behavior is not seen in the simulation and warrants further investigation and discussion. While the bias appears small, for some cases delta-R^2 approaches -.025, which corresponds to an under-estimation of Pearson's r by roughly .15; this is substantial. Could it be the case that, for highly polygenic traits such as height and BMI, the null-SNP assumption is unreliable and the performance of EraSOR is degraded? Does a fundamental assumption of sparse genetic association underlie EraSOR?I recommend that the real data application play a larger role in the manuscript narrative and be moved out of the supplementary. The simulations are appreciated and helpful, but there is nuance in the analysis of real data that cannot be replicated in simulation.I believe the reference to "Supplementary Figure 2" on line 346 should actually be "Supplementary Figure 3". I believe that the axis labels in Supp Figure 3 are flipped.Lines 82 and 83 reference genetic stratification and subpopulations; I think the relevance of these concepts should be introduced more clearly and they should be defined in this context. EraSOR concerns the overestimation of predictive accuracy and association incurred by sample overlap between the base and target GWASs; to this reader, it's not clear what this central issue has to do with population stratification. I realize that the derivation of the LD score method is motivated heavily by correcting for stratification; however, these concepts should be introduced more clearly in this manuscript.Line 88: consider defining LD score l_j.Lines 94-96: consider outlining the mathematical consequence of the assumption that "the two outcomes and cohorts are identical." It's the case that N_1 = N_2 = N_c = N, correct?Line 109 / equation (11): My understanding is that the relevant quantity of this derivation is N_c / sqrt(N_1 N_2), which allows us to define the correct matrix C in expression (4). If this is the case, perhaps the quantity of interest should be moved to the LHS of the equation in the final line of the expression, for clarity.As discussed in the manuscript, the estimated heritability is in the denominator of the expression for N_c / sqrt(N_1 N_2). The authors correctly discuss that the method should not be applied when there is doubt as to whether the heritability is different from zero. I would take this a step further; in cases where the heritability is zero, we cannot meaningfully apply the EraSOR correction, and thus I am not sure of the utility of the 'type I error' simulations in the manuscript. Perhaps an explicit test for h^2 > 0 should be worked into the EraSOR workflow?Line 148 / expression (12): If beta has a normal distribution here, it is the case that all SNPs in the simulation are associated with the outcome Y. This is a somewhat unusual choice for the distribution of SNP effects in a simulation; other applications such as LDPred (Vilhjalmsson et al, AJHG 2015) and LassoSum (TSH Mak et al, Genetic Epi 2017) use a point-normal distribution for simulated SNP effects, which effectively simulates the sparsity frequently observed in nature. Is there a reference or justification for the non-sparse simulation structure here?Line 215: there may be a typo in the expression for the variance of the residual term. Is it the case that the variance of the residual depends on the variance of a covariance term? If so, I am confused as to the derivation.Line 241: 'triat' should be 'trait'.The simulation results in this paper are based on clumping and thresholding for PRS, which does not estimate joint SNP effects i.e. account for LD. Methods such as LDPred and LassoSum do so. Is there any reason to believe the results would be different for a method such as LassoSum?I am confused by the very low Fst between the simulated Finnish and Yoruban samples in simulation. As detailed on line 385: the reported Fst is > .1, but the simulated Fst is essentially zero. This seems likely to be an undesirable simulation artefact, and potentially invalidates the simulation study (or, at least, doesn't provide evidence that EraSOR functions correctly when Fst is large, which was the ostensible motivation for this simulation). Is there no way to effectively simulate populations with a larger Fst?

    1. Author Response

      Reviewer #1 (Public Review):

      Point 1: Many of the initial analyses of behavior metrics, for instance predicting reaction times, number of fixations, or fixation duration, use value difference as a regressor. However, given a limited set of values, value differences are highly correlated with the option values themselves, as well as the chosen value. For instance, in this task the only time when there will be a value difference of 4 drops is when the options are 1 and 5 drops, and given the high performance of these monkeys, this means the chosen value will overwhelmingly be 5 drops. Likewise, there are only two combinations that can yield a value difference of 3 (5 vs. 2 and 4 vs 1), and each will have relatively high chosen values. Given that value motivates behavior and attracts attention, it may be that some of the putative effects of choice difficulty are actually driven by value.

      To address this question, we have adapted the methods of Balewski and colleagues (Neuron, 2022) to isolate the unique contributions of chosen value and trial difficulty to reaction time and the number of fixations in a given trial (the two behaviors modulated by difficulty in the original paper). This new analysis reveals a double dissociation in which reaction time decreases as a function of chosen value but not difficulty, while the number of fixations in a trial shows the opposite pattern. Our interpretation is that reaction time largely reflects reward anticipation, whereas the number of fixations largely reflects the amount of information required to render a decision (i.e., choice difficulty). See lines 144-167 and Figure 2.

      Point 2: Related to point 1, the study found that duration of first fixations increased with fixated values, and second (middle) fixation durations decreased with fixated value but increased with relative value of the fixated versus other value. Can this effect be more concisely described as an effect of the value of the first fixated option carrying over into behavior during the second fixation?

      This is a valid interpretation of the results. To test this directly, we now include an analysis of middle fixation duration as a function of the not-currentlyviewed target. Note that the vast majority of middle fixations are the second fixation in the trial, and therefore the value of the unattended target is typically the one that was viewed first. The analysis showed a negative correlation between middle fixation duration and the value of the unattended target which is consistent with the first fixated value carrying over to the second fixation. See lines 243-246.

      Point 3: Given that chosen (and therefore anticipated) values can motivate responses, often measured as faster reaction times or more vigorous motor movements, it seems curious that terminal non-decision times were calculated as a single value for all trials. Shouldn't this vary depending at least on chosen values, and perhaps other variables in the trial?

      In all sequential sampling model formulations we are aware of, nondecision time is considered to be fixed across trial types. Examples can be found for perceptual decisions (e.g., Resulaj et al., 2009) and in the “bifurcation point” approach used in the recent value-based decision study by Westbrook et al. (2020).

      To further investigate this issue, we asked whether other post-decision processes were sensitive to chosen value in our paradigm. To do so, we measured the interval between the center lever lift and the left or right lever press, corresponding to the time taken to perform the reach movement in each trial (reach latency). We then fit a mixed effects model explaining reach latency as a function of chosen value. While the results showed significantly faster reach latencies with higher chosen values, the effect size was very small, showing on average a ~3ms decrease per drop of juice. In other words, between the highest and lowest levels of chosen value (5 vs. 1), there is only a difference of approximately 12ms. In contrast, the main RT measure used in the study (the interval between target onset and center lever lift) is an order of magnitude more sensitive to chosen value, decreasing ~40ms per drop of juice. These results are shown in Author response image 1.

      Author response image 1.

      This suggests that post-decision processes (NDT in standard models and the additive stage in the Westbrook paper) vary only minimally as a function of chosen value. We are happy to include this analysis as a supplemental figure upon request.

      Point 4: The paper aims to demonstrate similarities between monkey and human gaze behavior in value-based decisions, but focuses mainly on a series of results from one group of collaborators (Krajbich, Rangel and colleagues). Other labs have shown additional nuance that the present data could potentially speak to. First, Cavanaugh et al. (J Exp Psychol Gen, 2014) found that gaze allocation and value differences between options independently influence drift rates on different choices. Second, gaze can correlate with choice because attention to an option amplifies its value (or enhances the accumulation of value evidence) or because chosen options are attended more after the choice is implicitly determined but not yet registered. Westbrook et al. (Science, 2020) found that these effects can be dissociated, with attention influencing choice early in the trial and choice influencing attention later. The NDTs calculated in the present study allot a consistent time to translating a choice into a motor command, but as noted above don't account for potential influences of choice or value on gaze.

      The two-stage model of gaze effects put forth by Westbrook et al. (2020) is consistent with other observations of gaze behavior and choice (i.e., Thomas et al., 2019, Smith et al., 2018, Manohar & Husain, 2013). In this model, gaze effects early in the trial are best described by a multiplicative relationship between gaze and value, whereas gaze effects later in the trial are best described with an additive model term. To test the two-stage hypothesis, Westbrook and colleagues determined a ‘bifurcation point’ for each subject that represented the time at which gaze effects transitioned from multiplicative to additive. In our data, trial durations were typically very short (<1s), making it difficult to divide trials and fit separate models to them. We therefore took at different approach: We reasoned that if gaze effects transition from multiplicative to additive at the end of the trial, then the transition point could be estimated by removing data from the end of each trial and assessing the relative fit of a multiplicative vs. additive model. If the early gaze effects are predominantly multiplicative and late gaze effects are additive, the relative goodness of fit for an additive model should decrease as more data are removed from the end of the trial. To test this idea, we compared the relative model fit of an additive vs. multiplicative models in the raw data, and for data in which successively larger epochs were removed from the end of the trial (50, 100, 150, 200, 300, and 400ms). The relative fit was assessed by computing the relative probability that each model accurately reflects the data. In addition, to identify significant differences in goodness of fit, we compared the WAIC values and their standard errors for each model (Supplemental File 3). As shown in Figure 4, the relative fit probability for both models is nonzero in the raw data 0 truncation), indicating that a neither model provides a definitive best fit, potentially reflecting a mixture of the two processes. However, the relative fit of the additive model decreases sharply as data is removed, reaching zero at 100ms truncation. 100ms is also the point at which multiplicative models provide a significantly better fit, indicated by non-overlapping standard error intervals for the two models (Supplemental File 3). Together, this suggested that the transition between early- and late-stage gaze effects likely occurs approximately 100ms before the RT.

      To minimize the influence of post-decision gaze effects, the main results use data truncated by 100ms. However, because 100ms is only an estimate, we repeated the main analyses over truncation values between 0 and 400ms, reported in Figure 6 - figure supplement 1 & Figure 7 - figure supplement 1. These show significant gaze duration biases and final gaze biases in data truncated by up to 200ms.

      Reviewer #2 (Public Review):

      Recommendation 1: The only real issue that I see with the paper is fairly obvious: the authors find that the last fixations are longer than the rest, which is inconsistent with a lot of the human work. They argue that this is due to the reaching required in this task, and they take a somewhat ad-hoc approach to trying to correct for it. Specifically, they take the difference between final and non-final, second fixations, and then choose the 95th percentile of that distribution as the amount of time to subtract from the end of each trial. This amounts to about 200 ms being removed from the end of each trial. There are several issues with this approach. First, it assumes that final and non-final fixations should be the same length, when we know from other work that final fixations are generally shorter. Second, it seems to assume that this 200ms is "the latency between the time that the subject commits to the movement and the time that the movement is actually detected by the experimenter". However, there is a mismatch between that explanation and the details of the task. Those last 200ms are before the monkey releases the middle lever, not before the monkey makes a left/right choice. When the monkey releases the middle lever, the stimuli disappear and they then have 500ms to press the left or right lever. But, the reaction time and fixation data terminate when the monkey releases the middle lever. Consequently, I don't find it very likely that the monkeys are using those last 200ms to plan their hand movement after releasing the middle lever.

      Thanks for the opportunity to clarify these points. There are three related issues:

      First, with regards to fixation durations, in the updated Figure 3 we now show durations as a function of both the absolute order in the trial (first, second, third, fourth, etc.) and the relative order (final/nonfinal). We find that durations decrease as a function of absolute order in the trial, an effect also seen in humans (see Manohar & Husain, 2013). At the same time, while holding absolute order constant, final fixations are longer than non-final fixations. To explain the discrepancy with human final fixation durations, we note that monkeys make many fewer fixations per trial (~2.5) than humans do (~3.7, computed from publicly available data from Krajbich et al., 2010.) This means that compared to humans, monkeys’ final fixations occur earlier in the trial (e.g., second or third), and are therefore comparatively longer in duration. Note that studies with humans have not independently measured fixation durations by absolute and relative order, and therefore would not have detected the potential interaction between the two effects.

      Second, the comment suggests that the final 200ms before lever lift is not spent planning the left/right movement, given that the monkeys have time after the lever lift in which to execute the movement (400 or 500ms, depending on the monkey). The presumption appears to be that 400/500ms should be sufficient to plan a left/right reach. However, we think that these two suggestions are unlikely, and that our original interpretation is the most plausible. First, the 400/500ms deadline between lift and left/right press was set to encourage the monkeys to complete the reach as fast as possible, to minimize deliberations or changes of mind after lifting the lever. More specifically, these deadlines were designed so that on ~0.5% of trials, the monkeys actually fail to complete the reach within the deadline and fail to obtain a reward. This manipulation was effective at motivating fast reaches, as the average reach latency (time between lift and press) was 165 SEM 20ms for Monkey K, and 290 SEM 100ms for Monkey C.

      Therefore, given the time pressure imposed by the task, it is very unlikely that significant reach planning occurs after the lever lift. In addition to these empirical considerations, the idea that the final moments before the RT are used for motor planning is a standard assumption in many theoretical models of choice (including sequential sampling models, see Ratcliff & McKoon 2008, for review), and is also well-supported by studies of motor control and motor system neurophysiology. Based on these, we think the assumption of some form of terminal NDT is warranted.

      Third, we have changed our method for estimating the NDT interval. In brief we sweep through a range of NDT truncation values (0-400ms) and identify the smallest interval (100ms) that minimizes the contribution of “additive” gaze effects, which are thought to reflect late-stage, post-decision gaze processes. See the response to Point 4 for Reviewer 1 above, Figure 4 and lines 267-325 in the main text. In addition, we report all of the major study results over a range of truncation values between 0 and 400ms.

    1. And I bid you all do likewise. In an ordinary crime, how does one defend the accused? One calls up witnesses to prove his innocence. But witchcraft is ipso facto, on its face and by its nature, an invisible crime, is it not? Therefore, who may possibly be witness to it? The witch and the victim. None other. Now we cannot hope the witch will accuse herself; granted? Therefore, we must rely upon her victims – and they do testify, the children certainly do testify. As for the witches, none will deny that we are most eager for all their confessions. Therefore, what is left for a lawyer to bring out? I think I have made my point. Have I not?

      The logic Danforth uses to justify and explain to Hale why a lawyer is not necessary in this instance is a flawed logic.

      He states that witchcraft is "an invisible crime" in which only the witch and victim are present, also that as the witch herself will hardly accuse herself, the court must rely upon the victim to testify buy identifying the witch in question.

      BUT what he fails to take into account here is that he is assuming that the "victims" are actually victims in the first place and that their accusations are true. He has no real evidence of this other than the girls' confessions. Danforth thus makes a big mistake in assuming that their accusations are valid and to be believed.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1* (Evidence, reproducibility and clarity (Required)): *

      * Srinivasan et al. present a comprehensive study on systematizing the structure-dynamics-function relation of lipid transfer proteins (LTPs), combining extensive molecular simulations and complementary experiments. Indeed, the current state-of-the-art in the field is quite chaotic and fractional, and such systematic studies are necessary to advance our general and conceptual understanding of the mechanisms of action of LTPs. The selected techniques and research strategies are all suitable, their description is sufficient and enables reproducibility; the obtained results are carefully presented and discussed; the conclusions are adequately supported by the data.

      Given my primarily computational background, I evaluated mainly the simulation part of the manuscript. Considering experiments, I do not see any significant flows or deficiencies that could diminish the value of the data and following conclusions given in the manuscript. I would even suggest improving the abstract by more explicitly saying that this work includes experimental measurements because it currently reads like purely computational work was performed. *

      We thank Reviewer #1 for the positive evaluation of our work. The abstract has now been updated to include that our work allows us to interpret existing data but also to design and perform new experimental measurements.

      * Major comments: *

      1) Although I like the central message of the paper and have no objections, I am curious whether the conclusion "a more "dynamic" or/and "mobile" part of the protein interacts with the membrane or any other (macro)(bio)molecule" makes sense globally and is not limited to LTPs. For example, it is a reasonable assumption that a more flexible part of the protein, i.e., capable of adopting necessary binding configurations, would be a more likely interacting spot. Locking in a less flexible and more specific configuration upon binding with a target molecule is also anticipated and quite typical, e.g., when ligands interact with target proteins, thereby blocking their function. The authors themselves recognize this paradigm as referring to the enzymes' dynamics. It would be great if authors could comment more on dynamics-function relation, referring to the existing literature, where such observations were/were not observed for different protein families. Performing simulations on proteins that do not exhibit such a feature and do not belong to LTPs, but, e.g., structurally similar to some of the studied LTPs, would be an excellent addition too, highlighting this signature characteristic of LTPs.

      We have now added a discussion comparing the mechanism we observe with those described for other proteins such as membrane transporters and receptors. Since those proteins are very different and have been already thoroughly characterized (including with molecular simulations) we don’t think that additional simulations are required. Also, concerning protein binding dynamics, we refer to the excellent review of Wade and coworkers: "Acc. Chem. Res. 2016, 49, 5, 809–815"

      "____Notably, the conformational plasticity we observe for LTPs is reminiscent of other, previously described, functional protein mechanisms, including enzyme dynamics during catalysis (____DOI: 10.1126/science.1066176____), the alternating-access model of membrane transporters (____https://doi.org/10.1038/nsmb.3179____) or GPCR dynamics (____https://doi.org/10.1021/acs.chemrev.6b00177____). In all these cases, protein dynamics is strongly coupled to ligand binding (____https://doi.org/10.1021/acs.accounts.5b00516____) and protein function, be it for signaling, transport or enzymatic activity. Unlike for these fields, however, the contribution of structural and spectroscopic studies to uncover LTP dynamics remains quite limited, and our simulations provide an important contribution to fill this gap. We hope that our results will motivate researchers to increase efforts to experimentally quantify LTPs conformational plasticity, e.g. by structural determination of LTPs in different states (or bound to different lipids) or by single-molecule spectroscopy studies."

      *Minor comments: *

      *

      1) Fig 1d. What is so special in Lysine compared to Arginine? Is there any disbalance in their presence in studied proteins? Any correlations between the binding affinity of certain amino acids and their overall presence on the protein surface? *

      Indeed, there is disbalance in the presence of lysine and arginine residues in our proteins. The relation between the number of these residues in our dataset is Lys:Arg = 1.6:1. On top of that, and as described in (Tubiana T et al PLoS Comput Biol. 2022 ;18(12):e1010346) lysine is preferred over arginine in peripheral membrane proteins, likely because it induces fewer perturbations in the lipid bilayer. Our data also agree with Tubiana et al, concerning the correlation between abundance of specific residues on the protein surface and membrane binding.

      * 2) Fig S1. GM2A and TTPA seem to be irreversibly adsorbed to the membrane on the microsecond timescale in most replicas. Is anything special in these proteins? Did this affect the sampling of a claimed membrane-binding interface?*

      Our interpretation of the different adsorption profile of GM2A and TTPA is that these two proteins appear to have higher membrane affinity in our computational assay in comparison with the other proteins in our dataset. However, this has no effect on the membrane-binding interface as the proteins are still able to undergo significant tumbling before binding to the lipid bilayer, as demonstrated by the angle between the two main protein axes and the bilayer normal before membrane binding (Fig. S8 in Supplementary Information).

      * 3) A related follow-up question. Multiple replicas were performed to identify the membrane-binding interface. However, if I understand well, the initial orientation of the protein with respect to the membrane was always the same. I found it a pity since performing multiple replicas starting from different initial geometries (e.g., rotating the protein in a somewhat systematic way) would likely result in a more efficient exploration of the conformation space. Can the authors comment on whether this predefined initial configuration could negatively affect the results? Performing a few additional simulations for the most problematic proteins I mentioned earlier (GM2A and TTPA) could be a nice opportunity to apply this strategy. *

      In our protocol, all proteins start from the same initial orientation but undergo significant tumbling in solution before interacting with the lipid bilayer, including for the two most extreme cases, GM2A and TTPA (Fig. R1). Hence, we think that there is no bias for what pertains to the final membrane interacting region. We have added the Fig. R1 in Supplementary Information (Fig. S8) and added the following text in the Methods Section:

      "____Despite starting from a single orientation, all proteins undergo extensive tumbling before binding to the bilayer, as illustrated by the angle between the two principal protein axes and the membrane normal for the two proteins that display the highest binding propensity, GM2A and TTPA (Fig. S8)."

      * 4) How was the volume of the cavity affected by mutations in STARD11 and Mdm12? Do these data somehow correlate with the experimentally observed reduced efficiency of the lipid transfer? *

      Our data on the volume of the cavity in STARD11 and Mdm12 are inconclusive. However, we caution from such a simplistic interpretation, since it completely neglects the lipid-bound conformation that normally has a much larger cavity than the apo form (Fig. 3).

      *5) I would appreciate it if the authors considered playing with the templates of the main Figures at later stages because in the current version, and when printed on A4 paper, the readability of certain graphs and pictures is uncomfortable and sometimes even impossible. Obviously, the final schematics would depend on the journal and its formatting. *

      We will modify the templates of the main Figures to improve readability according to journal formatting.

      * **Referees cross-commenting** *

      * I would like to acknowledge the thoughtful and detailed reviews provided by other reviewers. I do like their reports, and I believe that by addressing the reviewers' comments and incorporating their revisions, the article will significantly improve in terms of scientific rigor and contribution to the field. *

      *Reviewer #1 (Significance (Required)):

      This manuscript is a solid scientific work addressing gaps in our knowledge about Lipid Transfer Proteins by employing state-of-the-art methods. It advances the field on conceptual and fundamental levels. This study is of interest to both computational biophysicists and physical chemists (to whom I belong myself) as well as experimentalists, who seek a rational explanation of the experimental observations. *

      We thank the reviewer again for the positive evaluation of the significance of our work.

      Reviewer #2* (Evidence, reproducibility and clarity (Required)): *

      * Summary:

      In a combined computational and experimental study, the authors provide insights into general features of lipid transfer proteins (LTPs), which play key roles in lipid trafficking: Through molecular dynamics simulations of a diverse set of 12 shuttle-like LTPs, they demonstrate that LTPs consistently exist in an equilibrium between two or more conformations, whose populations are modulated by a bound lipid, and that residues significantly involved in these collective conformational changes typically interact with a membrane. Their simulations indicate that conformational plasticity is a general feature of LTPs, leading them to suggest that the ability to change conformations is essential for LTP function. They test the generality of this hypothesis through in cellulo assays of two LTPs (STARD11 and Mdm12) that were not originally simulated. While experiments of STARD11 support their hypothesis, those presented for Mdm12 provide ambiguous results. *

      *

      Major comments: *

      * Throughout the manuscript, it's stated that common 'dynamical features' correlate with LTP function. The accuracy of this statement is unclear since 'dynamical features' are never precisely defined and, while equilibrium conformational ensembles are characterized, dynamics (ie kinetics or time-dependent observables) are not. Please clarify.*

      We plan to improve the scholarly presentation of our article to clarify this issue. In short, two distinct properties modulate protein function: 1. Conformational plasticity, i.e. the (thermodynamic) ability of the protein to adopt different conformations (and with different populations depending on the bound substrate). 2. Conformational “dynamics”, i.e. the propensity to exchange between these different thermodynamic states. This ability depends on the free energy barriers between different states and it is intrinsically a kinetic (rather than thermodynamic) property.

      *More importantly, further evidence is needed to determine a correlation with *function*. LTPs are suggested to have faster transfer rates (a measure of function) if the apo form adopts a substantial population of holo-like conformations, akin to enzyme preorganization. This is further tested by rationally mutating STARD11 and Mdm12. However, the support for this conclusion and if these mutations alter the LTPs conformational ensembles as desired is unclear: *

      In our opinion, the interpretation suggested by Reviewer #2 that there is a “correlation” between transfer rates and the overlap of apo-like and holo-like conformations, though fascinating, cannot be derived from the available data at this stage, and we did not mean to imply as such. Rather, lipid transport is a complex phenomenon that involves several steps (membrane binding/unbinding, lipid uptake/release,…). Our simulations indicate that protein conformational plasticity, including potentially the overlap between apo-like and holo-like conformations, also influences lipid transfer rates. We will clarify this aspect in the text.

      * Is there a quantitative correlation between the overlap of apo and holo conformational distributions (as could be quantified by KL divergence or Wasserstein distance, for example) and difference in transfer rates as suggested by Fig S6?*

      We plan to compute quantitative correlation between apo and holo conformational distribution for Fig.S6 and for mutant simulations (see answer below) but, as discussed above, we are skeptical that we will observe a clear correlation.

      * The conclusion and the generality of the findings would be greatly strengthened if a correlation can be shown for other LTPs through additional simulations of mutants whose transfer rates have been previously characterized experimentally in the literature. (For example: Ryan 2007 PMID 17344474, Grabon 2017 PMID 28718450, Iaea 2015 PMID 26168008, among many others)*

      We are currently running simulations of several mutants to address this point and provide additional data/context.

      * While differences in the apo conformational ensembles of the WT and mutants are observed in Fig S7b and d, if these mutations reduce overlap with holo-like conformations is not determined. Simulations of the WT holo forms are needed to properly test this hypothesis. *

      We are currently performing these simulations.

      • For Mdm12, mutations are specifically made to "lock the protein in the apo-like state;" however, the mutant adopts conformations distinct from the apo form as show in Fig S7d. How do the authors interpret the results of the cellular assays considering this and could it help explain why the mutant has similar kinetics to WT? What may explain the puzzling results of similar transfer kinetics but differing mitochondrial morphology? *

      As discussed above, interpretation of lipid transport rates based exclusively on apo and holo conformational population is premature, as this is a complex mechanism that depends on many variables. For what concerns the experimental results, we think three explanations are possible: 1. Mitochondrial morphology could be more sensitive to small variations in lipid composition than our METALIC assay. 2. Our assay only quantifies transport of unsaturated PC and PE species, and we can’t quantify variations in transport of other lipid species that are likely to also be transported by ERMES, such as PS and PA. 3. According to a recent structural model (Wozny et al, Nature 618, 88–192, 2023), Mdm12 might be part of a tunnel-like LTP complex in which it doesn't establish direct interactions with nearby organellar membranes. As such, its mechanism might be different from the one described here for other shuttle-like lipid transport domains. We will discuss these possibilities in the main text.

      • Confounding factors potentially complicate the interpretation of the in cellulo experiments. Simpler in vitro experiments may be better suited to determine if altering LTP's biophysical properties, namely rationally altering the population of apo- vs holo-like configurations, quantitatively affects transport rates as suggested.*

      We agree with Reviewer #2 that this information could be useful. However, this is beyond our technical abilities, and it would require lengthy and expensive experiments that are unlikely to be completed within a reasonable time framework for a revision (3 months). We have rather opted to better discuss our model in the context of published in vitro lipid transport experiments.

      • The abstract, intro, and title highlight that the manuscript's findings are indicative of and correlated with *function* but on p. 12 it's foreseen "that future studies will focus on the functional consequence of such observation." Please reconcile these conflicting statements and ensure connections to function are accurately described. The current title is rather bold. *

      We will rewrite and clarify the extent of our hypotheses and validations.

      * All mentions of "correlation" throughout the manuscript need to be quantitatively evaluated or properly qualified. In addition to that mentioned above regarding Fig S6, what is the correlation coefficient between residues' contribution to PC1 and membrane interaction frequency (Fig 2)? *

      To address this point, we will quantify the correlation between residues' contribution to PC1 and membrane interaction frequency. However, we expect a low correlation between residues' contribution to PC1 and membrane interaction frequency for at least two main reasons. __ First, not all residues contributing to PC1 interact with membranes, but only a subset, as discussed above. Second, our methodology to compute membrane binding, based on the geometric distance between residues and bilayer, is intrinsically quite noisy (since residues in proximity of bona fide membrane binding regions will also appear as involved in membrane binding), thus making quantification of correlations somewhat inaccurate. Rather, we will try to explain in the text that our observations are not of "correlation" but rather of dependence/association, and we will use quantitative measures to quantify these properties (such as rank correlation coefficients or multivariate analyses).__

      * Residue's contributions to collective conformational changes are found to be indicative of membrane binding. Yet, membrane interacting residues are identified from CG simulations that cannot capture such collective conformational changes due to the use of an elastic network. Given that the CG simulations agree with previous experimental findings, this suggests that collective conformational changes are not important for membrane binding. *

      We disagree with this interpretation by Reviewer #2 of our data: we do not claim that residue's contributions to collective conformational changes is indicative of membrane binding. Rather, membrane binding happens at protein regions displaying high contribution to collective conformational changes. This distinction is subtle but important: protein motion does not determine membrane binding regions. Rather, it appears that, for LTPs, membrane binding regions are also characterized by collective motions (suggesting function). We will clarify this in the main text.

      *Are similar conclusions drawn from residues' RMSFs? In other words, are local conformational fluctuations just as indicative of membrane binding? *

      We will compute protein residues’ RMSFs and compare it with the membrane binding data. However, given that RMSF is representative of thermal fluctuations, we again expect a bad correlation between RMSF and membrane binding. On the other hand, we indeed observe that most membrane binding regions are protein loops, but this is not unexpected (e.g. Tubiana et al, PLoS Comput Biol. 2022 Dec; 18(12): e1010346.). However, such observation does not provide any information on lipid transport, but only on the mechanism of membrane binding. Rather, the observation of a relationship between membrane binding and global motion is more interesting, since the latter is often indicative of protein function.

      *The stated correlation may in fact be spurious and instead arise because residues at the entrance to LTP's hydrophobic cavities need to be positioned at the membrane surface for productive lipid uptake and these same residues must undergo significant conformational changes to allow lipid entry. *

      This is exactly what we think it is happening and what our data suggest. However, one must remember that our simulations allow us to predict the membrane binding interface, that is often difficult to determine experimentally (and often via indirect evidence). Hence our data provide novel evidence in this direction.

      *Is proximity to cavity entrance more or less correlated with membrane binding than 'dynamics'? *

      If we consider that, as discussed before, dynamics does not correlate with membrane binding (there are many dynamical regions that are not at the membrane interface), it is safe to assume that proximity to cavity entrance would correlate more with membrane binding. However, we have to consider that often we do not know where the cavity entrance in LTPs is located simply based on structure alone, and hence our approach provides important clues into this process.

      p.12 speculatively suggests "the high degree of protein dynamics we observed in membrane proximal regions could potentially facilitate the energetically unfavorable reaction that involves the extraction of a lipid from a membrane." Yet, the logic behind this idea does not make sense since a free energy barrier, an equilibrium thermodynamic quantity, cannot be lowered by changes in dynamics. Please explain.*

      Our current understanding of the mechanism of lipid extraction is quite poor. However, both using chemical intuition and following a recent MD study on one LTP (Rogers et al, 2023, Plos Comp Biol), it is safe to assume that the hydrophobic environment around the lipid is important for its stabilization in the lipid bilayer. Hence, reducing the number of hydrophobic contacts between the lipid and its environment could facilitate transport. A highly dynamic protein, by cycling between different conformations, could “stir” the bilayer, and hence decrease the number of contacts between the lipid and its environment favoring transport. We will clarify this point in the text.

      *Examining how the LTPs impact membrane properties would offer insight into the functional relevance of such residues for lipid extraction. *

      Indeed, our point above is connected to this one. We are performing simulations to compute hydrophobic contacts in bilayer as proposed in (Rogers et al, 2023, Plos Comp Biol).

      The authors highlight that a bound lipid alters LTPs' conformational ensembles akin to "conformational selection" or "induced fit." How sensitive are these findings to the bound lipid species? Do LTPs with multiple known substrates exhibit an increasing diversity of holo conformations and are different conformations stabilized by different substrates? Would similar observations (Fig 3) be made with a lipid that is not known to be transferred by a given LTP? An interesting future direction would be to examine if lipid substrate specificity could be assessed by comparing conformational ensembles to that of a known substrate and/or by overlap with the apo ensemble.

      We deem that the role of lipid specificity on LTP conformational plasticity is beyond the scope of the current work. While this topic is certainly worth future investigations, we must point out that (i) not all proteins bind/transport multiple lipids (at least according to current knowledge) and (ii) only few LTPs have been structurally characterized bound to different lipids (Osh4, Osh6, …). This limitation prevents a wide generalization, and we prefer not to speculate on this topic. So far, we have tested our approach for Osh4 bound to cholesterol or PI(4)P and found that indeed the protein exhibits different holo conformations (in agreement with the experimental data) when bound to different substrates. We have added a short comment on this topic in the Discussion section.

      "____We foresee that future studies will focus on the functional consequence of such observation, and most notably to the characterization of the extent to which such conformational changes affect multiple steps of protein function, including membrane binding or lipid extraction and release, and whether these are further modulated when different lipids are being transported."

      For LTPs to transfer lipids between membranes, transitions between apo and holo forms ought to occur when LTPs are membrane bound. How does membrane binding influence the conformational ensembles observed in solution? Does it promote conformational changes between apo- and holo-like structures, as suggested to regulate lipid uptake and release by previous studies of Osh/ORP, Ups/PRELI, and START family members? (For example: Miliara 2019 PMID 30850607, Watanabe 2015 PMID 26235513, Grabon 2017 PMID 28718450, Iaea 2015 PMID 26168008, Kudo 2008 PMID 18184806, Dong 2019 PMID 30783101) While answering these questions would require further computational effort, doing so will allow more accurate assessment of the role of conformational changes in LTP function.

      We can’t unfortunately currently quantify how membrane binding influences the conformational ensembles observed in solution, as the slowdown in diffusion at the water-membrane interface makes this task computationally challenging (and certainly not feasible within the time framework of a review). We have so far tested two different proteins and have not succeeded in converging their conformational distribution when membrane-bound despite long MD simulations that lasted several months (even though the non-converged data indicate sampling of both “open” and “closed” conformations). Interestingly, our observations are in qualitative agreement with a recent study on CPTP (Rogers et al, PLOS Comp Biol, 2023), where membrane-bound CPTP is able to sample different conformations (“open” and “closed”) but not to transition between the two states in 300 ns-long MD simulations.

      * The authors motivate the study with the *assumption* that a common molecular mechanism of LTP function exists. Yet LTPs have evolved diverse sequences, structures, and substrate preferences; thus there seems to be no a priori requirement (or even necessarily a benefit) for a single molecular mechanism. What evidence then supports this premise? While previous studies are limited to individual LTPs, when viewed altogether retrospectively, they suggest features that could be shared among LTPs. Synthesizing previous studies and more thoroughly referencing them (only 5 are cited in the intro on p. 3) would strengthen both the premise and findings of the manuscript. *

      Indeed, despite having different structures, substrates and the ability to target distinct organelles, previous evidence on LTPs seem to suggest a potential role for protein conformational plasticity for function, e.g. for Osh/ORP (Jun Im et al, Nature 2005; Canagarajah et al, JMB 2008; Moser von Filseck et al, Nat Comm, 2015; Lipp et al, Nat Comm. 2019,...), StART (Arakane et al, PNAS, 1996; Feng et al, Biochemistry, 2000; Grabon et al, JBC, 2017; Khelashvili et al, eLife, 2019;...) and PITP domains (Tremblay et al, Archives of Biochemistry and Biophysics, 2005; Ryan et al, MBOC, 2007; …). Our simulations provide additional evidence in this direction and allow for generalizing these observations, allowing to draw parallelisms with “enzyme-like” or transporter-like” features that could be exploited for further design of testable hypotheses. We will rewrite our text to better contextualize/acknowledge previous findings and to clarify these points.

      *The LTPs investigated are known to target distinct membranes. Should they then be expected to share structural or sequence-based features predictive of membrane binding interfaces, as motivates the analysis in Fig 1d, 1e, and S3? Or is it beneficial for LTPs to recognize membranes in different ways? *

      Since membrane binding is membrane/organelle-specific, it is possible that residue’s diversity in membrane binding interfaces could indeed be beneficial for this diversity. We will add this comment as a potential explanation of our finding of a lack of conserved sequence-based features for membrane binding interfaces.

      *

      Minor comments:*

      * 2 "making lipid transfer across the cytoplasm a potentially energetically favorable process": Is it meant that it is less energetically costly than transfer without a LTP? Why it would be energetically favorable is unclear (and would indicate that the LTP sequesters lipids away from membranes instead of transferring them between membranes). *

      Yes, this is what we meant. We will rewrite this appropriately.

      * 3 "The excellent agreement between the membrane interface determined from the simulations and the experimentally-proposed one available for... Osh6" is missing a citation. *

      We have now added the relevant citation.

      * The plots in Fig 1d and S3 are difficult to interpret. Bar plots, for example, would allow easier comparison and evaluation. Currently, it seems that most proteins individually exhibit some of the same trends observed among the whole set, counter to the conclusion on p 5. *

      We will improve the presentation of our Figures.

      * Negatively charged residues engage in a number of membrane interactions (Fig 1d and S3). What is a potential explanation for this unconventional observation? *

      One possible interpretation is that negatively charged residues could interact with positively charged moieties (ethanolamine, choline) of PC and PE lipids.

      * How much variance is captured by PC1, and how many PCs are needed to capture most of the variance in the conformations? *

      PC1 explains 38 % of the total variance, by average, whereas PC2 accounts for 17 % of it. Therefore, PC1 and PC2 capture most of the variance in almost all cases.

      We have also added this to the text:

      "____We specifically focused on PC1 as it explains most of the variance in the dynamics (38% on average for all the proteins in our dataset, see Supplementary Table 2).____ "

      We have computed this variance and we have added this analysis in Supplementary Information.

      * Plots in Fig 3, especially panels c and d are difficult to see. Please make the panels larger (perhaps a 3 x 4 layout instead of 2 x 6 would work better). *

      We will improve the presentation of our Figures.

      * 8 "these conformational changes are localized in protein regions that interact with the lipid bilayer" is contradicted by the results in Fig 2b showing that all residues with large contributions to PC1 do not interact with the membrane and discussed on p 5. *

      As discussed above, we don’t observe “correlation” between membrane binding and conformational plasticity, but we rather observe that membrane binding regions display high conformational plasticity (the opposite is not true). We will further clarify in the text.

      *

      8 "in the absence of bound lipids, it is able to sample multiple conformations" is not supported by the orange distributions in Fig 3d that appear unimodal. Is it instead meant that the apo form exhibits larger variance in cavity volume? *

      Yes, this is what we meant. We’ll clarify.

      *

      Please clarify if the elastic network was constructed to maintain the holo or apo structures of each protein and if a bound lipid was used in the CG simulations. *

      For membrane binding CG simulations, we used the apo structure and no bound lipid was used in the simulations. However, analogous simulations in the holo form (not shown) have essentially identical membrane binding interfaces.

      *

      Was *CHARMM* TIP3P used? *

      Yes.

      * Please clarify how membrane interacting residues were defined and how interaction frequency was calculated from the longest duration of interaction. *

      We will add this explanation in the Methods. The method is identical to (Srinivasan et al, Faraday Discussion, 2021).

      * Refs 16 and 45 refer to the same paper. *

      Thanks, it is now corrected!

      * Reviewer #2 (Significance (Required)): *

      * General assessment: *

      * The work aims to tackle a grand question regarding membrane homeostasis mechanisms-what are universal principles underlying LTP function-and offers initial insights; however, further evidence is needed to support the conclusions as written, and some key results require further investigation and explanation. *

      *Advance and audience: *

      *

      By concurrently investigating the largest number of lipid transfer proteins to-date, the authors provide data invaluable for uncovering general mechanisms of non-vesicular lipid transport and advancing our understanding of membrane homeostasis mechanisms. By illuminating the wide-spread importance of conformational plasticity among lipid transfer proteins, the work presents a conceptual advance in our understanding of lipid transfer mechanisms and unifies previous studies. Because the manuscript emphasizes common biophysical principles and draws connections to enzyme biophysics, it ought to be of interest not only to membrane biologists but biochemists and molecular biologists more broadly.*

      We thank Reviewer #2 for the very positive evaluation of the significance of our work and for the in-depth analysis provided that will certainly help improve the quality of our work.

      Reviewer #3* (Evidence, reproducibility and clarity (Required)): *

      *The article "Conformational dynamics of lipid transfer domains provide a general framework to decode their functional mechanism." by Sriraksha Srinivasan, Andrea DiLuca, Arun Peter, Charlotte Gehin, Museer Lone, Thorsten Hornemann, Giovanni D'Angelo and Stefano Vanni study the interaction of Lipid transport Domains with membranes. This is done mainly by molecular modelling but also with selected experimental validations. *

      * Major comments: *

      * - The key conclusions are generally well supported by the analysis. - The authors could however analyze in more details some aspects in which specific cases appear. For example, p3 "multiple binding and unbinding events, as shown by the minimum distance curves" does not give an entire description of the variability seen in Fig S1, e.g. LCN1 versus GM2A.*

      We now discuss in more detail the variability seen in Fig. S1 and attribute it to different membrane binding affinities of the proteins in our dataset. We also discuss how this variability could reflect the diversity of organellar membranes to which these proteins bind in vivo.

      "____Notably, the proteins in our dataset display distinct binding affinities, with some proteins showing very transient binding while others remain membrane-bound for most of the simulation trajectory (Fig. S1). This behavior could be, in part, attributed to the wide diversity of organellar membranes to which the LTDs in our dataset bind to in vivo, and to the comparative simplicity of our in silico model DOPC lipid bilayers."

      • Later the "excellent agreement" for the data in Fig S2 is not quantified which does not allow the reader to know whether it better than would have been with other methods (SASA, OPM, DREAM). *

      We have explicitly quantified this agreement by providing a direct comparison between the experimental results and our in silico assay, and we further compared it against two alternative methods: OPM and DREAMM. In detail, we have identified 12 experimentally-characterized spots suggested to be involved in membrane binding in our protein dataset (see shaded blue regions in Fig. S2). Of those 12, our method identifies all of them (100%), while DREAMM identifies 7 of them (58 %) and OPM 4 out of 8 (50 %), since of the 12 proteins we tested, only 7 are available in the OPM database. Overall, even if our approach is much noisier than the others, and thus suggesting multiple binding regions that are not currently supported by experimental observations, using physics-based methodologies appears to remain a preferable strategy to characterize the binding of peripheral proteins to lipid bilayers. Given the limited size of our dataset, we prefer not to make a direct comparison between our assay and OPM/DREAMM in the main text as this won't be representative of the various methodologies.

      *p5 commenting on Fig2b the case of Osh6 that appears to disagree should probably be mentioned. *

      We now discuss this case, and attribute to this disagreement to insufficient sampling for the peculiar case of Osh6:

      "____One interesting exception in our database appears to be Osh6, where the experimentally determined membrane-binding region at the N-terminus (https://doi.org/10.1038/s41467-019-11780-y) is only marginally binding to the lipid bilayer in silico and it also appears to have limited contribution to PC1. However, our simulations are unable to sample the large conformational changes that the N-terminal lid of Osh6 has been proposed to undergo from its lipid-bound to its apo state, indicating that insufficient sampling could be the reason for this apparent discrepancy."

      *

      -The data and the methods are generally well presented allowing to be reproduced.

      • The experiments adequately replicated with adequate statistical analysis. *

      * Minor comments: *

      * - When presenting the dataset the authors could probably detail a bit more the protocol undertaken to chose the cases. In particular it is unclear whether the chosen proteins have any membrane selectivity, which in principle could be affected by the choice of lipid used here.*

      We have now added in Table 1 a column with a list of potential organelles the different LTPs have been shown to localize to (source: UniProt). As model membrane bilayer, we opted to use a pure DOPC bilayer, for both simplicity and to compare membrane binding in a uniform setting. We foresee that future studies investigating the membrane specificity of the various proteins will shed further light into the molecular mechanism of LTPs. Finally, we also indicate that our choice of proteins was mainly driven by the availability of lipid-bound structures in the protein data bank. We have added the following sentences in the main text:

      "____Specifically, we selected all LTPs for which a crystallographic structure in complex with a lipid was available at the start of our project, plus two additional proteins (GM2A and LCN1) to increase the structural diversity of our dataset (Fig. 1a)"

      and

      "____Notably, the proteins in our dataset display distinct binding affinities, with some proteins showing very transient binding while other remain membrane-bound for most of the simulation trajectory (Fig. S1). This behavior could be, in part, attributed to the wide diversity of organellar membranes to which the LTDs in our dataset bind to in vivo, and to the comparative simplicity of our in silico model DOPC lipid bilayers."

      *- The authors could probably give some indication of how much of the variance is explained by PC1 and comment briefly on the choice to ignore other PCs. *

      PC1 explains 38 % of the total variance, on average. This means that PC1 has a large contribution to the variance, especially in comparison to the other PCs. For instance, PC2 only accounts for 17 % of the total variance. This is the reason we limited our discussion to PC1. We have added a table in supplementary Information quantifying the variance explained by PC1 and PC 2 and added the following sentence in the main text:

      "____We specifically focused on PC1 as it explains most of the variance in the dynamics (38% on average for all the proteins in our dataset)____. "

      * - When analyzing the residues involved in the interaction with the membrane the results could probably be compared with that of the systematic analysis performed recently: Tubiana, T., Sillitoe, I., Orengo, C., & Reuter, N. (2022). Dissecting peripheral protein-membrane interfaces. PLOS Computational Biology, 18(12), e1010346. *

      We have added in the text a reference to the work by Tubiana et al and we have further stressed that our results agree with previous observations (including theirs). This includes the preference for Lys over Arg and the importance of protruding hydrophobes:

      "____Concomitant analysis of all LTDs (Fig. 1d) indicates that the membrane binding interface of LTDs is enriched in the positively charged amino acid Lysine, as this amino acid is less membrane-disruptive than Arginine22, and aromatic/hydrophobic ones (Phe, Leu, Val, Ile). This confirms previous observations, as (i) binding of negatively charged lipids via positively charged residues and (ii) hydrophobic insertions are two of the main mechanisms involved in membrane binding by peripheral proteins22-27."

      * - In the discussion on allostery/conformational selection might not be centered so much on enzymes. *

      We thank the reviewer for this important observation. We have now included in the Discussion the following paragraph that provides additional references and discussion of membrane transporters and receptors.

      "____Notably, the conformational plasticity we observe for LTPs is reminiscent of other, previously described, functional protein mechanisms, including enzyme dynamics during catalysis (____DOI: 10.1126/science.1066176____), the alternating-access model of membrane transporters (____https://doi.org/10.1038/nsmb.3179____) or GPCR dynamics (____https://doi.org/10.1021/acs.chemrev.6b00177____). In all these cases, protein dynamics is strongly coupled to ligand binding and protein function, be it for signaling, transport or enzymatic activity. Unlike for these fields, however, the contribution of structural and spectroscopic studies to uncover LTP dynamics remains quite limited, and our simulations provide an important contribution to fill this gap. We hope that our results will motivate researchers to increase efforts to experimentally quantify LTPs conformational plasticity, e.g. by structural determination of LTPs in different states (or bound to different lipids) or by single-molecule spectroscopy studies."

      * Reviewer #3 (Significance (Required)): *

      *

      The article shows convincing results on the debated issue of the mechanism of lipid transport by lipid transfer proteins. *

      First the study employs molecular modelling to allow a rather large test on 12 cases. The molecular dynamics experiments allow the authors to draw clear hypotheses on role of protein dynamics on the interaction with membranes and the effect on bound lipids on the modification of this dynamics.

      *Then the authors use this knowledge to design experiments that largely confirm those hypotheses. The results should therefore be interesting for a large audience of biochemists and cell biologists interested in lipid transport in the cell. *

      We thank Reviewer #3 for its very positive evaluation and contextualization of our work.

    1. Since the family is the site where biology,society and psychology converge mostevidently, Freud's rooting of sexuality in adeterminate way in the family makesperfect sense. Sexual desire may indeed bedeeply structured by infantile experience,internal conflicts not fully resolved, andrepressions of instincts in early life. Butthe drive model also has blind spots: itobscures the importance of later develop-ment and adult experience, understatesthe impact of the social milieu that shapesthose experiences, and retains a telos ofnormal sexual development, even as itexpands the meaning of the word "sexual".In the final analysis, it can be argued thatFreud rendered nature partly social, movingbeyond the biological determinism ofsexology to begin to understand howdesire is constituted intersubjectively. Butlacking a theory of social structure beyondthe family, the drive model of sexualitytended to downplay the actual links betweensocial structure and sexual behavior.

      Some people believe that our feelings about love and our bodies are shaped when we are very young and that can affect us as we grow up. But this idea only focuses on the family and doesn't think about how other things and experiences in life can also shape our feelings. So, it's important to remember that there are many things that can make us feel different and that how we feel is not only because of our family.

    Annotators

  3. Jun 2023
    1. Author Response:

      Reviewer #1 (Public Review):

      […] The major strength of the study is the elegant and well-powered data set. Longitudinal data on this scale is very difficult to collect, especially with patient cohorts, so this approach represents an exciting breakthrough. Analysis is straightforward and clearly presented. However, no multiple comparison correction is applied despite many different tests. While in general I am not convinced of the argument in the citation provided to justify this, I think in this case the key results are not borderline (p<0.001) and many of the key effects are replications, so there are not so many novel/exploratory hypothesis and in my opinion the results are convincing and robust as they are. The supplemental material is a comprehensive description of the data set, which is a useful resource.

      The authors achieved their aims, and the results clearly support the conclusion that the AD and mean confidence in a perceptual task covary longitudinally. I think this study provides an important impact to the project of computational psychiatry.Sspecifically, it shows that the relationship between transdiagnostic symptom dimensions and behaviour is meaningful within as well as across individuals.

      Response: We thank the reviewer for their appraisal of our paper and positive feedback on the main manuscript and supplementary information. We agree with the reviewer that the lack of multiple comparison corrections can also justified by key findings being replications and not borderline significance. We have added this additional justification to the manuscript (Methods, Statistical Analyses, page 15, line 568: “Adjustments for multiple comparisons were not conducted for analyses of replicated effects”)

      Reviewer #2 (Public Review):

      […] The major strength and contribution of this study is the use of a longitudinal intervention design, allowing the investigation of how the well-established link between underconfidence and anxious-depressive symptoms changes after treatment. Furthermore, the large sample size of the iCBT group is commendable. The authors employed well-established measures of metacognition and clinical symptoms, used appropriate analyses, and thoroughly examined the specificity of the observed effects.

      However, due to the small effect sizes, the antidepressant and control groups were underpowered, reducing comparability between interventions and the generalizability of the results. The lack of interaction effect with treatment makes it harder to interpret the observed differences in confidence, and practice effects could conceivably account for part of the difference. Finally, it was not completely clear to me why, in the exploratory analyses, the authors looked at the interaction of time and symptom change (and group), since time is already included in the symptom change index.

      Response: We thank the reviewer for their succinct summary of the main results and strengths of our study. We apologise for the confusion in how we described that analysis. We examine state-dependence., i.e. the relationship between symptom change and metacognition change, in two ways in the paper – perhaps somewhat redundantly. (1) By correlating change indices for both measures (e.g. as plotted in Figure 3D) and (2) by doing a very similar regression-based repeated-measures analysis, i.e. mean confidence ~ time*anxious-depression score change. Where mean confidence is entered with two datapoints – one for pre- and one for post-treatment (i.e. within-person) and anxious-depression change is a single value per person (between-person change score). This allowed us to test if those with the biggest change in depression had a larger effect of time on confidence. This has been added to the paper for clarification (Methods, Statistical Analysis, page 14, line 553-559: “To determine the association between change in confidence and change in anxious-depression, we used (1) Pearson correlation analysis to correlate change indices for both measures and, (2) regression-based repeated-measures analysis: mean confidence ~ time*anxious-depression score change, where mean confidence is entered with two datapoints (one for pre- and one for post-treatment i.e., within-person) and anxious-depression change is a single value per person (between-person change score)”).

      The analyses have also been reported as regression in the Results for consistency (Treatment Findings: iCBT, page 5, line 197-204: ‘To test if changes in confidence from baseline to follow-up scaled with changes in anxious-depression, we ran a repeated measure regression analyses with per-person changes in anxious-depression as an additional independent variable. We found this was the case, evidenced by a significant interaction effect of time and change in anxious-depression on confidence (b=-0.12, SE=0.04, p=0.002)… This was similarly evident in a simple correlation between change in confidence and change in anxious-depression (r(647)=-0.12, p=0.002)”).

      This longitudinal study informs the field of metacognition in mental health about the changeability of biases in confidence. It advances our understanding of the link between anxiety-depression and underconfidence consistently found in cross-sectional studies. The small effects, however, call the clinical relevance of the findings into question. I would have found it useful to read more in the discussion about the implications of the findings (e.g., why is it important to know that the confidence bias is state-dependent; given the effect size of the association between changes in confidence and symptoms, is the state-trait dichotomy the right framework for interpreting these results; suggestions for follow-up studies to better understand the association).

      Response: Thank you for this comment. We have elaborated on the implications of our findings in the Discussion, including the relevance of the state-trait dichotomy to future research and how more intensive, repeated testing may inform our understanding of the state-like nature of metacognition (Discussion, Limitations and Future Directions, page 10, line 378-380: “More intensive, repeating testing in future studies may also reveal the temporal window at which metacognition has the propensity to change, which could be more momentary in nature.”).

      Reviewer #3 (Public Review):

      […] I think these findings are exciting because they directly relate to one of the big assumptions when relating cognition to mental health - are we measuring something that changes with treatment (is malleable), so might be mechanistically relevant, or even useful as a biomarker?

      This work is also useful in that it replicates a finding of heightened confidence in those with compulsivity, and lowered confidence in those with elevated anxious-depression.

      One caveat to the interest of this work is that it doesn't allow any causal conclusions to be drawn, and only measures two timepoints, so it's hard to tell if changes in confidence might drive treatment effects (but this would be another study). The authors do mention this in the limitations section of the paper.

      Another caveat is the small sample in the antidepressant group.

      Some thoughts I had whilst reading this paper: to what extent should we be confident that the changes are not purely due to practice? I appreciate there is a relationship between improvement in symptoms and confidence in the iCBT group, but this doesn't completely rule out a practice effect (for instance, you can imagine a scenario in which those whose symptoms have improved are more likely to benefit from previously having practiced the task).

      Response: We thank the reviewer for commenting on the implications of our findings and we agree with the caveats listed. We thank the reviewer for raising this point about practice effects. A key thing to note is that this task does not have a learning element with respect to the core perceptual judgement (i.e., accuracy), which is the target of the confidence judgment itself. While there is a possibility of increased familiarity with the task instructions and procedures with repeated testing, the task is designed to adjust the difficulty to account of any improvements, so accuracy is stable. We see that we may not have made this clear in some of our language around accuracy vs. perceptual difficulty and have edited the Results to make this distinction clearer (Treatment Findings: iCBT, pages 4-5, lines 184-189: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved. This was reflected as the overall increase in task difficulty to maintain the accuracy rates from baseline (dot difference: M=41.82, SD=11.61) to follow-up (dot difference: M=39.80, SD=12.62), (b=-2.02, SE=0.44, p<0.001, r2\=0.01)”.)

      However, it is true that there can be a ‘practice’ effect in the sense that one may feel more confident (despite the same accuracy level) due to familiarity with a task. One reason we do not subscribe to the proposed explanation for the link between anxious-depression change and confidence change is that the other major aspect of behaviour that improved with practice did so in a manner unrelated to clinical change. As noted above in the quoted text, participants’ discrimination improved from baseline to follow-up, reflected in the need for higher difficulty level to maintain accuracy around 70%. Crucially, this was not associated with symptom change. This speaks against a general mechanism where symptom improvement leads to increased practice effects in general. Only changes in confidence specifically are associated with improved symptoms. We have provided more detail on this in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up.”).

      Relatedly, to what extent is there a role for general task engagement in these findings? The paper might be strengthened by some kind of control analysis, perhaps using (as a proxy for engagement) the data collected about those who missed catch questions in the questionnaires.

      Response: Thank you for your comment. We included the details of data quality checks in the Supplement. Given the small number of participants that failed more than one attention checks (1% of the iCBT arm) and that all those participants passed the task exclusion criteria, we made the decision to retain these individuals for analyses. We have since examined if excluding these small number of individuals impacts our findings. Excluding those that failed more than one catch item did not affect the significance of results, which has now been added to the Supplementary Information (Data Quality Checks: Task and Clinical Scales, page 5, lines 181-185: “Additionally, excluding those that failed more than one catch item in the iCBT arm did not affect the significance of results, including the change in confidence (b=0.16, SE=0.02, p<0.001), change in anxious-depression (b=-0.32, SE=0.03, p<0.001), and the association between change in confidence and change in anxious-depression (r(638)=-0.10, p=0.011)”).

      I was also unclear what the findings about task difficulty might mean. Are confidence changes purely secondary to improvements in task performance generally - so confidence might not actually be 'interesting' as a construct in itself? The authors could have commented more on this issue in the discussion.

      Response: Thank you for this comment and sorry it was not clear in the original paper. As we discussed in a prior reply, accuracy – i.e. proportion of correct selections (the target of confidence judgements) are different from the difficulty of the dot discrimination task that each person receives on a given trial. We had provided more details on task difficulty in the Supplement. Accuracy was tightly controlled in this task using a ‘two-down one-up’ staircase procedure, in which equally sized changes in dot difference occurred after each incorrect response and after two consecutive correct responses. The task is more difficult when the dot difference between stimuli is lower, and less difficult when the dot difference between stimuli is greater. Therefore, task difficulty refers to the average dot difference between stimuli across trials. Crucially, task accuracy did not change from baseline to follow-up, only task difficulty. Moreover, changes in task difficulty were not associated with changes in anxious-depression, while changes in confidence were, indicating confidence is the clinically relevance construct for change in symptoms.

      We appreciate that this may not have been clear from the description in the main manuscript, and have added more detail on task difficulty to the Methods (Metacognition Task, page 14, lines 540-542: “Task difficulty was measured as the mean dot difference across trials, where more difficult trials had a lower dot difference between stimuli.”) and Results (Treatment Findings: iCBT, pages 4-5, lines 184-186: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved.”). We have also elaborated more on how improvements in symptoms are associated with change in confidence, not task performance in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up”).

      To make code more reproducible, the authors could have produced an R notebook that could be opened in the browser without someone downloading the data, so they could get a sense of the analyses without fully reproducing them.

      Response: Thank you for your comment. We appreciate that an R notebook would be even better than how we currently share the data and code. While we will consider using Notebooks in future, we checked and converting our existing R script library into R Notebooks would require a considerable amount of reconfiguration that we cannot devote the time to right now. We hope that nonetheless the commitment to open science is clear in the extensive code base, commenting and data access we are making available to readers.

      Rather than reporting full study details in another publication I would have found it useful if all relevant information was included in a supplement (though it seems much of it is). This avoids situations where the other publication is inaccessible (due to different access regimes) and minimises barriers for people to fully understand the reported data.

      Response: We agree this is good practice – the Precision in Psychiatry study is very large, with many irrelevant components with respect to the present study (Lee et al., BMC Psychiatry, 2023). For this reason, we tried to provide all that was necessary and only refer to the Precision in Psychiatry study methods for fine-grained detail. Upon review, the only thing we think we omitted that is relevant is information on ethical approval in the manuscript, which we have now added (Methods, Participants, page 11, lines 412-417: “Further details of the PIP study procedures that are not specific to this study can be found in a prior publication (21). Ethical approval for the PIP study was obtained from the Research Ethics Committee of School of Psychology, Trinity College Dublin and the Northwest-Greater Manchester West Research Ethics Committee of the National Health Service, Health Research Authority and Health and Care Research Wales”). If any further information is lacking, we are happy to include it here also.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer 1____: __

      1-Localization of ESYT1 and SYNJ2BP

      The claim of a localization at ER-mitochondria contacts relies on two type of assays. Light microscopy and subcellular fractionation. Concerning microscopy, while the staining pattern is obviously colocalizing with the ER (a control of specificity of staining using KO cells would nevertheless be desirable)

      the idea that ESYT1 foci "partially colocalized with mitochondria" is either trivial or unfounded

      Every cellular structure is "partially colocalized with mitochondria" simply by chance at the resolution of light microscopy

      If the meaning of the experiment is to show that ESYT1 'specifically' colocalizes with mitochondria, then this isn't shown by the data

      There is no quantification that the level of colocalization is more than expected by chance

      nor that it is higher than that of any other ER protein

      Moreover, the author's model implies that ESYT1 partial colocalization with mitochondria is, at least partially, due to its interaction with SYNJ2BP. This is not tested.

      • To analyze and measure MERCs parameters and functions, we used a set of validated methods described in the following specialized review articles (Eisenberg-Bord, Shai et al. 2016, Scorrano, De Matteis et al. 2019).
      • To support and confirm the localization of ESYT1-SYNJ2BP complex at MERCs, we performed supplementary BioID analysis using ER target BirA*, OMM targeted BirA* and ER-mitochondria tether BirA* (Table S1, Figure S1 and Figure 1 A and B). These results confirmed the specificity of the interaction of the 2 partners. ESYT1 is not identified as a prey in OMM BioID and SYNJ2BP is not identified in ER BioID, on the other hand both partners are identified in the ER-mitochondria tether BioID.
      • To improve our description of the partial localization of ESYT1 at mitochondria, we performed a quantitative analysis using confocal microscopy on control human fibroblasts stably overexpressing SEC61B-mCherry as an ER marker which were labelled with ESYT1 and TOMM40 for mitochondria. We measured the % of ESYT1 signal colocalizing with mitochondria and the % of mitochondria positive for ESYT1 (Figure 1E).
      • To demonstrate than ESYT1 partial colocalization with mitochondria is, at least partially, due to its interaction with SYNJ2BP, we performed a quantitative analysis using confocal microscopy. Human control fibroblasts, KO SYNJ2BP fibroblasts and SYNJ2BP overexpressing fibroblasts were labelled with ESYT1, TOMM40 for mitochondria and CANX for ER. We measured the % of ESYT1 signal colocalizing with mitochondria in each condition (Figure 3C). Membranes (MAM) can be purified and are enriched for proteins that localize at ER-mitochondria contacts. This idea originated in the early 90's and since then, myriad of papers has been using MAM purification, and whole MAM proteomes have been determined. Yet the evidence that MAM-enriched proteins represent bona fide ER-mitochondria-contact-enriched proteins (as can nowadays be determined by microscopy techniques) remain scarce. Here, anyway, ESYT1 fractionation pattern is identical to that of PDI, a marker of general ER, with no indication of specific MAM accumulation.

      • To highlight the enrichment of ESYT1 in the MAM fraction, we quantified the ESYT1 signal in each fraction. Those results show a similar fractionation pattern than the MAM resident protein SIGMAR1 (Figure 1F). For SYNJ2BP, it is different as it is more enriched in the MAM than the general mitochondrial marker PRDX3. However, PRDX3 is a matrix protein, making it a poor comparison point, since SYNJ2BP is an OMM protein.

      • To confirm the partial enrichment of SYNJ2BP in the MAM fraction compared to another outer mitochondrial membrane protein, we added the signal of the well characterized OMM protein CARD19 (Rios, Zhou et al. 2022). Again, the model implies that ESYT1 and SYNJ2BP accumulation in the MAM should be dependent on each other. This is not tested.

      • As describe above, we demonstrated in Figure 3C than the accumulation of ESYT1 at mitochondria is, at least partially, dependent on the quantity of SYNJ2BP.

      • We moreover showed a reciprocal effect in Figure 3E. A quantitative analysis using confocal microscopy demonstrated that the effect of SYNJ2BP overexpression on MERCs formation is partially dependent of the presence of ESYT1. 2-ESYT1-SYNJ2BP interaction.

      The starting point of the paper is a BioID signal for SYNJ2BP when BioID is fused to ESYT1. One confirmation of the interaction comes in figure 4, using blue native gel electrophoresis and assessing comigration. Because BioID is promiscuous and comigration can be spurious, better evidence is needed to make this claim. This is exemplified by the fact that, although SYNJ2BP is found in a complex comigrating with RRBP1, according to the BN gel, this slow migrating complex isn't disturbed by RRBP1 knockdown, but is somewhat disturbed by ESYT1 knockdown. More than a change in abundance, a change in migration velocity when either protein is absent would be evidence that these comigrating bands represent the same complex.

      • We showed in Figure 4C that the presence of SYNJ2BP in a complex of a similar molecular weight that ESYT1 (410KDa) is totally dependent of the presence of ESYT1, suggesting an interaction of the 2 proteins.
      • To confirm this interaction, in figure 4A we analyzed on BN cells overexpressing SYNJ2BP together with a 3xFlag tagged version of ESYT1. As a result of the addition of the Flag tag, the complex positive for ESYT1 shifted to a higher molecular weight. The complex positive for SYNJ2BP shifted to a similar the molecular weight, demonstrating the interaction and dependence of the 2 partners. ESYT1-SYNJ2BP interaction needs to be tested by coimmunoprecipitation of endogenous proteins, yeast-2-hybrid, in vitro reconstitution or any other confirmatory methods.

      • To confirm the interaction of the 2 partners, we performed co-immunoprecipitation of the ESYT1-3xFlag protein that we showed in Figure 1H to form complexes similar to the endogenous protein. SYNJ2BP is found as the strongest prey, followed by ESYT2 and SEC22B two described interactors of ESYT1, confirming the quality of the analysis (Table S2) (Giordano, Saheki et al. 2013, Gallo, Danglot et al. 2020). 3-Tethering by ESYT1- SYNJ2BP.

      This is assessed by light and electron microscopy. Absence of ESYT1 decreases several metrics for ER-mitochondria contacts (whether absence of SYNJ2BP has the same effect isn't tested).

      • Using PLA (proximity ligation assay) we demonstrated that the loss of SYNJ2BP leads to a decrease in MERCs (Figure 7 H and I), confirming previous studies (Ilacqua, Anastasia et al. 2022, Pourshafie, Masati et al. 2022). This interesting phenomenon could be due to many things, including but not limited to the possibility that "ESYT1 tethers ER to mitochondria".

      This statement and the respective subheading title are therefore clearly overreaching and should be either supported by evidence or removed.

      Indeed, absence of ESYT1 ER-PM tethering and lipid exchange could have knock-on effects on ER-mito contacts, therefore strong statements aren't supported.

      Moreover, the effect on ER-mitochondria contact metrics could be due to changes in ER-mitochondria contact indeed but may also reflect changes in ER and/or mitochondria abundance and/or distribution, which favour or disfavour their encounter. Abundance and distribution of both organelles are not controlled for.

      • The mitochondrial phenotypes caused by the loss of ESYT1 are all rescued by the introduction of an artificial mitochondrial-ER tether, demonstrating that they are due to loss of the tethering function of ESYT1. Finally, the authors repeat a finding that SYNJ2BP overexpression induces artificial ER-mitochondria tethering. Again, according to the model, this should be, at least in part, due to interaction with ESYT1. Whether ESYT1 is required for this tethering enhancement isn't tested.

      • As described above, we demonstrated in Figure 3C that the accumulation of ESYT1 at mitochondria is, at least partially, dependent on the quantity of SYNJ2BP.

      • We moreover showed a reciprocal effect in Figure 3F. A quantitative analysis using confocal microscopy demonstrated that the effect of SYNJ2BP overexpression on MERC formation is partially dependent of the presence of ESYT1. 4-Phenotypes of ESYT1/SYNJ2BP KD or KO.

      The study goes in details to show that downregulation of either protein yields physiological phenotypes consistent with decreased ER-mitochondria tethering. These phenotypes include calcium import into mitochondria and mitochondrial lipid composition.

      Figure 5 shows that histamine-evoked ER-calcium release cause an increase in mitochondrial calcium, and this increase is reduced in absence of ESYT1, without detectable change in the abundance of the main known players of this calcium import. This is rescued by an artificial ER-mitochondria tether. However, Figure 5D shows that the increase in calcium concentration in the cytosol upon histamine-evoked ER calcium release is equally impaired by ESYT1 deletion, contrary to expectation. Indeed, if the impairment of mitochondrial calcium import was due to improper ER-mitochondria tethering in ESYT1 mutant cells, one would expect more calcium to leak into the cytosol, not less.

      The remaining explanation is that ESYT1 knockout desensitizes the cells to histamine, by affecting GPCR signalling at the PM, something unexplored here.

      In any case, a decreased calcium discharge by the ER upon histamine treatment, explains the decreased uptake by mitochondria.

      The authors argue that ER calcium release is unaffected by ESYT1 KO, but crucially use thapsigargin instead of histamine to show it. Thus, the most likely interpretation of the data is that ESYT1 KO affects histamine signalling and histamine-evoked calcium release upstream of ER-mitochondria contacts.

      • Silencing ESYT1 impairs SOCE efficiency in Jurkat cells (Woo, Sun et al. 2020), but not in HeLa cells (Giordano, Saheki et al. 2013, Woo, Sun et al. 2020). Analysis of the role of ESYT1 in HeLa cells prevents confounding effects due to the loss of ESYT1 at ER-PM. In this model, knock-down of ESYT1 led to a decrease of mitochondrial Ca2+ uptake from the ER upon histamine stimulation, as monitored by genetically encoded Ca2+ indicator targeted to mitochondrial matrix (Figure 5A and B). ESYT1 silencing in HeLa cells did not impact ER Ca2+ store measured by the ER-targeted R-GECO Ca2+ probe (Figure 5C and D). The expression of the artificial mitochondria-ER tether was able to rescue mitochondrial Ca2+ defects observed in ESYT1 silenced cells (Figure 5B), confirming that the observed anomalies are specifically due to MERC defects.
      • In contrast loss of ESYT1 impaired SOCE efficiency in fibroblasts (Figure 6 A and B). This phenotype was fully rescued by re-expression of ESYT1-Myc but not the artificial tether. We therefore investigated the influence of ESYT1 loss on cytosolic Ca2+ concentration following ATP (Figure 6F to H) or histamine stimulation (Figure S3 D to F), both of which showed a reduced cytosolic Ca2+ concentration and uptake in ESYT1 KO cells. This phenotype was fully rescued by the re-expression of ESYT1-Myc but not the artificial tether. Measurment of cytosolic Ca2+ after tharpsigargin treatment in Ca2+-fee media, an inhibitor of the sarco/endoplasmic reticulum Ca2+ ATPase SERCA that blocks Ca2+ pumping into the ER, showed that ESYT1 KO does not influence the total ER Ca2+ pool (Figure 6K and L). However, ER-Ca2+ release capacity upon histamine stimulation (Figure 6I and J) is decreased in ESYT1 KO cells. This phenotype was fully rescued by the re-expression of ESYT1-Myc but not the artificial tether. Loss of ESYT1 decreased the Ca2+ uptake capacities of mitochondria after activation with histamine (Figure S3 A to C) or ATP (Figure 6 C to E). This phenotype was rescued by re-expression of ESYT1-Myc and also the engineered ER-mitochondria tether. Thus, despite the ER-Ca2+ release defect observed after ESYT1 loss, the artificial tether fully rescued the mitochondrial phenotype.
      • These results highlight the distinct and dual roles of ESYT1 in Ca2+ regulation at the ER-PM and at MERCs. The data with SYNJ2BP deletion are more compatible with decreased ER-mito contacts, as no decreased in cytosolic calcium is observed. This is compatible with the previously proposed role of SYNJ2BP in ER-mitochondria tethering, but the difference with ESYT1 rather argue that both proteins affect calcium signaling by different means, meaning they act in different pathways.

      • We explain the different results concerning cytosolic calcium by the fact that ESYT1 is a bi-localized protein with dual functions on cellular calcium. Implicated both in SOCE at ER-PM and in mitochondrial calcium uptake at MERCs. On the other hand, SYNJ2BP is only present at MERCs and its loss do not influence PM-ER signaling or ER-Ca2+ release. Finally, the study delves into mitochondrial lipids to "investigated the role of the SMP-domain containing protein ESYT1 in lipid transfer from ER to mitochondria". In reality, it is not ER-mitochondria lipid transport that is under scrutiny, but general lipid homeostasis, and changes in ER-PM lipids could have knock-on effects on mitochondrial lipids without the need to invoke disruptions in ER-mitochondria transfer activity.

      • The fact that the artificial tether, which specifically rescue MERCs, fully rescue the lipid phenotype argue for a direct loss of MERCs tethering function when ESYT1 is missing. The changes observed are interesting but could be due to anything. Surprisingly, PCA analysis shows that the rescue of the knockout by the ESYT1 gene clusters with the rescue by the artificial tether, and not with the wildtype. This indicates that overexpressing either ESYT1 or a tether cause similar lipidomic changes. These could be due, for instance, to ER stress caused by protein overexpression, and not to a rescue.

      • In order to verify if the overexpression of ESYT1 or the artificial tether induces ER stress, we performed a WB analysis to compare markers of ER stress in control fibroblasts, KO ESYT1 fibroblasts, KO ESYT1 fibroblasts overexpressing ESYT1-Myc or the tether (Figure S4C). This showed no changes in the levels of several different markers of ER stress or cell death. __Reviewer 2____: __

      1) the interaction between those proteins is direct,

      2) if SYNJ2BP is necessary and sufficient to localize E-Syt1 at MERC, and

      3) if MERCs extension induced by SYNJ2BP is dependent on E-Syt1.

      Those points are important to investigate because SYNJ2BP has already been shown to induce MERCs by interacting with the ER protein RRBP1. In addition, some experiments need to be better quantified.

      Major comments: E-syt1/SYNJ2BP in MERCs formation: the authors provide several convincing lines of evidence that both proteins are in the same complex (proximity labelling, localization in the same complex in BN-PAGE, localization in MAM) but it is not clear in which extent the direct interaction between both proteins regulates ER-mitochondria tethering. 1- Pull down experiments or BiFC strategy could be performed to show the direct interaction between both proteins.

      • We showed in Figure 4C that the presence of SYNJ2BP in a complex of a similar molecular weight to that ESYT1 (410KDa) is totally dependent of the presence of ESYT1, suggesting an interaction of the 2 proteins.
      • To confirm this interaction, in figure 4A we analyzed on BN cells overexpressing SYNJ2BP together with a 3xFlag tagged version of ESYT1. As a result of the addition of the Flag tag, the complex positive for ESYT1 shifted to a higher molecular weight. Significantly, the complex positive for SYNJ2BP shifted to a similar the molecular weight, demonstrating the interaction and dependence of the 2 protein partners.
      • To confirm the interaction of the 2 partners, we performed co-immunoprecipitation of the ESYT1-3xFlag protein (Table S2). SYNJ2BP was found as the strongest prey, followed by ESYT2 and SEC22B two described interactors of ESYT1, confirming the quality of the analysis (Giordano, Saheki et al. 2013, Gallo, Danglot et al. 2020). 2- SYNJ2BP OE has already been demonstrated to increase MERCs and this being dependent on the ER binding partners RRBP1 (10.7554/eLife.24463). Therefore, it would be of interest to perform OE of SYNJ2BP in KO Esyt1 to address the question of whether ESyt1 is also required to increase MERCs.

      • A quantitative analysis using confocal microscopy demonstrated that the effect of SYNJ2BP overexpression on MERCs formation is partially dependent of the presence of ESYT1 (Figure 3F). 3- The authors show that Esyt1 punctate size increases when SYNJ2BP is OE (Fig3C), but this can be indirectly linked to the increase of MERCs in the OE line. Thus, it could be interesting to test if the number/shape of E-syt1 punctate located close to mitochondria decreases in KO SYNJ2B. This could really show the dependence of SYNJ2BP for E-syt1 function at MERCs.

      • To improve our description of the partial localization of ESYT1 at mitochondria, we performed a quantitative analysis using confocal microscopy on control human fibroblasts stably overexpressing SEC61B-mCherry as an ER marker which were labelled with ESYT1 and TOMM40 for mitochondria. We measured the % of ESYT1 signal colocalizing with mitochondria and the % of mitochondria colocalizing with ESYT1 (Figure 1E).

      • To demonstrate than ESYT1 partial colocalization with mitochondria is, at least partially, due to its interaction with SYNJ2BP, we performed a quantitative analysis using confocal microscopy. Human control fibroblasts, KO SYNJ2BP fibroblasts and SYNJ2BP overexpressing fibroblasts were labelled with ESYT1, TOMM40 for mitochondria and CANX for ER. We measured the % of ESYT1 signal colocalizing with mitochondria in each condition (Figure 3C). Lipid analyses: the results of MS on isolated mitochondria clearly show that mitochondrial lipid homeostasis is affected on KO-Syt1 and rescued by expression of Syt1-Myc and artificial mitochondria-ER tether. However, p.15, the authors wrote "The loss of ESYT1 resulted in a decrease of the three main mitochondrial lipid categories CL, PE and PI, which was accompanied by an increase in PC ». As the results are expressed in mol%, this interpretation can be distorted by the fact that mathematically, if the content of one lipid decreases, the content of others will increase. I would suggest to express the results in lipid quantity (nmol)/mg of mitochondria proteins instead of mol%. This will clarify the role of E-Syt1 on mitochondrial lipid homeostasis and which lipid increase and decrease.

      • We changed the sentence in the text as suggested. Also it could be of high interest to have the lipid composition of the whole cells to reinforce the direct involvement of E-Syt1 in mitochondrial lipid homeostasis and verify that the disruption of mitochondrial lipid homeostasis is not linked to a general perturbation of lipid metabolism as this protein acts at different MCSs.

      • This is beyond the scope of the project and we would argue that the results of such an experiment would be difficult to interpret. To better understand the impact of Esyt1 of mitochondria morphology, the author could analyze the mitochondria morphology (size, shape, cristae) on their EM images of crt, KO and OE lines. Indeed, on OE (Fig3A), the mitochondria look bigger and with a different shape compared to crt.

      • As we do not observe obvious differences in mitochondrial morphology between control, KO and OE fibroblasts we do not think that quantitative analysis would add to the understanding of the effect of ESYT1 on mitochondrial function. Also, they performed a lot of BN-PAGE. Is it possible to check whether the mitochondrial respiratory chain super-complexes are affected on Esyt1 KO line compared to crt?

      • We decided to remove the data on the metabolic consequences of ESYT1 loss since it was too preliminary and required deeper investigations, focusing instead on the effect of ESYT1 loss on calcium homeostasis. Quantifications: some western blots needs to be quantified (Fig 5K, 6J, S3E);

      • We did not observe obvious differences in the protein levels so we think that quantitation would not add significantly to the understanding of the differences in calcium dynamics that we report. Fig1A: Can the author provide a higher magnification of the triple labeling and perform quantification about the proportion of E-Syt1 punctate located close to mitochondria?

      • We added higher magnification of the same area in all channels and arrows that point to the foci of ESYT1 colocalizing with both ER and mitochondria (Figure 1D).

      • To improve our description of the partial localization of ESYT1 at mitochondria, we performed a quantitative analysis using confocal microscopy on control human fibroblasts stably overexpressing SEC61B-mCherry as an ER marker which were labelled with ESYT1 and TOMM40 for mitochondria. We measured the % of ESYT1 signal colocalizing with mitochondria and the % of mitochondria colocalizing with ESYT1 (Figure 1E). Minor comments:

      • Fig1E + text: according to the legend, the BN-PAGE has been performed on Heavy membrane fraction. Why the authors speak about complexes at MAM in the text of the corresponding figure? Is-it the MAM or the heavy fraction (MAM + mito + ER...)? If BN have been performed from heavy membranes, it is not a real proof that E-syt1 is in MAMs.

      • Heavy membranes have been used in this experiment. The text and conclusions have been changed accordingly.

      • On fig3C (panel crt): it seems like SYNJ2BP dots are not co-localizaed with mito. Is this protein targeted to another organelle beside mitochondria?

      • It is not described that SYNJ2BP would be targeted to another organelle beside mitochondria. It is possible that those dots outside of mitochondria could be non-specific signals from the antibody we used.

      • Fig4A: can the author provide a control of protein loading (membrane staining as example) to confirm the decrease of E-Syt1 in siSYNJ2BP?

      • As we performed this experiment only once we have removed the statement suggesting a decrease in ESYT1 protein in response to the siSYNJ2BP.

      • Fig5E/F: it is not clear to me why the expression of E-Syt1 in the KO is not able to complement the KO phenotype for cytosolic Ca++. Can the authors comment this?

      • We performed further analysis using ATP to trigger calcium release from the ER (figure 6 F to H). In those conditions, expression of ESYT1 in the KO is able to complement the KO phenotype for cytosolic Ca2+. __Reviewer 3____: __

      Main points 1. Confirming the MERC localization of ESYT1 should include some more of tethering factors as demonstrated interactors (some are mentioned above) and should not be limited to lipid homeostasis.

      • As shown in Figure 1B, VAPB, PDZD8 and BCAP31 are found as preys in the ESYT1 bioID analysis. Those proteins have been described as MERC tethers, their loss leading to mitochondrial calcium defects. To support and confirm the specificity of ESYT1-SYNJ2BP complex at MERCs, we performed a supplementary BioID analysis using ER targeted BirA* and OMM targeted BirA* (Table S1, Figure S1 and Figure 1 A and B). These results confirmed the specificity of the interaction of the 2 partners. ESYT1 is not identified as a prey in OMM BioID and SYNJ2BP is not identified in ER BioID. Additional ER-mitochondria tether BirA* analyses showed that tether-BirA* identified both ESYT1 and SYNJ2BP as a prey at MERCs, confirming the localisation of this interaction. Interestingly, a large majority of the known MERCs tethers VAPB-PTPIP51, MFN2, ITPRs, BCAP31 are also found as preys in the tether-BirA* (Figure 1B), confirming the quality of these data.
      • To confirm the interaction of the 2 partners, we performed co-immunoprecipitation of the ESYT1-3xFlag protein. SYNJ2BP is found as the strongest prey, followed by ESYT2 and SEC22B two described interactors of ESYT1, confirming the quality of the analysis (Table S2) (Giordano, Saheki et al. 2013, Gallo, Danglot et al. 2020).

      The fact that in ESYT1 KO cells both mitochondrial calcium transfer and cytosolic calcium accumulation are accompanied by decreased ER-cepia1ER signal decay upon histamine addition suggest that the main reason for ER-mitochondria calcium transfer defects are due to impaired SOCE. Calcium-free medium and histamine are used to show that ESYT1 does not affect ER calcium content. However, if it affects SOCE, then the absence of extracellular calcium would abolish such an effect; moreover, histamine does not test for leak effects. As additional information, the authors should investigate whether ER calcium content is affected by the presence of extracellular calcium in the ko scenario using thapsigargin. The authors should inhibit SOCE to test whether this mechanism is affected in ESYT1 KO and could account for observed signal differences. Excluding SOCE is critical, since any change in calcium entry from the outside would potentially negate a role of ESYT1 in mitochondrial calcium uptake.

      • Silencing ESYT1 impairs SOCE efficiency in Jurkat cells (Woo, Sun et al. 2020), but not in HeLa cells (Giordano, Saheki et al. 2013, Woo, Sun et al. 2020). Analysis of the role of ESYT1 in HeLa cells prevents confounding effects due to the loss of ESYT1 at ER-PM. In this model, knock-down of ESYT1 led to a decrease of mitochondrial Ca2+ uptake from the ER upon histamine stimulation, as monitored by genetically encoded Ca2+ indicator targeted to mitochondrial matrix (Figure 5A and B). ESYT1 silencing in HeLa cells did not impact ER Ca2+ store measured by the ER-targeted R-GECO Ca2+ probe (Figure 5C and D). The expression of the artificial mitochondria-ER tether was able to rescue mitochondrial Ca2+ defects observed in ESYT1 silenced cells (Figure 5B), confirming that the observed anomalies are specifically due to MERC defects.
      • In contrast loss of ESYT1 impaired SOCE efficiency in fibroblasts (Figure 6 A and B). This phenotype was fully rescued by re-expression of ESYT1-Myc but not the artificial tether. We therefore investigated the influence of ESYT1 loss on cytosolic Ca2+ concentration following ATP (Figure 6F to H) or histamine stimulation (Figure S3 D to F), both of which showed a reduced cytosolic Ca2+ concentration and uptake in ESYT1 KO cells. This phenotype was fully rescued by the re-expression of ESYT1-Myc but not the artificial tether. Measurment of cytosolic Ca2+ after tharpsigargin treatment in Ca2+-fee media, an inhibitor of the sarco/endoplasmic reticulum Ca2+ ATPase SERCA that blocks Ca2+ pumping into the ER, showed that ESYT1 KO does not influence the total ER Ca2+ pool (Figure 6K and L). However, ER-Ca2+ release capacity upon histamine stimulation (Figure 6I and J) is decreased in ESYT1 KO cells. This phenotype was fully rescued by the re-expression of ESYT1-Myc but not the artificial tether. Loss of ESYT1 decreased the Ca2+ uptake capacities of mitochondria after activation with histamine (Figure S3 A to C) or ATP (Figure 6 C to E). This phenotype was rescued by re-expression of ESYT1-Myc and also the engineered ER-mitochondria tether. Thus, despite the ER-Ca2+ release defect observed after ESYT1 loss, the artificial tether fully rescued the mitochondrial phenotype.
      • These results highlight the distinct and dual roles of ESYT1 in Ca2+ regulation at the ER-PM and at MERCs.

      The authors claim that ER-Geco measurements show that no change of ER calcium was observed. However, they use thapsigargin treatment and then get a peak, when the signal should show a decrease due to leak. This suggests they did not use ER-Geco in Figure S3C. What was measured and what does it mean?

      • We used R-GECO (not ER-GECO) which measures the cytosolic calcium.
      • We measured total ER Ca2+ store using the cytosolic-targeted R-GECO Ca2+ probe upon thapsigarin treatment, an inhibitor of the sarco/endoplasmic reticulum Ca2+ ATPase SERCA that blocks Ca2+ pumping into the ER (Figure 5C and D) and observed no difference in our different conditions.

      The findings on growth in galactose medium are intriguing but are not accompanied by respirometry to confirm mitochondria are compromised upon ESYT1 KO.

      • We decided to remove the data on the metabolic consequences of ESYT1 loss since it was to preliminary and required deeper investigations, focusing instead on the effect of ESYT1 loss on calcium homeostasis

      Minor points: 1. The authors mention they measure mitochondrial uptake of "exogenous" calcium by applying histamine. They should specify that these measures transferred calcium from the ER rather than uptake of calcium from the exterior (directly at the plasma membrane).

      • The text was clarified as suggested.

      • Expression levels of IP3Rs are not very indicative of any change of their activity. The authors should discuss how ESYT1 could affect their PTMs.

      • A large numer of post translational modifications are known to regulate IP3R activity (Hamada and Mikoshiba 2020), and it is possible that the loss of ESYT1 could interfere with these modifications, but an exploration of this issue is beyond the scope of this study. The text was clarified as suggested. Eisenberg-Bord, M., N. Shai, M. Schuldiner and M. Bohnert (2016). "A Tether Is a Tether Is a Tether: Tethering at Membrane Contact Sites." Dev Cell 39(4): 395-409.

      Gallo, A., L. Danglot, F. Giordano, B. Hewlett, T. Binz, C. Vannier and T. Galli (2020). "Role of the Sec22b-E-Syt complex in neurite growth and ramification." J Cell Sci 133(18).

      Giordano, F., Y. Saheki, O. Idevall-Hagren, S. F. Colombo, M. Pirruccello, I. Milosevic, E. O. Gracheva, S. N. Bagriantsev, N. Borgese and P. De Camilli (2013). "PI(4,5)P(2)-dependent and Ca(2+)-regulated ER-PM interactions mediated by the extended synaptotagmins." Cell 153(7): 1494-1509.

      Hamada, K. and K. Mikoshiba (2020). "IP(3) Receptor Plasticity Underlying Diverse Functions." Annu Rev Physiol 82: 151-176.

      Ilacqua, N., I. Anastasia, D. Aloshyn, R. Ghandehari-Alavijeh, E. A. Peluso, M. C. Brearley-Sholto, L. V. Pellegrini, A. Raimondi, T. Q. de Aguiar Vallim and L. Pellegrini (2022). "Expression of Synj2bp in mouse liver regulates the extent of wrappER-mitochondria contact to maintain hepatic lipid homeostasis." Biol Direct 17(1): 37.

      Pourshafie, N., E. Masati, A. Lopez, E. Bunker, A. Snyder, N. A. Edwards, A. M. Winkelsas, K. H. Fischbeck and C. Grunseich (2022). "Altered SYNJ2BP-mediated mitochondrial-ER contacts in motor neuron disease." Neurobiol Dis: 105832.

      Rios, K. E., M. Zhou, N. M. Lott, C. R. Beauregard, D. P. McDaniel, T. P. Conrads and B. C. Schaefer (2022). "CARD19 Interacts with Mitochondrial Contact Site and Cristae Organizing System Constituent Proteins and Regulates Cristae Morphology." Cells 11(7).

      Scorrano, L., M. A. De Matteis, S. Emr, F. Giordano, G. Hajnoczky, B. Kornmann, L. L. Lackner, T. P. Levine, L. Pellegrini, K. Reinisch, R. Rizzuto, T. Simmen, H. Stenmark, C. Ungermann and M. Schuldiner (2019). "Coming together to define membrane contact sites." Nat Commun 10(1): 1287.

      Woo, J. S., Z. Sun, S. Srikanth and Y. Gwack (2020). "The short isoform of extended synaptotagmin-2 controls Ca(2+) dynamics in T cells via interaction with STIM1." Sci Rep 10(1): 14433.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful to both reviewers for reviewing our manuscript, and for providing very helpful feedback as to how we can improve this work. We have now implemented nearly all of the changes as recommended, and provide responses to these points below.

      In terms of novelty, while recent pre-prints and publications have suggested that the application of multi-omics analysis improves GRN inference, there has yet to be a systematic comparison of linear and non-linear machine learning methods for GRN prediction from single cell multi-omic data. here are many computational and statistical challenges to such a study, and we therefore believe that others in the field will be especially interested in our systematic comparison of network inference methods, especially given the increased interest and utility of multi-omic data.

      In addition, we report the first comprehensive inference of GRNs in early human embryo development. This is a particularly challenging to study developmental context given genetic variation, limitations of sample size due to the precious nature of the material and regulatory constraints. We anticipate that the methodology we developed and datasets we generated will be informative for computational, developmental and stem cell biologists.

      We have uploaded all the network predictions on FigShare and these can be accessed using the following link: https://doi.org/10.6084/m9.figshare.21968813. In addition, we anticipate that the computational and statistical codes and pipelines we developed (available on https://github.com/galanisl/early_hs_embryo_GRNs) will be applied to other cellular and developmental contexts, especially in challenging contexts such as human development, non-typical model organisms and in clinically relevant samples.

      Reviewer 1

      Major comments

      - The proposed strategy (i.e. combining gene expression-based regulatory inference with cis-*regulatory evidence) have been well developed (and implemented) by multiple published works like SCENIC and CellOracle, which is also properly acknowledged by the authors in the discussion section too. This leads to a serious concern on the major methodological contribution of this work. *

      We would like to note that our study is the first to comprehensively evaluate machine learning linear or non-linear gene regulatory network prediction strategies from single-cell transcriptional datasets combined with available multi-omic data. We also apply these methods to a challenging to study context of human early embryogenesis. There are specific methodological challenges arising in this context that other published work has not yet addressed. In particular, the precious nature of the source material means that sample sizes are limited, unlike the contexts where SCENIC and CellOracle were applied. Notably, the numbers of cells available for downstream analysis is typically several orders of magnitude fewer than when scRNA-seq data are collected from adult human tissue or from cell culture. This restriction on sample sizes places corresponding restrictions on statistical power, and is therefore likely to mean that different statistical network inference methodologies are optimal in specific contexts. Furthermore, the inclusion of multi-omic data from complementary platforms (such as ATAC-seq data) becomes even more important in this context to mitigate the effect of reduced sample sizes. These issues are very important for choice of gene regulatory network inference methodology in relation to studies of human embryo development, and ours is the first study to address these issues directly in any context. We have further clarified the novelty of our work in the manuscript in the abstract, introduction and discussion sections.

      - Most of the compared network reconstruction methods involve hyper-parameters setup (e.g., *sparsity regularization weights of the regression methods). The authors did not discuss how these hyper-parameters were chosen. *

      For sparse regression, the hyperparameter controlling sparsity was set by cross-validation (CV), using the internal CV function of the R package. All default settings for GENIE3 were used. This information has now been added to the manuscript (in the Methods section), along with a description of the implementation of the mutual information method we use.

      - For the real-world blastocyst data, the network prediction methods were compared in terms of their reproducibility across validation folds (Fig. 3, Fig. S4-6). However, reproducibility does not necessarily imply accuracy. In fact, statistical learning methods are generally subject to the bias-variance tradeoff, where lower variance (i.e., higher reproducibility) could imply higher bias in model prediction. While there is a lack of gold-standard ground truth to evaluate network accuracy in real biological systems, silver-standards like the ranking of known regulatory interactions in the predictions could be employed as an indirect estimate.

      We thank the reviewer for the opportunity to clarify this point. We would like to avoid any misunderstanding of the reproducibility statistic R, as follows. A higher value of R indicates that the fitted model would generalise well to new data; i.e., R=1 indicates that the model is robust (stable) to perturbations of the data-set. We note that this is not the same as analysing the residual variance of the data after model fitting and related over-fitting (i.e., bias-variance trade-off). The variance that is referred to when discussing bias-variance trade-off is the mean-squared error (of data compared to model), which is not the same as what is assessed by reproducibility statistic R . Specifically, R is a Bayesian estimate of the posterior probability of observing a gene regulation given the data. R is calculated by taking a random sample of the data, doing the network inference again, checking if each gene regulation still appears in the GRN, and then recording (as the R statistic) the average fraction of inclusions over many repetitions. So when we have R close to 1, this indicates that our model predictions generalise well to new data, which is the opposite of what is suggested in this comment. In summary, the accuracy quantified by the reproducibility statistic R relates to the stability of the model predictions to perturbation of the data. We thank the reviewer for the helpful comment to draw our attention to this point, and have now clarified this point in the manuscript on page 6 line 252.

      - The gene set enrichment results were reported only on EPI and TE cell types (Fig. 4C and Fig. *S12), due to the reason that CA data is only available for TE and ICM. However, many of the other results presented in Fig. 3-6 did include the PE cell type albeit using the same CA data. It is not particularly convincing why the cell type inclusion standard for gene set enrichment is different from the other results. *

      We thank the reviewer for noting this and would like to clarify that we restricted the analysis to the EPI and TE, because similar lists of gene-sets were not available for primitive endoderm, where it is currently unclear which pathways are most relevant to this cell type. This has now been clarified in the manuscript on page 8, line 337.

      - The authors cited TF binding in cis-regulatory regions as supporting evidence of several MICA-inferred regulatory interactions (e.g., NANOG -> ZNF343). However, the same cis-regulatory *evidence has already been used in the CA filtering step. All interactions passing CA filtering should in principle have TF-binding support. It would be more convincing if the authors provided other types of evidence as independent support, such as genetic associations like eQTL, experimental perturbations like gene knockdown/knockout, etc. *

      We appreciate the reviewer’s point. We address this by describing published ChIP-seq validation in human pluripotent stem cells which is widely used as a proxy for the study of the epiblast. We feel that the ChIP-seq validation in this context is an appropriate independent validation to support the MICA-inferred cis regulatory interactions predicted from the human embryo datasets we analysed. Our inferences from ATAC-seq data cannot identify TF-DNA binding directly. ChIP-seq data is a widely accepted independent methods to support the inferred interactions from ATAC-seq data.

      We agree that knockdown/knockout would provide further evidence suggesting gene regulation, and indeed these are experiments we would like to conduct systematically in the future, but such perturbations are difficult to achieve at genome-wide scale, especially with very restricted quantities of human embryo material. Notably, these studies would not be evidence of direct regulation and the gold-standard in our opinion is to perturb the cis regulatory region to demonstrate its functional importance in gene regulation. These are important experiments to conduct systematically in the future. We also note that assessing quantitative trait loci in the context of human pre-implantation embryos is extremely challenging due to the restricted sample sizes and genetic variance in the samples collected.

      *- Many of the MICA-inferred regulatory interactions do not exhibit Spearman correlation (Fig. 5, Fig. S17), which could probably be explained by the ability of mutual information to capture complex non-monotonic dependencies. It would be interesting to provide further investigation on these "uncorrelated" edges, which may help demonstrate the superiority of mutual information over Spearman correlation. *

      This has been added as a new Fig.S18.

      - The authors conducted immunostaining experiments to validate the MICA-inferred regulatory *interaction between TFAP2C and JUND. While the identified protein co-localization is a step further than RNA co-expression, it is still correlation rather than causality. Additional evidence like the effect of knockout/knockdown perturbations would be more convincing. *

      We agree with Reviewer 1 that experimental perturbations of TFAP2C and JUND to determine what consequence this has for interactions between these proteins would be informative. However due to the complexity of such an investigation in human embryos, we feel that this is beyond the scope of the current study. One option is to conduct the perturbations in human pluripotent stem cells, however it is unclear if the GRN in this context reflects the same interactions as human embryos and is a distinct question to address in the future. Moreover, while knockdown/knockout studies would be suggestive of up-stream regulation, it will not address the question of whether this is a direct or indirect effect without systematic further analysis including transcription factor-DNA binding (such as CUT&RUN, CUT&Tag or ChIP-seq) analysis as well as perturbations of the putative cis regulatory regions. These are all exciting future experiments and our study provides us and others with hypotheses to functionally test in the future. These are future directions and we have clarified this in the discussion section on page 16, line 576.

      __Minor comments __

      • *The γ symbols in AP-2γ are not correctly rendered. *

      We note that this applies only to the way AP-2γ appears on the Review Commons website, and we are trying to fix this issue. We hope this transformation after the manuscript upload will not apply to a subsequent transfer to a journal.

      • The UMAP figures (Fig. 4A, Fig. S7) are of low resolution compared to other figures.

      We thank the reviewer for noting this. These figures have now been added as vector graphics files to overcome this issue.

      • As the authors are focused on studying the blastocyst regulatory network, the inferred regulatory interactions should be provided as supplementary data.

      We have included all of the inferred gene regulatory interactions as a supplementary folder for the MICA predictions using FigShare: doi.org/10.6084/m9.figshare.21968813. We have included code to reproduce the inferred gene regulatory interactions for the other methods which we compared to MICA. Because this includes 100,000 regulatory interactions per method, we feel that it would be impractical to include the alternative inferred interaction as supplementary data.

      Reviewer 2

      Minor comments

      *- In the abstract, it would be adequate to already mention which normalisation method works the best. *

      This has now been added to the abstract and we appreciate this suggestion.

      *- In Fig. 1: *

      * Describe what are squares and circles

      This information has been included in the figure 1 legend.

      ** In the GRNs refined by keeping CA-predicted regulations only, mention that this are Cis interactions *

      We have modified the figure 1 legend and the text on page 5, line 224 to clarify that these are putative cis-regulatory interactions.

      * The ATAC seq shows KRT8, GATA3, RELB motifs, while the rest of the figure is very general. Maybe make the ATAC-seq peaks panel also as a sketch and relate it to the square/circles graphs on the right hand side to showcase how the filtering of the network is performed.

      We appreciate this suggestion and modified figure 1 accordingly.

      ** The caption says Five GRN inference approaches, while abstract and text say 4. If is clear after reading that the 5th is a random approach. However, it was a surprise at first. *

      We have modified the figure 1 legend to clarify that we also compared random prediction in addition to the 4 GRN inference approaches.

      *- How the Simulation study was performed is not understandable for non experts as it is described in the Methods section. This is an important approach in general, and I think the audience would benefit if the authors add a full section about it in their supplementary data. *

      Further details have now been added to the subsection ‘simulation study’ in the Methods section.

      *- Fig. 2: *

      ** As it is, it is hard to tell the difference between GRN inference methods for a given sample size and number of regulators. Could the authors add a comparative panel for this (maybe some scatter plots would be enough)? MI by itself looks worse here? *

      We thank the reviewer for this helpful suggestion. This comparative plot has now been included in figure 2 and indicates that MI is on par with the other GRN inference methods using simulation RNA-seq data.

      *- When mentioning "samples" (e.g. last paragraph of section 1 in results), do the authors refer to "cells"? *

      We appreciate the reviewer pointing this out and have amended the text throughout to state that these are cells.

      *- What about normalisation effects in the simulated data? *

      With regards to the simulated data, normalisation effects are not relevant as we are generating data that are idealised and therefore not subject to unwanted sources of variation such as read depth. However, in future work, this could be investigated with an expanded simulation study and we appreciate the reviewer’s suggestion.

      *- Figure S7 should be cited in the first paragraph of section 2 in results. *

      This has now been cited.

      *Could the authors add a panel to indicate whether the data is SMART-seq2 or 10X. *

      We thank the reviewer for the suggestion to clarify this, which we think is an important point. We have included a statement that all data used was generated using the SMART-seq2 sequencing technique in the figure legend. The choice of sequencing method/depth of sequencing will likely impact on the choice of GRN inference method and we have also clarified this in the discussion section on page 13, line 516.

      *- In the association of inferred GRNs to human blastocyst cell lineages, the authors find the GRN edges predicted that overlap between the 4 inference methods in each cell type. Do they, therefore, recommend to always use more than one GRN inference method? *

      Identifying overlapping inferences by comparing more than one GRN inference method may be a strategy to identify network edges with more confidence due to the agreement between several inference methodologies. However, this strategy may also miss some edges which can only be detected by one method and not another. We have included a statement in the discussion section to clarify this point on page 15, line 571.

      - If the CA data used was only generated for the TE and ICM only, how do the authors use it to perform MICA on PE?

      We appreciate that this is confusing and have since revised the manuscript on page 5, line 223 to state that the inner cell mass (ICM), comprises EPI (epiblast) and PE (primitive endoderm) cells. It may be that we miss putative cis-regulatory interactions if the ICM CA data does not reflect developmentally progressed PE and EPI cells and we have noted this caveat in the discussion section on page 15, line 561.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      She et al studied the evolution of gene expression reaction norms when individuals colonise a new environment that exposes them to physiologically challenging conditions. Their objective was to test the "plasticity first" hypothesis, which suggest that traits that are already plastic (their value changes when facing a new environment compared to the original environment) facilitates the colonisation of novel environments, which, if true, would be predicted to result in the evolution of gene expression values that are similar in the population that colonised the new environment and evolved under these particular selection pressures. To test this prediction, they studied gene expression in cardiac and muscle tissues in individuals originating from three conditions: lowland individuals in their natural environment (ancestral state), lowland individuals exposed to hypoxia (the plastic response state), and a highland population facing hypoxia for several generations (the coloniser state). They classified gene expression patterns as maladaptive or adaptive in lowland individuals responding to short term hypoxia by classifying gene expression patterns using genes that differed between the ancestral state (lowland) and colonised state (highland). Genes expressed in the same direction in lowland individuals facing hypoxia (the plastic state) as what is found in the colonised state are defined as adaptative, while genes with the opposite expression pattern were labelled as maladaptive, using the assumption that the colonised state must represent the result of natural selection. Furthermore, genes could be classified as representing reversion plasticity when the expression pattern differed between the plasticity and colonised states and as reinforcement when they were in the same direction (for example more expressed in the plastic state and the colonised state than in the ancestral state). They found that more genes had a plastic expression pattern that was labelled as maladaptive than adaptive. Therefore, some of the genes have an expression pattern in accordance with what would be predicted based on the plasticity-first hypothesis, while others do not.

      Thank you for a precise summary of our work. We appreciate the very encouraging comments recognizing the value of our work. We have addressed concerns from the reviewer in greater detail below.

      Q1. As pointed out by the authors themselves, the fact that temperature was not included as a variable, which would make the experimental design much more complex, misses the opportunity to more accurately reflect the environmental conditions that the colonizer individuals face at high altitude. Also pointed out by the authors, the acclimation experiment in hypoxia lasted 4 weeks. It is possible that longer term effects would be identifiable in gene expression in the lowland individuals facing hypoxia on a longer time scale. Furthermore, a sample size of 3 or 4 individuals per group depending on the tissue for wild individuals may miss some of the natural variation present in these populations. Stating that they have a n=7 for the plastic stage and n= 14 for the ancestral and colonized stages refers to the total number of tissue samples and not the number of individuals, according to supplementary table 1.

      We shared the same concerns as the reviewer. This is partly because it is quite challenging to bring wild birds into captivity to conduct the hypoxia acclimation experiments. We had to work hard to perform acclimation experiments by taking lowland sparrows in a hypoxic condition for a month. We indeed have recognized the similar set of limitations as the review pointed out and have discussed the limitations in the study, i.e., considering hypoxic condition alone, short time acclimation period, etc. Regarding sample sizes, we have collected cardiac muscle from nine individuals (three individuals for each stage) and flight muscle from 12 individuals (four individuals for each stage). We have clarified this in Supplementary Table 1.

      Q2. Finally, I could not find a statement indicating that the lowland individuals placed in hypoxia (plastic stage) were from the same population as the lowland individuals for which transcriptomic data was already available, used as the "ancestral state" group (which themselves seem to come from 3 populations Qinghuangdao, Beijing, and Tianjin, according to supplementary table 2) nor if they were sampled in the same time of year (pre reproduction, during breeding, after, or if they were juveniles, proportion of males or females, etc). These two aspects could affect both gene expression (through neutral or adaptive genetic variation among lowland populations that can affect gene expression, or environmental effects other than hypoxia that differ in these populations' environments or because of their sexes or age). This could potentially also affect the FST analysis done by the authors, which they use to claim that strong selective pressure acted on the expression level of some of the genes in the colonised group.

      The reviewer asked how individual tree sparrows used in the transcriptomic analyses were collected. The individuals used for the hypoxia acclimation experiment and represented the ancestral lowland population were collected from the same locality (Beijing) and at the same season (i.e., pre-breeding) of the year. They are all adults and weight approximately 18g. We have clarified this in the Supplementary Table S1 and Methods. We did not distinguish males from females (both sexes look similar) under the assumption that both sexes respond similarly to hypoxia acclimation in their cardiac and flight muscle gene expression.

      The Supplementary Table 2 lists the individuals that were used for sequence analyses. These individuals were only used for sequence comparisons but not for the transcriptomic analyses. The population genetic structure analyzed in a previously published study showed that there is no clear genetic divergence within the lowland population (i.e., individuals collected from Beijing, Tianjing and Qinhuangdao) or the highland population (i.e., Gangcha and Qinghai Lake). In addition, there was no clear genetic divergence between the highland and lowland populations (Qu et al. 2020).

      Q4. Impact of the work

      There has been work showing that populations adapted to high altitude environments show changes in their hypoxia response that differs from the short-term acclimation response of lowland population of the same species. For example, in humans, see Erzurum et al. 2007 and Peng et al. 2017, where they show that the hypoxia response cascade, which starts with the gene HIF (Hypoxia-Inducible Factor) and includes the EPO gene, which codes for erythropoietin, which in turns activates the production of red blood cell, is LESS activated in high altitude individuals compared to the activation level in lowland individuals (which gives it its name). The present work adds to this body of knowledge showing that the short-term response to hypoxia and the long term one can affect different pathways and that acclimation/plasticity does not always predict what physiological traits will evolve in populations that colonize these environments over many generations and additional selection pressure (UV exposure, temperature, nutrient availability). Altogether, this work provides new information on the evolution of reaction norms of genes associated with the physiological response to one of the main environmental variables that affects almost all animals, oxygen availability. It also provides an interesting model system to study this type of question further in a natural population of homeotherms.

      Erzurum, S. C., S. Ghosh, A. J. Janocha, W. Xu, S. Bauer, N. S. Bryan, J. Tejero et al. "Higher blood flow and circulating NO products offset high-altitude hypoxia among Tibetans." Proceedings of the National Academy of Sciences 104, no. 45 (2007): 17593-17598.

      Peng, Y., C. Cui, Y. He, Ouzhuluobu, H. Zhang, D. Yang, Q. Zhang, Bianbazhuoma, L. Yang, Y. He, et al. 2017. Down-regulation of EPAS1 transcription and genetic adaptation of Tibetans to high-altitude hypoxia. Molecular biology and evolution 34:818-830.

      Thank you for highlighting the potential novelty of our work in light of the big field. We found it very interesting to discuss our results (from a bird species) together with similar findings from humans. In the revised version of manuscript, we have discussed short-term acclimation response and long-term adaptive evolution to a high-elevation environment, as well as how our work provides understanding of the relative roles of short-term plasticity and long-term adaptation. We appreciate the two important work pointed out by the reviewer and we have also cited them in the revised version of manuscript.

      Reviewer #2 (Public Review):

      This is a well-written paper using gene expression in tree sparrow as model traits to distinguish between genetic effects that either reinforce or reverse initial plastic response to environmental changes. Tree sparrow tissues (cardiac and flight muscle) collected in lowland populations subject to hypoxia treatment were profiled for gene expression and compared with previously collected data in 1) highland birds; 2) lowland birds under normal condition to test for differences in directions of changes between initial plastic response and subsequent colonized response. The question is an important and interesting one but I have several major concerns on experimental design and interpretations.

      Thank you for a precise summary of our work and constructive comments to improve this study. We have addressed your concerns in greater detail below.

      Q1. The datasets consist of two sources of data. The hypoxia treated birds collected from the current study and highland and lowland birds in their respective native environment from a previous study. This creates a complete confounding between the hypoxia treatment and experimental batches that it is impossible to draw any conclusions. The sample size is relatively small. Basically correlation among tens of thousands of genes was computed based on merely 12 or 9 samples.

      We appreciate the critical comments from the reviewer. The reviewer raised the concerns about the batch effect from birds collected from the previous study and this study. There is an important detail we didn’t describe in the previous version. All tissues from hypoxia acclimated birds and highland and lowland birds have been collected at the same time (i.e., Qu et al. 2020). RNA library construction and sequencing of these samples were also conducted at the same time, although only the transcriptomic data of lowland and highland tree sparrows were included in Qu et al. (2020). The data from acclimated birds have not been published before.

      In the revised version of manuscript, we also compared log-transformed transcript per million (TPM) across all genes and determined the most conserved genes (i.e., coefficient of variance ≤  0.3 and average TPM ≥ 1 for each sample) for the flight and cardiac muscles, respectively (Hao et al. 2023). We compared the median expression levels of these conserved genes and found no difference among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05). As these results suggested little batch effect on the transcriptomic data, we used TPM values to calculate gene expression level and intensity. This methodological detail has been further clarified in the Methods and we also provided a new supplementary Figure (Figure S5) to show the comparative results.

      The reviewer also raised the issue of sample size. We certainly would have liked to have more individuals in the study, but this was not possible due to the logistical problem of keeping wild bird in a common garden experiment for a long time. We have acknowledged this in the manuscript. In order to mitigate this we have tested the hypothesis of plasticity following by genetic change using two different tissues (cardiac and flight muscles) and two different datasets (co-expressed gene-set and muscle-associated gene-set). As all these analyses show similar results, they indicate that the main conclusion drawn from this study is robust.

      Q2. Genes are classified into two classes (reversion and reinforcement) based on arbitrarily chosen thresholds. More "reversion" genes are found and this was taken as evidence reversal is more prominent. However, a trivial explanation is that genes must be expressed within a certain range and those plastic changes simply have more space to reverse direction rather than having any biological reason to do so.

      Thank you for the critical comments. There are two questions raised we should like to address them separately. The first concern centered on the issue of arbitrarily chosen thresholds. In our manuscript, we used a range of thresholds, i.e., 50%, 100%, 150% and 200% of change in the gene expression levels of the ancestral lowland tree sparrow to detect genes with reinforcement and reversion plasticity. By this design we wanted to explore the magnitudes of gene expression plasticity (i.e., Ho & Zhang 2018), and whether strength of selection (i.e., genetic variation) changes with the magnitude of gene expression plasticity (i.e., Campbell-Staton et al. 2021).

      As the reviewer pointed out, we have now realized that this threshold selection is arbitrarily. We have thus implemented two other categorization schemes to test the robustness of the observation of unequal proportions of genes with reinforcement and reversion plasticity. Specifically, we used a parametric bootstrap procedure as described in Ho & Zhang (2019), which aimed to identify genes resulting from genuine differences rather than random sampling errors. Bootstrap results suggested that genes exhibiting reversing plasticity significantly outnumber those exhibiting reversing plasticity, suggesting that our inference of an excess of genes with reversion plasticity is robust to random sampling errors. We have added these analyses to the revised version of manuscript, and provided results in the Figure 2d and Figure 3d.

      In addition, we adapted a bin scheme (i.e., 20%, 40% and 60% bin settings along the spectrum of the reinforcement/reversion plasticity). These analyses based on different categorization schemes revealed similar results, and suggested that our inference of an excess of genes with reversion plasticity is robust. We have provided these results in the Supplementary Figure S2 and S4.

      The second issue that the reviewer raised is that the plastic changes simply have more space to reverse direction rather than having any biological reason to do so. While a causal reason why there are more genes with expression levels being reversed than those with expression levels being reinforced at the late stages is still contentious, increasingly many studies show that genes expression plasticity at the early stage may be functionally maladapted to novel environment that the species have recently colonized (i.e., lizard, Campbell-Staton et al. 2021; Escherichia coli, yeast, guppies, chickens and babblers, Ho and Zhang 2018; Ho et al. 2020; Kuo et al. 2023). Our comparisons based on the two genesets that are associated with muscle phenotypes corroborated with these previous studies and showed that initial gene expression plasticity may be nonadaptive to the novel environments (i.e., Ghalambor et al. 2015; Ho & Zhang 2018; Ho et al. 2020; Kuo et al. 2023; Campbell-Staton et al. 2021).

      Q3. The correlation between plastic change and evolved divergence is an artifact due to the definitions of adaptive versus maladaptive changes. For example, the definition of adaptive changes requires that plastic change and evolved divergence are in the same direction (Figure 3a), so the positive correlation was a result of this selection (Figure 3d).

      The reviewer raised an issue that the correlation between plastic change and evolved divergence is an artifact because of the definition of adaptive versus maladaptive changes, for example, Figure 3d. We agree with the reviewer that the correlation analysis is circular because the definition of adaptive and maladaptive plasticity depends on the direction of plastic change matched or opposed that of the colonized tree sparrows. We have thus removed previous Figure 3d-e and related texts from the revised version of manuscript. Meanwhile, we have changed Figure 3a to further clarify the schematic framework.

      Reviewer #1 (Recommendations For The Authors):

      Q1. Here are private recommendations that I think could help improve the manuscript. West-Eberhard was a pioneer back in 2003 in explicating the hypothesis of "plasticity first". I think it is important to cite their main work in the first paragraph of introduction and to use the term "plasticity-first", which is widely known among evolutionary biologists studying phenotypic plasticity, instead of "plasticity followed by genetic change", since the three papers cited in paragraph 1 call it « plasticity first ».

      West-Eberhard, M.J. (2003) Developmental Plasticity and Evolution, Oxford University Press.

      Thank you for suggesting West-Eberhard (2003) and we have cited this important work. We have also changed “plasticity followed by genetic change” to “plasticity first”.

      Q2. Introduction. Line 5, Change for « On the one hand, if plasticity changes ... »

      We have modified as suggested.

      Q3. Line 52, Change for « ...same direction as adaptive evolution does ...»

      We have modified as suggested.

      Q4. Line 66,When presenting papers that address the plasticity and evolution of gene expression in response to environmental variables, paper by Morris et al is another example that could be useful to include (but this is only a suggestion in case the authors missed it).

      Thank you for suggesting this nice work. We have cited Morris et al. (2014).

      Q5. Line 94, Change for "We acclimated"

      We have modified as suggested.

      Q6. In Figure 3, the figure in panel A and B is labelled "normaxia", but I think that "normoxia" is usually the term used.

      Thank you for spot the typo. We have modified Figure 3a and we no longer used the term “normaxia”.

      Material and methods

      It would be important to merge supplementary table 1 and 2 and only present the individuals that were used with their respective cardiac and muscle libraries (if they come from the same individual?). Also, the origin of the individuals used in the hypoxia experiment should be explained at the beginning of the methods section and explicated in the supplementary table. Information on sex or stage of development (juvenile? Adult? Male? female?) and time of year (in breeding stage? Pre-migration (if any), etc) would allow the reader to see that individuals from lowland differed only in their exposure to hypoxia or not, or if other variables may affect gene expression patterns. Similarly, if all individuals form the highland are males and the lowland hypoxia exposed individuals are females (or juveniles versus breeders, or different time of year, etc) this should be stated in the methods. Gene expression is labile so the reader should know if other variables influence the results presented or not.

      Thank you for suggestion. We have added detailed information (i.e., age, collecting time and season) to the supplementary Table 1. We have also added this information to the Methods. Because the birds used in transcriptomic analysis (Supplementary Table 1) were different individuals from those used in the sequence analyses (Supplementary Table 2), these two tables cannot be merged.

      References:

      Campbell-Staton SC, Velotta JP, Winchell KM. 2021. Selection on adaptive and maladaptive genes expression plasticity during thermal adaptation to urban heat islands. Nat. Commun. 12: 6195.

      Ghalambor CK, Hoke KL, Ruell EW, Fischer EK, Reznick DN, Hughes KA. 2015. Non-adaptive plasticity potentiates rapid adaptive evolution of gene expression in nature. Nature 525:372–375.

      Hao et al. 2023. Divergent contributions of coding and noncoding sequences to initial high-altitude adaptation in passerine birds endemic to the Qinghai–Tibet Plateau. Mol. Ecol. Doi: 10.1111/mec.16942.

      Ho WC, Zhang J. 2018. Evolutionary adaptations to new environments generally reverse plastic phenotypic changes. Nat. Commun. 9: 350.

      Ho WC, Zhang J. 2019. Genetic gene expression changes during environmental adaptations tend to reverse plastic changes even after correction for statistical nonindependence. Mol. Biol. Evol. 36: 604–612.

      Ho WC, Li D, Zhu Q, Zhang J. 2020. Phenotypic plasticity as a long-term memory easing readaptations to ancestral environments. Sci. Adv. 6: eaba3388.

      Kuo KC, Yao CT, Liao BY, Weng MP, Dong F, Hsu YC, Hung CM. 2023. Weak gene-gene interaction facilitates the evolution of gene expression plasticity. BMC Biol. 21: 57.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. Point-by-point description of the revisions

      Reviewer #1

      Evidence, reproducibility and clarity (Required):

      In this paper by Wideman et al, the authors seek to determine the role of cellular iron homeostasis in the pathogenesis of murine malaria.

      The authors to attempt to disentangle the effects of anemia from that of cellular iron deficiency. The authors elegantly make use of a murine model of a rare human mutation in the transferrin receptor. This mutation leads to decreased receptor internalization and decreased cellular iron, but otherwise healthy mice. Using this model, the authors use a P. chabaudi infection model and show an increase in pathogen burden and a decrease in pathology. They show in some detail that the immune response to P. chabaudi infection is blunted, both T and B-cell responses are attenuated in the TfRY20H/Y20H model, and the block in proliferation can be rescued by exogenous iron supplementation. They also show that decreased cellular iron attenuates liver pathology through potentially multiple mechanisms.

      Minor comments:

      • The peak of parasitemia is relatively low (approx..3%) compared to other published studies (e.g. PMID: 22100995, 16714546, 31110285) where the peak in C57BL/6 mice reached 25 - 40%. Can the authors account for this low parasitemia?

      Response: We thank the reviewer for their constructive comments and appreciate that they are highlighting this important point. It has previously been shown (PMID: 23217144, 23719378) that mosquito-transmission of P. chabaudi leads to significantly lower parasitaemia (“Recently mosquito-transmitted parasites were used to mimic a natural infection more closely, as vector transmission is known to regulate Plasmodium virulence and alter the host’s immune response (48-50). Consequently, parasitaemia is expected to be significantly lower upon infection with recently mosquito-transmitted parasites, compared to infection with serially blood-passaged parasites that are more virulent (48,49).”

      • Figure 1K - At homeostasis, serum iron is low in TfR mice however increases to significantly higher than the WT mice at 8 days post infection. Do the authors have an explanation on why these dramatic changes in serum iron are seen?

      Response: During malaria infection, RBC lysis releases haem and iron into circulation, which leads to an increase in serum iron levels. This effect is observed in both wild-type and TfrcY20H/Y20H mice infected with P. chabaudi (Supplementary Figure 1F & Figure 1K). However, the significantly higher serum iron levels observed in infected TfrcY20H/Y20H mice can likely be explained by their decreased capacity for transferrin receptor-1 mediated iron uptake, leading to relatively slower uptake and storage of circulating transferrin-bound iron into tissues. This has been clarified in the manuscript (line 142-143):

      “The elevated serum iron observed in infected TfrcY20H/Y20H mice was consistent with their restricted capacity to take up transferrin-bound circulating iron into tissues.”

      • Figure S3 - Is it surprising that no effects on splenic neutrophils are seen? Were neutrophils quantified at any other point? These would also be expected to have a role in both the control of malaria infection and on any pathology.

      Response: We thank the reviewer for raising this interesting question. It is known that neutrophils can be sensitive to cellular iron deficiency (PMID: 36197985) and that neutrophils can play an important part in malaria infection (PMID: 31628160). However, the magnitude and significance of the neutrophil response to recently mosquito-transmitted P. chabaudi parasites has not been thoroughly investigated. A recent study demonstrated that monocytes and macrophages may be more important than granulocytes in the early response to recently mosquito-transmitted P. chabaudi infection (PMID: 34532703).

      Moreover, we performed neutrophil quantifications in our initial experiments and found that the splenic neutrophil response was not altered in TfrcY20H/Y20H mice eight days after infection. Additionally, no neutrophil infiltration was observed in the liver of either genotype upon P. chabaudi infection. In light of these findings, we did not characterise the neutrophil response further, as it appeared unlikely that neutrophils were the principal causal agent of either the altered immunity or pathology, in this context. However, we agree with the reviewer that larger question of whether neutrophil iron plays a role in the pathology of malaria is an interesting open question which we hope future studies can elucidate.

      A section was added to the discussion to address the role of innate immune cells in our model (line 354-363):

      “The inhibited innate immune response to P. chabaudi in TfrcY20H/Y20H mice likely contributed to both the increased pathogen burden and the decreased liver pathology. Splenic MNPs are important for controlling parasitaemia (34,35,72), but MNPs are also vital for maintaining tissue homeostasis and preventing tissue damage in malaria (43,73). Although other innate cells, such as neutrophils, NK cells and γδT cells are an important part of the immune response to malaria, only the MNP response was distinctly impaired in TfrcY20H/Y20H mice. Notably, neutrophils are known to be sensitive to iron deficiency (16,74) and to affect both immunity and pathology in malaria (75,76). However, in the context of recently mosquito-transmitted P. chabaudi it appears that monocytes and macrophages, rather than granulocytes, may be particularly important for parasite control and tissue homeostasis (43,72).”

      Changes to the text:

      • Fig S1EandF - Please add to the figure legend that these were measured at homeostasis.

      Response: This clarification has been added to the legend of Supplementary Figure 1 (line 954-957).

      • Figure 3 - In the legend, H and I are the wrong way around.

      Response: The legend of Figure 3 has been corrected accordingly (line 888-890).

      • Figure 4 - please add the units of concentration of FeSO4 to all panels

      Response: The units of concentration for FeSO4 and AFeC have been added to all panels of Figure 4 and 6, respectively.

      • Line 246 - The authors state: "there was some evidence of decreased malaria-induced hepatomegaly" however there is no significant difference between WT and TfR mice and both show significant hepatomegaly. I feel that this line should be reworded.

      Response: The sentence (line 252-254) has been reworded as follows:

      Furthermore, while both genotypes developed malaria-induced hepatomegaly, there was a trend toward less severe hepatomegaly in TfrcY20H/Y20H mice (Figure S5C).”

      Significance (Required):

      This work is one of the first to attempt to define the requirements for cellular iron in malaria infection. This is a difficult topic, as infection and associated inflammation and the red blood cell destruction caused by malaria all have complex effects on iron within the body. This study fits well with previous observations showing that anemia can be protective as it both prevents parasite growth and limit immunopathology. This work advances the field by demonstrating a cell intrinsic role for iron in malaria infection. There is a broad possible audience for this work, including malaria researchers, immunologists and people interested in the role or iron, both at a cellular level and systemically.

      Reviewer #2

      Evidence, reproducibility and clarity (Required):

      In this manuscript, the authors have studied the role of iron deficiency in the host response to Plasmodium infection using a transgenic mouse model that carries a mutation in the transferrin receptor. They show that restricted cellular iron acquisition attenuated P. chabaudi infection- induced splenic and hepatic immune responses which in turn mitigated the immunopathology, even though the peak parasitemia was significantly high in the mutant mice. Interestingly, the course of parasite infection doesn't seem to be affected in the mutant mice compared to the wildtype mice despite the induction of poor immune responses. The authors show that the decreased cellular iron uptake broadly impact both innate and adaptive components of the immune system. Conversely, free iron supplementation restored the immune cell functions.

      • The study is well performed, and the manuscript is well written. However, the authors should show how conserved the role of cellular iron is across other rodent malaria parasite species at least with * yoelii or P. berghei* blood stage infection models. This question becomes critical to address in order to understand broad relevance to human malaria infections where both the host and parasites are genetically diverse.

      Response: We thank the reviewer for appreciating our study and for the thoughtful comments. We agree with the reviewer that the diverse genetic background of both parasites and hosts makes it difficult to draw broad conclusions about human malaria infection from animal studies performed in a laboratory setting. The recently mosquito-transmitted P. chabaudi chabaudi AS blood-stage infection model replicates many key features of mild to moderate malaria infection in humans, such as low parasitaemia, anaemia, cyto-adhesive sequestration in microvasculature, and self-resolving immunopathology. Importantly, the immune response elicited by recently mosquito-transmitted parasites also more closely mimics the immune response to a natural infection (PMID: 23719378). Therefore, we consider the recently mosquito-transmitted P. chabaudi chabaudi AS model as the most relevant to answer our particular research questions.

      Furthermore, specific pathogen-free parasitised erythrocyte stabilates made from recently mosquito-transmitted P. berghei or P. yoelii parasites are unfortunately not readily accessible (e.g. through the European Malaria Reagent Repository), in contrast to P. chabaudi. Consequently, preparing and characterising recently mosquito-transmitted strains to perform the experiments suggested by the reviewer would require a substantial amount of additional time and labour, which we deem out of scope for this study.

      In the design of our model we have also taken care to minimise the effects of anaemia, something which would be difficult or impossible to achieve using serially blood passaged P. yollii or P. berghei parasites. Both P. yoelii and P. berghei merozoites preferentially invade immature RBCs (PMID: 34322397) making readouts such as parasitaemia far more sensitive to small variations in erythropoietic output. In addition, the extensive RBC destruction caused by most serially blood-passaged murine Plasmodium strains would likely exaggerate any erythropoietic impairment caused by the TfrcY20H/Y20H mutation.

      Although we strongly believe that the chosen mouse model of malaria is the most appropriate for our study, ultimately, no mouse model can replicate all features of human malaria infection. Inevitably, the direct relevance of animal studies for human infection will always be somewhat opaque. Hence, we respectfully disagree with the reviewer that repeating the experiments with additional murine malaria parasite species would allow us to extrapolate conclusions about human malaria infection. Such experiments would also conflict with the 3Rs principles that govern work with animals in the UK (https://nc3rs.org.uk/). Especially, because most strains of P. yoelii and P. berghei cause severe or non-resolving infections and have a significant negative impact on animal welfare.

      In our opinion, the logical continuation of this study must be to utilise the insights from our research to inform future human studies on the relationships between iron deficiency and malaria-related immunopathology. However, we agree that this is an important topic and have added a section addressing the broad relevance of our findings to the discussion (line 393-396):

      “It remains to be seen what the broader importance of cellular iron is in human malaria infection, in particular within the diverse genetic context of both humans and parasites found in malaria endemic regions. Murine models of malaria are useful in providing hypothesis-generating results, but such findings ultimately ought to be confirmed and developed further through studies in human populations.”

      • Since, restricted cellular iron uptake mitigates the immunopathology, the authors should explore whether this could also relieve the cerebral malaria condition that is caused by the hyper inflammation in the brain. They should use the * berghei* ANKA parasite strain which causes t cerebral malaria in mice. I think would increase impact of the paper.

      Response: Although we agree that this would be an interesting line of inquiry, we think that it is outside of the scope of this study, which predominantly aims to characterise and study the effects of cellular iron deficiency in host cells, particularly immune cells, during mild to moderate malaria infection. The severe pathology underlying cerebral malaria differs greatly from that of a self-resolving blood-stage infection. Furthermore, the relevance to human cerebral malaria of the P. berghei ANKA model is controversial within the field (PMID: 21288352) and as a severe infection its use would again conflict with the 3Rs principles.

      Minor comments:

      • Line 222: repeating word, "iron iron-supplemented...."

      Response: The sentence has been corrected (line 228).

      • Figure 3C, S4C & S5F: Why Mann-Whitney test is performed in these particular graphs, whereas rest of the two groups comparison were done using Welch's test? The authors should clearly mention this in the methods section.

      Response: We apologise if this was unclear in the manuscript. We routinely tested all our datasets for normality to identify the appropriate tests for each dataset. In case of the graphs shown in figure 3C, S4C and S5F, the dataset did not pass the D’Agostino-Pearson normality test and we therefore applied a non-parametric test (i.e. Mann-Whitney), in contrast to the other datasets that passed the test for normal or lognormal distribution. This has been further clarified in the method section (line 581-586):

      The D’Agostino-Pearson omnibus normality test was used to determine normality/lognormality. Parametric statistical tests (e.g. Welch’s t-test) were used for normally distributed data. For lognormal distributions, the data was log-transformed prior to statistical analysis. Where data did not have a normal or lognormal distribution, or too few data points were available for normality testing, a nonparametric test (e.g. Mann-Whitney test) was applied.“

      • Have authors explored whether gamma-delta T cell responses are affected in the mutant mouse strain compared to wildtype mice as they are one of the early responders and the key cytokine producing cells against the Plasmodium blood stage infection.

      Response: __We thank the reviewer for this valuable comment. We briefly explored the role of γδT cells, but did not observe a significant difference in splenic γδT cell numbers between wild-type and TfrcY20H/Y20H mice, eight days post-infection (__Reviewer Figure 1). It is of course possible that γδT cell numbers were affected at an earlier stage, or that γδT cell function (e.g. cytokine production) was affected by cellular iron deficiency during P. chabaudi infection. However, γδT cells may also be less sensitive to cellular iron deficiency than conventional T cells, as has been previously demonstrated for developing T cells (PMID: 7957580).

      A section was added to the discussion to address the role of innate immune cells in our model (line 354-363):

      “The inhibited innate immune response to P. chabaudi in TfrcY20H/Y20H mice likely contributed to both the increased pathogen burden and the decreased liver pathology. Splenic MNPs are important for controlling parasitaemia (34,35,72), but MNPs are also vital for maintaining tissue homeostasis and preventing tissue damage in malaria (43,73). Although other innate cells, such as neutrophils, NK cells and γδT cells are an important part of the immune response to malaria, only the MNP response was distinctly impaired in TfrcY20H/Y20H mice. Notably, neutrophils are known to be sensitive to iron deficiency (16,74) and to affect both immunity and pathology in malaria (75,76). However, in the context of recently mosquito-transmitted P. chabaudi it appears that monocytes and macrophages, rather than granulocytes, may be particularly important for parasite control and tissue homeostasis (43,72).”

      Significance (Required):

      Overall, the study provides novel insights into the role of iron in the immune response to Plasmodium blood stage infection using a rodent malaria model and the interplay of infection, immunity and the development of pathology. As such it is an important study.

      Reviewer #3

      Evidence, reproducibility and clarity (Required):

      Herein Wideman provide novel and important evidence on the role of iron availability for mounting an efficient immune response in a malaria infection model. They employed TfRC Y201H/Y201H mice which develop iron deficiency due to impaired cellular ingestion of transferrin bound iron. They found that those mice develop higher peak parasitemia after vector borne exposure to Pl. chabaudi chabaudi which was paralleled by an impaired immune response as reflected by altered CD4 cell activation, reduced IFN-g formation or reduced B-cell responsiveness. Those deficiencies could be re-covered upon ex vivo iron supplementation pointing to the importance of iron availability for mounting-CD4+ and B-cell specific anti-plasmodial immune responses at the initial phase of infection. However, TFRC mutated mice were able to clear infection over time in a comparable fashion to wt mice.

      This excellent study is important in convincingly showing (by employing high quality immunological analyses) the importance of cellular iron deficiency on immune responses in an infection model of general interest. It also indicates that overwhelming immune response as seen in wt mice is associated with organ damage over time.

      Minor comments:

      • The authors should discuss why and how TFRC mutated mice were able to control infection over time in a comparable fashion as wt mice although peak parasitemia was significantly higher?

      __Response: __We thank the reviewer for the helpful feedback on our study and for posing this interesting question. It does indeed appear as if the immune response, while significantly inhibited in the TfrcY20H/Y20H mice, is still sufficient to clear the infection. It is plausible that the early cell-mediated immune response is inhibited to the degree that parasite control is impaired, resulting in higher peak parasitaemia in TfrcY20H/Y20H mice. In contrast, parasite clearance is comparable and contemporary in both genotypes. Based on the fact that parasite clearance occurs at a time when a substantial adaptive immune response is expected to emerge, we hypothesize that this significantly contributes to pathogen clearance. Thus, it seems likely that the humoral response in TfrcY20H/Y20H mice, even if inhibited, may still be effective enough to clear the parasites and prevent recrudescence.

      As malaria infection progresses, RBC loss and increasing anaemia also contributes to limiting exponential parasite growth. This occurs more or less equally in both genotypes, but it could be particularly important for parasite control in the TfrcY20H/Y20H mice that have an inhibited immune response.

      We have added a section to the discussion to address this (line 380-386):

      “Despite the higher peak parasitaemia in TfrcY20H/Y20H mice, both genotypes were able to clear P. chabaudi parasites at a comparable rate and prevent recrudescence. It follows that even a weakened humoral immune response appears to be sufficient to control P. chabaudi infection. However, our study did not investigate the effects of immune cell iron deficiency on the formation of long-term immunity, which may have been more severely affected. The impaired GC response, in particular, suggests that iron deficiency could counteract the formation of efficient immune memory to subsequent malaria infections.”

      • The authors and others have previously shown (Frost J et al. Sci Adv 2022, Hoffmann et al. EBioMedicine 2021) that iron deficiency results in reduced neutrophil numbers in different infection models. This could also have contributed to the observed effect in initial infection control but may have also been linked altered histopathology seen in Figure 7. However, no mention of neutrophil numbers in this model is made. It would be important if the authors could provide information on neutrophil numbers (only if this analysis has been already performed) and discuss this issue in association with their observation.

      Response: We appreciate that the reviewer has brought attention to this important topic. As they mention, iron deficiency can have a negative impact on the neutrophil response (PMID: 36197985, 34488018) but it can also cause a maladaptive excessive neutrophil response due to failed adaptive immunity (PMID: 33665641). In this study, we show that there is no difference in splenic neutrophil numbers between wild-type and TfrcY20H/Y20H mice, eight days after P. chabaudi infection (Figure S3B). Moreover, the histopathologists detected no liver neutrophil infiltration in either genotype, but rather observed infiltration of mononuclear leukocytes upon P. chabaudi infection. Hence, it appears unlikely that neutrophils were a major contributor to differences in either immunity or pathology in this specific context. However, we cannot definitively rule out that neutrophil numbers were affected earlier in the infection or that neutrophil function was impaired due to cellular iron deficiency.

      A section was added to the discussion to address the role of innate immune cells in our model (line 354-363):

      “The inhibited innate immune response to P. chabaudi in TfrcY20H/Y20H mice likely contributed to both the increased pathogen burden and the decreased liver pathology. Splenic MNPs are important for controlling parasitaemia (34,35,72), but MNPs are also vital for maintaining tissue homeostasis and preventing tissue damage in malaria (43,73). Although other innate cells, such as neutrophils, NK cells and γδT cells are an important part of the immune response to malaria, only the MNP response was distinctly impaired in TfrcY20H/Y20H mice. Notably, neutrophils are known to be sensitive to iron deficiency (16,74) and to affect both immunity and pathology in malaria (75,76). However, in the context of recently mosquito-transmitted P. chabaudi it appears that monocytes and macrophages, rather than granulocytes, may be particularly important for parasite control and tissue homeostasis (43,72).”

      • In addition, alternative mechanism leading to immune tolerance and reduced tissue damage such as induction of heme oxygenase-1, which is also affected by systemic iron availability, should be discussed.

      Response: __An addition was made to the results section and to Figure S5 to address this reviewer comment (line __269-274):

      “In addition, we measured the expression of two genes that are known to have a hepatoprotective effect in the context of iron loading in malaria: Hmox1 (encodes haemoxygenase-1) and Fth1 (encodes ferritin heavy chain). Liver gene expression of Hmox1 was higher in TfrcY20H/Y20H mice, while the expression of Fth1 did not differ between genotypes, eight days after infection (Figure S5H-I). Thus, the higher expression of Hmox1 may have contributed to the hepatoprotective effect in TfrcY20H/Y20H mice.”

      A relevant sentence was also added to the discussion (line 313-318):

      “For example, HO-1 plays an important role in detoxifying free haem that occurs as a result of haemolysis during malaria infection, thus preventing liver damage due to tissue iron overload, ROS and inflammation (62). Interestingly, infected TfrcY20H/Y20H mice had higher expression of Hmox1, but levels of liver iron and ROS comparable to that of wild-type mice. Consequently, this may be indicative of increased haem processing that could have a tissue protective effect”

      Significance (Required):

      Important and intersting study highlighting the central role of iron homeostasis for immune repsonse to infection. General interest because iron deficiency has high prevalence in areas with high enedemic burden of infection

      Reviewer's expertise: infectious disease, immunity, iron homeostasis-- both basic science and clincal expertise (more than 300 peer reviewed publications on these topcis)

    1. Author Response

      Reviewer #1 (Public Review):

      The cerebral cortex, or surface of the brain, is where humans do most of their conscious thinking. In humans, the grooves (sulci) and bumps (convolutions) have a particular pattern in a region of the frontal lobe called Broca's area, which is important for language. Specialists study features imprinted on the internal surfaces of braincases in early hominins by casting their interiors, which produces so-called endocasts. A major question about hominin brain evolution concerns when, where, and in which fossils a humanlike Broca's area first emerged, the answer to which may have implications for the emergence of language. The researchers used advanced imaging technology to study the endocast of a hominin (KNM-ER 3732) that lived about 1.9 million years ago (Ma) in Kenya to test a recently published hypothesis that Broca's remained primitive (apelike) prior to around 1.5 Ma. The results are consistent with the hypothesis and raise new questions about whether endocasts can be used to identify the genus and/or species of fossils.

      We would like to thank Rev. 1 for their comments on our paper.

      Reviewer #2 (Public Review):

      The authors tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap (the region in fossil endocasts corresponding to Broca's area in the brain), being more similar to the condition in chimpanzees than in humans. The evidence from the described individual points to this direction but there are some flaws in the argumentation.

      We are grateful to Rev. 2 for their comments, although we partially agree with some of them.

      First, we would like to rectify the statement of Rev. 2 that we “tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap”, indeed, our aim was to test this hypothesis and not to try to validate it.

      First, only one human and one chimpanzee were used for comparison, although we know that patterns of brain convolutions (and in addition how they leave imprints in the endocranial bones) are very variable.

      We understand the point raised by Rev. 2 about the variation of brain convolutions in humans and chimpanzees. We used atlases published by Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022) to analyse the endocast of KNM-ER 3732 and compare it to the extant human and chimpanzee cerebral conditions. However, in Figure 2, for the sake of clarity only two Homo and Pan specimens were used to illustrate the comparison (as it has been done in other published papers, e.g., Carlson et al., 2011; Science, Gunz et al., 2020 Sci Adv). In the revised version, we modified the manuscript to explain further our approach (line 156) “We used brain and endocast atlases published in Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022; see also www.endomap.org) for comparing the pattern identified in KNM-ER 3732 to those described in extant humans and chimpanzees. To the best of our knowledge, these atlases are the most extensive atlases of extant human and chimpanzee brains/endocasts available to date and are widely used in the literature to explore variability in sulcal patterns. In Figure 2, the extant human and chimpanzee conditions are illustrated by one extant human (adult female) and one extant chimpanzee (adult female) specimens from the Pretoria Bone Collection at the University of Pretoria (South Africa) and in the Royal Museum for Central Africa in Tervuren (Belgium), respectively (Beaudet et al., 2018).”.

      Second, the evidence from this fossil specimen adds to the evidence of previously describe individuals but still not yet fully prove the hypothesis.

      We tempered our discussion by concluding that (line 116) “Overall, the present study not only demonstrates that Ponce de León et al.’s (2021) hypothesis of a primitive brain of early Homo cannot be rejected, but also adds information […]”.

      Third, there is a vicious circle in using primitive and derived features to define a fossil species and then using (the same or different) features to argue that one feature is primitive or derived in a given species. In this case, we expect members of early Homo to be derived compared to their predecessors of the genus Australopithecus and that's why it seems intriguing and/or surprising to argue that early Homo has primitive features. However, we should expect that there is some kind of continuum or mosaic in a time in which a genus "evolves into" another genus. This discussion requires far more discussions about the concepts we use, maybe less discussion about what is different between the two groups but more discussion about the evolutionary processes behind them.

      We fully agree with Rev. 2 on this aspect. We believe that identifying these differences/similarities between fossil and extant hominids constitute the first step of a better understanding of the evolutionary mechanisms. Our work suggests indeed a certain continuity between genera and raises questions on the genus concept and how to interpret the specimens currently attributed to early Homo. In the revised version of the manuscript we included a reference to this possible scenario (line 134): “[…] or to the absence of a definite threshold between the two genera based on the morphoarchitecture of their endocasts (Wood and Collard, 1999).”.

      Fourth, the data of convolutional imprints presented are rather subjective when identifying which impressions represent which brain convolutions. Not seeing an impression does not necessarily mean that the corresponding brain feature did not exist. Interestingly, the manuscript does not mention and discuss at all the frontoorbital sulcus. This is a sulcus that usually runs from the orbital surface of the frontal lobe up to divide the inferior frontal gyrus in chimpanzees, a condition totally different than in humans who do not have a frontoorbital sulcus. Could such a sulcus be identified, this would provide a far more convincing argument for a primitive condition in this specimen. In Australopithecus sediba, e.g., the condition in this region seems to be a mosaic in which some aspects of the morphology seem to be more modern while one of the sulcual impressions can well be interpreted as a short frontoorbital sulcus. For this specimen, by the way, I would come back to my third point above: some experts in the field might argue that this specimen could belong to Homo rather than Australopithecus...

      We agree that the presence of a fronto-orbital sulcus would be more conclusive. However, this sulcus has not been identified in KNM-ER3732 and the region in which we would expect to find it is not preserved. As demonstrated by Ponce de León et al. (2021), because of the topographic relationships between sulci (and cranial structures), it is possible to interpret imprints on endocasts and the evolutionary polarity of some traits even in the absence of landmarks such as the fronto-orbital sulcus. In Australopithecus sediba the main derived feature of the endocast corresponds to the ventrolateral bulge in the left inferior frontal gyrus, and not to the sulcal pattern itself (Carlson et al., 2011 Science). However, the discussion around the taxonomic status of this taxon confirms the urgent need for reconsidering specimens from that time period and clarifying the mosaic-like or concerted evolution of the derived Homo-like traits within our lineage. Regarding the subjective nature of this approach, we invite readers to examine the specimen on MorphoSource (https://www.morphosource.org/concern/media/000497752?locale=en) and to request access to the National Museums of Kenya to the physical or virtual specimen to falsify our hypothesis.

      According to my arguments above, I think that this manuscript might revive interesting discussions about this topic but it is not likely to settle them because the data presented are not strong enough to fully support the hypothesis.

      We would be more than happy to consider new/other specimens with similar chronological and geographical contexts and investigate further this hypothesis in the future.

      Reviewer #3 (Public Review):

      The authors provide a detailed analysis of the sulcal and sutural imprints preserved on the natural endocast and associated cranial vault fragments of the KNM-ER3732 early Homo specimen. The analyses indicate a primitive ape-like organization of this specimen's frontal cortex. Given the geological age of around 1.9 million years, this is the earliest well-documented evidence of a primitive brain organization in African Homo.

      In the discussion, the authors re-assess one of the central questions regarding the evolution of early Homo: was there species diversity, and if yes, how can we ascertain it? The specimen KNM-ER1470 has assumed a central role in this debate because it purportedly shows a more advanced organization of the frontal cortex compared to other largely coeval specimens (Falk, 1983). However, as outlined in Ponce de León et al. 2021 (Supplementary Materials), the imprints on the ER1470 endocranium are unlikely to represent sulcal structures and are more likely to reflect taphonomic fracturing and distortion. Dean Falk, the author of the 1983 study, basically shares this view (personal communication). Overall, I agree with the authors that the hypothesis to be tested is the following: did early Homo populations with primitive versus derived frontal lobe organizations coexist in Africa, and did they represent distinct species?

      I greatly appreciate that the authors make available the 3D surface data of this interesting endocast.

      We are grateful to Rev. 3 for their comments and for contextualizing our finding. We would also like to point out that, although the 3D surface can be viewed on MorphoSource, permission from the National Museums of Kenya has to be requested for studying the specimen and getting access to the physical specimen and/or the 3D model.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their positive and constructive evaluations. Based upon the reviewers’ helpful comments, we have performed complementary experiments. In particular, we additionally show that:

      • a complete analysis of CXCR1/2 binding chemokines in the secretions of tissular CD8+ T cells reinforces the key role of CXCL8 in CD8+ T cell-induced fibrocyte chemotaxis (new panel D in Figure 2)

      • a direct contact between fibrocytes and CD8+ T cells triggers CD8+ T cell cytotoxicity against primary basal bronchial epithelial cells (new Figure 6)

      • the interaction between CD8+ T cells and fibrocytes is bidirectional, with CD8+ T cells triggering the development of fibrocyte immune properties (new Figure 7)

      • the characteristic time to reach a stationary state reminiscent of a resolution of the COPD condition was estimated to be about 2.5 years using the simulations. Interfering with chemotaxis and adhesion processes by inhibiting CXCR1/2 and CD54, respectively was not sufficient to reverse the COPD condition, as predicted by the mathematical model (new Figure 9)

      • the massive proliferation effect induced by fibrocytes is specific to CD8+ T cells and not CD4+ T cells (new Figure 3-figure supplement 2), and that fibrocytes moderately promote the death of unactivated CD8+ T cells in direct co-culture (new Figure 3-figure supplement 3)

      We have graphically summarized our findings (new Figure 10) suggesting the existence of a positive feedback loop playing a role in the vicious cycle that promotes COPD. A new table describing patient characteristics for basal bronchial epithelial cell purification has also been added (new Supplementary File 9), the Supplementary Files 7 and S8 have been up-dated to take into account the new experiments.

      The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD041402.  

      Reviewer #1 (Recommendations For The Authors):

      The experimental approaches are all rationally designed and the data clearly presented, with appropriate analyses and sample sizes. I could find no technical or interpretative concerns. The interrelationship between the observational data (histology) with the quantitative live cell imaging and the follow-on functional investigations is especially laudable. The data nicely unifies several years of accumulated data regarding the (separate) participation of CD8 T cells and fibrocytes in COPD.

      We thank the reviewer for his/her comments.

      I have only minor comments:

      1) Line 79: The observation that T cells may influence fibrocyte differentiation/function was initially made some years earlier by Abe et al (J Immunol 2001; 7556), and should be cited in addition to the follow-on work of Niedermeyer.

      This reference has been added to acknowledge this seminal work.

      2) Line 632: Corticosteroids originate from the cortex of the adrenal gland. Budenoside and fluticasone are glucocorticoids, not corticosteroids.

      This mistake has been corrected in the discussion of the revised manuscript (see line 802 in the revised manuscript).

      3) Given the state of T cell immunotherapies, cytokine/chemokine antagonists, and emerging fibrocyte-targeted drugs, can the authors possibly speculate as to desired pathways to target therapeutically?

      Chemokine-receptor based therapies could be used to inhibit fibrocyte recruitment into the lungs, such as CXCR4 blockade. We have very recently shown that using the CXCR4 antagonist, plerixafor, alleviates bronchial obstruction and reduces peri-bronchial fibrocytes density (Dupin et al., 2023). Because CXCR4 expression in human fibrocytes is dependent on mTOR signaling and is inhibited by rapamycin in vitro (Mehrad et al., 2009), alternative strategies consisting of targeting fibrocytes via mTOR have been proposed. This target has proven effective in bronchiolitis obliterans, idiopathic pulmonary fibrosis, and thyroid-associated ophthalmopathy, using rapamycin (Gillen et al., 2013; Mehrad et al., 2009), sirolimus (Manjarres et al., 2023) or an insulin-like growth factor-1 (IGF-I) receptor blocking antibody (Douglas et al., 2020; Smith et al., 2017). Inhibiting mTOR is also expected to have effects on CD8+ T cells, ranging from an immunostimulatory effect by activation of memory CD8+ T-cell formation, to an immunosuppressive effect by inhibition of T cell proliferation (Araki et al., 2010). Last, chemokine-receptor base therapies could also include strategies to inhibit the CD8+-induced fibrocyte chemotaxis, such as dual CXCR1-CXCR2 blockade. We were able to test this latter strategy in our mathematical model, see response to point 6 of reviewer 2.

      Immunotherapies directly targeting the interaction between fibrocytes and CD8+ T cells could also be considered, such as CD86 or CD54 blockade. The use of abatacept and belatacept, that interfere with T cell co-stimulation, is effective in patients with rheumatoid arthritis (Pombo-Suarez & Gomez-Reino, 2019) and in kidney-transplant recipients (Vincenti et al., 2016), respectively. Targeting the IGF-I receptor by teprotumumab in the context of thyroid-associated ophthalmopathy also improved disease outcomes, possibly by altering fibrocyte-T cell interactions (Bucala, 2022; Fernando et al., 2021).

      We also tested this CD86 and CD54 blocking strategy for COPD treatment by simulations, see response to point 6 of reviewer 2.

      However, such therapies should be used with caution as they may favour adverse events such as infections, particularly in the COPD population (Rozelle & Genovese, 2007). Additionally, the fibrocytes-lymphocytes interaction has recently been shown to promote anti-tumoral immunity via the PD1-PDL1 immunological synapse (Afroj et al., 2021; Mitsuhashi et al., 2023). Therefore, care should be taken in the selection of patients to be treated and/or timing of treatment administration with regards to the increased risk of lung cancer in COPD patients.

      The discussion section has been altered accordingly.

      4) The authors may want to consider mentioning (and citing) recent insight into the immune-mediated fibrosis in thyroid-associated ophthalmopathy

      These important publications are now cited in a dedicated paragraph about the possible therapeutical interventions (see answer to point 3, and discussion in the revised manuscript).

      Reviewer #2 (Recommendations For The Authors):

      Specific comments

      1) The rationale for the selection of chemokines overexpressed by CD8+ T cells in COPD is based on literature data of n=2 patients per group. This is limited and risky. I am less concerned about false positives given the selection of chemokines and the available literature but am worried about the possibility that many chemokines may not have been selected based on insufficient power to do meaningful stats on this comparison. For example, many other CXCR1/2 binding CXCL chemokines exist and these could contribute to the migration effect in Fig 2C as well. Given the currently available single-cell resources it should be possible to extend these observations and to investigate CXCL chemokine expression in COPD CD8 T cells to the benefit of Fig 2A in full detail.

      We agree with the reviewer that the rationale for the selection of chemokines of interest could be reinforced by the analysis of supplementary single-cell resources. We used data from the COPD cell atlas (Gene Expression Omnibus GSE136831 (Sauler et al., 2022)) to perform such an analysis of chemokine expression by CD8+ CD103+ and CD8+ CD103- T cells. However, the expression level of all chemokines was globally very low, and was not different between control and COPD patients (see Author response image 1).

      Author response image 1.

      Expression of CXC chemokines in lung CD8+ CD103+ and CD8+ CD103- T cells from patients with COPD (n=18 independent samples) in comparison with healthy control subjects (n=29 independent samples) under resting conditions by Single-Cell RNA sequencing analysis (GEO accession GSE136831). The heatmaps show the normalized expression of genes (horizontal axes) encoding CXC chemokines. PF4=CXCL4, PPBP= CXCL7.

      The latter results are in discrepancy with those resulting from transcriptomic analysis of microarray data obtained on purified lung CD8+ CD103+ and CD8+ CD103- T cells, showing a significant level of chemokines expression (Hombrink et al., 2016), and a differential expression of CCL2, CCL26, CXCL2, CXCL8 and CCL3L1 between CD8+ T lymphocytes of control and COPD patients (Figure 2A in the revised manuscript). The reason for these differences is unclear, and could be attributed to biological differences (samples obtained from different patients) or, more likely, to differences in sample processing (cell sorting by flow cytometry for microarray analysis, that could activate minimally CD8+ cells) and/or methodological differences (differences of sensitivity between microarray and scRNA seq).

      Nevertheless, microarray data regarding CXCL8 expression are in good agreement with our in vitro experiments, showing an enhanced CXCL8 expression by CD8+ T cells purified from COPD lungs, in comparison with that of control subjects. In addition, the CXCL8 blocking antibody fully abrogates the increase of migration induced by secretion of COPD CD8+ T cells, to the same extent as the blocking of CXCR1/2 by reparixin. This suggests that this supplementary chemotaxis is mainly due to CXCL8 and not other CXCR1/2 binding CXCL chemokines, and correlates CXCL8 measurements to functional experiments. This precision has been now added in the results section of the revised version.

      2) Equally, it would strengthen the work if multiplex ELISA assays could be provided on the supernatants used in Fig 2D to provide a more comprehensive view of CXCR1/2 binding chemokines.

      In order to have a complete view of CXCR1/2 binding chemokines, we have now performed supplementary ELISA assays to measure the concentrations of CXCL1, 3, 5, 6 and 7, in addition of the measurements of CXCL2 and CXCL8 already presented in the previous version of the manuscript (Figure 2D). Results of these new assays are now presented in the revised version of Figure 2. Concentrations of CXCL1, 3, 5, 6 and 7 were unchanged between the control and COPD conditions.

      3) In the functional analyses, I missed information on the activation of the fibrocytes. Equally, the focus on CD8 T cells was mainly on proliferation in the functional work. RNAseq analyses on the cells, comparing CD8 T cells and fibrocytes, alone and in co-culture to each other would help to identify interaction patterns in comprehensive detail. Such an experiment would bolster the significance of the studies by providing impact analysis not only on the T cells beyond proliferation but by expanding on the effect of the interaction on the fibrocyte as well.

      Regarding the activation state of fibrocytes, we apologize if this was not clear: in our in vitro co-culture experiments, we chose not to activate the fibrocytes. This setting is in agreement with previous findings, demonstrating an antigen-independent T cell proliferation effect driven by fibrocytes (Nemzek et al., 2013), and it is now explicitly written in the results of the revised manuscript.

      Regarding the focus of the functional analyses:

      First, we have pushed forward the analysis of the consequences of the interaction beyond CD8+ T cells proliferation. In particular, having shown that fibrocytes promote CD8+ T cells expression of cytotoxic molecules such as granzyme B, we decided to investigate the cytotoxic capacity of CD8+ T cells against primary basal bronchial epithelial cells (see new Supplementary File 9 in the revised manuscript for patient characteristics).

      Direct co-culture with fibrocytes increased total and membrane expression of the cytotoxic degranulation marker CD107a, which was only significant in non-activated CD8+ T cells (see new Figure 6A-E in the revised manuscript). A parallel increase of cytotoxicity against primary epithelial cells was observed in the same condition (see new Figure 6F-H in the revised manuscript). This demonstrates that following direct interaction with fibrocytes, CD8+ T cells have the ability to kill target cells such as bronchial epithelial cells. This is now included in the results section of the revised manuscript.

      Second, we have now performed proteomic analyses on fibrocytes, alone or in co-culture during 6 days with CD8+ T cells either non-activated or activated (see new Figure 7A in the revised manuscript). Of the top ten pathways that were most significantly activated in co-cultured vs mono-cultured fibrocytes, largest upregulated genes were those of the dendritic cell maturation box, the multiple sclerosis signaling pathway, the neuroinflammation signaling pathway and the macrophage classical signaling pathway, irrespective of the activation state of CD8+ T cells (see new Figure 7B in the revised manuscript). The changes were globally identical in the two conditions of CD8+ T cell activation, with some upregulation more pronounced in the activated condition. They were mostly driven by up-regulation of a core set of Major Histocompatibility Complex class I (HLA-B, C, F) and II (HLA-DMB, DPA1, DPB1, DRA, DRB1, DRB3) molecules, co-simulatory and adhesion molecules (CD40, CD86 and CD54). Another notable proteomic signature was that of increased expression of IFN signaling-mediators IKBE and STAT1, and the IFN-responsive genes GBP2, GBP4 and RNF213. We also observed a strong downregulation of CD14, suggesting fibrocyte differentiation, and an upregulation of the matrix metalloproteinase-9 (MMP9) in the non-activated condition only. Altogether, these changes suggest that the interaction between CD8+ T cells and fibrocytes promotes the development of fibrocyte immune properties, which could subsequently impact the activation of CD4+ T cells activation.

      Up-regulated pathways identified in proteomic profile of fibrocytes co-cultured with CD8+ T cells are very consistent with a shift towards a proinflammatory phenotype rather than towards a reparative role. The activation of IFN-γ signaling could be triggered by CD8+ T cell secretion of IFN upon fibrocyte interaction, suggesting the existence of a positive feedback loop (see new Figure 10). Additionally, the priming of fibrocytes by CD8+ T cells could also induce CD4+ T cell activation.

      4) I suggest rewording the abstract to capture the main storyline and wording more. The abstract is good, but I see so many novelties in the paper that are not well sold in the abstract, particularly the modelling aspects.

      As suggested by the reviewer, we revised the abstract, as shown below and in the revised manuscript. The changes are indicated in red:

      Revised abstract:

      Bronchi of chronic obstructive pulmonary disease (COPD) are the site of extensive cell infiltration, allowing persistent contacts between resident cells and immune cells. Tissue fibrocytes interaction with CD8+ T cells and its consequences were investigated using a combination of in situ, in vitro experiments and mathematical modeling. We show that fibrocytes and CD8+ T cells are found in vicinity in distal airways and that potential interactions are more frequent in tissues from COPD patients compared to those of control subjects. Increased proximity and clusterization between CD8+ T cells and fibrocytes are associated with altered lung function. Tissular CD8+ T cells from COPD patients promote fibrocyte chemotaxis via the CXCL8-CXCR1/2 axis. Live imaging shows that CD8+ T cells establish short-term interactions with fibrocytes, that trigger CD8+ T cell proliferation in a CD54- and CD86-dependent manner, pro-inflammatory cytokines production, CD8+ T cell cytotoxic activity against bronchial epithelial cells and fibrocyte immunomodulatory properties. We defined a computational model describing these intercellular interactions and calibrated the parameters based on our experimental measurements. We show the model’s ability to reproduce histological ex vivo characteristics, and observe an important contribution of fibrocyte-mediated CD8+ T cell proliferation in COPD development. Using the model to test therapeutic scenarios, we predict a recovery time of several years, and the failure of targeting chemotaxis or interacting processes. Altogether, our study reveals that local interactions between fibrocytes and CD8+ T cells could jeopardize the balance between protective immunity and chronic inflammation in bronchi of COPD patients.

      5) The probabilistic model appears to suggest that reduced CD8 T cell death may also explain the increase in the pathology in COPD. Did the authors find that fibrocytes reduce cell death of the CD8 T cells?

      Taking advantage of the staining of CD8+ T cells with the death marker Zombie NIR™, we have quantified CD8+ T cell death in our co-culture assay. The presence of fibrocytes in the indirect co-culture assay did not affect CD8+ T cell death (see new Figure 3-figure supplement 3A-B in the revised manuscript). In direct co-culture, the death of CD8+ T cells was significantly increased in the non-activated condition but not in the activated condition (see new Figure 3-figure supplement 3C-D in the revised manuscript). Of note, these results are in agreement with a recent study showing the existence of CD8+ T cell-population-intrinsic mechanisms regulating cellular behavior, with induction of apoptosis to avoid an excessive increase in T cell population (Zenke et al., 2020). This is taken into account in our mathematical model by an increased probability p_(dC+) of dying when a CD8+ T cell is surrounded by many other T cells in its neighborhood. It also suggests that the reduced CD8+ T cell death evidenced in tissues from patients with COPD (Siena et al., 2011) might not be due to the specific interplay between fibrocyte and CD8+ T cells, but rather to a global pro-survival environment in COPD lungs.

      These new data have been described in the results section.

      6) Following the modeling in Figure 6, curiosity came to mind, which is how long it would take for the pathology to disappear if a drug would be applied to the patient. How much should the interactions be reduced and how long would it take to reach clinical benefit? Could such predictions be made? I understand that this may be outside the main message of the manuscript but perhaps this could be included in the discussion.

      This is a very interesting question, that we have addressed by performing additional simulations to investigate the outcomes of possible therapeutic interventions. First, we applied a COPD dynamics during 20 years, to generate the COPD state, that provide the basis for treatment implementation. Then, we applied a COPD dynamic during 7 years, that mimics the placebo condition (see new Figure 9A in the revised manuscript, and below), that we compared to a control dynamics (“Total inhibition”), that mimics an ideal treatment able to restore all cellular processes. As expected the populations of fibrocytes and CD8+ T cells, as well as the density of mixed clusters, decreased. These numbers reached levels similar of healthy subjects after approximately 2.5 years, and this time point can therefore be considered as the steady state (Figure 9B-E).

      Monitoring of the different processes revealed that these effects were mainly due to a reduction in fibrocyte-induced CD8+ T duplication, and a transient or more prolonged increase in basal fibrocyte and CD8+ T death (Figure 9C-D).

      Then, three possible realistic treatments were considered (Figure 9A). We tested the effect of directly inhibiting the interaction between fibrocytes and CD8+ T cells by blocking CD54. This was implemented in the model by altering the increased probability of a CD8+ T cell to divide when a fibrocyte is in its neighbourhood, as shown by the co-culture results (Figure 4). We also chose to reflect the effect of a dual CXCR1/2 inhibition by setting the displacement function of fibrocyte similar to that of control dynamics, in agreement with the in vitro experiments (Figure 2E). Blocking CD54 only slightly reduced the density of CD8+ T cells compared to the placebo condition, and had no effect on fibrocyte and mixed cluster densities (Figure 9B). CXCR1/2 inhibition was a little bit more potent on the reduction of CD8+ T cells than CD54 inhibition, and it also significantly decreased the density of mixed clusters (Figure 9B). As expected, this occurred through a reduction of fibrocyte-induced duplication, which was affected more strongly by CXCR1/2 blockage than by CD54 blockage (Figure 9C-E). Combining both therapies (CD54 and CXCR1/2 inhibition) did not strongly major the effects (Figure 9B-E). In all the conditions tested, the size of the fibrocyte population remained unchanged, suggesting that other processes such as fibrocyte death or infiltration should be targeted to expect broader effects.

      The results section has been altered accordingly.

      Using the simulations, we were also able to estimate the characteristic time to reach a stationary state reminiscent of a resolution of the COPD condition. This time of approximately 2.5 years was totally unpredictable by in vitro experiments, and indicates that a treatment aiming at restoring these cellular processes should be continued during several years to obtain significant changes.

      We have also investigated the outcomes of more realistic treatments, modifying specifically processes such as chemotaxis or targeting directly the intercellular interactions. The modification of parameters controlling these processes only slightly affected the final state, suggesting that such treatments may be more effective when used in combination with other drugs e.g. those affecting fibrocyte infiltration and/or death.

      The discussion section has been altered accordingly.

      Reviewer #3 (Recommendations For The Authors):

      1) Broader assessment of cell types in the lung: Staining for other cell types such as dendritic cells, CD4 cells, and interstitial macrophages, and comparing their proximity to fibrocytes with that of CD8 cells would better justify the CD8 focus.

      We agree with the reviewer that multiple stainings would have better justified the focus on CD8+ T cells. However, it is difficult to distinguish fibrocytes, dendritic cells and interstitial macrophages on the basis of immunohistochemistry, as we and others previously showed (Dupin et al., 2019; Mitsuhashi et al., 2015; Pilling et al., 2009). On the other hand, the study of Afroj et al. indicated the possible interaction between fibrocytes and CD8+ T cells in cancer context, with the induction of CD8+ T cell proliferation (Afroj et al., 2021). This T cell-costimulatory function of fibrocytes and CD8+ T cells was further confirmed in a very recent study, together with the antitumor effects of PD-L1 and VEGF blockade (Mitsuhashi et al., 2023). These data, along with the specific implication on CD8+ T cells in COPD, relying mainly on their abundance in COPD bronchi (O’Shaughnessy et al., 1997), their overactivation state (Roos-Engstrand et al., 2009), their cytotoxic phenotype (Freeman et al., 2010; Wang et al., 2020) and the protection against lung inflammation and emphysema induced by their depletion (Maeno et al., 2007) justified the CD8 focus.

      To further justify this focus, we have now performed co-culture between fibrocytes and CD4+ T cells, indicating that the massive fibrocyte-mediated proliferation was specific to CD8+ T cells (see answer to comment 3 below). This is in agreement with the results obtained with the simulations, showing that considering fibrocytes and CD8+ T cells only was sufficient to reproduce the spatial patterns in the bronchi of healthy and COPD patients. Altogether, we think that focusing on the CD8+ T cell-fibrocyte interplay was pertinent in the context of COPD. It does obviously not exclude the possibility of other interactions, that could be the focus of other studies.

      2) Transcriptomic analysis: Using n=2 and only showing the chemokines as well as selected adhesion receptor data narrows the focus but does not provide broader insights into the interactions. Using a more robust sample size and performing a comprehensive pathway analysis would represent an unbiased analysis to determine the most dysregulated pathways. Importantly, the authors could use a single-cell RNA-seq dataset to broadly assess the transcriptomes of several cell types in the lung (such as the data from (Sauler et al, Characterization of the COPD alveolar niche using single-cell RNA sequencing).

      This very pertinent suggestion has also been raised by reviewer 2, see our answer to comment 1 of reviewer 2, and below:

      We agree with the reviewer that the rationale for the selection of chemokines of interest could be reinforced by the analysis of supplementary single-cell resources. We used data from the COPD cell atlas (Gene Expression Omnibus GSE136831 (Sauler et al., 2022)) to perform such an analysis of chemokine expression by CD8+ CD103+ and CD8+ CD103- T cells. However, the expression level of all chemokines was globally very low, and was not different between control and COPD patients (see Figure scRNAseq, in the answer to comment 1 of reviewer 2).

      These latter results are in discrepancy with those resulting from transcriptomic analysis of microarray data obtained on purified lung CD8+ CD103+ and CD8+ CD103- T cells, showing a significant level of chemokines expression (Hombrink et al., 2016), and a differential expression of CCL2, CCL26, CXCL2, CXCL8 and CCL3L1 between CD8+ T lymphocytes of control and COPD patients (Figure 2A in the revised manuscript). The reason for these differences is unclear, and could be attributed to biological differences (samples obtained from different patients) or, more likely, to differences in sample processing (cell sorting by flow cytometry for microarray analysis, that could activate minimally CD8+ cells) and/or methodological differences (differences of sensitivity between microarray and scRNA seq).

      Nevertheless, microarray data regarding CXCL8 expression are in good agreement with our in vitro experiments, showing an enhanced CXCL8 expression by CD8+ T cells purified from COPD lungs, in comparison with that of control subjects. In addition, the CXCL8 blocking antibody fully abrogates the increase of migration induced by secretion of COPD CD8+ T cells, to the same extent as the blocking of CXCR1/2 by reparixin. This suggests that this supplementary chemotaxis is mainly due to CXCL8 and not other CXCR1/2 binding CXCL chemokines, and correlates CXCL8 measurements to functional experiments. This precision has been now added in the text of the revised version.

      3) Inclusion of control/comparison cell types in co-culture studies would help establish that CD8 cells are more relevant for interactions with fibrocytes than for example CD4 cells.

      We have now performed co-cultures between fibrocytes and CD4+ T cells, with the same settings than for CD8+ T cells. The results from these experiments show that fibrocytes did not have any significant effect of CD4+ T cells death, regardless of their activation state (see new Figure 3-figure supplement 2A-C in the revised manuscript, and below). Fibrocytes were able to promote CD4+ T cells proliferation in the activated condition but not in the non-activated condition (see new Figure 3-figure supplement 2A-D in the revised manuscript). Altogether this indicates that although fibrocyte-mediated effect on proliferation is not specific to CD8+ T cells, the amplitude of the effect is much larger on CD8+ T cells than on CD4+ T cells.

      These new data have been added in the results section.

      4) In vitro analysis of cells from non-COPD patients would also help assess whether the circulating cells from COPD patients have a level of baseline activation which promotes the vicious cycle but may not exist in healthy cells.

      Regarding circulating cells, the present study relies on the COBRA cohort (COhort of BRonchial obstruction and Asthma), which includes only asthma and COPD patients, and therefore does not grant access to healthy subjects’ blood samples (Pretolani et al., 2017). Unfortunately, we have no other ongoing study with healthy subjects that would allow us to retrieve blood for research, and fibrocytes can only be grown from freshly drawn blood samples. We agree with the reviewer that it is a limitation of our study, which is now acknowledged at the end of the discussion section.  

      References

      Afroj, T., Mitsuhashi, A., Ogino, H., Saijo, A., Otsuka, K., Yoneda, H., Tobiume, M., Nguyen, N. T., Goto, H., Koyama, K., Sugimoto, M., Kondoh, O., Nokihara, H., & Nishioka, Y. (2021). Blockade of PD-1/PD-L1 Pathway Enhances the Antigen-Presenting Capacity of Fibrocytes. The Journal of Immunology, 206(6), 1204‑1214. https://doi.org/10.4049/jimmunol.2000909

      Araki, K., Youngblood, B., & Ahmed, R. (2010). The role of mTOR in memory CD8+ T-cell differentiation. Immunological reviews, 235(1), 234‑243. https://doi.org/10.1111/j.0105-2896.2010.00898.x

      Bucala, R. J. (2022). Targeting fibrocytes in autoimmunity. Proceedings of the National Academy of Sciences, 119(5), e2121739119. https://doi.org/10.1073/pnas.2121739119

      Douglas, R. S., Kahaly, G. J., Patel, A., Sile, S., Thompson, E. H. Z., Perdok, R., Fleming, J. C., Fowler, B. T., Marcocci, C., Marinò, M., Antonelli, A., Dailey, R., Harris, G. J., Eckstein, A., Schiffman, J., Tang, R., Nelson, C., Salvi, M., Wester, S., … Smith, T. J. (2020). Teprotumumab for the Treatment of Active Thyroid Eye Disease. The New England Journal of Medicine, 382(4), 341‑352. https://doi.org/10.1056/NEJMoa1910434

      Dupin, I., Henrot, P., Maurat, E., Abohalaka, R., Chaigne, S., Hamrani, D. E., Eyraud, E., Prevel, R., Esteves, P., Campagnac, M., Dubreuil, M., Cardouat, G., Bouchet, C., Ousova, O., Dupuy, J.-W., Trian, T., Thumerel, M., Begueret, H., Girodet, P.-O., … Berger, P. (2023). CXCR4 blockade alleviates pulmonary and cardiac outcomes in early COPD (p. 2023.03.10.529743). bioRxiv. https://doi.org/10.1101/2023.03.10.529743

      Dupin, I., Thumerel, M., Maurat, E., Coste, F., Eyraud, E., Begueret, H., Trian, T., Montaudon, M., Marthan, R., Girodet, P.-O., & Berger, P. (2019). Fibrocyte accumulation in the airway walls of COPD patients. The European Respiratory Journal, 54(3), Article 3. https://doi.org/10.1183/13993003.02173-2018

      Fernando, R., Caldera, O., & Smith, T. J. (2021). Therapeutic IGF-I receptor inhibition alters fibrocyte immune phenotype in thyroid-associated ophthalmopathy. Proceedings of the National Academy of Sciences, 118(52), e2114244118. https://doi.org/10.1073/pnas.2114244118

      Freeman, C. M., Han, M. K., Martinez, F. J., Murray, S., Liu, L. X., Chensue, S. W., Polak, T. J., Sonstein, J., Todt, J. C., Ames, T. M., Arenberg, D. A., Meldrum, C. A., Getty, C., McCloskey, L., & Curtis, J. L. (2010). Cytotoxic potential of lung CD8+ T cells increases with COPD severity and with in vitro stimulation by IL-18 or IL-15. Journal of immunology (Baltimore, Md. : 1950), 184(11), 6504‑6513. https://doi.org/10.4049/jimmunol.1000006

      Gillen, J. R., Zhao, Y., Harris, D. A., LaPar, D. J., Stone, M. L., Fernandez, L. G., Kron, I. L., & Lau, C. L. (2013). Rapamycin Blocks Fibrocyte Migration and Attenuates Bronchiolitis Obliterans in a Murine Model. The Annals of thoracic surgery, 95(5), 1768‑1775. https://doi.org/10.1016/j.athoracsur.2013.02.021

      Hombrink, P., Helbig, C., Backer, R. A., Piet, B., Oja, A. E., Stark, R., Brasser, G., Jongejan, A., Jonkers, R. E., Nota, B., Basak, O., Clevers, H. C., Moerland, P. D., Amsen, D., & van Lier, R. A. W. (2016). Programs for the persistence, vigilance and control of human CD8+ lung-resident memory T cells. Nature Immunology, 17(12), Article 12. https://doi.org/10.1038/ni.3589

      Maeno, T., Houghton, A. M., Quintero, P. A., Grumelli, S., Owen, C. A., & Shapiro, S. D. (2007). CD8+ T Cells are required for inflammation and destruction in cigarette smoke-induced emphysema in mice. Journal of Immunology (Baltimore, Md.: 1950), 178(12), 8090‑8096. https://doi.org/10.4049/jimmunol.178.12.8090

      Manjarres, D. C. G., Axell-House, D. B., Patel, D. C., Odackal, J., Yu, V., Burdick, M. D., & Mehrad, B. (2023). Sirolimus suppresses circulating fibrocytes in idiopathic pulmonary fibrosis in a randomized controlled crossover trial. JCI Insight. https://doi.org/10.1172/jci.insight.166901

      Mehrad, B., Burdick, M. D., & Strieter, R. M. (2009). Fibrocyte CXCR4 regulation as a therapeutic target in pulmonary fibrosis. The International Journal of Biochemistry & Cell Biology, 41(8‑9), 1708‑1718. https://doi.org/10.1016/j.biocel.2009.02.020

      Mitsuhashi, A., Goto, H., Saijo, A., Trung, V. T., Aono, Y., Ogino, H., Kuramoto, T., Tabata, S., Uehara, H., Izumi, K., Yoshida, M., Kobayashi, H., Takahashi, H., Gotoh, M., Kakiuchi, S., Hanibuchi, M., Yano, S., Yokomise, H., Sakiyama, S., & Nishioka, Y. (2015). Fibrocyte-like cells mediate acquired resistance to anti-angiogenic therapy with bevacizumab. Nature Communications, 6(1), Article 1. https://doi.org/10.1038/ncomms9792

      Mitsuhashi, A., Koyama, K., Ogino, H., Afroj, T., Nguyen, N. T., Yoneda, H., Otsuka, K., Sugimoto, M., Kondoh, O., Nokihara, H., Hanibuchi, M., Takizawa, H., Shinohara, T., & Nishioka, Y. (2023). Identification of fibrocyte cluster in tumors reveals the role in antitumor immunity by PD-L1 blockade. Cell Reports, 112162. https://doi.org/10.1016/j.celrep.2023.112162

      Nemzek, J. A., Fry, C., & Moore, B. B. (2013). Adoptive transfer of fibrocytes enhances splenic T-cell numbers and survival in septic peritonitis. Shock (Augusta, Ga.), 40(2), 106‑114. https://doi.org/10.1097/SHK.0b013e31829c3c68

      O’Shaughnessy, T. C., Ansari, T. W., Barnes, N. C., & Jeffery, P. K. (1997). Inflammation in bronchial biopsies of subjects with chronic bronchitis : Inverse relationship of CD8+ T lymphocytes with FEV1. American Journal of Respiratory and Critical Care Medicine, 155(3), 852‑857. https://doi.org/10.1164/ajrccm.155.3.9117016

      Pilling, D., Fan, T., Huang, D., Kaul, B., & Gomer, R. H. (2009). Identification of markers that distinguish monocyte-derived fibrocytes from monocytes, macrophages, and fibroblasts. PloS One, 4(10), e7475. https://doi.org/10.1371/journal.pone.0007475

      Pombo-Suarez, M., & Gomez-Reino, J. J. (2019). Abatacept for the treatment of rheumatoid arthritis. Expert Review of Clinical Immunology, 15(4), 319‑326. https://doi.org/10.1080/1744666X.2019.1579642

      Pretolani, M., Soussan, D., Poirier, I., Thabut, G., Aubier, M., COBRA Study Group, & COBRA cohort Study Group. (2017). Clinical and biological characteristics of the French COBRA cohort of adult subjects with asthma. The European Respiratory Journal, 50(2), 1700019. https://doi.org/10.1183/13993003.00019-2017

      Roos-Engstrand, E., Ekstrand-Hammarström, B., Pourazar, J., Behndig, A. F., Bucht, A., & Blomberg, A. (2009). Influence of smoking cessation on airway T lymphocyte subsets in COPD. COPD, 6(2), 112‑120. https://doi.org/10.1080/15412550902755358

      Rozelle, A. L., & Genovese, M. C. (2007). Efficacy results from pivotal clinical trials with abatacept. Clinical and Experimental Rheumatology, 25(5 Suppl 46), S30-34.

      Sauler, M., McDonough, J. E., Adams, T. S., Kothapalli, N., Barnthaler, T., Werder, R. B., Schupp, J. C., Nouws, J., Robertson, M. J., Coarfa, C., Yang, T., Chioccioli, M., Omote, N., Cosme, C., Poli, S., Ayaub, E. A., Chu, S. G., Jensen, K. H., Gomez, J. L., … Rosas, I. O. (2022). Characterization of the COPD alveolar niche using single-cell RNA sequencing. Nature Communications, 13(1), Article 1. https://doi.org/10.1038/s41467-022-28062-9

      Siena, L., Gjomarkaj, M., Elliot, J., Pace, E., Bruno, A., Baraldo, S., Saetta, M., Bonsignore, M. R., & James, A. (2011). Reduced apoptosis of CD8+ T-lymphocytes in the airways of smokers with mild/moderate COPD. Respiratory Medicine, 105(10), 1491‑1500. https://doi.org/10.1016/j.rmed.2011.04.014

      Smith, T. J., Kahaly, G. J., Ezra, D. G., Fleming, J. C., Dailey, R. A., Tang, R. A., Harris, G. J., Antonelli, A., Salvi, M., Goldberg, R. A., Gigantelli, J. W., Couch, S. M., Shriver, E. M., Hayek, B. R., Hink, E. M., Woodward, R. M., Gabriel, K., Magni, G., & Douglas, R. S. (2017). Teprotumumab for Thyroid-Associated Ophthalmopathy. The New England Journal of Medicine, 376(18), 1748‑1761. https://doi.org/10.1056/NEJMoa1614949

      Vincenti, F., Rostaing, L., Grinyo, J., Rice, K., Steinberg, S., Gaite, L., Moal, M.-C., Mondragon-Ramirez, G. A., Kothari, J., Polinsky, M. S., Meier-Kriesche, H.-U., Munier, S., & Larsen, C. P. (2016). Belatacept and Long-Term Outcomes in Kidney Transplantation. The New England Journal of Medicine, 374(4), 333‑343. https://doi.org/10.1056/NEJMoa1506027

      Wang, X., Zhang, D., Higham, A., Wolosianka, S., Gai, X., Zhou, L., Petersen, H., Pinto-Plata, V., Divo, M., Silverman, E. K., Celli, B., Singh, D., Sun, Y., & Owen, C. A. (2020). ADAM15 expression is increased in lung CD8+ T cells, macrophages, and bronchial epithelial cells in patients with COPD and is inversely related to airflow obstruction. Respiratory Research, 21(1), 188. https://doi.org/10.1186/s12931-020-01446-5

      Zenke, S., Palm, M. M., Braun, J., Gavrilov, A., Meiser, P., Böttcher, J. P., Beyersdorf, N., Ehl, S., Gerard, A., Lämmermann, T., Schumacher, T. N., Beltman, J. B., & Rohr, J. C. (2020). Quorum Regulation via Nested Antagonistic Feedback Circuits Mediated by the Receptors CD28 and CTLA-4 Confers Robustness to T Cell Population Dynamics. Immunity, 52(2), 313-327.e7. https://doi.org/10.1016/j.immuni.2020.01.018

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their comments and insights, we feel the manuscript is now greatly improved. Please find below our answers to the reviewer’s queries

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript by Niccoli et al. describes the identification of a novel modifier of C9orf72-derived toxicity based on the manipulation of the brain metabolic pathways. The premise for this work is supported by strong literature describing the aberrant glucose metabolism in FTD, AD and other degenerative disorders. The idea tested here is whether increasing the import of pyruvate produced in glia into neurons. They test three different types of importers and find that one of them, Bumpel, the orthologue of human SLC5A12, suppresses toxicity and reduces the accumulation of arginine-containing repeats, GP and PR. The authors investigate several potential mechanisms mediating this reduction of toxic DPRs, but do not find strong evidence linking pyruvate import and increase autophagy or mitochondria metabolism.

      Overall, this is an interesting discovery based on a candidate approach that shows the power of Drosophila to efficiently identify novel mediators of neurodegeneration. The article is well written, although more detailed explanations of some experiments would be helpful. The weaknesses of the manuscript are the lack of a clear mechanism mediating the protective activity of pyruvate, the incomplete experiments lacking relevant controls, and the presentation of western blots.

      Specific comments:

      1. The reduced levels of DPRs require that the expression of C9 mRNA or the GR and PR constructs is examined by qPCR. In figure 3E, GP is not even detectable_

      We agree with the reviewer, ideally we would have measured the RNA by qPCR. However, the C9 repeats and the DPR constructs are highly repetitive, it is therefore impossible to do a qPCR for them. The upstream and downstream sequence is identical for the C9 and the bumpel constructs, there isn’t, to our knowledge any unique sequence we can use to measure levels of expression in the presence of bumpel.

      We did run a GFP control (Fig 2D) and did not see any difference and we have now carried out a qPCR for Gal4-GeneSwitch (Fig S3) to show that the levels of the driver do not change.

      1. I wonder if there are constructs available to silence Bumpel or overexpress the human orthologues of bumpel. These would be nice controls for the effects observed with the Bumpel overexpression

      This would be an extremely interesting experiment, however bumpel is normally only expressed in glia, therefore we can’t down-regulated it in glia whilst upregulating 36R in neurons, as we are limited to one driver (since everything is driven by the Gal4/UAS system). Expression of C9 in glia does not have a clear phenotype (our observation), so we can’t drive both in glia. We tried over-expressing the human homologue SLC5A12 , but it did not rescue the C9 phenotype (data not shown), possibly because it requires (like other human SLC5A type transporters) PDZK1 as extra co-factor (Srivastava S. et al, 2019), and this is not present in flies.

      1. The argument about bumpel modulating autophagy downstream of Atg1 is not supported by the experimental data

      We now have imaging data showing that bumpel modulates the formation of lysosomes, downstream of Atg1 (Fig 5). We also show that bumpel and Atg1 can act synergistically, leading to a much stronger rescue of C9 expression (See Fig 5I.), which also suggests that the two are acting at different points in the same pathway. We also show that bumpel rescues the downregulation of TFEB targets (Fig 5J)

      1. Western blots throughout show no control lanes and in several occasions are created with cutout bands. The standard for this type of experiments should be more stringent, with entire gels showing all experimental conditions, which requires consistent methods and results vs selecting the best bands from different gels.

      We apologise if this was mis-understood, the lanes shows are all from the same blot, where other samples were run too, and it would be confusing for the reader to include them. We have re-run samples where we had remaining sample from our quantifications, so that the lanes are now contiguous and we provide original blot images in the supplemental information for those we could not re-run. The control for all experiments are the C9 expressing line without bumpel, and this is always present, if the reviewer means we are missing -RU controls, these do not produce any DPRs so are not included in western blot or ELISA quantifications as the signal is not above back-ground.

      1. For figures 2B and 5C, please, show representative WBs

      These are ELISA quantifications, not western blots, we choose to run these when possible, as they are more quantitative.

      1. Figure 5D describes the survival curve as significantly rescued. Statistical tests can indicate differences, but that is in no way convincing. The test may show the curves are different, but the abeta Atg1 flies also seem to start falling early, so an argument could be made in both directions, as a suppressor or an enhancer.

      We agree the rescue is not strong enough, we have now removed this lifespan.

      1. It is unclear why several results are placed in the supplemental materials. In general, all this material seems highly relevant and related to what is shown in the main figures

      We are happy to include them in the main manuscript if this would help the reader, and we have now placed all mitochondrial data in Fig 4.

      Minor comments:

      Please, define several abbreviations throughout

      We apologise for this over-sight, we have now does this.

      A couple of sections could be improved by carefully sequencing human vs Drosophila background to advance the argument rather than going in circles. There is also a section on mitophagy in between two sections related to autophagy that could be sequenced better.

      We have re-structured the sections, we think this has improved the flow.

      There is a sentence at the end of page 6 that seems misplaced

      We apologise for the over-sight, and we have removed this

      Reviewer #1 (Significance):

      Overall, this is an interesting discovery based on a candidate approach that shows the power of Drosophila to efficiently identify novel mediators of neurodegeneration. The article is well written, although more detailed explanations of some experiments would be helpful. The weaknesses of the manuscript are the lack of a clear mechanism mediating the protective activity of pyruvate, the incomplete experiments lacking relevant controls, and the presentation of western blots.

      We thank the reviewer for the helpful comments, we have added some details in the methods section, we apologise for not having made it clear that the westerns were all derived from the same blot (we have now placed the originals in the supplemental materials). Regarding mechanism, we now show that bumpel over-expression increases clearance of late stage autolysosomes, possibly by increasing transcription of TFEB target lysosomal genes.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:<br /> Project investigates the role in dementias of glial glucose uptake, conversion to lactate and shuttling via transporters to neurons to produce pyruvate to fuel TCA cycle production of ATG. The experiments are conducted in Drosophila melanogaster, which have become a powerful model system for understanding neurodegeneration mechanisms associated with ALS/FTD associated C9orf72 pathology. Bumple misexpression is shown to rescue early death phenotype in flies expressing a C9orf72 expansion and flies expressing arginine containing di-peptide repeat proteins. The report describes novel insight into the function of bumpel, demonstrating that this conserved orthologue of human SLC14A functions as a sodium exchange transporter for monocarboxylates pyruvate and lactate. These findings conclude that increased neuronal pyruvate, but not its metabolites, rescues C9orf72 associated pathology.<br /> The authors next set out to describe the mechanism by which increase pyruvate rescues survival in C9orf72 expressing flies. Levels of autolysosomes were increased in C9orf72 expressing flies, and stimulation of autophagy by overexpression of atg1 shown to decrease levels of DPRs (though not to same extent as bumple expression). Expression of bumple in C9orf72 flies led to a modest increase in LC3-II, indicating increased autophagy. Co-overexpression of bumple and atg1 did not have an additive effect, suggesting bumple activates autophagy downstream or independent of atg1 activity. Finally the author extend their findings to amyloid models, suggest a common protective mechanism for elevating neuronal pyruvate levels in neurodegenerative disease.

      Major comments

      Prior data suggests that bumpel is expressed in glia (for example Yildirim et al 2022). In their study the authors do not present any data to demonstrate that the transporter is normally expressed in neurons in flies. This calls into questions the physiological relevance of their findings, that neuronal upregulation of bumpel is protective against C9orf72 associated pathology in neurons, from which it is reasonable for a reader to conclude that bumpel may be a neuronal target for therapeutic intervention. However, the report well demonstrates that regardless of whether the transporter in native to neurons, the increase in monocarboxylates it facilitates is projective against C9orf72 pathology and thus the overall conclusion of the project is supported by experimental evidence. The point of upregulation of a natively expressed gene versus misexpression of a glial enriched transporter should be considered in a bit more detail in the discussion text. The authors may consider speculating the identify of members of the sodium coupled monocarboxylate transporters that are enriched in neurons. Are any of the bumple human orthologues expressed in neurons?_

      We thank the reviewer for this comment and suggestion. The reviewer correctly points out that we do not show whether there is a defect in pyruvate import in C9 expressing flies. We could not identify a validated sodium coupled pyruvate transporter in flies with a strong neuronal expression, we have added a comment in the discussion about this. There are a number of human homologues, some, such as SLC5A8, are expressed in neurons, thus providing a possible therapeutic target. We have added a sentence to this regard in the discussion.

      [_OPTIONAL] cDNA overexpression of neuron specific sodium coupled monocarboxylate transporters in C9orf72 fly models would strengthen the conclusion their physiological relevance for ALS/FTD. Fly lines for these are not available in repositories, but could be generated and tested at reasonable cost (<£700, ~3 month duration).

      This would be an ideal experiment, however, we could not find a neuronal sodium coupled transporter which is known to import monocarboxylates. There are a number of sodium coupled neuronal transporters, but they are mostly homologous to SLC5A6, which is a glucose coupled transporter. Going forward, we will screen a number of transporters to identify if there are any which import pyruvate.

      The role of bumple expression in survival (Figure 1) could be a technical artifact due to dilution of Gal4 between C9orf72 and bumple-ORF transgenes. No expression control is shown (for example GFP, LacZ etc). This theory is unlikely as no improvement in survival was seen for the SLC14A class of transporters which have a matching site directed transgene insertion. For clarity this point relating to controls should be commented on in the text.

      The reviewer is correct, there could be a dilution of the Gal4. We don’t like using GFP as a control as we have often seen a worsening when expressing other highly stable proteins at high levels. We have generated an “empty” flyORF line (generated by injecting the empty plasmid into the identical attP site), and used it as a control to check for dilution effects, bumpel still rescued relative to this control, we now include this is the supplementary (Fig S1B).

      Reduced Mito-GFP levels are used to support a role for bumple in increasing mitophagy. As mito-GFP is a marker for mitochondria but not specifically mitophagy, an alternative explanation for decreased levels could be reduced mitochondria biogenesis. The text should be amended to clarify this point.<br /> The role of Pink1 RNAi in modifying mitophagy is a bit overstated. Whilst Pink1 is involved in stress associated mitophagy, its role in basal mitochondria turnover is less well defined. Text should be adapted.

      We have added qualifying statements regarding the possibility of reduced mitochondrial biogenesis, and the fact that Pink1’s role in basal mitophagy is not very clear. The use of the mitophagy inducer drug, Kaempferol, however, suggests that mitophagy is unlikely to be a cause of the DRP reduction.

      Minor comments

      Introduction well describes current state of C9orf72 fly models. Introduction would benefit from a few comparable lines for AD models. The first paragraph of reports may also be better placed in the introduction._

      We thank the reviewer for the suggestion, and have added a more in depth introduction to Aß and have moved the first paragraph of the results section to the introduction

      Figure 1 presents survival for three SLC16A transporters and bumple. The C9 control curve appears to be consistent between charts, likely indicating the same control used across experiments, rather than independent controls for each chart. The authors should considered showing either all SLC16A and bumple data on a single chart, or clarify in the figure legend that a common control dataset is used. GFP control is used in later experiments (Figure 2).

      We have now indicated that the SLC16A transporters were run together in the figure legend.

      Choice of amyloid model needs a line of explanation, particularly with regard to extra/intracellular deposition of amyloid in this model.

      We have now added a few sentences describing this when the model is introduced

      Fruit Fly Injection method section needs a bit more detail to describe site of injection (head, body etc). This is not clear in the result section either.

      We have now added this, the injection was done in the abdomen.

      How were bumple orthologues identified? What degree of conservation (sequence homology etc?)

      The bumpel orthologues are those identified as most similar by flybase. We have now added the degree of conservation in the text

      The speculative mechanism for C9 pathology modification involves interaction of neurons and glia, monocarboxylate transporters and changes in autophagy activity. For clarity a diagram showing the model may be a helpful addition.

      We have now added a diagram explaining how we think the rescue is achieved

      Typos:<br /> Figure 1 Legend - "p values of ona way ANOVA "

      We apologise for the error, and have now corrected it

      Figure S2 Legend - Atg1 RNAi genotypes from S2 legend are mentioned erroneously

      We apologise for the error, and have now corrected it

      Repetition of text in results: "Bumpel, together with its paralogues kumpel and rumpel, is expressed in glia in flies, where it is thought to promote transport of substrates across the brain (31)."

      We apologise and have rectified this

      "Modulation of Atg1 when bumpel was co-overexpressed, however, did not affect GP<br /> levels (Fig 4E, F)" - Should be refering to Fig 4D, E)

      We apologise and have rectified this

      Reviewer #2 (Significance):

      The study will be of broadly of interest to researcher working in the fields of neurodegeneration and metabolism, providing evidence for a protective role of elevated pyruvate in neuron that provide new understand relating to pathology in C9orf72 associated motor neuron disease and frontotemporal dementia.

      Strengths:<br /> The study presents novel data to demonstrate that overexpression of fly monocarboxylate transporter bumple rescues an early death phenotype associate with ALS/FTD gene C9orf72. Any novel therapeutic strategies of ALS are of interest to the field, and the strategy demonstrated here may be readily translated to human cell culture systems for proof of principle translational studies to a more physiologically relevant system. This study further demonstrates the utility of invertebrate models to generate novel understanding of C9orf72 pathology.

      Limitations:<br /> The study speculates that there is a link between pyruvate levels and increased autophagy, however the mechanisms by which this occurs is not defined in present study. This is a limitation of the experiment, though opens up an interesting question for future studies._

      We thank the reviewer for their comments, and we have now added experiments characterising the role of bumpel in autophagy, particularly showing its rescue of a late autolysosomal block.

      Reviewer expertise: The reviewer researches ALS and dementia associated neurodegeneration, utilising Drosophila, rodent and stem cell derived model systems.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This is an interesting manuscript in which the authors provide evidence that elevated neuronal expression of the pyruvate transporter bumpel can partially rescue shortened lifespan in fly models of frontotemporal dementia and Alzheimer's disease. In addition, elevated neuronal bumpel expression can reduce accumulation of arginine containing FTD-linked dipeptide repeat proteins. Some evidence is presented that elevated neuronal bumpel expression may activate autophagy. These findings are novel and may have implications for therapeutic interventions based on pyruvate import/metabolism to treat neurodegenerative disorders. However, I have several concerns as follows:

      Major Comments:

      1. The authors provide no explanation as to why they targeted bumpel overexpression in neurons. Endogenous bumpel appears to be predominately expressed in glia cells so why not target these cells instead?

      We wanted to increase pyruvate import in neurons, so we over-expressed a number of pyruvate transporter that were available in the fly ORF stock centre (so that they would all be inserted into the same site and therefore directly comparable), we were mainly interested in cell autonomous effects of importing glycolytic metabolites. Over-expressing bumpel in glia would be indeed an extremely interesting experiment, unfortunately we do not have the ability to express C9 in neurons while over-expressing bumpel in glia as we only have one over-expression system that works. We are working towards generating a new C9 model so we can then use the Gal 4 system to over-express bumpel in glia, but this is currently not available yet. Over-expression of C9 in glia is not toxic and not a good model of disease.

      1. Data is shown that overexpressed bumpel can suppress GR and PR dipeptide repeat toxicity when these peptides are translated using an ATG start codon (Fig 2D,E). Does bumpel mediated neuroprotection also correlate with a reduction in DPR levels driven with an ATG start codon?

      This would be a very interesting question, unfortunately, whist the Isaacs lab kindly made available the GR antibody for the initial ELISA experiment, we no longer have that antibody available and we do not have a working PR antibody. GR and PR westerns are not possible to carry out as the proteins are too positively charged to run. We do show that bumpel can down-regulate Aß from a UAS promoter, so its effect is not specific to RAN translation.

      1. The authors provide some evidence suggesting that overexpression of bumpel increases autophagy in the fly brain. However, knockdown of Atg1 while co-expressing bumpel (Fig 4E) did not result in increased GP protein levels. In addition, Atg1 knockdown did not attenuate the protective effects of bumpel overexpression (Fig 4I), suggesting that bumpel is working through a pathway independent of autophagy to promote DPR clearance and protection against toxic peptide accumulation. The authors need to modify the interpretation of their data and temper their claim that autophagy contributes to bumpel-mediated protective effects in the CNS.

      We apologise the data was not strong enough. We have now added evidence that bumpel acts downstream of Atg1, on late stage autolysosomal clearance. We also show that bumpel and Atg1 can act synergistically to improve the C9 phenotype when over-expressed, this is now described in Fig 5.

      1. Although the authors present evidence that increased bumpel expression can activate autophagy, the data is not convincing that the neuroprotective effects associated with bumpel are mediated through autophagy. Pyruvate, in some circumstances, can non-enzymatically scavenge hydrogen peroxide or in other cases trigger oxidative stress resistance through hormetic ROS signaling. The authors should consider these alternative possibilities.

      These are indeed possibilities, we have added a sentence to that effect in the discussion, we have now also showed that bumpel is affecting late clearance of autolysosomes, and is leading to an increase in TFEB targets.

      1. The authors rely on overexpressing bumpel to attenuate C9 toxicity in flies. They should perform the opposite experiment and knockdown bumpel to demonstrate that reduced bumpel expression results in potentiation of C9 and amyloid beta neurotoxicity. In addition, then should show that knockdown of bumpel expression has some effect on autophagy.

      This would be a very interesting experiment, unfortunately bumpel is expressed only in a few glia subtypes in a wild type fly, and we can’t downregulate it in glia while over-expressing toxic proteins in neurons, because of limitations of our expression system, both genes need to be over-expressed in the same cell type. We have tried downregulating bumpel in neurons, and don’t get an effect on phenotype, and no effect on DPR levels, but bumpel expression in neurons is extremely low. Moreover, bumpel has 2 paralogs, rumpel and kumpel,(also only present in glia) and all three need to be knocked out for phenotypes to become visible in glia (Yildirim et al, 2022). These experiments would be interesting but outside out scope.

      We are in the process of generating new C9 models to be able to do these experiments, but these are currently outside the scope of this work.

      Minor Comments:

      1. Neuronal overexpression of bumpel appears to shorten lifespan of wild type flies (Fig 2A). It is possible that neuronal import of pyruvate may drive mitochondrial oxidative phosphorylation and ROS formation. The authors should comment on this possibility in the discussion._

      This is a very good point, we have added a point to that effect.

      1. In Fig 3 the authors used a mixture of sodium pyruvate and ethyl pyruvate to demonstrate the import properties of bumpel. The rationale for using ethyl pyruvate is unclear as this membrane-permeable metabolite can by-pass any transporters.

      The ethyl pyruvate was only used in the injection of flies, not for the FRET experiments looking at the import properties of bumpel. Since we were not over-expressing bumpel, we needed the pyruvate to by-pass the requirement for a transporter. We were showing that delivery of pyruvate by another methods (other than by a transporter) was able to phenocopy the over-expression of bumpel, thus showing the effect is mediated by pyruvate entrance into the cell.

      1. In the introduction several acronyms are used (i.e. GRN, MAPT, TREM2) that are not defined.

      We apologise and have now rectified this.

      Reviewer #3 (Significance):

      To my knowledge, this is the first study to identify that bumpel can permit the import of pyruvate and lactate into neurons when ectopically expressed in the fly brain. The fact that increased neuronal pyruvate import can partially protect against toxic peptide accumulation is unexpected and quite novel. Although some evidence is presented that bumpel can trigger autophagy, it is not clear if autophagy is mediating bumpel neuroprotective effects. Alternative mechanisms related to pyruvate effects on ROS and oxidative stress resistance should be considered.

      We thank the reviewer for their comments, and have added clarifying statements regarding the potential role of ROS.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for their comments and suggestions, which were very helpful to improve our manuscript. The revised manuscript notably includes the following improvements:

      • To evaluate the relevance of identified candidate targets genes, we integrated an additional screening step in our method, corresponding to the analysis of RNAseq datasets specific of blood or brain cells. RNAseq data from irradiated hematopoietic stem cells or splenic cells were analyzed and included in the new Table S19, and RNAseq data from zika virus-infected neural progenitors were analyzed and included in the new Table S28. In addition, we also verified that the expression of a subset of blood related genes was decreased in the bone marrow cells of p53Δ31/Δ31 mice, known to exhibit increased p53 activity and to phenocopy dyskeratosis congenita (new Figure S8).
      • Luciferase data were expanded to show that, for promoters exhibiting a significant p53-mediated repression in luciferase assays, the p53-dependent regulation was abrogated after mutation of the putative DREAM binding site (new Figures 2e and 2i).
      • We found putative DREAM binding sites for 151 targets, and the predicted binding sites were precisely mapped relative to the position of ChIP peaks of DREAM subunits (E2F4 and LIN9) and to transcription start sites of target genes. These additional analyses, shown in the new Figures 3a and 3b, further suggest the reliability of our predicted binding sites. Notably, hypergeometric tests of the distribution of DREAM binding sites relative to E2F4/LIN9 ChIP peaks reveal a significant >1300-fold enrichment of these sites at ChIP peaks.
      • We now present a detailed comparison of our results with those reported in other studies, notably the predicted E2F and CHR sites from the Target gene regulation database (new Figure S11), or the list of candidate DREAM targets suggested from Lin37 KO cells (new Figure S10 and new Table S35). This also leads us to discuss the different types of DREAM binding sites (bipartite sites (e.g. CDE/CHR or E2F/CLE) vs sites composed of a single E2F or a single CHR motif).
      • We integrated updates of the Human phenotype ontology website to include the latest lists of genes related to blood or brain ontology terms in our analysis. In the previous version of the manuscript we had analyzed a total of 811 genes downregulated ≥ 1.5 fold upon bone marrow cell differentiation. Our revised manuscript now includes the analysis of 883 genes.
      • Several improvements were made to present our results more clearly and with more details : 1) additional evidence that the differentiation of Hoxa9ER cells correlates with p53 activation is now provided in the new Figure S1; 2) the precise values for gene expression after bone marrow cell differentiation, as well as p53 regulation scores from the Target gene regulation databases are included in the new Tables S1, S5, S8, S11, S14, S20 and S23; 3) A Venn-like diagram was included to summarize the different steps of our approach in the new Figure 3c, with detailed lists of genes selected at each step in new Tables S17 and S26; 4) for genes associated with blood or brain genetic disorders, bibliographic references describing gene mutations and clinical traits were included in a new Table S36; 5) Figure 4a and Table S37 were improved to include evidence that increased BRD8 in glioblastoma cells leads to a decreased expression of several genes transactivated by p53.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary<br /> In this paper the authors describe a data driven approach to identify and prioritise p53-DREAM targets whose repression might contribute to abnormal haematopoiesis and brain abnormalities observed in p53-CTD deleted mice. The premise is that in these mice, (where they have previously demonstrated p53 to be hyperactive in at least a subset of tissues), that the p53-p21-E2F/DREAM axis is at least in part responsible for observed phenotypes due to the repression of E2F and CDE/CHE element containing genes. Their approach to home in on relevant genes is based on transcriptomic gene ontology analysis of genes repressed in these disease settings where they primarily use publicly available data from HOXA9-ER regulated model of HSC expansion wherein they observe increases on p53-p21 expression upon differentiation where they demonstrate that p53-p21 DREAM target genes are suppressed as we would expect in this scenario where p53-p21 is activating withdrawal from cell cycle. They then spend a lot of effort analysing this datasets combining "gene-ontology", "disease phenotype" and "meta-ChIP-seq" analysis of public data to support the observation that mutations of genes suppressed in this manner are disproportionately linked to heritable haematopoetic and brain disorders. While these results are interesting in terms of framing a hypothesis about how mutations in p53-p21-DREAM regulated targets contribute to such conditions, they are to be expected given the now very well described impact of p53-p21 on both E2F4/DREAM targets.

      We agree with the referee that the impact of p53-p21 on both E2F4/DREAM targets is well described. However, discussions with many scientists or clinicians specialized in bone marrow failure syndromes or microcephaly diseases led us to realize that most were not familiarized with the p53-DREAM pathway, so that a study that would bridge the gap between DREAM experts and bone marrow or microcephaly specialists would be particularly useful. In addition, we thought that strategies that would rely on disease-based ontology terms were likely to identify new targets, compared to previous studies that considered cell cycle regulation instead of disease phenotypes. Consistent with this, many genes we identified as candidate DREAM targets were not reported in previous studies. In addition, as detailed below, our positional frequency matrices led to identify DREAM binding sites that had not been predicted by previous approaches.

      The natural progression of this work would be to go on to show this occurs in relevant cells or tissues derived from the p53-CTD mice as well as look at modulating target genes to understand underlying mechanisms and consequences.<br /> Rather than this, they focus on validating that a sub-set of these targets are indeed suppressed by specific p53 activation by MDM2 inhibitor Nutlin-3A in MEFs by qPCR and that mutation of predicted CDE CHR elements in luciferase constructs leads to increase luciferase activity. While these findings support their predictions, the results are entirely expected based on what is known about such targets and demonstrating that this occurs in MEFs does not closely relate to haematopoietic and brain cells they suggest this regulation is important. In fact, in the discussion, the authors comment on the importance of cell type context specificity in terms of discordance between predictions of TF binding sites and public datasets.

      We agree that additional data from relevant cells or tissues were required to strengthen our conclusions. In the revised manuscript, we evaluated the relevance of candidate target genes related to blood ontology terms by integrating an additional screening step in our method, corresponding to the analysis of RNAseq datasets specific of blood cells. We analyzed dataset GSE171697, with RNAseq data from hematopoietic stem cells of unirradiated p53 KO, or unirradiated or irradiated WT mice, as well as dataset GSE204924, with RNAseq data from splenic cells of irradiated p53Δ24/- or p53+/- mice. The latter dataset appeared interesting because p53Δ24 is a mouse model prone to bone marrow failure and the spleen is a hematopoietic organ in mice. The analysis of these datasets is included in the new Table S19. In the datasets,increased p53 activity correlated with the downregulation of most of the 269 candidate DREAM targets. However, 56 genes which appeared upregulated in cells with increased p53 activity were considered poor candidate p53-DREAM targets and removed from further analyses, leading to a list of 213 genes that appeared as better candidate p53-DREAM targets related to blood abnormalities. Furthermore, we also verified that the expression of a subset of blood-related candidate genes was decreased in the bone marrow cells of p53Δ31/Δ31 mice (prone to bone marrow failure) compared to bone marrow cells from WT mice. This result is presented in the new Figure S8.

      As for genes related to brain development, we discussed in the previous version of the manuscript that most genes mutated in syndromes of microcephaly or cerebellar hypoplasia are involved in ubiquitous cellular functions (chromosome condensation, mitotic spindle activity, tRNA splicing…), which suggested that our analysis of transcriptomic changes associated with bone marrow cell differentiation might also be used to identify brain specific targets. However, we agree with the referee that confirmation of these brain specific targets in a more relevant cellular context was preferable. In the revised manuscript, we included the analysis of datasets GSE78711 and GSE80434, containing RNAseq data from human cortical neural progenitors infected by the Zika virus (ZIKV) or mock-infected, because ZIKV was shown to cause p53 activation in cortical neural progenitors and microcephaly. This analysis is detailed in the new supplementary Table S28. In both datasets, increased p53 activity correlated with the downregulation of most of the 226 candidate DREAM targets. Sixty-four genes which appeared more expressed in ZIKV-infected cells were considered poor candidate p53-DREAM targets and removed from further analyses, leading to a list of 162 candidate p53-DREAM targets related to brain abnormalities. We think this significantly increases the relevance of our analysis of brain-specific targets.

      Finally, they try and contextualise effects in glioblastoma data by correlating target gene expression with levels of BRD8 since it has recently been shown to attenuate p53 function in glioblastoma and show that some of the brain disease associated genes are expressed at higher levels in BRD8 high patient samples. It seems strange here that they do not also look at expression of p21 or other p53 targets that would help ascertain if p53 activity is indeed suppressed. Moreover, much more elegant methods for predicting transcription factor activity could be applied to this data.

      We agree with the referee. Indeed, when we had performed the analysis of glioblastoma cells, we first verified that increased BRD8 levels correlated with decreased p21 levels in these cells. However, we had not included this verification in the previous version of the manuscript. In this revision, we improved the Figure 4 (and Table S37) reporting the analysis of glioblastoma cells to address this point. In Figure 4a, we now show the variations in mRNA levels between BRD8Low and BRD8High tumors, for BRD8 itself, as well as 5 genes well-known to be transactivated by p53 (p21, MDM2, BAX, GADD45A and PLK3) and the 77 p53-DREAM targets associated with microcephaly or cerebellar hypoplasia. The data clearly show that tumors with high BRD8 exhibit a decrease in the expression of p53 transactivated targets, and an increase in p53-DREAM repressed targets.

      Major Comments<br /> The major result of this paper as it stands is the prioritisation of candidate genes in the p53-DREAM pathway involved in these conditions, and their refined approach used to identify and prioritise these genes and is such more of a starting point for further investigation. They fall short of demonstrating the relevance of their predictions physiologically in tissues from the mice and do not demonstrate functional importance of regulation of targets they put forward. Given that these genes will be co-ordinately regulated, without a mechanistic experiment in physiologically relevant model it is impossible to infer causality. For example, depleting individual targets in the HOXA9 model and evaluating impact on survival, proliferation and differentiation may be a (relatively) simple way to explore this, perhaps comparing to effects of p53 activating agents such as Nutlin-3A. Of note the authors (Jaber 2016 PMID: 27033104) and several other groups had (Fischer 2014 PMID: 25486564 McDade 2014 PMID: 24823795) previously demonstrated the link between p53-p21 and suppression of DNA-repair/Damage related genes (as is also observed here in particular FA-related genes that they discuss briefly here. I would have thought that this would be an obvious starting point for some mechanistic experiments and in fact I note this has been demonstrated before (Li et al 2018 PMID: 29307578)

      The starting point of our study is not the prioritization of DREAM target genes, but rather the detailed phenotyping of p53Δ31/Δ31 mice that we performed in previous publications (Simeonova et al. Cell Rep 2013, Toufektchan et al. Nat. Commun. 2016), in which we mentioned phenotypical traits typical of dyskeratosis congenita and Fanconi anemia, including notably bone marrow failure and cerebellar hypoplasia.

      We understand that depleting individual targets in the Hoxa9 system and evaluating impact on survival, proliferation and differentiation might seem appropriate to explore their potential causality. However, our previous work on Fanc genes leads us to think that this might not be informative. Regarding this, we now clearly discuss in the revised version of the manuscript : “Finding a functionally relevant [DREAM binding site] for Fanca, mutated in 60% of patients with Fanconi anemia [59,60], may help to understand how a germline increase in p53 activity can cause defects in DNA repair. Importantly however, we previously showed that p53Δ31/Δ31 cells exhibited defects in DNA interstrand cross-link repair, a typical property of Fanconi anemia cells, that correlated with a subtle but significant decrease in expression for several genes of the Fanconi anemia DNA repair pathway, rather than the complete repression of a single gene in this pathway [25]. Thus, the Fanconi-like phenotype of p53Δ31/Δ31 cells most likely results from a decreased expression of not only Fanca, but also of additional p53-DREAM targets mutated in Fanconi anemia such as Fancb, Fancd2, Fanci, Brip1, Rad51, Palb2, Ube2t or Xrcc2, for which functional or putative [DREAM binding sites] were also found with our systematic approach.” We further discuss in the manuscript how this may also apply to telomere-, ribosome-, of microcephaly-related genes.

      The analysis of brain specific targets and the link to BRD8 sits largely as an aside and the analysis of patient data from glioblastomas is underdeveloped as noted above.

      As we previously mentioned, the revised manuscript includes the analysis of RNAseq datasets from human cortical neural progenitors infected by the Zika virus (ZIKV) or mock-infected, which significantly increases the relevance of our analysis of brain-specific targets. Furthermore, we improved Figure 4 to present more clearly the impact of BRD8 levels on the expression of genes transactivated by p53 or repressed by p53-DREAM.

      The computational methods applied are robust, albeit predominantly coorelative, in terms of identifying regulation of potential causative target genes, validated across human and mouse cell lines, and this indicates a role of these genes in the relevant conditions. However, further validation through application in a bulk or single cell RNAseq patient cohort, or at least an in vivo model would strengthen these conclusions and complement the work presented here which is based on in vitro mouse and human cells. This is pertinent as this study improves upon previously published approaches by focusing on "clinically relevant target genes". Additionally, this would exhibit the potential applications of the findings presented.

      We thank the referee for this comment. As mentioned above, in the revised manuscript we analyzed RNAseq data from hematopoietic stem cells of unirradiated WT or p53 KO mice, or irradiated WT mice, and from splenic cells of irradiated p53D24/- or p53+/- mice, and quantified the expression of a subset of blood-related candidate genes in the bone marrow cells of p53Δ31/Δ31 mice (prone to bone marrow failure) and WT mice (new Figure S8 and Table S19). For genes related to brain development, we included the analysis of RNAseq data from human cortical neural progenitors infected by the Zika virus (ZIKV) or mock-infected (Table S28). These RNAseq analyses were added as an additional screening criterion in our approach, which significantly increased the relevance of the target genes identified.

      In terms of statistical analysis, the hypergeometric test should be applied to assess significant enrichment of genes for example with CDE/CHR regions within the previously identified lists.

      In the revised manuscript, we precisely mapped the DREAM binding sites in 50 bp windows within regions bound by E2F4 and/or LIN9, an analysis included in new Figure 3a. We then compared the distribution of DREAM binding sites at the level of ChIP peaks compared to their distribution over the entire genome and found a > 1300-fold enrichment of these sites at ChIP peaks. This significant enrichment (f=3 10-239 in a hypergeometric test) is most likely underestimated because mouse-human DNA sequence conservations were not determined for putative DBS over the full genome. These new analyses clearly reinforce our previous conclusions.

      Minor Comments<br /> References are required for the genes listed which play a role in the diseases of interest.

      In the revised manuscript, references are provided for genes which play a role in the diseases of interest. Due to the large number of added references, these were included in a new supplementary table, Table S36.

      This paper would benefit from the inclusion of summary schematics and tables throughout (rather than relying only on somewhat unwieldy heatmaps which show little other than all these genes are co-ordinately regulated), this could include summaries of the methods applied, gene or CDE/CHR inclusion criteria, and Venn diagrams indicating the subsets of final genes identified through this approach.

      We thank the referee for this suggestion. In the revised manuscript we provide a Venn-like diagram of the different steps of our approach (new Figure 3c), as well as tables listing the genes retained after each step of the selection (new Tables S17 and S26) and these additions improve the clarity of our manuscript.

      Reviewer #1 (Significance):

      In its current form this is a very limited study that would require significant additional work to move conclusions beyond correlation and hypothesis generation.<br /> Overall, while limited largely to target prioritisation, this research nicely exemplifies how genes affected by the p53-DREAM pathway can be robustly identified, providing a potential resource for individuals working on this pathway or on abnormal haematopoiesis and brain abnormalities. These results are complementary to work previously published by Fischer et al, which has been referenced throughout the analysis (highlighting Target Gene Regulation Database p53 and DREAM target genes) and discussion.

      This paper will be of interest to researchers of blood/neurological diseases who can assess if these genes are dysregulated in their datasets, or those investigating the p53-DREAM pathway. This work represents a useful resource detailing genes affected by this pathway in these disease settings, however researchers of the p53-DREAM pathway may find this paper useful when planning an approach to identify and prioritise genes of interest.

      We thank the reviewer for considering that our study represents a useful resource for researchers working on the p53-DREAM pathway, abnormal haematopoiesis and brain abnormalities, because it was exactly the purpose of our work. As mentioned above, we think that a study bridging the gap between DREAM experts and bone marrow or microcephaly specialists should be particularly useful.

      We also agree with the referee that our approach could be used to identify DREAM targets relevant to other disease settings, and we now mentioned this clearly in the revised manuscript.

      While our results are complementary to work previously published by Fischer et al and included in the Target gene regulation database, in the revised manuscript we discuss the novelty of our results in more details, notably by performing additional analyses. For example, our method identified bipartite DREAM binding sites for 151 candidate DREAM targets (of which 56 genes were not previously mentioned by Fischer et al.) and we now provide a detailed mapping (using 50 bp windows) of the bipartite DREAM binding sites we identified relative to ChIP peaks for DREAM subunits, then performed a similar mapping of the E2F and CHR sites included in the Target gene regulation database. Our predicted DREAM binding sites coincided with ChIP peaks more frequently (Figure 3a) than the predicted E2F or CHR from the Target gene regulation database (Figure S11), which further indicates the usefulness of our study as a resource.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The authors used various systems including Hoxa9-indubible BMCs, human and mouse cells, WT and p53 knockout MEF, glioblastoma cells to screen p53-DREAM targets and observed distinct finding for each system. Since different cell types have various p53 activation and p53 target genes expression, the authors might want to select proper cell type(s) to screen p53-DREAM target genes and design experiments to confirm that these genes are really p53-DREAM target genes.

      We agree that additional data from relevant cells or tissues were required to strengthen our conclusions. As mentioned in response to referee #1, in the revised manuscript we evaluated the relevance of candidate target genes related to blood ontology terms by integrating an additional screening step in our method, corresponding to the analysis of RNAseq dataset GSE171697, with data from hematopoietic stem cells of unirradiated or irradiated WT mice and unirradiated p53 KO mice , as well as RNAseq dataset GSE204924, with data from splenic cells of irradiated p53D24/- or p53+/- mice. As for genes related to brain development, we included the analysis of RNAseq datasets GSE78711 and GSE80434 for validation, two datasets from human cortical neural progenitors infected by the Zika virus or mock-infected. Together, the 4 datasets provide evidence for a p53-dependent downregulation in blood- and brain- relevant settings (new Tables S19 and S28).

      Importantly, in the revision we also compared our list of 151 genes appearing as the best p53-DREAM candidates with the results of Magès et al., who analyzed, in murine cells with a CRISPR-mediated KO of Lin37 (a subunit of DREAM), the transcriptomic changes that follow a reintroduction of Lin37. This comparison is detailed in the discussion section, with the new Figure S10 and Table S35. We mention: “Our list of 151 genes overlaps only partially with the list of candidate DREAM targets obtained with this approach, with 51/151 genes reported to be downregulated in Lin37-rescued cells [17]. To better evaluate the reasons for this partial overlap, we extracted the RNAseq data from Lin37 KO and Lin37-rescued cells and focused on the 151 genes in our list. For the 51 genes that Mages et al. reported as downregulated in Lin37-rescued cells, an average downregulation of 14.8-fold was observed (Figure S10, Table S35). Furthermore, when each gene was tested individually, a downregulation was observed in all cases, statistically significant for 47 genes, and with a P value between 0.05 and 0.08 for the remnant 4 genes (Table S35). By contrast, for the 100 genes not previously reported to be downregulated in Lin37-rescued cells, an average downregulation of 4.7-fold was observed (Figure S10, Table S35), and each gene appeared downregulated, but this downregulation was statistically significant for only 35/100 genes, and P values between 0.05 and 0.08 were found for 23/100 other genes (Table S35). These comparisons suggest that, for the additional 100 genes, a more subtle decrease in expression, together with experimental variations, might have prevented the report of their DREAM-mediated regulation in Lin37-rescued cells.”

      This comparison provides additional evidence that the 151 candidate target genes we identified are bona fide DREAM targets.

      Specific comments:<br /> The authors need to describe and define HSC and Diff in Figure 1.

      This has been corrected in the revised manuscript. “HSC” was replaced by “Hematopoietic Stem / Progenitor cells (+OHT)” and “Diff” was replaced by “Differentiated cells (5 days – OHT).

      Are Figure 1B and 1D list genes p53 targets in bone marrow cells?

      In the revised manuscript, we now analyzed RNAseq data to address this point. The question refers to lists of telomere-related genes (Figure 1b in both versions of the manuscript) and Fanconi-related genes (Figure 1d in the previous version, now Figure S2a), but could also apply to other lists of genes related to blood ontology terms (Figures S3-S5 in the revised manuscript). As mentioned in response to referee #1, in the revised manuscript we integrated an additional screening step in our method, corresponding to the analysis of RNAseq datasets specific of blood cells. We analyzed dataset GSE171697, with RNAseq data from hematopoietic stem cells of unirradiated WT or p53 KO mice, or irradiated WT mice, as well as dataset GSE204924, with RNAseq data from splenic cells of irradiated p53D24/- or p53+/- mice. The latter dataset appeared interesting because p53D24 is a mouse model prone to bone marrow failure and the spleen is a hematopoietic organ in mice. Furthermore, we also verified that the expression of a subset of blood-related candidate genes was decreased in the bone marrow cells of p53Δ31/Δ31 mice (prone to bone marrow failure) compared to bone marrow cells from WT mice, a result presented in the new Figure S8.

      Where is the detailed information for mouse and human cells in Figure 1 and Figure 2?

      In the first draft of the manuscript, supplementary tables provided precise values for ChIP binding. In the revised manuscript, we also provide the precise values for gene expression after bone marrow cell differentiation, as well as p53 regulation scores from the Target gene regulation databases. This additional information is included in the new Tables S1, S5, S8, S11, S14, S20 and S23.

      Are Figure 3B list genes also p53 target genes in other cell types such as bone marrow cells and glioblastoma?

      For genes in the Figure 3B of the previous version of the manuscript (now Figure 2B in the revised version), we now provide evidence that the blood-related genes are less expressed in the bone marrow cells of p53Δ31/Δ31 mice (mice with increased p53 activity and prone to bone marrow failure) compared to bone marrow cells from WT mice. This result is presented in the new Figure S8. For the brain-related genes of the same Figure, evidence of their p53-mediated regulation is provided by the RNAseq datasets GSE78711 and GSE80434, from human cortical neural progenitors infected by the Zika virus or mock-infected (analyzed in the new Table S28). Evidence of that a decreased p53 activity in glioblastomas correlates with increased expression of the brain-related genes of the same Figure is provided in supplementary Table S37.

      Does BRD8high has high p53 and p21?

      We now clearly show, in both Figure 4a and Table S37, that glioblastoma cells with high BRD8 exhibit a decreased expression of CDKN1A/p21 and other genes known to be transactivated by p53 (BAX, GADD45A, MDM2, PLK3), consistent with the fact that BRD8 attenuates p53 activity.

      Are genes listed in Figure 4B all p53 target genes? can some validation be done?

      For genes in Figure 4B, in the revision we focused on the genes that appeared more relevant, i.e. the 77 genes mutated in diseases with microcephaly or cerebellar hypoplasia. All the genes in Figure 4B are repressed in neural progenitors upon infection by the Zika virus, a virus known to cause p53 activation in those cells. This is reported in the new Table S28.

      Reviewer #2 (Significance):

      This is a potentially interesting study. The major limitation is the absence of validation from the screening. This study would definitely benefit the research community as long as some of the key findings are validated.

      We thank the referee for this comment. We hope the new evidence in this revision provide the validation requested by the referee.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In their work submitted to Review Commons, Rakotopare et al. aim to identify p53-DREAM target genes associated with blood or brain abnormalities. To this end, they utilize published data generated with a cellular model that results in cell-cycle exit and differentiation of murine bone marrow progenitor cells upon inducible expression of Hoxa9. By analyzing this gene expression data set published by Muntean et al., they find that multiple of the 3631 genes which are downregulated more than 1.5-fold in differentiated BMCs are also mutated in several disorders connected to proliferation and differentiation defects during hematopoiesis and brain development. By screening ChIP-seq data sets available at ChIP-Atlas, they find that the promoters of many of these genes are bound by DREAM complex components, and most of them were identified as genes indirectly repressed by p53 before (Fischer et al. 2016, targetgenereg.org). They then use a computational approach to identify putative CDE/CHR DREAM-binding sites in the promoters of 372 genes associated with blood/brain abnormalities which are downregulated in differentiated BMCs and bound by DREAM components. Out of the 173 candidate genes, they select twelve to analyze whether mutation of the putative DREAM binding sites results in increased activity of the promoters in luciferase reporter assays. The authors conclude that their findings suggest a general role for the p53-DREAM pathway in regulating hematopoiesis and brain development.<br /> While the study supports a large body of publications proving that repression of cell cycle genes by the DREAM complex is crucial for cell cycle arrest and exit, it is noted that none of the main conclusions here are unexpected or particularly exciting. All the analyses are based on data sets that compare gene expression in highly proliferative cells with cells that underwent terminal cell cycle exit. Thus, a large portion of the genes that are downregulated in differentiated BMCs are cell cycle genes and well-established targets of DREAM and E2F:RB complexes. Furthermore, it is not surprising that some of these pro-proliferative genes are mutated in diseases connected to proliferation defects like anemias or microcephaly.

      We agree with the referee that the DREAM complex is well known to regulate cell cycle genes – in fact, this is what we mention in the first sentence of our introduction in both versions of our manuscript. However, as we already pointed out in response to Referee #1, many scientists or clinicians specialized in bone marrow failure syndromes or microcephaly diseases are not familiarized with the p53-DREAM pathway, and we think our study will be particularly useful to them. Furthermore, our strategy relying on disease-based ontology terms rather than cell cycle regulation led to identify many DREAM targets that were not reported in previous studies, and our positional frequency matrices led to identify DREAM binding sites not predicted by previous approaches. As discussed below, our revised manuscript provides a more detailed comparison of our findings with those from previous studies.

      Additionally, I am not very enthusiastic about this manuscript because of several major concerns:

      1. The authors draw conclusions about the p53-DREAM pathway based on data that was generated in a cellular differentiation model without convincingly showing that p53 plays a central role in gene repression in this experimental setup.<br /> (A) Rakotopare et al. define p53-DREAM target genes based on RNA expression data from proliferating precursor cells and non-proliferating, differentiated BMCs (Muntean et al., 2010). This paper has not studied whether p53 gets activated in the particular experimental setup during Hox9a-induced BMC differentiation. On page 4 of their manuscript, the authors state: "Consistent with the fact that BMC differentiation strongly correlates with p53 activation..." without citing any literature or explaining why this is supposed to be a fact. Furthermore, they imply that cell cycle gene repression in this model system depends on p53 because mRNA expression of the p53 targets p21 and Mdm2 was found to be increased in the differentiated cells (Fig. 1A, 5-fold and 2-fold, respectively). However, defining a large set of "p53-DREAM target genes" based on the moderate increase in mRNA levels of two genes that are known to be activated by p53 without showing any evidence that p53 is even involved in this effect during BMC differentiation is not appropriate.

      We agree that Muntean et al. did not study whether p53 gets activated when BMCs differentiate in the Hox9a-ER system. We previously mentioned: “We observed that p53 activation correlated with cell differentiation in this system, because genes known to be transactivated by p53 (e.g. Cdkn1a, Mdm2) were induced, whereas genes repressed by p53 (e.g. Rtel1, Fancd2) were downregulated after tamoxifen withdrawal (Figure 1a)”. We had provided examples for 2 genes transactivated and 2 genes repressed, but clearly mentioned that they were given as examples. In the revised manuscript, we provide additional evidence with a new supplementary Figure that includes changes in expression for 15 additional genes known to be transactivated by p53, and 5 additional genes known to be repressed by p53 (Figure S1). In total, we now correlate HSC differentiation with p53 activation based on the expression of 24 well-known p53-regulated genes, which we hope is more convincing.

      In addition, we changed our phrasing and mention “Consistent with the notion that BMC differentiation strongly correlates with p53 activation in this system, 72 of these 76 genes have negative score(s) in the Target gene regulation (TGR) database”.

      (B) Interestingly, p53 is among the genes that get repressed on mRNA level in differentiated BMCs (Fig. 1B; Trp53), and the authors also identify the DREAM components E2F4 and LIN9 as bound to the p53 promoter by screening ChIP-Atlas data (Fig. 1C). Given that p53 has never been described as a DREAM target, I find this rather surprising and it makes me wonder whether appropriate parameters were selected for analyzing the ChIP data, particularly since the authors do not provide binding data for sets of non-cell cycle genes as a negative control.

      We retrieved ChIP data from the ChIP Atlas database without any specific parameters, thus in a completely unbiased manner. Importantly however, for reasons detailed in the manuscript, we clearly mentioned that total ChIP scores <979/4000 were considered too low to reflect significant DREAM binding. The ChIP score for Trp53 was 630, which rapidly led us to eliminate this gene from our screen.

      This ChIP score criterion was already mentioned in the previous version of our manuscript, but we think the addition of a Venn-like diagram (Figure 3c) and summary tables (S17 and S26) in the revised manuscript will probably make it easier to understand.

      (C) Finally, the authors utilize the targetgenereg.org database to show that many of the genes they describe as p53-repressed were already identified as p53 targets. This database (Fischer et al. 2016) was created by performing a meta-analysis integrating a plethora of RNA-seq and ChIP-seq datasets with the aim to identify whether a particular gene gets up- or downregulated by p53, shows cell-cycle-dependent expression, is a DREAM/MuvB or E2F:RB target, etc. For example, 57 datasets analyzing p53-dependent RNA expression in human and 15 datasets generated with mouse cells were included, and a positive or negative score shows in how many of these experiments the gene was found to be up (positive score) or downregulated (negative score). Combining a large number of datasets in such a study is very helpful to get an idea if a gene is indeed generally regulated by a transcription factor, or if it just showed up in a few experiments - either as a false positive or because the regulation depends on a particular biological setting. The authors find most of the genes they identify as repressed in differentiated BMCs also as downregulated by p53 in targetgenereg.org, however, it remains unclear what parameters they used to define a gene as p53-repressed. For example, in the caption of Fig. 1C, they state: "According to the Target gene regulation database, 72/76 genes are downregulated upon mouse and/or human p53 activation." The four exemptions are SLX1B (human score: 0, mouse score : na), PML (+41, +9), RAD50 (0, na), and TNKS2 (+17, +4). However, there are several other genes that do not appear to be generally repressed by p53, e.g. HMBOX1 (+4, -2); UPF1 (+1, -2), SMG6 (+18, -2), CTC1 (-5, +11), etc. Thus, without providing details regarding the parameters they use to define p53-target genes, such statements are rather misleading. An easy way to solve this problem would be to show the p53 scores in the tables together with the E2F4/LIN9 ChIP data.

      All the genes mentioned as downregulated by p53 had a negative TGR score in human and/or mouse cells. In the revised manuscript, we mention clearly what a negative TGR score means, by stating: “Consistent with the notion that BMC differentiation strongly correlates with p53 activation in this system, 72 of these 76 genes have negative p53 expression score(s) in the Target gene regulation (TGR) database [23], which indicates that they were downregulated upon p53 activation in most experiments carried out in mouse and/or human cells (Figure 1b, Table S1).” We agree with the referee that adding precise TGR scores is informative. In the revised manuscript, we provide the TGR scores for all the genes analyzed, as part of the new supplementary Tables S1, S5, S8, S11, S14, S20 and S23, together with their expression levels in undifferentiated or differentiated cells (as requested by Referee #2). The ChIP data are provided in separate tables (Tables S2, S3, S6, S7, S9, S10, S12, S13, S15, S16, S21, S22, S24 and S25).

      1. The authors define a large set of genes containing "CDE-CHR" promoter elements and thereby ignore how these elements are defined and what properties they have.<br /> (A) At the beginning of the introduction, the authors state: "The DREAM complex typically represses the transcription of genes whose promoter contain a bipartite CDE/CHR binding site, with a cell cycle-dependent element (CDE) bound by E2F4 or E2F5, and a cell cycle gene homology region (CHR) bound by LIN54, the DNA binding subunit of MuvB (Zwicker et al., 1995; Müller and Engeland, 2010)."<br /> This statement is incorrect. The authors ignore that the CDE/CHR tandem site is just one of four promoter elements that have been shown to recruit DREAM for the transcriptional repression of several hundred genes. It has been studied in detail that DREAM can bind to the following promoter sites:<br /> (I) CHR elements - bound by DREAM via LIN54; also bound by the activator MuvB complexes B-MYB-MuvB and FOXM1-MuvB which results in maximum gene expression in G2/M<br /> (II) CDE-CHR tandem elements - like (I) but binding of DREAM can be stabilized via E2F4/DP interacting with a truncated E2F binding site. Since CDE elements do not represent functional E2F sites, E2F:RB complexes do not bind.<br /> (III) E2F binding sites - bound by DREAM via E2F4/DP; also bound by E2F:RB complexes and activator E2Fs which results in maximum gene expression in G1/S<br /> (IV) E2F-CLE tandem elements - like (III) but binding of DREAM can be stabilized via LIN54 interacting with a non-canonical CHR-like element. Since CLE elements do not represent functional CHR sites, B-MYB-MuvB and FOXM1-MuvB do not bind.<br /> Thus, these promoter sites have different functions and can be clearly distinguished from each other based on their properties - a fact that is completely ignored by the authors. Since the authors do not differentiate between G1/S and G2/M expressed genes and (CDE)-CHR and E2F-(CLE) sites, they identify CDE-CHR elements in G1/S genes that are functional E2F-(CLE) sites. A good example of this is the Rad51ap1 gene (and also the Rad51 gene that the Toledo lab described before as a CDE-CHR gene (Jaber et al. 2016)): these genes get expressed in G1/S and the promoters contain highly conserved E2F sites (parts of which the authors define as CDEs), and CLEs (which the authors define as CHRs). Furthermore, E2F:RB complexes bind to the promoters. Again: even though (CDE)-CHR and E2F-(CLE) sites both bind DREAM, they are otherwise functionally different in their ability to recruit non-DREAM complexes.

      We agree that in the previous version of our manuscript we should have presented in more details the different types of DREAM binding sites and have corrected this in the revised manuscript. We now mention in the introduction that “The DREAM complex was initially reported to repress the transcription of genes whose promoter sequences contain a bipartite binding motif called CDE/CHR [19,20] (or E2F/CHR [21]), with a GC-rich cell cycle dependent element (CDE) that may be bound by E2F4 or E2F5, and an AT-rich cell cycle gene homology region (CHR) that may be bound by LIN54, the DNA-binding subunit of MuvB [19,20]. Later studies indicated that DREAM may also bind promoters with a single E2F binding site, a single CHR element, or a bipartite E2F/CHR-like element (CLE), and concluded that E2F and CHR elements are required for the regulation of G1/S and G2/M cell cycle genes, respectively [14,22].”

      We hope that the referee will agree with this complete yet concise way of presenting DREAM binding sites. Importantly, we agree that CDE/CHR and E2F/CLE are sites bound by different non-DREAM complexes, but both sites are bound by DREAM, so it makes perfect sense to use them together to define positional frequency matrices for DREAM binding predictions. We would also like to point out that terms used to define DREAM binding sites may vary in the literature. For example, to our knowledge Müller et al. were the first to propose a clear distinction between “CDE/CHR” and “E2F/CLE” sites (Müller et al. (2017) Oncotarget 8, 97737-97748), yet Müller recently co-authored a review in which these two distinct terms were not used, but were replaced by a single, apparently more generic term of “E2F/CHR” (Fischer et al., (2022) Trends Biochem. Sci. 47, 1009-1022). In the revised manuscript we now clearly mention that we designed our positional frequency matrices to search for “bipartite DREAM binding sites”, i.e. sites that might be referred to as CDE/CHR, E2F/CLE or E2F/CHR sites in various publications.

      (B) The authors identified putative CDE-CHR in the promoters of genes by building two position weight matrices (PWMs) based on 10 or 22 "validated CDE-CHR elements". However, since they include several genes that are clearly expressed in G1/S and contain E2F-(CLE) sites (e.g. Mybl2/B-myb, Rad51, Fanca, Fen1), it is not surprising that they identify a lot of putative CDE-CHR sites in genes that do not contain such elements.

      As discussed above, both CDE/CHR and E2F/CLE are bipartite DREAM binding sites, and we now clearly state that we used bipartite DREAM binding sites to generate our positional frequency matrices and predict DREAM binding.

      (C) Finally, in the discussion, the authors state: "A recent update (2.0) of the Target gene regulation database of p53 and cell cycle genes (www.targetgenereg.org) was recently reported to include putative DREAM binding sites for human genes (Fischer et al., 2022). However, this update only suggests potential E2F or CHR binding sites independently, a feature of little help to identify CDE/CHR elements. For example, targetgenereg 2.0 suggests several potential E2F sites, but no CHR site close to the transcription start site of FANCD2, despite the fact that we previously identified a functionally CDE/CHR element near the transcription start site of this gene (Jaber et al., 2016)." This statement highlights again that the authors don't seem to be aware of what specific properties distinct DREAM binding sites have, and that analyzing promoters for CHR and E2F sites separately generates much more meaningful results than the approach they chose. Also, the FANCD2 promoter binds DREAM as well as E2F:RB complexes and contains a highly conserved E2F binding site - which Jaber et al. mutated together with a potential downstream CLE element and named it "CDE/CHR".

      In the revised manuscript, we provide a more detailed comparison between the bipartite DREAM binding sites predicted with our positional frequency matrices for 151 genes and the separate E2F and CHR predicted sites reported in the Target gene regulation database for the same set of genes. We now mention: “The Target gene regulation (TGR) database of p53 and cell-cycle genes was reported to include putative DREAM binding sites for human genes, based on separate genome-wide searches for 7 bp-long E2F or 5 bp-long CHR motifs [23]. We analyzed the predictions of the TGR database for the 151 genes for which we had found putative bipartite DBS. A total of 342 E2F binding sites were reported at the promoters of these genes, but only 64 CHR motifs. The similarities between the predicted E2F or CHR sites from the TGR database and our predicted bipartite DBS appeared rather limited: only 14/342 E2F sites overlapped at least partially with the GC-rich motif of our bipartite DBS, while 27/64 CHR motifs from the TGR database exhibited a partial overlap with the AT-rich motif. Importantly, most E2F and CHR sites from the TGR database mapped close to E2F4 and LIN9 ChIP peaks, but only 16% of E2Fs (54/342), and 33% of CHRs (21/64) mapped precisely at the level of these peaks (Figure S11), compared to 55% (83/151) of our bipartite DBS (Figure 3a). Thus, at least for genes with bipartite DREAM binding sites, our method relying on PFM22 appeared to provide more reliable predictions of DREAM binding than the E2F and CHR sites reported separately in the TGR database. Importantly however, predictions of the TGR database may include genes regulated by a single E2F or a single CHR that would most likely remain undetected with PFM22, suggesting that both approaches provide complementary results.”

      1. The experimental approach chosen to validate CDE-CHR elements in a set of twelve promoters by luciferase reporter assays is not adequate.<br /> (A) Since the authors introduce point mutations in putative CDE and CHR elements in parallel, it is impossible to identify functional CDE elements. As explained above, a functional CDE is not required for binding of MuvB complexes and gene repression, and mutating the CHR alone would already lead to a loss of DREAM binding and to de-repression of a promoter. Thus, without mutating both sites of CDE-CHR elements separately, it is impossible to provide evidence that a putative CDE is functional.<br /> (B) As the putative CDE-CHR elements identified by the authors with a computational approach can overlap with functional E2F-(CLE) elements, the authors inactivate such sites by introducing mutations which leads to loss of DREAM binding and upregulation of the promoters, however, because of the problems described above, this experimental approach in the best case identifies DREAM binding sites, but does not differentiate between (CDE)-CHR and E2F-(CLE) elements.

      Yes, we agree with this comment. As discussed above, our goal was to identify DREAM-binding sites, not to differentiate between CDE/CHR and E2F/CLE elements. In other words, we wanted to identify genes regulated by p53 and DREAM, but not distinguish between genes regulated by p53, DREAM and E2F/Rb versus those regulated by p53, DREAM and BMyb-MuvB or FoxM1-MuvB.

      (C) The authors analyze the activities of wild-type and mutant promoters in proliferating NIH3T3 cells. Since the mutated promoters showed increased activity (about 2-3 fold), which would be expected when binding of DREAM gets abolished, they conclude: "...these experiments indicated that we could identify functional CDE/CHRs for 12/12 tested genes." In addition to the problems described above, a slight upregulation of promoter activities caused by the introduction of multiple point mutations close to the TSS is not sufficient to verify these elements. The increase in activity could occur independent of DREAM-binding by unrelated mechanisms. The authors should at least analyze the activities of the promoters with and without induction of p53. A loss of p53-dependent repression of the mutated promoters would prove that the elements are essential for p53-dependent repression. Furthermore, there are several experimental approaches to analyze whether DREAM binds to the putative promoter element and whether the introduced mutations disrupt binding (ChIP, DNA affinity purification, etc.).

      In the revised manuscript, we show that the promoters of 7 of the tested genes, when cloned in luciferase reporter plasmids and transfected into NIH3T3 cells, exhibited a significant (> 1.4 fold) repression upon p53 activation by cell treatment with Nutlin, the Mdm2 antagonist. For these promoters, we showed that the p53-dependent repression was abrogated by mutating the identified DREAM binding site, which provided direct evidence that our positional frequency matrices can identify functionally relevant DREAM binding sites essential for p53-mediated repression. These experiments were added in Figures 2e and 2i.

      Furthermore, as previously mentioned in response to referee #1, in the revised manuscript we precisely mapped the predicted DREAM binding sites for 151 genes in 50 bp windows within regions bound by E2F4 and/or LIN9, an analysis included in new Figure 3a. The distribution of these peaks clearly indicates that most predicted DREAM binding sites map precisely within a 50 bp-window encompassing the ChIP peaks, which represents an enrichment of at least a 1300-fold compared to the rest of the genome. This mapping strongly suggests that our predicted DREAM binding sites are functionally relevant.

      Importantly, as shown in the new Figure S11, we carried out a similar mapping of the predicted E2F and CHR sites reported in the Target gene regulation (TGR) database and found that our predicted DREAM binding sites co-mapped with E2F4/LIN9 ChIP peaks more frequently than the E2F and CHR sites of the TGR database, which supports the conclusion that our positional frequency matrices bring new and improved predictions for DREAM binding.

      1. Taken together, while over-simplifying mechanisms of cell cycle gene regulation, the authors largely ignore recent findings and publications regarding gene regulation by p53, E2F:RB, and DREAM/MuvB complexes:<br /> (A) Publications that show how DREAM binds to (CDE)-CHR sites and that experimentally defined a consensus motif for CHR elements (e.g. PMID: 27465258, PMID: 25106871).<br /> (B) Publications that identify p53-DREAM target genes by activating p53 in cells with or without functional DREAM complex (e.g. PMID: 31667499, PMID: 31400114).<br /> (C) Identification and comparison of (CDE)-CHR and E2F-(CLE) DREAM binding sites that have distinct functions in the activation of cell-cycle expression in G1/S and G2/M (e.g. PMID: 29228647, PMID: 25106871).<br /> These findings have been summarized in several review articles (e.g. PMID: 29125603, PMID: 28799433, PMID: 35835684). All of them describe the mechanisms I have mentioned above in detail, and since Rakotopare et al. cite one of the papers (Engeland 2018), I wonder even more why they did not design their experiments based on current knowledge.

      The points (A) and (C) of this comment were largely discussed in our response to points 2 and 3 of the same referee. Briefly, in the revised manuscript we clearly mention CDE/CHR, E2F/CLE and E2F/CHR sites, as well as the functional differences between E2F and CHR sites with regards to cell cycle regulation, but all these sites were considered together in our positional frequency matrices because our goal was to identify genes regulated by p53 and DREAM, not to distinguish between genes regulated by p53, DREAM and E2F/Rb versus those regulated by p53, DREAM and BMyb-MuvB or FoxM1-MuvB.

      Regarding point (B) of this comment, in the revised manuscript we performed a detailed comparison of our results with those of Mages et al. who analyzed, in murine cells with a CRISPR-mediated KO of Lin37 (a subunit of DREAM), the transcriptomic changes that follow a reintroduction of Lin37 (Mages et al. (2017) elife 6, e26876). This comparison is detailed in the discussion section, with New Figure S10 and Table S35. As mentioned in response to referee #2, this comparison is perfectly consistent with DREAM regulating the 151 genes for which we identified DREAM binding sites.

      Minor concerns:

      1. The authors state: "Importantly however, the relative importance of the p53-p21-DREAM pathway (called below p53-DREAM) remains controversial, because multiple mechanisms were proposed to account for p53-mediated gene repression (Peuget and Selivanova, 2021)." Even though Peuget & Selivanova do not agree that genes get repressed in response to p53 activation exclusively by the p21-DREAM pathway, they do not question that this mechanism is essential for the p53-dependent repression of a core set of cell cycle genes. Since I am also not aware of any publications that challenge the importance of the p53-p21-DREAM pathway, I do not agree with this statement.

      As the referee pointed out, in the first version of the manuscript we wrote that “the relative importance of the p53-p21-DREAM pathway (called below p53-DREAM) remains controversial, because multiple mechanisms were proposed to account for p53-mediated gene repression (Peuget and Selivanova, 2021)”. The term “relative” was crucial in this sentence, because we wanted to say that the relative proportion of genes regulated by DREAM remained controversial. It seems to us that the title of the review by Peuget & Selivanova (“p53-dependent repression: DREAM or reality?”) emphasizes this controversy. Nevertheless, in the revised manuscript, we now mention : “The relative importance of this pathway remains to be fully appreciated, because multiple mechanisms were proposed to account for p53-mediated gene repression [18]”. We hope the referee will find this phrasing more acceptable.

      1. Some parts of the manuscript are tiring to read - for example, pages 6, 7, and 8 which contain long listings and numbers of genes that are downregulated in differentiated BMC, found to be mutated in various disorders, bind DREAM components, were identified as downregulated by p53, etc. The authors may consider combining central parts of these data in a table that they show in the main manuscript which would make it easier to digest the information and at the same time significantly shorten the manuscript.

      We apologize if some parts of the article were tiring to read. We hope that the addition of Tables S17 and S26, as well as the Venn-like diagram in Figure 3c, will improve the reading of the manuscript.

      1. The supplementary tables (S1-S26) are combined in one Excel file with multiple tabs. The authors should label the tabs accordingly to make it easier for the reader to find a particular table.

      We labelled the Excel tabs in the revised manuscript, as suggested.

      1. At the end of page 6, the authors show that 17 genes found to be downregulated in differentiated BMCs are mutated in multiple bone marrow disorders, however, since they don't include references, it remains unclear where these mutations were originally described.

      In the revised manuscript, we included a supplementary table (Table S36) with appropriate references for blood and/or brain related phenotypes for the 106 genes associated with blood or brain abnormalities.

      1. On page 9, the authors state: "As a prerequisite to luciferase assays, we first verified that the expression of these genes, as well as their p53-mediated repression, can be observedin mouse embryonic fibroblasts (MEFs), because luciferase assays rely on transfections into MEFs (Figure 3b)." The authors don't explain why luciferase assays rely on transfections into MEFs and based on the caption of Fig. 3C, the luciferase assays were not performed in MEFs, but in NIH3T3 cells: "WT or mutant luciferase reporter plasmids were transfected into NIH3T3 cells..."

      According to the American Type Culture Collection (ATCC), the NIH3T3 cell line is a mouse embryonic fibroblastic (MEF) cell line, which explains why we had tested the expressions of candidate target genes in MEFs. However, as we now clearly mention in the manuscript, this cell line exhibits an attenuated p53 pathway, which improves cell survival after transfection but leads to decreased p53-mediated repression. These points are now clearly mentioned in the text and in a new supplemental Figure (Figure S9).

      Reviewer #3 (Significance):

      While the study supports a large body of publications proving that repression of cell cycle genes by the DREAM complex is crucial for cell cycle arrest and exit, it is noted that none of the main conclusions here are unexpected or particularly exciting. All the analyses are based on data sets that compare gene expression in highly proliferative cells with cells that underwent terminal cell cycle exit. Thus, a large portion of the genes that are downregulated in differentiated BMCs are cell cycle genes and well-established targets of DREAM and E2F:RB complexes. Furthermore, it is not surprising that some of these pro-proliferative genes are mutated in diseases connected to proliferation defects like anemias or microcephaly.

      Again, we agree with the referee that the DREAM complex is well known to regulate cell cycle genes, but many scientists or clinicians specialized in bone marrow failure syndromes or microcephaly diseases are not familiarized with the p53-DREAM pathway, and we think our study will be particularly useful to them. As for DREAM specialists, our strategy relying on disease-based ontology terms rather than cell cycle regulation led to identify many DREAM targets that were not reported in previous studies, and our positional frequency matrices led to identify DREAM binding sites not predicted by previous approaches. We hope that, by considering all these points together, the referee will acknowledge that our study provides a valuable resource for different types of readerships.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      1) It is interesting MxDnaK1 seems to prefer cytosolic proteins while Mx-DnaK2 prefers inner membrane proteins. The domain-swapping experiments seem to suggest that the NBD is important for this difference. How NBD is important is not addressed. Is it due to ATP hydrolysis, NBD-SBD interaction, or co-chaperone interactions?

      Answer: Thanks for your comments. We speculate that the co-chaperone interaction might be the key factor contributing to substrate differences. According to the working principle of Hsp70, its functional diversity is largely determined by substrate differences. Co-chaperones, such as JDPs, play a crucial role in this process as they possess the ability to bind substrates and facilitate their targeted delivery. Therefore, much of the functional diversity of the HSP70s is driven by a diverse class of JDPs 1,2. We found that NBD played important roles in cochaperone recognition of MxDnaKs. Additionally, it is generally accepted that the efficiency of ATP hydrolysis does not significantly impact the substrate recognition of Hsp70. Furthermore, if the NBD-SBD interaction is crucial, the substitution of either the NBD or SBDβ domain might result in similar cell phenotypes, as both alterations disrupt the original NBD-SBDβ interaction. We believe the DnaK proteins and their cochaperones both determine the substrate spectrums. We made corresponding modifications in the revised manuscript. (Page22; Line 488-494 in the marked-up manuscript)

      2) About the interactome analysis, since apyrase was added to remove ATP, it's surprising multiple Hsp40s were found in their analysis. Hsp70-Hsp40 interaction is known to require ATP. This may suggest some of the proteins found in their interactome analysis are artifacts. The authors should perform negative controls for their interactome analysis, such as using a control antibody for their CO-IP and analyze any non-specific binding to their resin.

      In addition, since JDPs were pull-down, is it possible some of the substrates identified are actually substrates for JDPs, not binding directly to DnaKs?

      Answer: This is an interesting question. As you correctly noted, the interaction between Hsp70 and Hsp40 requires ATP. In our experiment, we used apyrase to remove ATP in order to promote tight binding of substrate by DnaK. This methodology was initially described by Calloni, G. et al in 20123, and the authors also identified the co-chaperone protein DnaJ, but with a concentration higher than 77% of the interactors. In our opinions, the incomplete removal of ATP could be the underlying cause of this phenomenon.

      We apologize for the undetailed description in Methods. Actually, we implemented negative controls for each MxDnaK in order to eliminate the potential non-specific interactions with Protein A/G beads or antibodies. Specifically, we conducted a CO-IP experiment without the presence of antibodies to assess any non-specific binding to the Protein A/G beads. To further investigate non-specific binding to the antibodies of MxDnaK2 and MxDnaK1, we utilized the mxdnak2-deleted mutant (strain YL2216) and the MxDnaK1 swapping strain with the MxDnaK2 SBDα (strain YL2204), respectively. As the SBDα of MxDnaK1 was employed as antigen to generate antibodies, and YL2204 can’t be recognized by anti-MxDnaK1 (Figure S5). We believe these controls allowed us to evaluate and exclude the non-specific interactions in our CO-IP. We have improved our description in methods. (Page 27; Line 596-607)

      While one of the main functions of JDPs is to interact with unfolded substrates and facilitate their delivery to Hsp70, there may still be substrates that do not directly bind to Hsp70. It’s thus possible that some of the substrates identified only bind to JDPs. We made corresponding modifications in the revised manuscript. (Page 14; Line 290-292)

      3) For Figure S7, the pull-down assay used His6-tagged JDPs. Ni resin is known to bind Hsp70s non-specifically. It's not surprising DnaK showed up in all the pull-down lanes, especially considering how much DnaK was over-expressed. For some pull-down lanes, the amount of DnaK is much more than that of JDPs, further indicating artifact. The author should include negative controls such as JDPs without His6-tag or any irrelevant protein with His6 tag.

      Answer: Thanks for your suggestion. As you and another reviewer pointed out, there were some flaws in the experimental design of the pulldown assay. These include the non-specific binding of Hsp70 proteins to nickel resin, the absence of a negative control without a tag, and the inappropriate selection of the MBP tag. Thus, we employed the nLuc assay as an alternative to the pulldown experiment to validate the interaction between DnaK and JDP (Figure S9). While our manuscript employed nLuc to confirm protein dimerization, it is worth noting that nLuc assay was originally devised for investigating protein interactions 4.

      4) For the proposed dimer formation in Fig. 4C, there are multiple bands above the monomer bands. What are these forms? It seems the majority of the Cys residues that could form disulfide bonds are in the NBD of MxDnaK2 since constructs with MxDnaK2-NBD form some sort of high-MW bands above the monomer. Does MxDnaK1-NBD also contain Cys at the analogous positions? The fact that MxDnaK1 didn't show disulfide-bonded bands doesn't mean it doesn't form dimer. It depends on where the Cys residues are.

      It's nice the authors did Fig. 4D. However, the authors should include a positive control to show how strong the signal is for a true interaction before interpreting their results.

      Answer: Thank you very much for your comments. In at least three independent experiments, we consistently observed two unidentified bands within the molecular weight range of 70-100 kDa during the purification process of His6-MxDnaK2. These bands appeared to be intermediate in size between the monomeric and dimeric forms of His6-MxDnaK2, and disappeared upon DTT treatment. the unidentified band compositions have been confirmed by LC/MS. The upper band included MxDnaK2 (65.3 kDa) and anti-FlhDC factor of E. coli (WP_001300634.1, 27 kDa). In the lower band, we detected the presence of MxDnaK2 and the 50S ribosomal protein L28 of E. coli (WP_000091955.1, 9 kDa). Based on these findings, we conclude that these two additional bands are the result of the interaction between His6-MxDnaK2 and these two E. coli proteins. The related explanations have been added in the legend of Figure 5. (Page 42; Line 938-942)

      We analyzed the presence of Cys in MxDnaK1 and MxDnaK2. The NBD region of MxDnaK2 contains two Cys, located at positions 15 and 319. MxDnaK1-NBD contain a Cys at position of 316, which is the analogous position of 319-Cys of MxDnaK2. The analogous position of 15-Cys of MxDnaK2 is a Val in MxDnaK1, which might be an important factor contributing to the inability of MxDnaK1 to form oligomers.

      Thanks for your suggestion to add the positive control. We re-performed the nLuc assays including a positive control(αSyn). According to the working principle of the nLuc assay, the amount of fluorescent substrate is limited. Therefore, even for proteins that interact with each other, the fluorescence value gradually decreases and reaches a plateau, similar to the negative control. This gradual decline in fluorescence is a significant indicator of protein interaction. In Figure 4D (Figure 5D in the revision version), we only presented the results of the first 20 minutes of detection. The complete two-hour detection results have been added in the supplementary figure (Figure S14).

      5) line 48: "human HSC70 and HSP70 are 85% identical, and the phenotypes of their knockout mutants are different, which is consistent with their largely nonoverlapping substrates" The authors completely ignored that the promoters of HSC70 and HSP70 are very different.

      Answer: This is our carelessness. Yes, HSC70 and HSP70 exhibit distinct expression patterns, which play important roles in their functional diversity. We modified the sentence in the new version (Page 5; Line 58)

      6) Line 69: "The two PRK00290 proteins, not the other Myxococcus Hsp70s, could alternatively compensate the functions of EcDnaK (DnaK of E. coli) for growth." Please add references for this statement.

      Answer: Added, thanks.

      7) line 191: What's the mechanism for DnaK's role in oxidative stress? Is the disulfide bond formation in Fig. 4 related? Does disulfide-bond change the activity of DnaK?

      Answer: Thanks for your pertinent comments. Honestly, we have no idea about the mechanism for MxDnaK2's role in oxidative stress. In our previous studies, we determined that the deletion of mxdnaK2 resulted in a longer lag phase after H2O2 treatment. Here, our aim was to investigate the impact of region swapping on the cellular function of MxDnaK2. In other bacteria, the critical role that DnaK plays in resistance to oxidative stress stems from the pleotropic functions of this chaperone. By shortening the dwelling time that proteins spend in the unfolded state, the DnaK/DnaJ chaperone system minimizes the risk of metal-catalyzed carbonylation of the side chains of proline, lysine, arginine, and threonine residues, but none of them linked to the dimerization characteristic of DnaK 5-7.

      8) Fig. S9 seems redundant.

      Answer: Deleted, thanks.

      9) line 263, "but the NBD exchange was almost equal to the deletion of the gene with respect to phenotypes." But, the mutant has >50% activity in Fig. 3F.

      Answer: We apologize for the confusing description. The “phenotypes” here indicates “cell phenotypes”. What we really tried to say with this sentence is that the NBD swapping strain of either MxDnaK1 or MxDnaK2 presented identical cell phenotypes with the gene-deleted strain. As we have already provided a detailed description of this result earlier, now we consider this sentence to be redundant and have therefore deleted it. (Page 17; Line 355-356)

      10) line 221-226: the logic is not quite clear.

      Answer: We apologize for the confusing description. In M. xanthus DK1622, MxDnaK1 is essential for cell survival, and an insertion of a second copy of mxdnaK1 in the genome is required for deletion of the in-situ gene. Thus, To verify whether the NBD region is required for the essentiality of MxDnaK1, we performed the region swapping of the in situ MxDnaK1 gene in the att::mxdnaK1 mutant (a DK1622 mutant containing a second copy of mxdnaK1 at attB site), and successfully obtained the MxDnaK1 mutant swapped with the MxDnaK2 NBD region. The experiment indicated that the NBD of MxDnaK1 is essential for the cellular functions of the chaperone. We have added the information and modified the sentences in the manuscript. (Page 15; Line 308-319)

      Minor concerns:

      Please check spelling. There are some typos such as "HPPES" in the Methods.

      Answer: Corrected. Many thanks.

      My areas of expertise are protein biochemistry, genetics, and structural biology on heat shock proteins.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Major comments:

      The manuscript is very nice and interesting, although some of the authors' conclusions are perhaps not well supported by their data. For example:

      1) In the pulldown experiments the lack of interaction between 2747-MxDnaK2, 3015-MxDnaK2 and 1145-MxDnaK1 should be shown in order to support the conclusion made in line 197-198,

      Answer: This is our carelessness. As you and another reviewer pointed out, there are some flaws in the experimental design of the pulldown assay. These include the non-specific binding of Hsp70 proteins to nickel resin, the absence of a negative control without a tag, and the inappropriate selection of the MBP tag. Thus, we employed the nLuc assay as an alternative to the pulldown experiment to validate the interaction between DnaK and JDP (including 2747-MxDnaK2, 3015-MxDnaK2 and 1145-MxDnaK1 interaction) (Figure S9). While our manuscript employed nLuc to confirm protein dimerization, it is worth noting that nLuc assay was originally devised for investigating protein interactions 4.

      2) The only evidence that the NBD of MxDnaK1 is essential for bacterial growth is that this mutation couldn´t be obtained in M. xanthus. However, it could be purified in E. coli. Could the authors do some experiments with the M. xanthus strain without the chromosomal MxDnaK1 and then introduce a plasmid with the mutated gene?

      Answer: We apologize for the confusing description. Actually, we determined the NBD is essential not only from the mutation couldn’t be obtained. In M. xanthus DK1622, MxDnaK1 is essential for cell survival, and in-situ deletion of the gene could be obtained after an insertion of a second copy of mxdnaK1 in the genome at the attB site. To verify whether the NBD region is required for the essentiality of MxDnaK1, we performed the region swapping of the in situ MxDnaK1 gene in the att::_mxdnaK_1 mutant (a DK1622 mutant containing a second copy of _mxdnaK_1), and successfully obtained the MxDnaK1 mutant swapped with the MxDnaK2 NBD region. The experiment indicated that the NBD of MxDnaK1 is essential for the cellular functions of the chaperone. We have added the information and modified the sentences in the manuscript. (Page 15; Line 308-319)

      3) All the experiments with purified proteins were done with MxDnaKs bearing His-tags. It doesn't say explicitly its position, but as they employed a pET28A it is likely that the tag is at the N-terminus, which is close to the linker region. As this tag might interfere, it should be removed for the experiments, or at least a control done with the tag removed.

      Answer: We apologize for the lack of detailed description. As you pointed out, the His-tags are located at the N-terminus of DnaKs. The full lengths of MxDnaK1 and MxDnaK2 are 638 and 607 amino acids. The linker regions are located at amino acid positions 381-386 for MxDnaK1 and 387-392 for MxDnaK2. Therefore, we believe that the His-tag is not close to the linker regions. We have included the information in new manuscript. (Page 24; Line 544-546)

      The purified His6-DnaK proteins were employed for holdase activity assays and in vitro dimerization assays. Several previous studies have utilized the same holdase activity assay method with His-tagged DnaK 8,9. We suggested that the His-tag did not interfere with the holdase activity of DnaK. To exclude the influence of His-tag on oligomerization, we conducted a control with the tag removed in the in vitro dimerization assay and the result show no difference (Figure S13).

      4) The authors state that MxDnaK dimerized in vitro with the NBD, and to disrupt the dimer they used 100 mM DTT, which is a very high concentration. As the protein has the His-tag, it should be removed to corroborate that it is not interfering with the dimerization.

      Answer: Thanks for your suggestion. As mentioned above, to exclude the influence of the His-tag on oligomerization, we conducted a control with the tag removed in the in vitro dimerization assay and the result show no difference (Figure S13).

      5) Why were the pulldown experiments done with MBP-MxDnaKs? Can you show a negative control between the MBP and the JDPs to rule out this interaction? It will be more suitable to do the pulldown assays with the purified MxDnaK´s without the His-tags (and the His-tags JDP that were employed).

      Answer: Thanks for your suggestion. As mentioned above, there are some flaws in the experimental design of the pulldown assay. Thus, we employed the nLuc assay as an alternative to the pulldown experiment to validate the interaction between MxDnaKs and JDPs (Figure S9).

      Minor comments:

      • E. coli´s DnaK is only essential in heat shock conditions and for lambda phage cycle. If MxDnaK1 is similar to this Hsp70, why the substitution of its NBD for the NBD MxDnaK2 would be lethal for bacterial growth?

      Answer: Thanks for the comments. As you correctly point out, DnaK is nonessential in E. coli. But in some other bacteria, DnaK also plays an essential role in cell growth for different reasons 10-12. In our previous studies, we determined that MxDnaK1 is essential in M. xanthus DK1622, and the MxDnaK2 is nonessential. In this study, we performed region swapping and found that only the NBD of MxDnaK1 was unreplaceable. In our opinions, the result indicated that NBD play important roles in the functional diversity between MxDnaK1 and MxDnaK2.

      • I think that the writing should be revised and in the supporting information the captions of the figures should include more information.

      Answer: Thanks a lot for the suggestion. We revised the manuscript and added more information in the legends of supplementary figures.

      Reviewer #2 (Significance):

      -General assessment: This is a nice piece of work which would benefit from revision to address the comments above. The authors showed the roles and differences between two DnaK in the same organism. They track these differences to the subdomains of the MxDnaK´s and co-chaperones. It will be interesting for future works to explore more deeply the co-chaperones and their interactions.

      -Advance: I think that this manuscript fills a gap regarding the role of DnaK duplicated in bacterial strains. -Audience: I would say that the audience is broad and includes scientists interested in protein folding and chaperones, as well as myxobacteria.

      1. Rosenzweig, R., Nillegoda, N. B., Mayer, M. P. & Bukau, B. The Hsp70 chaperone network. Nat Rev Mol Cell Biol 20, 665-680, doi:10.1038/s41580-019-0133-3 (2019).
      2. Kampinga, H. H. & Craig, E. A. The HSP70 chaperone machinery: J proteins as drivers of functional specificity. Nat Rev Mol Cell Biol 11, 579-592, doi:10.1038/nrm2941 (2010).
      3. Calloni, G. et al. DnaK functions as a central hub in the E. coli chaperone network. Cell Rep 1, 251-264, doi:10.1016/j.celrep.2011.12.007 (2012).
      4. Dixon, A. S. et al. NanoLuc Complementation Reporter Optimized for Accurate Measurement of Protein Interactions in Cells. ACS Chem Biol 11, 400-408, doi:10.1021/acschembio.5b00753 (2016).
      5. Fredriksson, A., Ballesteros, M., Dukan, S. & Nystrom, T. Defense against protein carbonylation by DnaK/DnaJ and proteases of the heat shock regulon. J Bacteriol 187, 4207-4213, doi:10.1128/JB.187.12.4207-4213.2005 (2005).
      6. Santra, M., Dill, K. A. & de Graff, A. M. R. How Do Chaperones Protect a Cell's Proteins from Oxidative Damage? Cell Syst 6, 743-751 e743, doi:10.1016/j.cels.2018.05.001 (2018).
      7. Fredriksson, A., Ballesteros, M., Dukan, S. & Nystrom, T. Induction of the heat shock regulon in response to increased mistranslation requires oxidative modification of the malformed proteins. Mol Microbiol 59, 350-359, doi:10.1111/j.1365-2958.2005.04947.x (2006).
      8. Chang, L., Thompson, A. D., Ung, P., Carlson, H. A. & Gestwicki, J. E. Mutagenesis reveals the complex relationships between ATPase rate and the chaperone activities of Escherichia coli heat shock protein 70 (Hsp70/DnaK). J Biol Chem 285, 21282-21291, doi:10.1074/jbc.M110.124149 (2010).
      9. Thompson, A. D., Bernard, S. M., Skiniotis, G. & Gestwicki, J. E. Visualization and functional analysis of the oligomeric states of Escherichia coli heat shock protein 70 (Hsp70/DnaK). Cell Stress Chaperones 17, 313-327, doi:10.1007/s12192-011-0307-1 (2012).
      10. Shonhai, A., Boshoff, A. & Blatch, G. L. The structural and functional diversity of Hsp70 proteins from Plasmodium falciparum. Protein Sci 16, 1803-1818, doi:10.1110/ps.072918107 (2007).
      11. Vermeersch, L. et al. On the duration of the microbial lag phase. Curr Genet 65, 721-727, doi:10.1007/s00294-019-00938-2 (2019).
      12. Burkholder, W. F. et al. Mutations in the C-terminal fragment of DnaK affecting peptide binding. Proc Natl Acad Sci U S A 93, 10632-10637, doi:10.1073/pnas.93.20.10632 (1996).
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Kellner and Berlin present their research findings pertaining to the effect of GRIN2B variants that modify NMDA receptor function and pharmacology. While these mutations were published previously, the new manuscript provides a more thorough investigation into the effects that these variants pose when incorporated into heteromeric complexes with either wildtype GluN2B or GluN2A - NMDA receptors containing only a single mutated GluN2B subunits is more relevant to the disease cases because the associated patients are heterozygous for the variant. The authors achieved selective expression of receptor heteromeric complexes by utilising an established trafficking control system. The authors found that while a single variant subunit in the receptor complex is largely dominant in its effect on reducing glutamate potency of the NMDA receptor, it 's effect on receptor pharmacology varied. Unlike diheteromeric receptors containing mutated subunits, polyamine spermine potentiated GluN1/2B (but not GluN1/2A/2B) receptors that contained a single mutated GluN2B. In contrast, the neurosteroid, pregnenolone-sulfate (PS), was effective at potentiating the NMDA receptor currents (to varying degrees) regardless of the subunit composition. The potentiation of NMDA receptor currents by PS was also observed in neurons overexpressing the variants.

      The techniques used in this study were appropriate to address the objectives and the overall effects are large, and generally convincing. I like the way the results are presented, although have a few (easily addressable) comments.

      We thank the reviewer for the positive remarks on our manuscript.

      Major comments:

      #1 When incrementally adding drugs (e.g. traces in figures 5 and 6), it doesn't always appear like the response has plateaued before changing the solutions/drugs. Therefore, I am curious to what extent the effects observed are underestimated.

      The reviewer is correct to note that some responses do not necessarily reach a plateau, despite our efforts reach steady-state (as shown in most figures, e.g., Figs. 1-4, 6b, etc.), in particular when applying pregnenolone-sulfate (PS) (Fig. 5a, all traces in middle and bottom rows). However, in several instances, this was unobtainable due the very slow effect of the neurosteroid (its mode of action is from within the membrane) and the very large size of the cell (~1 mm). For these reasons, these experiments mandated excessively long exposures (~minutes) of oocytes to glutamate and PS (see scale bar- 20 secs) to try to reach steady-state, however this also caused deterioration to some cells (which did not return to baseline- and were therefore discarded). Thus, we eventually converged on settings whereby we did not expose oocytes to more than 4 minutes of the drug. Nevertheless, to try to estimate the extent of the underestimation (if any), we fitted the currents (standard mono-exponential fit, as previously reported1–3 (Suppl. 5a). We found that our application times of PS were, on average, three time the response’s time constants (tau) (Suppl. 5b), and we found a very weak relationship (R2 = 0.09) between the response to PS and time of its application (Suppl. 5c). These are now explicitly mentioned in the text (line #203), and in the legend of Suppl. 5. These thereby suggest that the reaction reached approximately 95% (1 - 1/e^3) of the steady-state value, and we are therefore confident that we have very small, if any, underestimation the extent of PS potentiation.

      2 Also, in relation to figure 6, to what extent does agonist application cause desensitization here? Looking at traces in Figure 6b it appears that there is some desensitization and it isn’t clear to what extent this persists during the solution changes.

      Agonist desensitization of NMDARs-currents is a well-known phenomenon, but it is very well established that it is not always observed in cells, including neurons (e.g., 4–7). In general, we did not observe very frequent desensitization’s (we provide a larger variety of traces of desensitizing and non-desensitizing currents (Fig. 6b Suppl. 7e and Suppl. 8a). Nevertheless, we explicitly note that in neurons, currents that didn’t reach steady-state after application of 100 mM NMDA were excluded from analysis (Methods - Patch clamping of cultured neurons, line #474), and in most cases desensitization was minor (or absent) following application of 100 mM NMDA and 100 mM PS (Fig. 6b).

      3 Could the authors conduct/show the controls where NMDA alone (for 50-60s), or NMDA followed by PE-S (without ifenprodil).

      These recordings are now shown in Fig. 6b and Suppl. 8a, (as opposed to Suppl. 7e).

      #4 Finally, figure 5 shows the effect of the neurosteroid (and ifenprodil) on NMDA-evoked currents in neurons overexpressing the GluN2B variants in neurons. However, there currents probably reflect a mixture of extrasynaptic and synaptic receptors. To what extent are synaptic NMDA receptors affected by the variants?

      To show the extent of the effect of the variants over synaptic receptors, we recorded miniature NMDA-dependent EPSCs; mEPSCNMDA), as described in our previous report8. We find that the varinats completely eliminate the appearance of mEPSCs (Suppl. 7a, b). Change in minis’ frequency is not the result of a presynaptic change or a change in synapse number9, as we have shown that AMPAR-mEPSC frequency was unaffected by the variants (i.e., synapse number and probability of presynaptic release are unchanged by the variants).

        To further address this, we also explored the relative synaptic vs. extrasynaptic distribution of the variants by using the established MK-801-protocol (to block all synaptic receptors during spontaneous activity, leaving extrasynaptic receptors unblocked)10,11. In neurons overexpressing the GluN2B-*wt* subunit, we obtained an extrasynaptic fraction of 38%, highly consistent with previous reports12,13. Overexpression of the variants, however, yielded a significantly and higher fraction (~50%) of the remaining current, supposedly suggesting more variant receptors at extrasynaptic loci (__Suppl. 8b, c__). However, due to the experimental settings we have chosen, the results from this experiment represent quite the inverse when involving extreme LoF variants. Firstly, 100 mM NMDA does not saturate variant receptors (whether pure, mixed di- or tri-heteromers, see __Table 1__). Secondly, normal neurotransmission does not open synaptic receptors containing mutant GluN2B-subunits, attested by the complete absence of mEPSCs (see __Suppl. 7a, b and __8,9). Thus, during the 10 minutes exposure to MK-801, only (mostly) purely *wt* receptors are blocked by spontaneous synaptic activity, and thus the second bout of 100 mM NMDA solely exposes the remaining *wt*-receptors. An increase in the number reflects more *wt*-receptors at the extrasynapse than the synapse. Thus, the observed increase in the fraction of extrasynaptic receptors in neurons overexpressing the variants, implies that the number of *wt*-receptors is necessarily decreased from the synapse and increases at the extrasynapse. We deem this to ensue due to the incorporation of the variants at the synapse. This increase cannot be explained by an overall increase in membrane expression of *wt*-receptors in neurons overexpressing the variants, as these cells show a strong reduction in Imax  (see __Fig. 6c and Suppl. 7e__). This is now detailed in the text (lines #270-290).
      

      Minor comments:

      5 Looking at the fits in the graph of Figure 2b it appears that the slope on the concentration response curves is less steep for the mixed 2B-diheteromeric NMDA receptors. How much are the Hill coefficients changing and can this be interpreted to provide more mechanistic insight? Wouldn't it make sense to include the Hill coefficients in Table 1?

      We agree with the reviewer’s observation. Actually, the mixed di-heteromers have a similar Hill coefficient (nH) as the purely di-heteromeric GluN2Bwt receptors (see Table 1), and these show the typical near nH ~1 (e.g., 14–16). The only diverging groups are the purely di-heteromeric variant-containing channels (G689C/S only containing receptors; nH~2). Although these may suggest positive cooperation between the subunits, we are less inclined to infer insights from the latter owing to the fact that we limited our examination to 10 mM glutamate (we limit exposure of oocytes to 10 mM glutamate due to artifacts arising past this concentration, as discussed in Kellner et al.8: Fig. 2—figure supplement 1). (this description is now mentioned in page lines #149, 318, 319).

      6 The authors illustrate the changes in potency by the shift in the concentration response curves, but is there any change in efficacy? A simple way to illustrate this would be also present a simple graph showing the maximum current amplitudes (i.e. to 10 mM glutamate) for each of the receptor complexes.

      We now provide these data in (Suppl. 2a, b). We would like to note however that the expression pattern of the tailed-receptors (i.e., subunits with carboxy-termini tagged with C1/C2 tails, see Fig. 1a) are less expressive in general when compared with the native subunits (Suppl. 2c). This description is detailed in lines# 162-166.

      #7 The authors characterize the 'apparent' affinity (or potency) of the receptor using concentration-response curves, but numerous points in the manuscript refer to changes in affinity. None of the experiments shown directly measure affinity (which would require ligand-binding assays) and so the use of the word affinity is inaccurate/misleading. I suggest the authors replace the instances of the word 'affinity' with 'potency'.

      We apologize for the confusion surrounding our use of the term affinity. In fact, we do initially define this term in introduction (page #4): “apparent glutamate affinity (EC50)” to differentiate from affinity (KD). Regardless, and to avoid confusion, we replaced all terms, as suggested by reviewer to potency.

      #8 In the third line of the abstract, the authors wrote, 'for which there are no treatments' in relation to GRINopathies. My understanding is that there are symptomatic treatments but that there are no disease-modifying treatments.

      Indeed, all current treatments are supportive, rather than provide a bona fide cure or disease-modifying. These are now better defined in the abstract.

      #9 The authors have interchangeably used the terms NMDAR or GluNRs throughout the manuscript. I suggest sticking to one of these terms. I would suggest NMDARs since this is less likely to be misread as a a specific NMDA receptor subunit.

      Agreed and corrected throughout manuscript.

      #10 Typos: 1) Results paragraph 2 sentence one: 'We thereby produced GluN2B-wt, GluN2B-G689C and GluN2B-G689S subunits tagged with C1 or C2, co-expressed these along with the GluN1a-wt subunits in...') Results paragraph 2: '...but these were mainly noticeable when oocytes are were exposed to high (saturating) glutamate concentrations...'
3) Last sentence in the second to last paragraph of the results section entitled 'Mixed di-and tri-heteromeric channels...': 'This , PS may serve to rescue...'
4) Last sentence in last paragraph of the results section entitled 'Mixed di-and tri-heteromeric channels...': 'Despite the latter, we found no evidence for any direct effect of three different physiologically relevant concentrations of the drug on di- or tri-heteromeric receptors'

      All typos corrected.

      #11 Figures 1e, 2b, 3b: it would be helpful to add a legend to the graph so that the curves can be interpreted without having to read through the figure legend.

      Corrected.

      #12 The bar graphs in Figure 6 show individual data points but those in figures 4b and 5b don't. Can the authors please add the data points to these graph.

      Individual data points have been added.

      #13 It would be helpful to reviewers that future manuscripts by the authors include page numbers and line numbers.

      Included.

      **Referees cross-commenting**

      #14 Reviewers 2 and 3 highlight an important issue concerning figure 6 and the extent to which the overexpressed variants subunits can compete and assemble with endogenous NMDA receptors (unlike the system where the surface expression of specific receptor complexes is controlled). Indeed in the recent paper by the same authors, the two variants differed in their surface expression (in HEK cells), with G689C expressing particularly poorly. With reference to the second minor comment of Reviewer 1, the maximum current amplitudes would of course need to be normalized to cell surface expression of the receptor to gain any insight into efficacy.

      We provide maximal current amplitudes (Imax) as a proxy for expression level as typically done (e.g.,8,17). These are now shown in Suppl. 2a, b (and see our response to comment #6, above). We would like to emphasize that we find it challenging to gain insights about efficacy of the variants in neuronal synapses, as we purposefully express non-C1/C2 tagged subunits in neurons (as we covet assembly of the variants with endogenous subunits). Moreover, the C1/C2-tagged subunits (whether wt or variants) are less expressive compared to their non-tagged NMDAR-counterparts. For instance, tagged GluN2B-wt subunits express at ~50% compared to non-tagged GluN2B wt subunits (Suppl. 2c). Thus, we find that efficacy of the C1/C2 tagged-subunits is less relatable to the non-tagged subunits (which are used in neurons and likely more relevant to the disease).

      Despite the latter, we deem that we have specifically addressed this issue by measuring miniature EPSCs (mEPSCs) (see our reply to comment #4, Suppl. 7a, b). Briefly, even though the non-tagged G689C expresses at ~40% compared to other subunits (in oocytes and mammalian cells8), in neurons it engenders a robust (and highly significant) negative effect over synaptic currents (mEPSCs), as strong as the G689S-variant which expresses much more robustly (non-tagged G689S expresses to same extent as wt subunits). This demonstrates that the reduced efficacy the tagged subunits is less relatable to the non-tagged subunits and, importantly, it does not hinder the variants’ ability to incorporate within the synapse and affect function (i.e., exert a dominant negative effect). Here, we extend these observations towards the major postnatal channel subtype, namely tri-heteromers (2A/2B*), and therefore demonstrate that the robust dominant negative effect of G689C and G689S variants is likely due to their ability to incorporate within the predominant receptor subtype at the synapse (Suppl. 8).

      Reviewer #1 (Significance (Required)):

      This study emphasizes the complex pattern of effects that variants can have on glutamate receptor function and pharmacology, especially considering the context of receptor subunit composition. The authors have followed up their previous findings on the same mutants (Kellner et al, 2021, Elife), but used a trafficking control system here to characterize properties of mutated receptor complexes that are most likely to exist in neurons. The authors show that the defective currents mediated by NMDA receptors containing a loss-of-function GluN2B variant can be enhanced by neurosteroids (and in the case of GluN1/2B receptors, polyamines also). Development and approval of neurosteroids for the clinic would be required for the findings to translate to a therapy for patients. Readers should also be aware that neurosteroids act on other receptors too (e.g. GABA receptors), which could complicate the outcome. The expertise of the reviewer is in glutamate receptors and synaptic transmission.

      We agree with the reviewer’s comment pertaining to challenges in translating PS to the clinic. Indeed, we explicitly mentioned its inhibitory effect on GABAA receptors (see line #366-367 and reference 18), as well as note its potential negative effect over GluN2C/D-containing receptors (line #365 and reference 19). We further describe alternative neurosteroids and means to bypass the limitations of PS, for instance by use of 24(S)-hydroxycholesterol6,18 or synthetic analogues (SGE-201, SGE-301)6. Lastly, we also propose a novel therapeutic approach, for which we did not find any mentions in the literature with regard to GRINopathies, consisting of the use of the FDA-approved Efavirenz (anti-retroviral compound20) to promote activity of cytochrome P450 46A1 (CYP46A1) to increase amounts of 24-S in the brain (discussion, lines #370-383).


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      #1 The objective of this paper is to assess whether a single mutated subunit of GRIN can affect the function of various forms of NMDA receptors. In particular, this study investigates the functional consequences of a GRIN variant when assembled within tri-heteromers, containing 2 GluN1, 1 GluN2A and 1 GluN2B subunits, the major postnatal receptor type. For this purpose, the authors artificially forced the subunits to associate in predefined complexes, using chimeras of GRIN subunits fused to GABAb receptor retention control sites at the endoplasmic reticulum. This trick allows to control the stoichiometry of the channels at the membrane and thus to focus on the function of a single type of NMDA receptor.The take home message of the paper is that a single GluN2B‐variant, whether assembled with a GluN2B‐wt subunit to form mixed di‐heteromer or with a GluN2A‐wt‐subunit (tri‐heteromer), strongly impairs the receptor functioning, as reported by a decrease of the apparent glutamate affinity of the receptor.

      Altogether, this is a straightforward study of great interest for the GRIN community.

      We greatly appreciate the reviewer’s comment about the relevance of our work towards the GRIN-community.

      2 However, the way the background and purpose of the study (title and abstract) are presented is a bit confusing for non-specialists and could be easily improved. Technical information, which is crucial to validate the conclusions drawn from data analysis, should be added to the article. Some additional experiments are suggested to consolidate the work. Finally, additional discussion points are strongly encouraged.

      We apologize for not making the paper more accessible to a broader readership. We did so for the sake of brevity. Nevertheless, we have re-written major parts of the manuscript to address this issue and retitled the report: “Rescuing Tri-Heteromeric NMDA Receptor Function: The potential of Pregnenolone-Sulfate in Loss-of-Function GRIN2B Mutations”.

      Specific comments

      Abstract / Title:

      3 This work shows that a single GRIN variant impairs the function of various forms of NMDA receptors. Several sentences in the title and the abstract are confusing for a non-specialized audience. "Two extreme Loss‐of‐Function GRIN2B‐mutations are detrimental to triheteromeric NMDAR‐function, but rescued by pregnanolone‐sulfate." "Here, we have systematically examined how two de novo GRIN2B variants (G689C and G689S) affect the function of di‐ and tri‐heteromers." The number of variants tested is not of capital importance in the title, especially because one could believe that both are tested at the same time; similarly, when variants are named in the abstract, the fact that only 1 variant is studied at a time should be clarified (G689C OR G689S). Indeed, the problem is obvious to those familiar with GRIN disorders, but if this paper is to be published in a journal reaching a large audience rather than a specialized audience, the title of the paper should be modified.

      As noted in our reply to comment #1 of this reviewer, we apologize for not making the paper more accessible and have therefore changed the title and re-written major parts of the manuscript to address this issue. We would like to note that we appreciate the reviewer’s comment and intent to increase the readership of our manuscript.

      #4 "We find that the inclusion of a single GluN2B‐variant within mixed di‐ or tri‐heteromeric channels is sufficient to prompt a strong reduction in the receptors' glutamate affinity, but these reductions are not as drastic as in purely di‐heterometric receptors containing two copies of the variants. This observation is supported by the ability of a GluN2B‐selective potentiator (spermine) to potentiate mixed diheteromeric channels." Please, clarify the link between the two sentences. How do spermin potentiation of mixed diheteromeric channels supports the observation that the inclusion of a single GluN2B‐variant has less effect than the inclusion of two variants?

      Our intention was to highlight that mixed di-heteromeric channels (2B/2B*) are less “damaged” (this is the link) than purely di-heteromeric channels (2B*/2B*).Explicitly mixed di-heteromers show less reduction in glutamate potency AND are also spermine-responsive, whereas purely mutant di-heteromers (2B*/2B*) show reduced potency, BUT do not respond to spermine at all. We have rephrased the sentences in our current manuscript to be clearer:

      For instance: The positive responses of mixed di-heteromers, compared to the null effect over pure di-heteromers, is likely the result of the restored pH-sensitivity of mixed di-heteromers (Suppl. 3). This was surprising as the minimal and essential rules of engagement for potentiation by spermine are not well established, particularly in the case of tri-heteromers21,22 (see discussion, lines #341-353).

      Methods

      #5 All this study is based on the use of a unique ER‐retention technique to limit expression of a desired receptor‐population at the membrane of cells. According to the ER system retention of GABAb receptor, used in this study, while C1/C1-fused subunits are retained in the ER, C2/C2 reach the cell surface and the association of C1/C2 in the ER enables cell-surface targeting of the heterodimer. However, GB2 does not contain any retention signal and can reach the cell surface in the absence of GB1, as a functionally inactive homo-dimer (doi: 10.1042/BJ20041435). If there is an experimental trick that prevents the addressing of C2/C2 to the cell membrane, it should be specified and explained. This is critically important for understanding which receptor populations the data are derived from: receptors containing C1/C2 fused subunits only as stated by the authors, or C1/C2 and C2/C2 fused subunits?

      We base our experiments on two seminal reports—23,24—that have developed this unique method (which we also refer to in the text, lines# 112-116). Briefly, the method employs the binding motifs of GABAB1 (GB1) and GB2 subunits and ER-retention motifs (these are now better detailed in Methods section, line # 448). Previous reports explicitly demonstrate that C1/C1- OR C2/C2-containing receptors do not reach the plasma (or very minimally) and we have reproduced these data with our variants (C1/C1: Suppl. 1a-d; C2/C2: Fig. 1a-c).

      Figures #6 NMDA-receptor current amplitude should be normalized by the membrane expression of the receptors. A preliminary experiment should measure the effective cell surface expression of each of the subunits in the different transfection conditions.

      To address the effective cell surface expression, we employed Imax as a proxy for functional expression (e.g.,8,17). These are now shown in Suppl. 2a, b (and see our response to reviewer 1, comments #6 and 14). Expectedly, we find significantly reduced efficacy by the varinats compared to wt-receptors, and the purely mutant di-heteromeric receptors exhibit the weakest efficacy. We have also addressed this issue by measuring miniature EPSCs (mEPSCs) (see our reply to reviewer 1, comment #4,). We find the variants to abolish mEPSCNMDA frequency (Suppl. 7a, b). This shows that the variants’ reduced efficacy translates to elimination of synaptic activity (dominant negative effect) (also seen in Suppl. 8).

      #7 Fig.1a

      The scheme should include C2-C2 complexes and mention whether these complexes are expressed at the cell surface (see previous and following comments).

      As noted in our reply to comment #5 of this reviewer (above), C2/C2-containing receptors do not reach the plasma membrane (Fig. 1a-c). To avoid confusion, we have now added this scheme to the cartoon presented in Fig. 1a and have provided a more detailed description of the method and clones produced in the Methods section (line # 448).

      #8 Fig.1b and c

      Current from cells transfected with GluN2B‐wt‐C1 and GluN2B‐wt‐C2 should be compared to current expressed in cells expressing untagged receptor subunits: GluN2B‐wt Current from cells transfected with GluN2B‐wt‐C1 alone should be shown as well (although expected to be retained in the ER) (as performed for GluN2A‐wt‐C1 GluN2B‐wt‐C1 in suppl Fig. 1a)

      Current comparisons of oocytes expressing tagged GluN2B‐wt‐C1 and GluN2B‐wt‐C2 and non-tagged GluN2B‐wt are now demonstrated in Suppl. 2c. The results indicate that the “tags” (C1 and C2) affect the expression of the subunits. We have also added a sample trace of current from a cell expressing the GluN2B‐wt‐C1 alone (Fig. 1b).

      9 How could you explain the null current from cells transfected with GluN2B‐wt‐C2 alone (Fig.1b middle, and 1c)? since GB2 does not contain any retention signal and can reach the cell surface in the absence of GB1, GluN2B‐wt‐C2 is supposed to reach the cell surface. This is a very important point to clarify (I am probably missing a technical detail) because if the sub-unit tagged with C2 does reach the cell surface, then all the results and conclusions drawn from the C1-C2 conditions are wrong and could be attributed to a mix of complexes containing either C1-C2 or C2-C2.

      We now realize that the reviewer was missing a crucial technical detail, namely how the clones are designed. Briefly, all clones have ER retention motifs and cannot reach plasma membrane unless they necessarily assemble as C1/C223,24. Also, please see our replies to comments #5, 7 to this reviewer (and Methods section, line # 448).

      My following comments are based on the assumption that only receptors containing C1-C2 tagged subunits reach the membrane (as assumed by the authors and suggested in Figure 1b middle), but explanations should absolutely be provided to convince the reader. Fig. 4a and 5a (see our above replies to comments #5, 7 and 9; and references 23,24).

      #10 Please, keep the current scale constant between all current illustrations within the same figure (4a and 5a). Indeed, not only the Spermin- or SP- induced potentiation is an important data (which is presently quantified on the histograms of fig. 4b and 5b) but also knowing whether the amount of current recorded in cells expressing one mutant subunit in presence of SP (for example GluN2A‐wt‐C1 GluN2B‐G689S‐C2) is comparable to the current recorded in wt receptor-expressing cells (GluN2A‐wt‐C1 GluN2B‐wt‐C2) in absence of SP would be an excellent added value for the paper. A special figure could quantify this rescue effect of SP, measuring and comparing the mean currents recorded in these conditions (one current illustration is not sufficient given variations between similar samples). By the way 5mM glutamate might be an excessive concentration. At 1mM, the expected synaptic concentration of glutamate following action potential, according to figures 3 and Suppl1 the response of the mutated receptor is much lower than that of the WT which is already almost maximal. In these conditions, SP-induced potentiation by a factor of two of GluN2A‐wt‐C1 GluN2B‐G689S‐C2 current could be equivalent to control currents recorded in GluN2A‐wt‐C1 GluN2B‐wt‐C2 cells.

      We have rescaled all current amplitudes in Figs. 4 and 5 to be identical in size for easier comparison.

      We have added all current amplitudes to try to examine the rescue effect of the two drugs in cell overexpressing a specific channel subtype, as requested (Suppl. 4). We find that; indeed, the potentiated currents of the mutant receptors reach (or even surpass) the basal Imax (i.e., current before potentiation) of the non-mutated receptors (Suppl. 4, dashed statistics bar).

      In neurons, we address this in two ways. First, we show that the total NMDA-current is reduced by expression of the variants, and this current is “normalized” by PS (Fig. 6a-c). Similar reductions in Imax (by the variants) are shown in Suppl. 7e (to provide more examples). Secondly, neurotransmission (i.e., 1 mM glutamate25,26) is not sufficient for activating mutant receptors, certainly not pre-di-heteromers (see Table1, EC50 and Suppl. 7a, b- no mEPSCs)27–29. Therefore, 5 mM was required. Together, these strongly suggest that PS may normalize the currents of different receptors that respond to PS (under physiological settings and not 1- or 5mM NMDA). As suggested by the reviewer, there are many subtypes, and some may be activated by ambient glutamate (as suggested by application of PS onto neurons without opening the receptors by NMDA; see Suppl. 7c, d).

      #11 Fig. 6

      Figure 6 is not convincing because cultured hippocampal neurons do express endogenous NMDA receptors. To what extent the recording currents are affected by endogenous, non-mutated GluN2B subunits? Western Blots showing an extinction of endogenous subunits expression when transfected tagged subunits are competitively expressed would be required.

      We have previously shown that the two variants incorporate very efficiently within synapses, causing a very robust elimination of synaptic currents (by measuring miniature NMDA-dependent EPSCs; minis) [see Fig. 8 in Kellner et al. eLIFE, 202127, and see review by Sabo et al.9 ). Change in minis’ frequency can be interpreted as either a presynaptic change or a change in synapse number, however we observed that AMPAR-mEPSC frequency was unaffected by these variants. These imply that synapse number and probability of release are unchanged by the variants. As the experiments are performed in wild-type neurons, (which express wild-type GluN2A and -2B), the dramatic effects we observed on minis suggests a dominant-negative effect of these disease-associated GluN2B variants. These are consistent with our observations that mutant subunits can co-assemble with wild-type GluN2B and/or GluN2A subunits. We have now reproduced this experiment (in fact, we employ this strategy prior each experiment to ensure expression of the variants) (Suppl. 7a, b). This thereby shows that there are no available wt-receptors at the synapse.

      As there are various pools of NMDARs at synaptic and extrasynaptic sites, we did not think that a western blot would sufficiently differentiate between the latter, and thereby would not provide insight about extinction of wt-receptors (which could be simply pushed to other sites compared to synapse). Moreover, the intracellular pool of receptors is much larger than the amount of NMDARs that can be detected at the membrane (e.g., 30,31), and therefore electrophysiology seemed to be a better means to monitor membrane receptors only:

      Thus, to examine the distribution of the variants between synaptic- and extrasynaptic loci, we employed a standard procedure consisting of the use of the activity-dependent blocker MK-801 (Methods). Briefly, neurons were persistently bathed in TTX during which they were probed for Imax using 100 mM NMDA (to refrain from activating other GluRs), followed by application of MK-801 for 10 minutes to exclusively blocks synaptic receptors (that open following action-potential independent miniature neurotransmission). This thereby spares all extrasynaptic receptors from being blocked by MK-801, which are subsequently revealed by a second application of 100 mM NMDA (Suppl. 8a, inset)12. In neurons overexpressing the GluN2B-wt subunit, we obtained an extrasynaptic fraction of 38%, highly consistent with previous reports12,13. Overexpression of the variants, however, yielded a significantly and higher fraction (~50%) of remaining current (Suppl. 8b, c), but instead of reflecting a larger pool of extrasynaptic receptors, the experiment represents quite the inverse when involving LoF variants. Firstly, 100 mM NMDA does not saturate variant receptors (whether pure, mixed di- or tri-heteromers, see Table 1). Secondly, normal neurotransmission does not open synaptic receptors containing mutant GluN2B-subunits, attested by the complete absence of mEPSCs (see Suppl. 7). Thus, during the 10 minutes exposure to MK-801, only wt receptors are blocked by spontaneous synaptic activity, and thus the second bout of 100 mM NMDA solely exposes the remaining wt-receptors at the extrasynapse. Thus, the observed increase in the fraction of extrasynaptic receptors, in neurons overexpressing the variants, implies that the number of wt-receptors is necessarily decreased from the synapse and increases at the extrasynapse, most likely due to the incorporation of the variants at the synapse. This increase cannot be explained by an overall increase in membrane expression of wt-receptors in neurons overexpressing the variants, as these cells show, yet again, a strong reduction in Imax as seen above (see Fig. 6c and Suppl. 7e) (lLines #270-291). These thereby suggest that purely wt-receptors are not necessarily eliminated from the membrane (extinct), rather pushed outside of the synapse.

      12 Fig.6b “PE-S” on the graph should be replaced by “PS”

      Typo corrected.

      Discussion #13 The authors are surprised by the fact (Fig.2) that 1 variant reduces the apparent glutamate affinity of the receptor, but not as much as 2 variants, despite the fact that "NMDARs opening requires all four subunits to be liganded (i.e., occupied by a ligand) which implies that the least affine subunit should have dominated the final affinity of the receptor". I agree that the difference is noticeable, however the glutamate affinity for receptors containing 1 variant is much closer to that of receptors containing 2 variants than that of wild-type receptors. Hence, the results obtained do not seem so surprising and could result, as rightly explained by the authors from a possible cooperativity between the subunits.

      We agree with the reviewer that glutamate potency of receptors containing 1 variant subunit is much closer to that of receptors containing 2 variant subunits. However, we maintain our surprise because we expected it to equal (not just close) to the potency of the least affine subunit (the limiting factor). This is based on the notion that all four subunits need to be liganded for channel opening4,32–34. We gently raise the possibility of potential cooperativity (Table 1, see Hill-coefficient and 33,35,36), as well as mention that this may also stem from the variants’ lower proton sensitivity (Suppl. 3), which has also been shown to promote motions of the ATD (amino terminal domain) and increase open probability (positive cooperativity)36. Nevertheless, we are very careful with interpreting the Hill coefficient , as we limited exposure of oocytes to 10 mM glutamate due to artifacts arising past this concentration (see Kellner et al.8: Fig. 2—figure supplement 1). This description is now mentioned in page lines #149, 318, 319. Thus, even the slightest underestimation of the maximal reposnse would surely affect the slope.

      #14 On the other hand, the data in Figure 6 are much more difficult to interpret and reconcile with the nature of the expressed receptor subunits (which this time is not controlled) nor their association within the same receptor. However, this aspect, which is essential to the understanding of the consequences of 1 variant on neuronal signalling, is not discussed: Whatever the stoichiometry of the complexes in the heterozygous disease, the mutated and wild type GluN2B subunits coexist in the same cell: Either within the same di-heteromeric complexes GluN2B-wt + GluN2B-mutant, or in separate complexes but nevertheless expressed in the same cell, in di heteromeric (GluN2B-wt + GluN2B-wt and GluN2B-mutant + GluN2B-mutant); or tri-heteromeric (GluN2A-wt + GluN2B-wt and GluN2A-wt + GluN2B-mutant) complexes. Assuming that half of the complexes remain wild-type, e.g. (GluN2A-wt + GluN2B-wt and GluN2A-wt + GluN2B-mutant) we would expect (Fig. 6) a small decrease in NMDA current (carried only by the half that expresses the mutated subunit, and whose function is not zero but only decreased by about 20% in response to 5 mM Glutamate, Fig. 3b). The same reasoning applies to the di-heteromeric conditions (GluN2B-wt + GluN2B-wt; GluN2B-mutant + GluN2B-mutant), here again the decrease observed Fig. 6b is difficult to reconcile with the responses measured Fig. 2b.

      In other words, how to explain a 50% decrease of the currents, instead of the 10% expected by the previous reasoning. In this experiment we do not know which subunits are expressed, their proportions, nor how they are associated in functional complexes, which makes the interpretation of the data impossible. The only explanation, far-fetched, for 50 % decrease would be that the complexes were to contain all (or the vast majority) 1 wild-type subunits associated with 1 variant, then a homogeneous 50% reduction in current could be expected. But this extreme condition could only be possible in the case of di-heteromers, which is unlikely the case in Fig.6 as GluN2A currents are measured in presence of Ifenprodil. To conclude

      1) the comparison of the currents in transfected and non-transfected neurons does not make sense in figure 6b which is not convincing because we do not know the nature of the currents actually measured. A comparison in controlled condition would make more sense (as I suggested in the criticism of figures 4, 5).

      2) The reality of the combinations of expression and association between subunits within different complexes expressed in the same cell must be considered and taken into account in the interpretation of the data. Undoubtedly, the means of restoring the NMDA current will be different depending on the presence of mutated subunits in all functional channels or not.

      Indeed, neurons express a variety of different combinations of channel stoichiometry, including following transfection with the variants. We do find find that the effect on whole-cell current is indeed ~50% (Fig. 6b, c), thereby safe to assume that 50% remain “wt”, but we do not know how they distribute between synaptic and extrasynaptic loci. Our results however argue against 50% remaining receptors at the synapse. First, mEPSCNMDA disappear (Suppl. 7a, b and see reply to comment #11 of this reviewer), but wt-receptors are still at the membrane, and they seem to be moving out of synapse (Suppl. 8). Thus, we can only state with higher certainty that the variant subunits are very efficient in incorporating within mixed or pure receptors, especially at the synapse.

        We also consider that the reduction in the whole-cell current observed in __Fig. 6b, c__ is not due to the remaining 50% GluN2B-*wt*-containing receptors, rather likely due to other variants, notably GluN2A, which are more prominent at postnatal stages37, such as in our case. In support, we see a large remaining current after saturating ifenprodil application (__Suppl. 7 e, f__)38. Thus, the variants incorporate within all 2A/2B membrane receptors, at the synapse and outside it (i.e., extrasynaptic) (see __Suppl. 8, c__).
      

      **Referees cross-commenting**

      The referees' comments are highly relevant. In particular, referee 3's comment 1 seems very interesting because it may help to better understand the discrepancy in the results observed in neurons, i.e. a 50% decrease in the currents induced by the expression of the mutant and wild type subunits in the same cells, whereas theoretically one would expect a 10% decrease of this current (cf. referee 2's 2nd comment in the discussion section). This comment 1 of referee 3 indeed stresses the fact that the control (non-transfected neurons) to which the heterozygous condition is compared is not the correct control, which should rather be neurons transfected with wild type receptor subunits. More generally, this comment underlines the importance of monitoring the effective membrane expression of the different subunits in each of the experimental conditions in order to be able to compare conditions and draw conclusions.

      We initially did not perform this control as the literature paints a clear picture whereby expression of the GluN2B-subunit (without adding excess of the GluN1 subunit) does not instigate a robust increase in surface expression of NMDARs (and thus current remains the ~same) 4,39–43, and see our reply to comment #14 (above), and reviewer 3 comment 1 (below). Nevertheless, we have now performed this test by overexpressing GluN2B-wt. In support of previous reports, we do not find any statistical difference in current size between non-transfected neurons and neurons solely overexpressing the GluN2B-wt subunit (Fig. 6a, b). Furthermore, application of PS onto naïve or GluN2Bwt expressing neurons yields identical currents (Imax) and potentiation (Fig. 6c, d). These argue that we did not obtain “overexpression”.

      We suggest that the 50% reduction in current size between neurons expressing the mutant and wt expressing neurons stems from the integration of mutant subunits and their dominant negative effect. Evidence for this incorporation is provided by the very strong reduction in synaptic currents (suppl 7a, b), and the supposedly higher abundance of wt-containing receptors in extrasynaptic regions (see reviewer 1 comment 4 and suppl 8). This is

      Reviewer #2 (Significance required):

      The novelty of the study, is to evaluate the consequences of a single mutated subunit within NMDA receptors affected by GRIN variant, to mimic the heterozygous condition of GRIN encephalopathies, this is of potential value for the field and the interest could also be extended to other genetic diseases (at least the experimental way to study the functioning of only one desired stoichiometric configuration). The strength of this paper is precisely to isolate technically and to study the functioning of a desired stoichiometric configuration only. The main limitation of the paper is the interpretation that the authors make of their data in a physio-pathological context. This work could be of interest for general audience, providing the title and summary are slightly modified. My area of expertise could not be closer to the topic of the article: Glutamate receptors; GRIN; molecular tinkering, cell culture, electrophysiology, receptor stoichiometry...

      We thank the reviewer for noting the value in our work and its potential contribution and interest to the field and other diseases. Per reviewer’s suggestion, we have modified the title and text to suit a larger audience.

      Reviewer #3 (Evidence, reproducibility and clarity (Required):

      This paper is a follow up of an earlier paper published by the group (Kellner et al., eLife 2021), which aimed at characterizing the functional properties of two de novo GluN2B mutations in patients suffering from severe pediatric diseases, GluN2B-G689C and -G689S. NMDA receptors (NMDARs) are tetramers composed of two GluN1 and two GluN2 subunits. A single receptor can incorporate either two identical GluN2 subunits (di-heteromers) or two different GluN2 subunits (tri-heteromers), leading to a large diversity of NMDAR subtypes. The main NMDAR subtypes in the adult forebrain are GluN1/GluN2A and GluN1/GluN2B di-heteromers, as well as GluN1/GluN2A/GluN2B tri-heteromers. While the exact proportions of these three subtypes are still contentious, there are evidence that in the adult N1/2A/2B tri-heteromers form the major population of synaptic NMDARs in the adult forebrain. In addition, patients bearing pathogenic mutations are often heterozygous for the mutation, giving rise to mixed NMDARs incorporating one mutated and one intact GluN2 subunit. In their previous paper, Kellner et al. had shown that purely di-heteromeric GluN1/GluN2B-G689C and -G689S mutants display a drastic (> 1,000-fold) decrease of glutamate sensitivity and a decrease of surface expression. In the current paper, the authors characterize the effects of the -G689C and -G689S mutations on N1/2A/2B tri-heteromeric receptors, as well as on mixed di-heteromeric GluN1/GluN2B receptors containing one copy of the wild-type GluN2B subunit and one copy of the mutated GluN2B subunit. They show that one copy of the mutant subunit, either within mixed diheteromeric or tri-heteromeric receptors, is sufficient to decrease drastically glutamate sensitivity, although the shift in glutamate EC50 is not as strong as in pure di-heteromeric receptors (≈ 500-fold). They furthermore explore strategies to counteract the hypofunction induced by these mutations by testing the effect of positive allosteric modulators (PAMs). They show that spermine, a GluN2B-specific PAM, can potentiate the activity of mixed diheteromeric N1/2B but not N1/2A/2B tri-heteromers. However pregnenolone sulfate (a 2A/2B-specific PAM) can potentiate both the activity of mixed diheteromeric and tri-heteromeric NMDAR populations, either in oocytes or cultured neurons.I have very few major comments to make. The experiments are straightforward and the adequate controls have been made. Here are my two only major comments:

      We thank the reviewer for the very detailed overview of our work and for appreciating our controls and methods.

      #1 About the experiment on cultured neurons. The authors compare the currents of cultured neurons transfected with GluN2B-G689C and -G689S to non transfected neurons. The adequate control is rather neurons transfected with the wild-type GluN2B subunit to even out any phenomenon linked to transfection of the neuron. Given the overexpression that can occur after transfection, the effect of the mutations on the size of NMDAR currents might be even stronger than what the authors show. However in that case PS might not completely rescue mutant NMDAR currents to wild-type levels.

      We initially did not perform this control as the literature paints a clear picture whereby expression of the GluN2B-subunit (without adding excess of the GluN1 subunit) does not instigate a robust increase in surface expression of NMDARs (and thus current remains the ~same) 4,39–43, and see our reply to comment #14 (above), and reviewer 3 comment 1 (below). Nevertheless, we have now performed this test by overexpressing GluN2B-wt. In support of previous reports, we do not find any statistical difference in current size between non-transfected neurons and neurons solely overexpressing the GluN2B-wt subunit (Fig. 6a, b). Furthermore, application of PS onto naïve or GluN2Bwt expressing neurons yields identical currents (Imax) and potentiation (Fig. 6c, d). These argue that we did not obtain “overexpression”. Thus, our results and interpretations hold true, and are therefore not underestimation of the effects of PS in neurons.

      2 How come high concentrations of glutamate (>100µM) produce additional current on wt GluN1/GluN2B (with retention signals) compared to 100 µM glutamate, which is supposed to be saturating? It does not seem to stem from an osmotic effect since 10 mM glutamate does not produce any current on uninjected oocytes. Knowing that this "artefactual" effect might also occur in the mutant receptors, how do you take this effect into account when calculating the glutamate EC50s of the mutants? Given the drastic shift in EC50 produced by the mutant, taking into account this artefact is not going to change the conclusion, but the actual EC50s will be affected.

      GluN1/GluN2B-wt receptors (with or without retention signals) are indeed saturated at 100 mM glutamate. However, excessively large concentrations of glutamate (>100 mM) may yield artefacts even in non-injected oocytes (in 10 mM, this occurs in ~20% of the cells, see Kellner et al 20218—Fig. 2 and Suppl. 1c, d) as well as in GluN2B-wt injected oocytes (supplementary Table 1 in 44). This is not due to osmolarity, as rightly mentioned by the reviewer (and below), rather possibly by endogenous glutamate receptors and transporters that do not readily contribute to current amplitude (these are extremely small currents), but can cause deterioration of the cell (and enhance ‘leak’) when activated for prolonged times by very large concentrations (e.g.,45). In fact, we explicitly report these to highlight potential artefacts, as these are often overlooked in the field. Regardless, most reports do no go past 100 mM glutamate, not even when describing GRIN2 mutations since most mutations do not cause such drastic shifts in potency as we observed (to the best of our knowledge only one report describes such an extreme LoF mutation for a GluN2A variant46). Of note, these effects are not seen when glycine is applied at high concentrations (supporting lack of effect by osmolarity)47. Thus, we refrained from testing concentrations past 10 mM, aware that it may yield a slight underestimation of glutamate potency (and perhaps the reason for the larger Hill coefficient, nH; see our reply to reviewer #1, comment #5). Importantly, despite the potential underestimation of the EC50, it does not change our conclusions as all groups are measured side-by-side (thus, the same underestimation equally applies to all other groups as well). We now mention this more in detail in the methods under the section – “Two Electrode Voltage Clamp recordings in Xenopus Laevis oocytes”.

      Minor comments:

      3 In the first paragraph of the "Results" section, when describing the design of the constructs used to force a heteromeric stoichiometry in recombinant systems, the authors do as if they had designed the constructs themselves "Briefly, we tagged...are retained in the ER (Fig. 1a)". Please rewrite this paragraph to show that you used constructs that had been previously designed by another group (Hansen et al., 2014).

      We apologize. We did not mean to express that we have developed the method and indeed refer readers to the seminal works of those who did (Stroebel et al., 2014 and Hansen et al. 2014, lines #109-116). We did not go into details for the sake of brevity. We have rewritten this part to give proper acknowledgement to the method’s developers (also see Methods, line# 448).

      4 I do not see any evidence of "positive cooperativity" between subunits in ref. 32. Ref. 32, to the best of my knowledge, states that in N1/2A/2B tri-heteromers, the 2A subunit sets the biophysical properties of the tri-heteromer. But there is no account of mixed di-heteromers. In addition, the cooperativity between the glutamate and glycine binding sites is negative.

      The reviewer is correct, and we apologize for the mis-citation. Indeed, the cooperativity between glutamate- and glycine-binding is typically reported as negative48,49, and our intention was to highlight the strong cooperativity (whether positive or negative) observed between NMDAR-subunits and meant to cite the works of: 33,35,50 (lines . We have now rephrased the sentence: The divergence from this scenario suggests that the slight amelioration in potency could stem from positive cooperativity between the subunits50 (but see Hill coefficients in Table 1). Indeed, mixed receptors show restored proton sensitivity (Suppl. 3), which has been suggested to be coupled to other receptor features, notably increase in open probability.

      5 Interpretation of spermine action within the Results section: it is striking indeed to observe that the mutations in the context of a mixed di-heteromer still allow spermine potentiation, while they abolish this potentiation in pure di-heteromers. As rightly said in the discussion, the regain of spermine potentiation in the mixed compared to the pure diheteromers is likely due to a more favorable transduction of spermine signaling to the pore, likely via a higher pH sensitivity of mixed di-heteromers compared to di-heteromers. I would thus avoid the terms of "one single intact interface" for the mixed di-heteromer, since both spermine binding sites are likely intact in this NMDAR configuration. How is pH sensitivity affected in the mixed di-heteromers?

      We have performed a detailed pH dose-responses for the various channel types (Suppl 3). We find that GluN2B mixed di-heteromers exhibit similar IC­50 as pure GluN2B-wt di-heteromers, thus explaining their ability to undergo potentiation by spermine via alleviation of proton inhibition. We therefore further suggest that mixed di-heteromers’ have higher pH-sensitivity compared to pure mutant di-heteromers and this mat also contributes to their higher spermine sensitivity. Lastly, we observed that all GluN2A-wt-containing tri-heteromeric receptors were non-responsive to spermine (Fig. 4a). In fact, under our experimental conditions tri-heteromers underwent slight inhibition by spermine, regardless the identity of the GluN2B subunit (whether wt or variant) (Fig. 4b). Thus, as the tri-heteromers used here exhibit identical pH-sensitivity as 2B-di-heteromers, the only diverging aspect is the missing interface between the GluN1a and GluN2B subunits, demonstrating that potentiation by spermine requires at least one GluN2B-subunit with an intact proton sensitivity, and mandates two intact interfaces between GluN1-wt and GluN2B-wt subunits (Table 1)21.

      6 In the methods section, the oocyte recording solution (likely Ringer and not Barth) does not contain any potassium. This is probably a typo. Could you correct the composition of your Ringer?

      Corrected. We record NMDARs currents by use of a Barth solution containing (in mM): 100 NaCl, 0.3 BaCl2, 5 HEPES, pH 7.3 (adjusted with KOH, at ~2.5 mM) (as in 4,51).

      7 There are several typos, especially in the Discussion.

      We have corrected the typos throughout the publication.

      **Referees cross-commenting**

      I overall agree with the comments of reviewers 1 and 2. In particular, I agree that it is pointless to compare the absolute currents of non transfected neurons vs mutant-transfected neurons without an idea of receptor cell-surface expression.

      We have performed this experiment (Fig. 6) and please see our reply to this reviewer’s comment #1.

      I would like however to give some precisions about some comments of Reviewer 2. About the ER retention technique to express tri-heteromers: I didn't know that the C2 signal could be addressed to the membrane on its own. The lack of leak current stemming from C1-C1 or C2-C2 combinations has been demonstrated in the paper establishing the technique (Hansen et al, 2014), as well as in another paper that developed an analog technique based on GABAB retention signals (Stroebel et al., J Neurosci 2014). So it is fair to consider that the authors were not surprised by the lack of current when co-expressing two GluN2B subunits carrying the C2 signal.

      We thank you for this addition and support for our observations.

      About the comparison about absolute currents wt vs mutants, +/- spermine (Fig. 4a and 5a). I agree with reviewer 2 that being able to compare absolute currents of wt without spermine to mutant + spermine would be very interesting to see if spermine can actually rescue mutant hypofunction. However, to the defense of the authors, comparing absolute current values of recordings from Xenopus oocytes is meaningless. Indeed the variability of currents for the same construct and same day of experiment is too high (there can be up to a ten-fold difference between the lowest and the highest current of oocytes expressing the same construct the same experimental day). A way to investigate this aspect would be to estimate the open probability of the different constructs with or without spermine via the inhibition kinetics of an open channel blocker (e.g. MK801) and measure surface expression by Western blot, but I am not sure these experiments are worth it for the spermine experiment.

      We agree with this reviewer about current size. It is quite variable among cells and would therefore introduce an additional variable and variability: the expression of these modified (C1/C2-tagged) subunits is dually affected by the mutation itself (Kellner et al. 2021) and by the introduction of the tagging (which really hampers there trafficking to membrane, Suppl. 2c); with unknown contribution of each variable. We thereby do not think these provide an added value to our conclusions, yet to grant reviewers’ no 2 request we have added __Suppl. 4 __which shows the rescue effect of the different drugs.

      Reviewer #3 (Significance (Required)):

      This paper is not of high significance since most of the characterization of the 2B-G689C and -G689S de novo mutants found in patients has already been published (Kellner et al., eLife 2021). However, this paper is worth publishing since it brings new data on the effect of the mutations on tri-heteromeric and mixed di-heteromeric NMDAR populations, which are likely the most abundant NMDAR populations in the patient's brain at adult stage. Tri-heteromeric and mixed NMDAR populations have often been overlooked when studying pathogenic NMDAR mutations due to the difficulty to express them specifically in recombinant systems. This paper (in addition to other papers in the field, see for instance Elmasri et al., Brain Sci. 2022; Li et al., Hum. Mutat. 2019) shows that the effect of the mutations on the receptor biophysical and pharmacological properties (but also on trafficking) differ whether the receptor contains one or two copies of the mutant subunit. This paper is of interest to scientists interested in NMDA receptor structure-function and pharmacology, as well as clinicians interested in GRINopathies (pathologies linked to NMDAR mutations).

      I, the reviewer, am an expert in NMDAR structure-function and pharmacology. I believe I have sufficient expertise to evaluate the entirety of the paper.

      We thank the reviewer for appreciating and acknowledging the merits of our work for publication.

      References:

      1. Berlin, S. et al. Gαi and Gβγ Jointly Regulate the Conformations of a Gβγ Effector, the Neuronal G Protein-activated K+ Channel (GIRK). J. Biol. Chem. 285, 6179–6185 (2010).
      2. Kahanovitch, U., Berlin, S. & Dascal, N. Collision coupling in the GABAB receptor–G protein–GIRK signaling cascade. FEBS Lett. 591, 2816–2825 (2017).
      3. Berlin, S. et al. A Collision Coupling Model Governs the Activation of Neuronal GIRK1/2 Channels by Muscarinic-2 Receptors. Front. Pharmacol. 11, (2020).
      4. Berlin, S. et al. A family of photoswitchable NMDA receptors. eLife 5, e12040 (2016).
      5. Reyes-Guzman, E. A., Vega-Castro, N., Reyes-Montaño, E. A. & Recio-Pinto, E. Antagonistic action on NMDA/GluN2B mediated currents of two peptides that were conantokin-G structure-based designed. BMC Neurosci. 18, 44 (2017).
      6. Paul, S. M. et al. The Major Brain Cholesterol Metabolite 24(S)-Hydroxycholesterol Is a Potent Allosteric Modulator of N-Methyl-D-Aspartate Receptors. J. Neurosci. 33, 17290–17300 (2013).
      7. Yakovlev, A. V., Kurmasheva, E. D., Ishchenko, Y., Giniatullin, R. & Sitdikova, G. F. Age-Dependent, Subunit Specific Action of Hydrogen Sulfide on GluN1/2A and GluN1/2B NMDA Receptors. Front. Cell. Neurosci. 11, 375 (2017).
      8. Kellner, S. et al. Two de novo GluN2B mutations affect multiple NMDAR-functions and instigate severe pediatric encephalopathy. eLife 10, e67555 (2021).
      9. Sabo, S. L., Lahr, J. M., Offer, M., Weekes, A. L. & Sceniak, M. P. GRIN2B-related neurodevelopmental disorder: current understanding of pathophysiological mechanisms. Front. Synaptic Neurosci. 14, (2023).
      10. Martel, M.-A. et al. The Subtype of GluN2 C-terminal Domain Determines the Response to Excitotoxic Insults. Neuron 74, 543–556 (2012).
      11. Papouin, T. et al. Synaptic and Extrasynaptic NMDA Receptors Are Gated by Different Endogenous Coagonists. Cell 150, 633–646 (2012).
      12. Harris, A. Z. & Pettit, D. L. Extrasynaptic and synaptic NMDA receptors form stable and uniform pools in rat hippocampal slices. J. Physiol. 584, 509–519 (2007).
      13. Moldavski, A., Behr, J., Bading, H. & Bengtson, C. P. A novel method using ambient glutamate for the electrophysiological quantification of extrasynaptic NMDA receptor function in acute brain slices. J. Physiol. 598, 633–650 (2020).
      14. Curras, M. C. & Dingledine, R. Selectivity of amino acid transmitters acting at N-methyl-D-aspartate and amino-3-hydroxy-5-methyl-4-isoxazolepropionate receptors. Mol. Pharmacol. 41, 520–526 (1992).
      15. Laube, B., Hirai, H., Sturgess, M., Betz, H. & Kuhse, J. Molecular Determinants of Agonist Discrimination by NMDA Receptor Subunits: Analysis of the Glutamate Binding Site on the NR2B Subunit. Neuron 18, 493–503 (1997).
      16. Esmenjaud, J. et al. An inter‐dimer allosteric switch controls NMDA receptor activity. EMBO J. 38, (2019).
      17. Liu, S. et al. A Rare Variant Identified Within the GluN2B C-Terminus in a Patient with Autism Affects NMDA Receptor Surface Expression and Spine Density. J. Neurosci. 37, 4093–4102 (2017).
      18. Geoffroy, C., Paoletti, P. & Mony, L. Positive allosteric modulation of NMDA receptors: mechanisms, physiological impact and therapeutic potential. J. Physiol. 600, 233–259 (2022).
      19. Malayev, A., Gibbs, T. T. & Farb, D. H. Inhibition of the NMDA response by pregnenolone sulphate reveals subtype selective modulation of NMDA receptors by sulphated steroids. Br. J. Pharmacol. 135, 901–909 (2002).
      20. Petrov, A. M. et al. CYP46A1 Activation by Efavirenz Leads to Behavioral Improvement without Significant Changes in Amyloid Plaque Load in the Brain of 5XFAD Mice. Neurotherapeutics 16, 710–724 (2019).
      21. Mony, L., Zhu, S., Carvalho, S. & Paoletti, P. Molecular basis of positive allosteric modulation of GluN2B NMDA receptors by polyamines. EMBO J. 30, 3134–3146 (2011).
      22. Stroebel, D., Casado, M. & Paoletti, P. Triheteromeric NMDA receptors: from structure to synaptic physiology. Curr. Opin. Physiol. 2, 1–12 (2018).
      23. Hansen, K. B., Ogden, K. K., Yuan, H. & Traynelis, S. F. Distinct Functional and Pharmacological Properties of Triheteromeric GluN1/GluN2A/GluN2B NMDA Receptors. Neuron 81, 1084–1096 (2014).
      24. Stroebel, D., Carvalho, S., Grand, T., Zhu, S. & Paoletti, P. Controlling NMDA Receptor Subunit Composition Using Ectopic Retention Signals. J. Neurosci. 34, 16630–16636 (2014).
      25. Clements, J. D., Lester, R. A. J., Tong, G., Jahr, C. E. & Westbrook, G. L. The Time Course of Glutamate in the Synaptic Cleft. Science 258, 1498–1501 (1992).
      26. Budisantoso, T. et al. Evaluation of glutamate concentration transient in the synaptic cleft of the rat calyx of Held: Glutamate concentration in synapse. J. Physiol. 591, 219–239 (2013).
      27. Kellner, S. et al. Two de novo GluN2B mutations affect multiple NMDAR-functions and instigate severe pediatric encephalopathy. eLife 10, e67555 (2021).
      28. McAllister, A. K. & Stevens, C. F. Nonsaturation of AMPA and NMDA receptors at hippocampal synapses. Proc. Natl. Acad. Sci. 97, 6173–6178 (2000).
      29. Ishikawa, T., Sahara, Y. & Takahashi, T. A Single Packet of Transmitter Does Not Saturate Postsynaptic Glutamate Receptors. Neuron 34, 613–621 (2002).
      30. Washbourne, P., Liu, X.-B., Jones, E. G. & McAllister, A. K. Cycling of NMDA Receptors during Trafficking in Neurons before Synapse Formation. J. Neurosci. 24, 8253–8264 (2004).
      31. Yan, Y.-G. et al. Clustering of surface NMDA receptors is mainly mediated by the C-terminus of GluN2A in cultured rat hippocampal neurons. Neurosci. Bull. 30, 655–666 (2014).
      32. Kussius, C. L. & Popescu, G. K. Kinetic basis of partial agonism at NMDA receptors. Nat. Neurosci. 12, 1114–1120 (2009).
      33. Sun, W., Hansen, K. B. & Jahr, C. E. Allosteric interactions between NMDA receptor subunits shape the developmental shift in channel properties. Neuron 94, 58-64.e3 (2017).
      34. Benveniste, M. & Mayer, M. L. Kinetic analysis of antagonist action at N-methyl-D-aspartic acid receptors. Two binding sites each for glutamate and glycine. Biophys. J. 59, 560–573 (1991).
      35. Lü, W., Du, J., Goehring, A. & Gouaux, E. Cryo-EM structures of the triheteromeric NMDA receptor and its allosteric modulation. Science 355, eaal3729 (2017).
      36. Vyklicky, V., Stanley, C., Habrian, C. & Isacoff, E. Y. Conformational rearrangement of the NMDA receptor amino-terminal domain during activation and allosteric modulation. Nat. Commun. 12, 2694 (2021).
      37. Stroebel, D., Casado, M. & Paoletti, P. Triheteromeric NMDA receptors: from structure to synaptic physiology. Curr. Opin. Physiol. 2, 1–12 (2018).
      38. Borza, I. & Domany, G. NR2B Selective NMDA Antagonists: The Evolution of the Ifenprodil-Type Pharmacophore. Curr. Top. Med. Chem. 6, 687–695 (2006).
      39. Tang, Y. P. et al. Genetic enhancement of learning and memory in mice. Nature 401, 63–69 (1999).
      40. Gonda, S. et al. GluN2B but Not GluN2A for Basal Dendritic Growth of Cortical Pyramidal Neurons. Front. Neuroanat. 14, (2020).
      41. Sceniak, M. P. et al. A GluN2B mutation identified in Autism prevents NMDA receptor trafficking and interferes with dendrite growth. J. Cell Sci. jcs.232892 (2019) doi:10.1242/jcs.232892.
      42. Philpot, B. D. et al. Effect of transgenic overexpression of NR2B on NMDA receptor function and synaptic plasticity in visual cortex. Neuropharmacology 41, 762–770 (2001).
      43. Barria, A. & Malinow, R. Subunit-Specific NMDA Receptor Trafficking to Synapses. Neuron 35, 345–353 (2002).
      44. Platzer, K. et al. GRIN2B encephalopathy: novel findings on phenotype, variant clustering, functional consequences and treatment aspects. J. Med. Genet. 54, 460–470 (2017).
      45. Green, T., Rogers, C. A., Contractor, A. & Heinemann, S. F. NMDA Receptors Formed by NR1 in Xenopus laevis Oocytes Do Not Contain the Endogenous Subunit XenU1. Mol. Pharmacol. 61, 326–333 (2002).
      46. Swanger, S. A. et al. Mechanistic Insight into NMDA Receptor Dysregulation by Rare Variants in the GluN2A and GluN2B Agonist Binding Domains. Am. J. Hum. Genet. 99, 1261–1280 (2016).
      47. Madry, C., Betz, H., Geiger, J. R. P. & Laube, B. Supralinear potentiation of NR1/NR3A excitatory glycine receptors by Zn2+ and NR1 antagonist. Proc. Natl. Acad. Sci. 105, 12563–12568 (2008).
      48. Regalado, M. P., Villarroel, A. & Lerma, J. Intersubunit Cooperativity in the NMDA Receptor. Neuron 32, 1085–1096 (2001).
      49. Durham, R. J. et al. Conformational spread and dynamics in allostery of NMDA receptors. Proc. Natl. Acad. Sci. 117, 3839–3847 (2020).
      50. Vyklicky, V., Stanley, C., Habrian, C. & Isacoff, E. Y. Conformational rearrangement of the NMDA receptor amino-terminal domain during activation and allosteric modulation. Nat. Commun. 12, 2694 (2021).
      51. Kellner, S. et al. Two de novo GluN2B mutations affect multiple NMDAR-functions and instigate severe pediatric encephalopathy. eLife 10, e67555 (2021).
    1. Now, when data is transformed into evidence, when we isolate or distill the features of a data set, or when we generate a visualization or present the results of a statistical procedure, we are not presenting the artifact. These are abstractions. The data itself has an artifactual quality to it. What one researcher considers noise, or something to be discounted in a dataset, may provide essential evidence for another.

      When it comes to data analysis, I usually think of data as a source of information rather than it being a research object by itself. The term “raw data” has been used in all my classes, starting from accounting and finishing with introduction to digital culture and information. Yes, we’ve talked about biases that come up in different data sets, but usually this conversation is related to so-called “post-production” of data – either us, students, using it, or someone else and we reverse engineered where it came from. So, reading about an approach to data, even ‘raw’ data, as a constructed artifact is very refreshing. It’s extremely important to look at how the raw data was collected and what was left out by collectors initially to have a full image of what’s going on.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Major:

      - The statement (line 149'Together, our data suggest that systemic ecdysone levels are unlikely to be involved in modulating tumour-induced muscle detachment or to mediate the role of fatbody Insulin signalling in regulating muscle detachment.') is derived from an experiment with sterol free diet (in which 20HE is genetically addressed) and a pleiotropic experiment (PG>RasG12V). In neither paper nor the current manuscript, 20HE levels have been directly addressed.

      Therefore, this statement needs further experimental support and discussion. Ecdysone is a critical hormone during development and especially growth-related effects central to this study. The authors should consider doing pharmacology or augment their claims here with genetic manipulation experiments of 20HE related genes in larvae (Leopold, Rewitz, Rideout, Drummond-Barbosa, Schuldiner labs) and adult animals using genetics, pharmacology or direct assessment of 20HE levels (RIPA, Edgar and Reiff labs).

      The main point we were trying to convey is that we do not think global ecdysone levels plays a role in modulating fatbody insulin or tgfb signalling, which in turn affects muscle detachment. We are not claiming that edysone levels is not changing in control vs. tumour bearing animals. In fact, we predict that 20HE levels will be different in tumour bearing vs. control animals (as tumour bearing animals undergo developmental delay), but this is not the main point of our conclusions. We believe that our conclusions are supported by the experiment demonstrating global ecdysone alterations (via feeding sterol-free food) did not affect how fatbody Akt activation altered tgfb signalling and enhanced muscle integrity (Figure S1). Therefore, we don’t think measuring 20HE helps to support our conclusions. Pharmacological inhibition via feeding ecdysone inhibitors effectively demonstrate a similar point to feeding sterol-free food which we have already performed. We are happy to try direct manipulation of 20HE related genes (eip75B-RNAi) in the fatbody to see if this affects muscle detachment or pAkt and pMad levels in tumour bearing animals.

      - In Fig.7 the authors used a sog-LacZ stock to show transcriptional activation in fatbody cells. This stock is based on P-element insertion in the according regulatory regions and supposed to express lacZ with an nls. I can clearly see lacZ in nuclei in Fig. 7H, whereas this is very hard to see in nuclei in Fig7i in the tumour model. In addition, lacZ is known for its high stability and not the best option. As this finding is vital for central claims of this study, it should be complemented by either qPCR for sog on fat body cells or using another readout by converting one of the two Mimic lines (BL42189/44958) into GFP sensors for sog.

      We will add a counterstain to these images. We will also perform qPCR in the fatbody of control and cachectic animals to assess whether Sog transcription is altered. We agree converting one of the Mimic lines to a GFP sensor would be a good option, but this experiment would require getting new fly lines into Australia, which takes at least 2 months because of quarantine laws. We don’t believe this experiment would change the general conclusions of the paper, therefore would prefer not to do this experiment.

      - I have similar problems with Fig.7B-F, as phosphorylated Mad should be translocated to the nucleus. In 7F the authors measure pMad over Dapi, which is the right way but it is hard to see pMad in the nucleaus apart from Fig7B, wheras in D and E, where the authors measure higher levels, I cannot identify clear pMad in nuclei. These images either need to show the Dapi channel or more representative images should be chosen like in Fig.4 with arrows pointing to measured nuclei. Fig.7C something went wrong with the compression of this image.

      We will show more representative examples and fix Fig 7C.

      - The proper function of RNAi stocks targeting genes like sog, mad, etc. is vital for this study as these lines are used throughout the study. Functional evidence of specific knockdown efficiency should be provided or references given in which these stocks were shown to provide functional knockdown on transcript or protein level.

      We agree with the reviewer that this is an important point. We will demonstrate the knockdown of sog and mad (and other RNAis) used in the study by either referring to published data or demonstrate knockdown ourselves.

      - Fig.S7 discusses appearance of gbb/Bmp7 and Sog/CHRD in human patients. The analysis the authors performed shows a correlation between both factors, but is hampered by the fact that datasets for peripheral tissues of cachexia patients are unavailable. The authors may consider sorting these after tumor entities in which cachexia occurs frequently vs. low occurrence and then check for both genes.

      We will try this analysis.

      Fig.5 M-P pMAd is not indicated in the Panels only the legend.

      We will fix this error.

      - Please follow FlyBase nomenclature, e.g. dlg1 for discs large 1 and unify in the whole manuscript and figure for all genes.

      We will fix this error.

      - For endogenous fusion proteins like Viking-GFP (e.g. vkg::GFP) choose a format to clearly decipher them from transcriptional readout stocks like sog-lacZ.

      We will fix this error.

      - The quantifications in most figures are quite small with tiny lettering and XY axis are difficult to read in letter/A4 size.

      We will enlarge font size.

      Minor:

      1. Adjust in-figure caption alignments

      2. Line 104: add comma RasV12, dlgRNAi

      3. Line 114: replace little  not significant (n.s.)

      4. Line 334: 'sogRNAi overexpression' to my knowledge, RNAi are expressed, not overexpressed.

      5. Line 454: italicize r4>

      6. Fig S4E: remove frame

      7. Figures 6: It would be better to number and explain the pathway presented in the figure in text and fig legend.

      8. Just a personal preference. Lettering of images in images is commonly done horizontally, here it appears like a mix between vertical and horizontal.

      We will fix these minor errors.

      Reviewer #2

      Major comment

      Their genetic experiments clearly showed that the reduction of insulin signaling activity in the fatbody induces upregulation of TGF-β signaling and Collagen accumulation. Then, how does TGF-β signaling induce Collagen accumulation?

      From the experiments we have carried out, we do not have insights into how TGF-B signalling induce Collagen accumulation.

      They showed that Rab10 knockdown and SPARC overexpression reduced the accumulation of fatbody ECM. Are Rab10 and SPARC expression regulated by TGF-β signaling?

      We can address this point by assessing if Rab10 and SPARC expression is altered in cachectic fatbody.

      Minor comments

      Line 90: "Disc Large (Dlg) RNAi in the eye" must be "Discs Large (Dlg1) RNAi in the eye imaginal discs".

      we will fix this error.

      Figures 1D and 1L are from the same image. Also, Figures 1C and 1M are from the same image. Are both of them necessary to be shown in the different panels?

      The duplication of 1C and 1M, was an error, we thank the reviewer for picking this up. We will fix this error. We will use different images for 1D and 1L.

      Why are the staining patterns of anti-pAkt shown in Figures 1L and 1U so different? pAkt is not detected in the nuclei in Fig. 1L but its nuclear signal is clear in Fig. 1U.

      We will show more representative images of these staining.

      Figure 1: Images of counter staining for nuclei like DAPI should be also included for all these fatbody images.

      We will show counter staining for DAPI.

      Line 101: "Tumour specific ImpL2 inhibition was sufficient to reduce fatbody pAkt levels." Is this correct? ImpL2 inhibition in tumors should elevate the pAKT level in fatbody.

      This was a mistake, we will fix this error.

      Figure S1~S4: These figures and their legends do not correspond to each other. We thank the reviewer in picking up this error, there was an error in inserting the images into the text. S2 and S3 were swapped.

      We will fix this error.

      Line 189: The pAkt level in the muscle of tumour-bearing animals should be examined to confirm the activity of the insulin signaling is downregulated.

      We will include this data.

      Line 189: If the authors conclude that muscle insulin signaling predominantly regulates translation and atrophy, OPP assay for the muscle cells should be examined in the same experimental settings.

      We will carry out OPP assay upon Akt overexpression in the muscle.

      Line 247: The expression level of Rab10 and SPARC should be examined in the fatbody of tumour-bearing animals to see whether Rab10 is upregulated and SPARC is downregulated.

      Line 247: If Rab10 upregulation and SPARC downregulation are the causes of the accumulation of ECM proteins in the fatbody of tumour-bearing animals, how the overexpressed Collagen proteins can be secreted from the fatbody cells?

      We are not sure, but the overexpression of Collagen proteins is at an extremely high level, therefore, it is possible that some of it can be processed and secreted despite Rab10 upregulation and SPARC downregulation. We have carried out an experiment to overexpress Collagen proteins in the muscle, in this case, this manipulation did not rescue. This indicates that processing of Collagen in the fatbody is important, however, we do not know how the processing is regulated.

      Line 347: Sog is a secreted BMP antagonist. Thus, it can be expected that the Sog overexpression downregulates TGF-β signaling in fatbody and muscle tissues. If the rescued phenotypes with Sog overexpression can be explained by this logic, pMad level should be examined in these experiments.

      We have shown this data in Figure R-T. We will refer back to this data in Line 347.

      Reviewer #3

      Major comments:

      - Are the key conclusions convincing?

      Most of the conclusions are convincing. It is not clear however whether the ECM accumulation in the fat body of tumor animals is fibrotic and whether it is extracellular or in the cell cortex.

      - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      -The authors state in line 71 'This deposition of disorganized ECM leads to fibrotic ECM

      accumulation.' The authors haven't really provided evidence for the ECM being fibrotic. The authors could either rephrase this or provide additional experimental evidence of fibrosis in the fat body.

      We will tone down the claim that the ECM accumulation is fibrotic.

      - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      -The authors state in line 147" Finally, in tumor-bearing animals fed a sterol-free diet, that underwent a prolonged 3rd instar stage due to reduced ecdysone levels (Parkin and Burnet, 1986), we activated insulin signalling in the fatbody via Akt overexpression (QRasV12, scribRNAi). We found that this manipulation caused a significant decrease in pMad levels in the fatbody and a rescue of muscle detachment (Figure S1 D-I), similar to animals fed a standard diet (Figure 1 O-Q, Figure 2 F-H)." Since it's not already known what the extent of muscle integrity defect there is in tumors with additional sterol free diet, it would be important to show a non-tumor control for comparison in FigS1F. This would also then make it clear to what extent the defect is rescued by Akt overexpression.

      We will include a non-tumour control for Fig S1F.

      -The authors state in line 158 'Upon the knockdown of Impl2, we found that tumor gbb was not significantly altered (Figure S3A).' Even though this shows an indication that Gbb levels are not reduced, the n number is too low to state that it is non-significant. The authors should increase the n number here.

      N=3 is generally enough to see a difference, we will include data done in parallel which shows Impl2 RNAi is sufficient to induce a reduction in Impl2 RNA levels. This will demonstrate that n=3 is sufficient to demonstrate a reduction in transcript levels if there is a reduction.

      -The authors state in line 171 'Conversely, knockdown of gbb alone or knockdown of gbb together with ImpL2 significantly rescued the Nidogen overaccumulation defects observed at the plasma membrane of fatbody from tumor-bearing animals, while ImpL2RNAi alone did not (Figure S2 Q-U).' This is a somewhat misleading representation, since again no non-tumor control was used, so the extent of the rescue by gbb knowdown is not obvious. In FigS2P Nidogen levels in the tumor seem ~100% higher than in control. But in FigS2U, in which no control was included, the tumor+gbb knowdown seems ~ 20% lower than tumor. So it is probably a more moderate rescue, but that's only possible to assess by including a non-tumor control in FigS2U. Also the images in FigS2Q-T don't seem representative since they appear to show a much bigger difference in fluorescence intensity than ~20%. Please show more representative images.

      We will include a non-tumour control for S2Q-T and show more representative pictures.

      -The authors state in line 174 'Finally, co-knockdown of gbb and ImpL2 in the tumor significantly rescued the reduction in OPP and Nidogen levels observed in the muscles of tumor-bearing animals (Figure S3 B-I).'

      Again, the single knockdowns and the non-tumor control are not shown in FigS3E and I and should be included for comparison and to see the contribution of each knockdown and to be able to judge the extent of the rescue.

      We will include the single knockdowns and a wildtype control

      -Regarding Fig3O: Is there a significant tumor muscle attachment defect here? In this graph the tumor only looks about 10% lower than the WT (rather than 40% in Fig2E). The other issue is the extremely low n number for WT. I would recommend increasing the n number for WT here and to indicate in the graph whether the tumor is significantly different to WT (or non-significant, in which case RabRNAi wouldn't actually 'rescue' the defect). In the present form, this graph is not very convincing.

      We will increase the n number for WT for this experiment. The reduction in muscle detachment is 10% rather than 40% here is because this experiment was done at day 6, which we will indicate in the figure legend. The 40% reduction in Fig2E is because these samples were processed at day7. Rab10RNAi experiment was carried out at day 6, because by day7, the Rab10RNAi rescue is so good, most of the tumour bearing animals have pupated, thus the experiment could only be carried out at day6.

      - Regarding Fig3W: A non-tumor control would be important to include to be able to judge the extent of muscle attachment defects and the extent of the rescue for UAS-Sparc. This will allow to assess the severity of muscle integrity defect in this particular experiment (since it appears to vary in different experiments e.g. muscle defect in tumor 40% in Fig2E and ~10% in Fig3O) and to assess the extent of rescue for the various genotypes.

      We will include a non-tumour control for 3W.

      -The authors show an accumulation of ECM in the fat body of tumors. It is not clear, whether this ECM accumulates intracellularly near the cell surface or extracellularly. The authors should assess this, maybe by doing electron microscopy.

      We do not have an EM facility that can accommodate this experiment, thus doing EM is not an option for us. However, we can address whether the accumulation of ECM is intracellular or extracellular by performing an experiment, where we try perform antibody staining against Viking-GFP without permeabilizing the cells. If Viking is detected without permeabilization, it would indicate the accumulations are extracellular. This approach has been previously used to address this question in Zang et al., elife, 2015.

      - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      -These suggested experiments should be quite straightforward since they are mostly just repeating previous experiments with the appropriate controls and n numbers. I would think that they can be done within a few months. The electron microscopy should not take more than a few weeks and not be costly.

      - Are the data and the methods presented in such a way that they can be reproduced?

      -The details on how old animals used in each experiment were, are not easy to find and not written very clearly. They should be included in the each figure legend rather than summarising those details in the methods.

      We will add the number of days in the figure legend.

      -Also, in line 788 in the methods, several stocks are indicated as coming from particular labs (e.g. UAS-FOXO (Kieran Harvey), UAS-GFP (Kieran Harvey), UAS-lacZRNAi (Kieran Harvey), UAS-RasV12 (Helena Richardson), UAS-cg25C;UAS-Vkg (Brian Stramer)).

      However, it is not clear whether these labs actually made these stocks and if so whether it has already been described in their papers how the lines were made. If the lines are unpublished, the detailed information should be given on how the lines were made. Or if the lines are published, the authors should provide the reference.

      We will fix these references.

      - Are the experiments adequately replicated and statistical analysis adequate?

      In general, the n number is rather low in several experiments, especially n of 3 for many controls. And as I mentioned before, rescues of tumor phenotypes are often shown without including a non-tumor control, making it hard to judge the extent of the rescue. Sometimes this information can be found in other figures, but the reader should not have to search for it. And also the severity of the phenotype can vary from experiment to experiment.

      We will include a non-tumour control when appropriate to address this.

      Minor comments:

      - Specific experimental issues that are easily addressable.

      - Are prior studies referenced appropriately?

      Yes, as far as I can tell.

      - Are the text and figures clear and accurate?

      -In the literature, people usually call it 'fat body' rather than 'fatbody'.

      We will fix this error.

      -The authors state in line 265 "Vkg accumulated in the membranes of fatbody where p60 was overexpressed using r4-GAL4 (Figure 5 A-C)."

      This must be a typo. I think it is shown in Fig5E-G. Unless it's labelled wrongly in the figure and B, C and D show p60 rather than TorDN.

      We will fix this error.

      -The authors state in line 188 'This manipulation significantly rescued muscle integrity (Figure S4 A-C) and muscle atrophy (Figure S4 D-F), without affecting muscle ECM levels (Figure S4 G-H).' According to the graph in FigS4H this does actually 'affect muscle ECM levels' significantly, as in that it reduced Nidogen levels further. The authors could rephrase this.

      We will reword this statement.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This important work reports the identification of a list of proteins that may participate in the clearance of paternal mitochondria during fertilization, which is known as essential for normal fertilization and embryonic and fetal development. While the main method used is state of the art and the supporting data are solid, the vigor of the biochemical assays and function validation is inadequate. This work will be of interest to developmental and reproductive biologists working on fertilization. Key revisions (for the authors) include 1) Use a mitochondria-enriched fraction instead of whole sperm for the assays, and add more control samples to monitor what got lost during sperm and oocyte treatments before the coincubation step. 2) Functional validation of the key proteins identified.

      We thank Editors of eLife, as well as Special Issue Guest-Editors and Reviewers for a favorable assessment and helpful recommendations for key revisions. Provisional revisions included in our revised article are detailed below. We agree with Editors’ comment about the use of mitochondrion enriched fractions and additional functional validation of key proteins. In fact, we are developing experimental protocols for oocyte extract coincubation with isolated sperm heads and tails, and eventually with purified mitochondrial sheaths, to separate the ooplasmic sperm nucleus remodeling factors from the mitophagic ones. Such experiments, as well as functional validations using porcine zygotes are contingent upon anticipated post-pandemic rebound in the availability of porcine oocytes, obtained from ovaries harvested on slaughterhouse floors, requiring currently unavailable workforce which has hampered our access to this necessary resource.

      Reviewer #1 (Peer Review):

      Could the authors make clear how much the presented pictures reflect the described localisation? There is no information on the number of spermatozoa and embryos observed nor the fraction of these embryos showing the presented pattern of localisation. This must be included.

      Two hundred spermatozoa were counted per replicate of the cell-free system co-incubation and 20 zygotes per replicate, with 3 replicates of immunolabelling for each phase/picture which were examined to establish the typical localization patterns that were observed. The displayed patterns were observed in 65 to 88% of examined spermatozoa/zygotes; varying dependent on protein, replicate, and phase of immunolabelling. In all cases, the signal displayed is the typical pattern that was displayed in most cells. This information has been added to the Materials and Methods section for clarification.

      It is not clear if the authors also examined the localization of other proteins and obtained a different pattern than anticipated from the proteomic approach or if they only tested these 6 proteins and got a 100% of correlation.

      These are the 6 proteins which were selected based on extensive literature review into known functions of all identified proteins, as well as extensive research into available and reliable antibodies to detect such proteins within our porcine systems. Even so, no particular localization patterns were anticipated; instead, we presented the patterns actually observed and even some patterns which defied our expectations (i.e., the localization of BAG5 in the sperm acrosome).

      The authors use "MS" in the text to indicate "mitochondrial Sheath" and "Mass spectrometry". this is confusing.

      The authors agree and the usage of MS as an acronym for either has been removed entirely to avoid confusion.

      In the introduction the author refers to Ankel-Simons and Cummins, 1996 as a reference for the number of sperm mitochondria in mammalian species, this is incorrect since the quoted paper is about the number of mtDNA molecules and mentioned an earlier publication.

      This has been revised and the appropriate citation has been used.

      Reviewer #2 (Peer Review):

      Major:

      1) It has been proved from the earlier studies from this group that the porcine cell-free system is useful to observe spermatozoa interacting with ooplasmic proteins in a single trial and could recapitulate fertilization sperm mitophagy events that take place in a zygote without affecting later cell-division process. However, the post-fertilization sperm mitophagy process is a complex time-associated event that many processes that occur sequentially and interactively, which means ooplasmic proteins might be involved in this process but may not directly interact with sperm or may associate with sperm-ooplasmic protein complex at different time points. It is certainly a great advance already in knowledge to identify "the candidate players" from the list of 185 proteins; however, with the time-resolution (4 and 24hr) in the current study and without functional validation experiments at this stage, it is still difficult to postulate the importance of these identified proteins. The functional validation experimental designs, in my opinion, is critically important for better interpretation of the data.

      The authors agree with this reviewer’s sentiments and do plan to conduct further functional analysis. This project was able to generate a list of candidate, sperm-mitophagy promoting proteins and we were further able to show that many of these proteins were detectable both via mass spectrometry and via immunocytochemistry in spermatozoa exposed to our cell-free system. Furthermore, similar localization patterns were found in spermatozoa that were detected within newly fertilized zygotes. These results boost our confidence in our cell-free system and show that our list of candidate proteins is truly a useful list for future localization and functional analyses. We are certainly aware that we have not captured every protein that may play a role in post-fertilization sperm mitophagy and that the proteins captured are just candidates until proven otherwise. Likewise, we have almost certainly captured multiple proteins that are currently candidates that will likely not be shown to play a role in postfertilization sperm mitophagy, while it is plausible that at least some of these candidate proteins do play a role in mitophagy and some of them likely participate (perhaps have yet to be described roles) in other fertilization events, in which we would be extremely interested in as well.

      2) As shown in Figure 1, whole sperm was used in the co-incubation and the later MS analysis; thus, proteins identified in the current study might be relevant in fertilization processes other than postfertilization sperm mitophagy, as proteins identified in the current study may be associated with other parts of the sperm (e.g. sticky sperm head, e.g. PSMG2 associated with sperm midpieces, tail at 4hr coincubation, but then only associate with sperm head at 24hr co-incubation) rather than sperm midpiece, despite the fact that authors applied immunohistochemistry to show the localization of this protein, but the evidence is indirect, so how authors functionally differentiate these 6 identified proteins from sperm mitophagy process with other processes and to confirm (or to associate) the relevance of these proteins with sperm mitophagy process?

      The authors agree that the 6 proteins which were further studied by using immunocytochemistry may be playing roles in other processes such as pronuclear formation. We discussed some potential roles including and beyond post-fertilization mitophagy, in the Supplemental Discussion. After reviewer comments, we moved the Supplemental Discussion back in the main Discussion section. Thus, this section now considers additional putative pathways in which the said 6 proteins cold participate, though we concede that thorough functional studies must still be performed.

      3) Class 3 proteins were present in both the gametes or only the primed control spermatozoa, but are decreased in the spermatozoa after co-incubation, which authors interpreted as sperm-borne mitophagy determinants and/or sperm-borne proteolytic substrates of the oocyte autophagic system, this data categorization may need to be revised as sperm-borne proteolytic substrates of the oocyte autophagic system only, not for sperm borne mitophagy determinants. The argument for this disagreement is due to the fact that if the protein is a sperm-borne mitophagy determinant, after coincubation, to execute the mitophagy process, this protein should still be associated with the sperm at least at the early stage (of 4hr) (constant under MS detection when comparing control with 4hr treated) rather than being released from the sperm. Or alternatively, they could result in class 3 proteins (but not all those 6 were in class 3). Nevertheless, if these proteins serve as substrates, they can be used (consumed) and show decreased under MS detection.

      This argument for redefining the Class 3 proteins more accurately is understood and we agree. The definition is revised in the paper.

      4) Of particular interest among the 6 proteins that were further investigated. Unlike other proteins, MVP was highly significant (p<0.001) after 4hr incubation, but the significance became less after 24hr (p=0.19). Interpretation of this dynamic change in the relevance of the mitophagy process would facilitate the readers to understand the relevance and the role of MVP.

      The differences in significance are likely influenced by the abundance of MVP detectable by mass spectrometry. As the time of cell-free system incubation increases, the variability between replicates also seemed to increase, likely due to the sustained proteolytic activity taking place in our system. This work was based on three replicates of mass spectrometry for each time point; additional replicates likely would have reduced the p-value for the 24hr cell-free data set, for MVP and potentially other proteins also. At both time points, MVP was only detectable in spermatozoa after they had been exposed to the cell-free system treatment which is the criteria that truly interested us more than the actual differences in content between the timepoints and is why it was added to our list of candidate proteins.

      5) In figure 3, the association of ooplasmic MVP to sperm midpiece is not convincing enough as sperm midpiece and tail often show some levels of non-specific signals under fluorescent microscopy. And the dynamic association of ooplasmic MVP to sperm midpiece in Fig. 3F-G is difficult to reach a conclusion solely based on data presented in the manuscript. Additional negative control of sperm MVP staining from the primed and treated sperm would be helpful. Additionally, a quantitative comparison (15 vs 25hr) of sperm-associated MVP signals from the fertilized embryo or a stack image from different angles would clarify the doubts raised here.

      For all images and all replicates, serum controls were also generated. These controls were then viewed under fluorescent microscope, and light intensities and exposures thresholds for each fluorescent light channel were set based on the background intensity that came from these nonimmune serum-treated control samples. We set our light intensity/acquisition time below a threshold where the non-specific signal began to appear. All the presented patterns are based on setting this peak intensity threshold and as such the signal we see should be the true signal. Furthermore, 200 spermatozoa were counted per treatment per replicate of the cell-free system co-incubation and 20 zygotes per replicate, with 3 replicates of immunolabelling for each protein and data point, which was used to represent the typical localization patterns that were observed. The displayed patterns were observed between in 65- 88% of examined spermatozoa/zygotes. Invariably, the signal displayed in the manuscript is the typical pattern that was seen in a majority of cells. This information has now been added to the Materials & Methods section for clarification.

      6) Same concerns for the other 5 proteins (PSMG2, PSMA3, FUNDC2, SAMM50, BAG5) as indicated above.

      See response to Question 5.

      7) The patterns of these 6 proteins under the immunofluorescent study are confusing as the pattern varies after co-incubation (treated), and mostly, the signal of these proteins observed from the fertilized embryos is not really associated with sperm midpieces. Therefore, the evidence of these proteins involving in post-fertilization sperm mitophagy is, at this moment, weak based on the data presented. But the relevance of these proteins in events post-fertilization or early embryo development is certainly (evidence did not strong enough to support "sperm mitophagy," in my opinion).

      The authors agree that some of these proteins seem to be playing roles beyond postfertilization sperm mitophagy and that there is a need for true functional studies before the authors can state with certainty that these proteins play a role in any of the discussed fertilization events. We state this in the discussion: “Considering the dynamic proteomic remodeling of both the oocyte and spermatozoa which takes place during early fertilization, these 185 proteins which have been identified likely play roles in processes beyond sperm mitophagy.” It should be noted that the authors went into greater detail about potential alternative protein functions based on the present data and literature review in the Supplemental Discussion. Based on this comment and other reviewer comments we have now included the Supplemental Discussion as part of the main Discussion section, and this will hopefully help clarify some of the authors’ thoughts about the 6 candidate proteins which were further analyzed during this study.

      Minor:

      1) To my understanding, statistical significance (relevance) is normally set at a p-value of either <0.1 or 0.05. The reason for loosening the p-value of 0.2 in the current study needs to be justified as this was not a common statistical criterium, and the interpretation of those candidates from this loosened criterium should also be careful.

      The loosening of statistical relevance in this study to 0.2, only applied to our Class 1 proteins. This is because for a protein to fall into the Class 1 proteins it was a protein that was only present in samples after they were exposed to the cell-free system. In the case of these Class 1 proteins, this happened for all 3 replicates at each stated timepoint. We found this pattern of detection to be important whether the p-value fell under 0.1 or 0.2. As such, we loosened our statistical threshold for our Class 1 proteins. Any proteins added to our candidate list will be subject to further investigation before definitive conclusions can be drawn, and as such we think that capturing more proteins was more important for the goals of this study than limiting the number of proteins captured, especially for those Class 1 proteins. An explanation of this has been added to the Materials & Methods section Mass Spectrometry Data Statistical Analysis.

      2) First cell cleavage of porcine embryo normally occurs within 48hr post-insemination or activation; therefore, the 4 and the 24hr time points used in the current study require justification included in the discussion or methods and material section.

      First cleavage of porcine embryos normally occurs around 24 - 28 hours post-insemination. Thus, for both the cell-free system and the embryo studies we were capturing an advanced 1 cell stage zygote/zygote like system with our 24 hour and 25-hour time points.

      3) In figure 2, colors used in different time points and in two different classes represent (sometimes) different protein categories, would be easier for the readers for quick comparisons if the same color could be used to represent the same protein category throughout the graph. (E.g, proteins for early zygote development are shown in red in "A", but blue in "B")

      This has been corrected and the color scheme for Figure 2 has been revised for easier comparisons.

      Reviewer #3 (Peer Review):

      I am not used to seeing a supplementary discussion in a manuscript. I also believe it should be incorporated into normal discussion.

      The Supplemental Discussion has been incorporated into the main Discussion now.

      It would be very helpful to make an additional figure in which the proposed interactome of identified factors with the sperm mitochondria before and after incubation are drawn schematically and also which factors are not IDed in both cases (when comparing to somatic mito- or autophagy). This eases to get through the discussion and will beautifully summarize and illustrate the importance and progress that the authors have made with this assay.

      We made a diagram that depicts the changes in protein localization patterns overtime within our cell-free system. This diagram has been added to the manuscript as Figure 9.

      Reviewer #1 (Public Review):

      In this manuscript, the authors used an unbiased method to identify proteins from porcine oocyte extracts associated with permeabilised boar spermatozoa in vitro. The identification of the proteins is done by mass spectrometry. A previous publication of this lab validated the cell-free extract purification methods as recapitulating early events after sperm entry in the oocyte. This novel method with mammalian gametes has the advantage that it can be done with many spermatozoa at the time and allows the identification of proteins associated with many permeabilised boar spermatozoa at the time. This allowed the authors to establish a list of proteins either enriched or depleted after incubation with the oocytes extract or even only associated with spermatozoa after incubation for 4h or 24h. The total number of proteins identified in their test is around 2 hundred and with very few present in the sample only when spermatozoa were incubated with the extracts. The list of proteins identified using this approach and these criteria provide a list of proteins likely associated with spermatozoa remnants after their entry and either removed or recruited for the transformation of spermatozoa-derived structures. Using WB and histochemistry labelling of spermatozoa and early embryos using specific antibodies the authors confirmed the association/dissociation of 6 proteins suspected to be involved in autophagy.

      While this unique approach provides a list of potential proteins involved in sperm mitochondria clearance it's (only) a starting point for many future studies and does not provide the demonstration that any of these proteins has indeed a role in the processes leading to sperm mitochondria clearance since the protein identified may also be involved in other processes going-on in the oocyte at this time of early development.

      We thank reviewer 1 for positive comments. We added a sentence in Discussion addressing the obvious shortcoming of present study, as further functional validations of candidate mitophagy factors are planned.

      Concerning the localisation of the 6 proteins further analysed, the authors must add how much the presented picture represents the observed patterns. They must include the details on the fraction of spermatozoa and embryos displaying the presented pattern.

      We now specify that the patterns depicted in manuscript are typical and representative of data from at least three replicates of immunolabeling in spermatozoa and zygotes. For each of these replicates, 200 spermatozoa were examined per replicate of the cell-free system co-incubation or 20 zygotes per replicate. The displayed patterns were observed between 65-88% in examined spermatozoa/zygotes. Invariably, the signal displayed in manuscript is the typical pattern that was seen in a majority of cells. This information has now been added to the Materials & Methods section for clarification.

      Reviewer #2 (Public Review):

      Mitochondria are essential cellular organelles that generate ATPs as the energy source for maintaining regular cellular functions. However, the degradation of sperm-borne mitochondria after fertilization is a conserved event known as mitophagy to ensure the exclusively maternal inheritance of the mitochondrial DNA genome. Defects on post-fertilization sperm mitophagy will lead to fatal consequences in patients. Therefore, understanding the cellular and molecular regulation of the postfertilization sperm mitophagy process is critically important. In this study, Zuidema et. al applied mass spectrometry in conjunction with a porcine cell-free system to identify potential autophagic cofactors involved in post-fertilization sperm mitophagy. They identified a list of 185 proteins that might be candidates for mitophagy determinants (or their co-factors). Despite the fact that 6 (out of 185) proteins were further studied, based on their known functions, using a porcine cell-free system in conjunction with immunocytochemistry and Western blotting, to characterize the localization and modification changes these proteins, no further functional validation experiments were performed. Nevertheless, the data presented in the current study is of great interest and could be important for future studies in this field.

      We thank reviewer 2 for positive comments. As we explain in our response to Editors and Reviewer 1, further validation studies will be resumed once the availability of slaughterhouse ovaries for such studies improves. Examples of such functional validation of pro-mitophagic proteins SQSTM1 and VCP are included in our previous studies (DOI: 10.1073/pnas.1605844113 and DOI: 10.3390/cells10092450) that led to the development of cell-free system reported here, and are cited in present study.

      Reviewer #3 (Public Review):

      In this manuscript, a cytosolic extract of porcine oocytes is prepared. To this end, the authors have aspirated follicles from ovaries obtained from by first maturing oocytes to meiose 2 metaphase stage (one polar body) from the slaughterhouse. Cumulus cells (hyaluronidase treatment) and the zona pellucida (pronase treatment) were removed and the resulting naked mature oocytes (1000 per portion) were extracted in a buffer containing divalent cation chelator, beta-mercaptoethanol, protease inhibitors, and a creatine kinase phosphocreatine cocktail for energy regeneration which was subsequently triple frozen/thawed in liquid nitrogen and crushed by 16 kG centrifugation. The supernatant (1.5 mL) was harvested and 10 microliters of it (used for interaction with 10,000 permeabilized boar sperm per 10 microliter extract (which thus represents the cytosol fraction of 6.67 oocytes). The sperm were in this assay treated with DTT and lysoPC to prime the sperm's mitochondrial sheath. After incubation and washing these preps were used for Western blot (see point 2) for Fluorescence microscopy and for proteomic identification of proteins.

      Points for consideration:

      1) The treatment of sperm cells with DTT and lysoPC will permeabilize sperm cells but will also cause the liberation of soluble proteins as well as proteins that may interact with sperm structures via oxidized cysteine groups (disulfide bridges between proteins that will be reduced by DTT).

      This is certainly a possibility, the lysoPC and DTT permeabilization steps were designed to mimic natural processing (plasma membrane removal and sperm protein disulfide bond reduction), which the spermatozoa would undergo during fertilization. However, we do realize that this is a chemically induced processing and thus is not a perfect recapitulation of fertilization processes. However, in this study and in previous studies with this system, we were able to show alignment between proteomic interactions taking place in the cell-free system and within the zygotes.

      2) Figure 3: Did the authors really make Western blots with the amount of sperm cells and oocyte extracts as the description in the figures is not clear? This point relates to point 1. The proteins should also be detected in the following preparations (1) for the oocyte extract only (done) (2) for unextracted nude oocytes to see what is lost by the extraction procedure in proteins that may be relevant (not done) (3) for the permeabilized (LPC and DTT treated and washed) sperm only (not done) (4) For sperm that were intact (done) (5) After the assay was 10,000 permeabilized sperm and the equivalent of 6.67 oocyte extracts were incubated and were washed 3 times (or higher amounts after this incubation; not done). Note that the amount of sperm from one assay (10,000) likely will give insufficient protein for proper Western blotting and or Coomassie staining. In the materials and methods, I cannot find how after incubation material was subjected to western blotting the permeabilized sperm. I only see how 50 oocyte extracts and 100 million sperm were processed separately for Western blot.

      The authors did make Western blots with the number of spermatozoa and oocytes stated in the materials and methods, a total protein equivalent of 10 to 20 million spermatozoa (equivalent to ~20-40 µg of total protein load) and 100 MII oocytes (equivalent to ~20 µg of total protein load). These numbers have been corrected in the Materials & Methods. Also, we did find in the Materials & Methods section that the Co-Incubation of Permeabilized Mammalian Spermatozoa with Porcine Oocyte Extracts section refers to using cell-free exposed spermatozoa for electrophoresis; however, for none of the presented Western blot work was this true. Rather, all of the presented Western blots as per their descriptions are utilizing ejaculated or capacitated sperm or oocytes. This line has been removed from the Materials & Methods to reduce confusion.

      Regarding preparation (2), we have previously assessed the difference between oocyte extract and intact oocytes in this manner internally and we are certainly losing proteins due to the oocyte extraction process. We make caveats in this vein throughout the article such as: “Furthermore, this cell-free system while useful does not perfectly capture all the events which take place during in vivo fertilization. The cell-free system is intended to mimic early fertilization events but is presumably not the exact same as in vitro fertilization.”

      3) Figures 4, 5, 6, 7, and 8 see point 2. I do miss beyond these conditions also condition 1 despite the fact that the imaged ooplasm does show positive staining.

      For all the presented Western blots, the tissue type is stated in the image description and the protocol which was used to prepare these samples is stated in the Materials & Methods.

      4) These points 1-3 are all required for understanding what is lost in the sperm and oocyte treatments prior to the incubation step as well as the putative origin of proteins that were shown to interact with the mitochondrial sheath of the oocyte extract incubated permeabilized sperm cells after triple washing. Is the origin from sperm only (Figs 5-8) or also from the oocyte? Is the sperm treatment prior to incubation losing factors of interest (denaturation by DTT or dissolving of interacting proteins preincubation Figs 3-8)?

      The authors understand that there are proteins and interactions lost on both sides of the cellfree system equation and we have added a sentence to the Discussion to caveat this limitation in the system.

      5) Mass spectrometry of the permeabilized sperm incubated with oocyte extracts and subsequent washing has been chosen to identify proteins involved in the autophagy (or cofactors thereof). The interaction of a number of such factors with the mitochondrial sheath of sperm has been shown in some cases from sperm and others for an oocyte origin. Therefore, it is surprising that the authors have not sub-fractionated the sperm after this incubation to work with a mitochondrial-enriched subfraction. I am very positive about the porcine cell-free assay approach and the results presented here. However, I feel that the shortcomings of the assay are not well discussed (see points 1-5) and some of these points could easily be experimentally implemented in a revised version of this manuscript while others should at least be discussed.

      We agree that the use of a mitochondrial-enriched subfraction for further analysis would be interesting and useful. We are actively developing experimental protocols for oocyte extract coincubation with isolated sperm heads and tails, and eventually with purified mitochondrial sheaths. However, such experiments are contingent upon our access to porcine oocytes, which has continued to be a struggle since the COVID-19 pandemic compromised our ability to attain oocytes in large, cheap, and reliable quantities. This was a continuous problem with preparing materials for this very paper and has continued to be an issue for our laboratory as well as many others at our university and across the country. We continue to maximize oocytes every time we can get access to them, but the unfortunate reality is that this access has become sparce and unreliable over the past three years.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The expression and localization of Foxc2 strongly suggest that its role is mainly confined to As undifferentiated spermatogonia (uSPGs). Lineage tracing demonstrated that all germ cells were derived from the FOXC2+ uSPGs. Specific ablation of the FOXC2+ uSPGs led to the depletion of all uSPG populations. Full spermatogenesis can be achieved through the transplantation of Foxc2+ uSPGs. Male germ cell-specific ablation of Foxc2 caused Sertoli-only testes in mice. CUT&Tag sequencing revealed that FOXC2 regulates the factors that inhibit the mitotic cell cycle, consistent with its potential role in maintaining a quiescent state in As spermatogonia. These data made the authors conclude that the FOXC2+ uSPG may be the true SSCs, essential for maintaining spermatogenesis. The conclusion is largely supported by the data presented, but two concerns should be addressed: 1) terminology used is confusing: primitive SSCs, primitive uSPGs, transit amplifying SSCs... 2) the GFP+ cells used for germ cell transplantation should be better controlled using THY1+ cells.

      Thanks for your good comments. According to your suggestions, we have addressed your two concerns as follows:

      1> Overall our work suggest that FOXC2+ SSCs are a subpopulation of SSCs in a quiescent state, thus we have replaced the term ‘primitive’ with ‘quiescent’ in the revised manuscript. In general, ‘transient amplifying SSCs’ is considered to be ‘progenitors’, thus we have replaced ‘transient amplifying SSCs’ with ‘progenitors’ in the revised manuscript.

      2> The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).

      Reviewer #2 (Public Review):

      The authors found FOXC2 is mainly expressed in As of mouse undifferentiated spermatogonia (uSPG). About 60% of As uSPG were FOXC2+ MKI67-, indicating that FOXC2 uSPG were quiescent. Similar spermatogonia (ZBTB16+ FOXC2+ MKI67-) were also found in human testis.

      The lineage tracing experiment using Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice demonstrated that all germ cells were derived from the FOXC2+ uSPG. Furthermore, specific ablation of the FOXC2+ uSPGs using Foxc2iCreERT2/+;Rosa26LSL-DTA/+ mice resulted in the depletion of all uSPG population. In the regenerative condition created by busulfan injection, all FOXC2+ uSPG survived and began to proliferate at around 30 days after busulfan injection. The survived FOXC2+ uSPGs generated all germ cells eventually. To examine the role of FOXC2 in the adult testis, spermatogenesis of Foxc2f/-;Ddx4Cre/+ mice was analyzed. From a 2-month-old, the degenerative seminiferous tubules were increased and became Sertoli cell-only seminiferous tubules, indicating FOXC2 is required to maintain normal spermatogenesis in adult testes. To get insight into the role of FOXC2 in the uSPG, CUT&Tag sequencing was performed in sorted FOXC2+ uSPG from Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice 3 days after TAM diet feeding. The results showed some unique biological processes, including negative regulation of the mitotic cell cycle, were enriched, suggesting the FOXC2 maintains a quiescent state in spermatogonia.

      Lineage tracing experiments using transgenic mice of the TAM-inducing system was well-designed and demonstrated interesting results. Based on all data presented, the authors concluded that the FOXC2+ uSPG are primitive SSCs, an indispensable subpopulation to maintain adult spermatogenesis.

      The conclusion of the mouse study is mostly supported by the data presented, but to accept some of the authors' claims needs additional information and explanation. Several terminologies define cell populations used in the paper may mislead readers.

      1) "primitive spermatogonial stem cell (SSC)" is confusing. SSCs are considered the most immature subpopulation of uSPG. Thus, primitive uSPGs are likely SSCs. The naming, primitive SSCs, and transit-amplifying SSCs (Figure 7K) are weird. In general, the transit-amplifying cell is progenitor, not stem cell. In human and even mouse, there are several models for the classification of uSPG and SSCs, such as reserved stem cells and active stem cells. The area is highly controversial. The authors' definition of stem cells and progenitor cells should be clarified rigorously and should compare to existing models.

      Thanks for your good comments. Considering that our results showed that FOXC2+ SSCs are in a quiescent state and that Mechanistically FOXC2 maintained the quiescent state of SSCs by promoting the expression of negative regulators of cell cycle, we have replaced ‘primitive SSCs’ with ‘quiescent SSCs’ in the revised manuscript. We agree with the reviewer that ‘transient amplifying SSCs’ is considered to be ‘progenitors’, thus we have replaced ‘transient amplifying SSCs’ with ‘progenitors’ in the revised manuscript. Further,from our point of view, the FOXC2+Ki67+ SSCs could be regarded as active stem cells, and the FOXC2+Ki67- SSCs could be regarded as reserved stem cells, although further research evidence is still needed to confirm this.

      2) scRNA seq data analysis and an image of FOXC2+ ZBTB16+ MKI67- cells by fluorescent immunohistochemistry are not sufficient to conclude that they are human primitive SSCs as described in the Abstract. The identity of human SSCs is controversial. Although Adark spermatogonia are a candidate population of human SSCs, the molecular profile of the Adark spermatogonia seems to be heterogeneous. None of the molecular profiles was defined by a specific cell cycle phase. Thus, more rigorous analysis is required to demonstrate the identity of FOXC2+ ZBTB16+ MKI67- cells and Adark spermatogonia.

      We agree with the reviewer that the identity of human SSCs remain elusive even though Adark population demonstrates certain characteristics of SSCs. To acknowledge this notion, we have revised our conclusion as such that only suggests FOXC2+ZBTB16+MKI67- represents a quiescent state of human SSCs.

      3) FACS-sorted GFP+ cells and MACS-THY1 cells were used for functional transplantation assay to evaluate SSC activity. In general, the purity of MACS is significantly lower than that of FACS. Therefore, FACS-sorted THY1 cells must be used for the comparative analysis. As uSPGs in adult testes express THY1, the percentage of GFP+ cells in THY1+ cells determined by flow cytometry is important information to support the transplantation data.

      Thanks for your good comments. According to your suggestions, we have addressed your concerns as follows:

      1> The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).

      2> We performed FACS analysis to determine the proportion of GFP+ cells in FACS-sorted THY1+ cells from Rosa26LSL-T/G/LSL-T/G or Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice at day 3 post TAM induction, and the result showed that GFP+ cells account for approximately 20.9±0.21% of THY1+ cells, See Author response image 1.

      Author response image 1.

      4) The lineage tracing experiments of FOXC2+-SSCs in Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G showed ~95% of spermatogenic cells and 100% progeny were derived from the FOXC2+ (GFP+) spermatogonia (Figure 2I, J) at month 4 post-TAM induction, although FOXC2+ uSPG were quiescent and a very small subpopulation (~ 60% of As, ~0.03% in all cells). This means that 40% of As spermatogonia and most of Apr/Aal spermatogonia, which were FOXC2 negative, did not contribute to spermatogenesis at all eventually. This is a striking result. There is a possibility that FOXC2CRE expresses more widely in the uSPG population although immunohistochemistry could not detect them.

      Thanks for your good comments. From our lineage tracing results, over 95% of the spermatogenic cells are derived from the FOXC2+ SSCs in the testes of 4-month-old mice, which means that FOXC2+ SSCs maintain a long-term stable spermatogenesis. In addition, previous studies have shown that only a portion of As spermatogonia belong to SSCs with complete self-renewal ability (PMID: 28087628, PMID: 25133429), which is consistent with our findings. Therefore, we speculate that 40% of As spermatogonia and most of Apr/Aal spermatogonia, which were FOXC2 negative, did contribute to spermatogenesis but cannot maintain a long-term spermatogenesis due to limited self-renewal ability.

      5) The CUT&Tag_FOXC2 analysis on the FACS-sorted FOXC2+ showed functional enrichment in biological processes such as DNA repair and mitotic cell cycle regulation (Figure 7D). The cells sorted were induced Cre recombinase expression by TAM diet and cut the tdTomato cassette out. DNA repair process and negative regulation of the mitotic cell cycle could be induced by the Cre/lox recombination process. The cells analyzed were not FOXC2+ uSPG in a normal physiological state.

      We do appreciate the reviewer’s concern on the possibility of the functions enriched in the analysis as referred might be derived from Cre/lox recombination. However, we think it is unlikely that the Cre/lox recombination process, supposed to be rather local and specific, can trigger such a systemic and robust response by the DNA damage and cell cycle regulatory pathways. The reasons are as follows: First, as far as we are aware, there has been sufficient data to support this suggested scenario. Second, we did not observe any alteration in either the SSC behaviors or spermatogenesis in general upon the TAM-induced genomic changes, suggesting the impact from the Cre/lox recombination on DNA damage or cell cycle was not significant. Third, no factors associated with the DNA repair process were revealed in the differential analysis of single-cell transcriptomes of FOXC2-WT and FOXC2-KO.

      6) Wei et al (Stem Cells Dev 27, 624-636) have published that FOXC2 is expressed predominately in As and Apr spermatogonia and requires self-renewal of mouse SSCs; however, the authors did not mention this study in Introduction, but referred shortly this at the end of Discussion. Their finding should be referred to and evaluated in advance in the Introduction.

      Thanks for your good comments. According to your suggestion, we have revised the introduction to refer this latest parallel work on FOXC2. We are happy to see that our discoveries are converged to the important role of FOXC2 in regulating SSCs in adult mammals.  

      Reviewer #3 (Public Review):

      By popular single-cell RNA-seq, the authors identified FOXC2 as an undifferentiated spermatogonia-specific expressed gene. The FOXC2+-SSCs can sufficiently initiate and sustain spermatogenesis, the ablation of this subgroup results in the depletion of the uSPG pool. The authors provide further evidence to show that this gene is essential for SSCs maintenance by negatively regulating the cell cycle in adult mice, thus well-established FOXC2 as a key regulator of SSCs quiescent state.

      The experiments are well-designed and conducted, the overall conclusions are convincing. This work will be of interest to stem cell and reproductive biologists.

      Thanks for the positive feedback.  

      Reviewer #1 (Recommendations for the Authors):

      The authors should address the following concerns:

      1) The most primitive uSPGs should be the true SSCs. The term "primitive SSCs" is very confusing.

      2) In addition to FACS-sorted GFP+ cells, FACS-sorted THY1+ cells should also be used for transplantation.

      Thanks for your good comments. According to your suggestions, we have addressed your two concerns as follows:

      1) Overall our work suggest that FOXC2+ SSCs are a subpopulation of SSCs in a quiescent state, thus we have replaced the term ‘primitive’ with ‘quiescent’ in the revised manuscript.

      2) The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).

      Reviewer #3 (Recommendations for the Authors):

      The experiments are well-designed and conducted, the overall conclusions are convincing. The only concerns are the writing, especially the introduction which was not well-rationalized. Sounds the three subtypes and three models for SSCs' self-renew are irrelevant to the major points of this manuscript. I don't think you need to talk too much about the markers of SSCs. Instead, I suggest you provide more background about the quiescent or activation states of the SSCs. In addition to that, as a nuclear-localized protein, it cannot be used to flow cytometric sorting, I don't think it should be emphasized as a marker. You identified a key transcription factor for maintaining the quiescent state of the primitive SSCs, that's quite important!

      Appreciate the positive feedback and constructive suggestions on the writing. We have substantially revised our manuscript to include the relevant advances and understanding from the field as well as highlight the importance of FOXC2 in regulating the quiescent state of SSCs.

    1. Costanza-Chock explains that we should be designing algorithms that are just.45 This means shifting from the ahistorical notion of fairness to a model of equity.

      This reminds me of a metaphor my high school used to properly explain the difference between equality and equity. Let's say there's a fence, and on the other side is a baseball game that you and your friend are trying to peek over and watch. You each get a box to stand on, and now you can see over the fence! Your friend, however, is shorter than you and still can't reach. Although you may have the same box to stand on (equality), in order to get the same opportunity to watch the game you have to put effort into making sure that everyone actually receives that truly equal opportunity, e.g. another box for your friend.

      Costanza-Chock's example of college admissions to explain equality vs. equity also make me think about what kinds of digital barriers exist in place to prevent restorative justice. Issues such as accessibility, class, and status keep coming up for me and now I'm wondering: How does class background influence the attempts made by digital humanities scholars who try to perform this restoration?

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Recommendations For The Authors):

      1) The strikingly different conclusion from the previous Bourane study seems to stem from the experimental approaches. Rather than using genetic crosses that target all neurons from the hindbrain and spinal cord that express Npy at any point in development, Boyle et al target their manipulations specifically to the lumbar region of the superficial dorsal horn in adult mice using direct viral injections. Thus, Boyle is almost certainly manipulating much fewer neurons that the original study. How then is their behavioral effects so much greater? At the minimum, the authors need to discuss this discrepancy head on. Better would be a direct molecular/anatomical comparison of the neurons targeted by each approach. This could be done using Nyp-Cre mice crossed to a Rosa-LSL-reporter strain and quantifying the overlap with the same markers used here. Perhaps, the intersectional approach with Lbx1 resulted in labeling of a different population of neurons than the adult AAV injections? Although likely outside the scope, given this work directly questions the main conclusion of the Bourane paper, it will be important to see a replication of the original finding of selectivity to mechanical itch.

      We agree that our approach should be manipulating a smaller population of neurons, and that it is therefore suprising that we see greater behavioural effects. Please see our response to "Weakness 1" of Reviewer 2 for consideration of this point. We have already provided a direct molecular comparison as requested by the reviewer, and this appears in Figure 1 supplement 1. Here we used tissue from NPY::Cre that had been crossed with Ai9 mice (i.e. a Rosa-LSL-reporter) and had received intraspinal injections of AAV.flex.GFP. We then characterised the neurochemistry of tdTomato+ cells that were GFP+ or GFP-negative.

      2) The authors state that, "91.6% ± 0.3% of cells classed as Cre-positive cells were also Npy-positive, and these accounted for 62.1% ± 0.6% of Npy-positive cells" If I am reading this correctly, does that mean that 40% of the Npy+ cells are Cre negative? If so, how is this possible?

      This interpretation is correct. For quantification of RNAscope data we used a cut-off level of 4 transcripts, and cells with fewer than 4 transcripts were classed as negative. It is likely that some of the NPY cells classified as negative for Cre would have had some Cre mRNA (sufficient to cause recombination), but at a level below this threshold. It is also possible that some NPY+ cells would fail to express Cre, since this is a BAC transgenic mouse, rather than a knock-in.

      3) Similarly, the authors state that "great majority of FP-expressing neurons in laminae I-III were immunoreactive (IR) for NPY (78.5% ± 3.6%), and these accounted for 74.6% ± 109 1.9% of the NPY-IR neurons in this area". So does this mean 20% of the recombination is non-specific/in other cell types that could be involved in pain/itch sensation?

      Our finding that 91.6% of cells with Cre mRNA were also positive for Npy mRNA (see above) indicates that Cre expression was largely restricted to NPY cells. The failure to detect NPY peptide in some of these cells probably results from the relatively low level of peptide seen in the cell bodies of peptidergic neurons, which results from the rapid transport of peptides into their axons.

      4) Comparing Fig 3B and Fig4B it seems the control baseline von Frey responses are different. In fact, baseline response in Fig4b is quite like the CNO effect in Fig 3B. Unless I'm misunderstanding something, this seems quite odd?

      We agree that there is a difference between the baseline responses. We are not aware of any particular reason for this, and we think that it reflects a degree of variability that is seen with the von Frey test. Interestingly, the baseline values for the SNI cohort (Fig 4E) lies between the values in Fig 3B and Fig 4B.

      5) In Fig 4E, the behavior of the CNO treated mice is quite variable. Can the authors comment as to how this might be happening? Does the effect correlate with viral transduction?

      We did not see any obvious correlation between the extent of viral transduction and the behaviour of individual mice.

      6) Fig6, the PDyn-Cre experiment, is a bit of a non sequitur?

      Please see our response to "Weakness 2" of Reviewer 2 for consideration of this point.

      7) The conclusion is unusually long. I recommend trimming it to make it more concise.

      We presume that this refers to the Discussion. However, this was ~1550 words, and we do not feel that that is unusually long.

      Reviewer 2 (Public Review):

      Weaknesses

      1) There is inadequate discussion about previous studies of NPY interneurons. Specifically, the authors should address why a more restricted subset of these neurons (this study) have broader effects than seen previously.

      We have expanded the discussion on the discrepancies between our findings and those reported previously. We state at the outset that we are targeting a more restricted population (lines 509-10), and we now go into more detail concerning both similarities and differences between our findings and the reasons that we think may underlie any discrepancies (various changes between lines 522-575).

      2) I cannot see the reason for including results from manipulation of Dyn+ interneurons in this paper. First, the title does not reflect roles of spinal Dyn+ population. In addition, without further experiments characterizing relationships between NPY and Dyn interneurons in modulating itch and/or nociception, Dyn datasets seem to deviate from the main theme.

      We had previously shown that activating Dyn-INs suppressed pruritogen-evoked itch (Huang et al 2018), but it was important to test whether silencing these cells would have the opposite effect. Our finding of overlap in function (i.e. both NPY-INs and Dyn-INs suppress itch, and that both innervate GRPR cells) provides strong evidence against the idea that neurochemically-defined interneuron populations have highly specific functions, and we now state this in the Discussion. The anatomical experiments (which follow on from the functional studies) provide important new information concerning synaptic circuitry of the dorsal horn, by showing that NPY-INs preferentially innervate GRPR cells, and provide around twice as many synapses on these cells, compared to the Dyn-INs. Interestingly, this correlates with the relatively large optogenetically-evoked IPSCs that we saw when NPY-INs were activated, compared to those reported by Liu et al (2019) when galanin-expressing (which largely correspond to Dyn-INs) were activated. By including these findings in the paper, we are able to make comparisons between these two populations.

      3) While the authors provided convincing evidence that GRPR+ neurons serve as a downstream effector of NPY+ neuron evoked itch, the relationship between GRPR and NPY neurons in modulating pain is not examined. Therefore, Fig. 7B is pure speculation and should be removed.

      We feel that our recent findings that GRPR neurons correspond to vertical cells, that they respond to noxious stimuli, and that activating them results in pain-related behaviours, makes it reasonable to speculate that the NPY/GRPR circuit may also be involved in the anti-nociceptive action of NPY cells. The legend for Fig 7B already refers to this as a "potential circuit", and we have toned down the corresponding part of the discussion to say that our findings "raise the possibility" that this is the case (lines 605-7). We feel that this part of the figure is important, as otherwise our summary diagram ignores some of the main findings of the paper, and we hope that this is now acceptable.

      Recommendations For The Authors

      1) Fig. 1G: the "misexpression" of tdTomato neurons was much more prominent in deep dorsal horn laminae but not in the superficial ones. Was this representative? Can the authors perform a laminae specific characterization?

      We did test for this possibility in 2 NPY::Cre;Ai9 mice that had received intraspinal injections of AAV.flex.GFP, and found that there was a modest difference - 62% of tdTomato+ cells in laminae I-II, but only 39% of those in lamina III, were GFP+. This suggests that "misexpression" may have differed slightly between these regions. However, since the difference was quite modest, and we were only able to analyse tissue from two mice in this way, we did not include these findings in the paper.

      2) I have a lot of problems interpreting the c-Fos data in Fig. 2 E and F. For the mCherry- population, how was the quantification performed? From the image, it does not look like 2030% of cells express c-Fos; at a minimum a clear stain of neurons would be needed. Similarly, the identification of NPY cells is not particularly convincing (e.g., middle arrowhead lower 2 panels in C).

      We have provided further details on how the analysis was performed (changes made to lines 1016-29). NeuN staining was used to reveal all neurons, and a modified optical disector method was performed from somatotopically appropriate regions of the dorsal horn. As noted by the Reviewer, NeuN staining was required to allow identification of mCherrynegative cells. However, we have not included the NeuN immunoreactivity in the image, as this would add considerably to the complexity. These images are from single optical sections, and therefore the overall numbers of cells are low (in comparison to what would be seen in a projected image). The intensity of mCherry staining varied between cells. However, for all mCherry-positive cells (including the example referred to by the Reviewer), there was clear staining in the membrane, which could be followed in serial sections.

      3) Please add individual data points for all quantifications.

      These have been added.

      Reviewer 3 Recommendations For The Authors:

      1) It is somewhat surprising that there is no effect on CPP after activating spinal NPY neurons in neuropathic mice, given the almost complete rescue of hypersensitivity to baseline values in the nociceptive tests. Based on the methods, it appears that conditioning was carried out already 5 min after CNO injection. Yet, suppression of c-fos activity in excitatory spinal dh neurons was observed 30min after CNO injection. Also, it is not clear to me when CNO was injected prior to the nociceptive or CQ testing?

      Have the authors considered that conditioning from 5-35 min after CNO injection might be too short after CNO injection to achieve a profound analgetic effect?

      In a previous study (Polgár et al 2023), we had observed the timecourse of CNO-evoked itch and pain behaviours in mice in which GRPR cells expressed hM3Dq. We found that these started within 5 minutes of i.p. CNO injection (e.g. Fig S2 in that paper). In addition, the timecourse of action of gabapentin and CNO (both given i.p.) are likely to be similar, and there was a preference for the chamber paired with gabapentin. We are therefore confident that the conditioning period with CNO was adequate. We now explain this in the Methods section (lines 846-52). The timing of CNO injections for the nociceptive and CQ tests is now described (lines 749-55).

      2) The authors claim that tonic pain was not affected based on the conditioned place preference test. Efficacy in withdrawal response tests and in the CPP differ by more than duration of the stimulus. I'd suggest using more cautious wording here.

      We agree that caution is needed in interpreting the results of the CPP experiments. We have therefore replaced "does" with "may" in the Results section (line 336) and "did" with "may" in the Discussion (line 620).

      3) On page 9 the authors state "...suggesting that they suppress the transmission of pain- and itch-related information in the dorsal horn." However, pain is not affected in the loss of function experiments suggesting some qualitative differences in the role of the NPY neurons in itch and pain. This should also be reflected more clearly in this statement and in the discussion e.g. "suppress itch" and "can suppress pain".

      We accept the point made by the Reviewer. We have slightly altered the wording in lines 249-51 and 610 to reflect this.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the four reviewers for their generally positive feedback on the manuscript. Below, we provide a point-by-point response to each reviewer.

      We are performing new FCS and gradient measurements as suggested by the reviewers. We are confident we can have these completed within three months (accounting for the summer break).


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *This manuscript reports a very thorough and careful study of the mobility of Bicoid in the early embryo, explored with single-point fluorescence correlation spectroscopy. Although previous groups have looked into this question in the past, the work presented here is novel and interesting because of the different Bicoid mutants and constructs the authors have examined, in particular with the goal of understanding the role of the protein DNA-binding homeodomain. The authors convincingly show that there is a significant increase in Bicoid dynamics from the anterior to the posterior region of the embryo, and that the homeodomain plays an important role in regulating the protein's dynamics. Their experiments are very well designed and carefully analyzed. The authors also modelled gradient formation to see whether this change in dynamics might play a role in setting the shape of the gradient. I am not sure I fully agree with their conclusion that it does, as mentioned in my comment below. However, it is an interesting discussion to have, and I think this paper makes a significant advance in our understanding of Bicoid's behavior in the early embryo. *

      We thank the Reviewer for their positive comments and their suggestions for improving the manuscript. We will resolve the concerns raised by the reviewer with clarity in the revision. We will also add additional comment in the Discussion regarding the interpretation of our results.

      *Major comments: *

      • 1) Gradient profile quantification: Some of the conclusions made by the authors rely on the comparison between their model of gradient formation (as captured in the equations in lines 232 and 233) and the Bcd intensity profile measured in the embryos. Since the differences in gradient shape predicted by the different models are very small (see Fig. 3B, which is on a log scale and therefore emphasize small differences, and Fig. 3C), it is very important to understand how reliable the experimental concentration profiles are.*

      This is a fair comment. It is worth noting that the key differences between the 1- and 2-component models are only apparent at large distances (and hence low concentrations) from the source.

      We performed the quantification of the gradients in a manner similar to the Gregor lab, whereby the midsagittal plane is analysed. We used 488nm illumination (rather than 2-photon, as the Gregor lab does) so our measurements are likely noisier. However, we are not investigating the variability in the gradient here, but the mean extent. We currently correct background with a uniform subtraction, but we appreciate that is not the optimal method.

      In the revised manuscript, we will repeat the above experiments using a 2-photon microscope. Further, we will image lines expressing His::mcherry without eGFP under the same imaging conditions to more accurately estimate the background signal. While we expect this to improve the data quality, we do not envisage significant change to the observed profiles based on prior experience.

      At the moment, I do not find the evidence that [Bcd] concentration profile is more consistent with a 2-component diffusion model than a 1-component model very strong. A few comments related to this: * * 1a. Line 249, it is mentioned that: "observations ... incompatible with the SDD model". Which observations exactly are incompatible with the SDD model?

      The key points are in the preceding paragraph. We will improve the model presentation in the Results and also include further contextualisation in the Discussion.

      1b. In Fig. 3D, only the prediction of the 2-component model is shown. What would the simple 1-component diffusion model look like? Is it really incompatible with the data?

      We agree with this comment and will provide the 1-component fit to the gradient profiles. We expect it to fit well for the anterior half of the embryo but fail at larger distances (as has been previously shown).

      Regarding the FCS data, we also show one and two component fits. We will show the alternative fits – a 2 particle fit is clearly an improvement (see also related response to reviewer 2).

      1c. Line 243: "The increased fraction in the fast form ... consistent with experimental observation of Bcd in the most posterior" (Mir et al.)". I am not sure how this is significant, since the simple model also predicts there will be Bcd in the posterior - the only difference is how much is there (as shown in Fig. 3C), and it's a very small difference.

      The absolute differences are not large between the two models, but due to the observed clustering (Mir et al. 2018), even small differences can have very large effects. In the revision we will provide estimates of the actual concentration differences.

      We are performing new experiments with the Fritzsche lab at Oxford to estimate if there is clustering of Bcd. We will also repeat our FCS experiments to validate our key conclusion of AP differences in diffusion of Bcd. These should be completed by the end of the summer.

      1d. Since the difference between models is in the posterior region where Bcd concentration is very low, when comparing the models to the data the question of background subtraction is essential. How was the subtracted background (mentioned line 612) estimated?

      See above response to the first comment.

      1e. Along the same line, were the detectors on the Zeiss LSM analog or photon counting detectors, and how confident can we be that signal is exactly proportional to concentration?

      We used PMTs and did not directly do photon counting. But the intensity is still proportional to the concentration. It is possible to estimate the absolute concentration value, e.g., Zhang et al., 2021 (https://doi.org/10.1016/j.bpj.2021.06.035). However, our main conclusions – especially regarding the spatially varying Bcd dynamics – are not dependent on this.

      1f. Can the gradients created by the two Bcd mutants (FIg. 4B) be quantified as well, and are they any different from the original Bcd gradient?

      We agree this would be useful. We will provide the gradient quantifications of the bcd mutants in the revision.

      1e. What is the pink line in Figure 5C (I am assuming the green one is the same as in Fig. 3D)? It could be better to not use normalization here, or normalize everything respective to the eGFP::Bcd data to make comparison in relative concentrations in the posterior for different constructs more evident (also maybe different colors for the three different data sets would help clarity).

      This is a fair comment, and we will create graphs with new data for better visualisation.

      1f. Discussion, lines 402-403: Does the detailed shape of the Bcd in the posterior region matter at all, since the posterior is not a region where Bicoid is active, as far as we know? Could a varying Bcd dynamics have other consequences that would be more biologically relevant?

      Bcd is now known to act at 70% EL (Singh et al., Cell Reports 2022). So, the gradient is relevant for a large extent of the embryo length, though it is not known if there is any effect in the most posterior region.

      2) Model for gradient formation (lines 231-238): * * 2a. Whether the molecules of Bcd can change from their fast to slow form is never questioned. How do we know (or why might we suspect) they do exchange?

      This is a good point. Within the nucleus, and based on our mutant data, we suspect the fast/slow forms correspond to unbound/bound DNA states.

      In the cytoplasm, the dynamics are less clear. Bcd can bind to cytoskeletal elements (Cai et al., PLoS One 2017) as well as to Caudal mRNA. Therefore, it seems reasonable to have different effective dynamic modes – yet, how such switching occurs remains unclear.

      Ultimately, our model approximates multiple dynamic modes that are integrated to drive Bcd motion. Including switching between states is a reasonable assumption based on what is known about cytoskeletal and protein dynamics, but we do not have a specific mechanism.

      It is challenging to estimate a specific kon / koff rate, as the dynamic changes also depend on the diffusion – which itself is changing. For now, we believe our level of abstraction is appropriate given what is known about the system. It will be very interesting to explore the specific interactions underlying such behaviour in the future, but that is beyond this current manuscript.

      2b. The values used in the model for alpha, beta_0 and rho_0 should be mentioned. Maybe having a table with all the parameters in the method section, or even in the supplementary section, would help. The exact values of alpha and beta matter, because if they are large (fast exchange) a single exponential gradient is to be expected, if they are 0 (no exchange) a double exponential gradient is to be expected, with intermediate behavior in between. Which case are we in here?

      We agree and will add a more complete table in the revision.

      3) Discussion about anomalous diffusion (lines 386-388): The 2-component model used by the authors to interpret their FCS data seems very well justified here (excellent fits with very small residuals). I agree with the authors' conclusion that "the dynamics of Bcd within the nucleus are more complicated than a simple model of bound versus unbound Bcd", but I don't see how that can lead to a diagnostic of anomalous diffusion instead. Maybe it is just a matter of exactly explaining what is meant by anomalous diffusion here (since this term is often used to mean different things). A more likely scenario I think, is that there are more than just two Bcd components in the system.

      This is a good point, and we can’t easily differentiate two/multi- component fits from anomalous diffusion ones. This is a known problem. But we have recently shown in a collaboration with the Laurent Heliot lab (Furlan et al, Biophys J 2019), that anomalous diffusion is a good stable indicator of changes, even if it might not be the right model. We use anomalous diffusion as it stably predicts changes. We do not claim, however, that diffusion is anomalous. We will improve the discussion of these points in the revised manuscript.

      4) Line 440 and after: What is the evidence that the transition between the two forms might vary non-linearly with Bcd concentration? How would that help adapt to different embryo sizes? It would be good to be more explicit here instead of just referring to another paper.

      We will improve this discussion. The central point is that the action of Bicoid is unlikely to simply depend linearly on concentration as in that case the ratio of fast to slow forms would be constant across the embryo. Related to the above comment, it is important to emphasise that we are using a phenomenological model, not one based on a specific mechanism.

      5) Since an important aspect of this work is the study of different Bcd constructs in vivo, it is important that these constructs are very clearly described, so the section on the generation of the fly lines (Methods) should be expanded. In particular: * * 5a. It seems that the eGFP:: NLS control used here was different from that first described in Ref. 64 (and used for FCS experiments in Ref. 30 and 36)? If so, what NLS sequence was used here, and precisely what type of eGFP was used (in particular, was the A206K mutation that prevents dimerization present in the eGFP used)? If it is the same construct as in Ref. 64, it should be mentioned explicitly. * * 5b. Were the mutant N51A and R54A lines gifts as well, or have they been described before? If so, previous publications should be referenced. If not, how the plasmid was introduced in the embryo should be briefly explained.

      We agree and will expand on the fly lines in the revision.

      6) Concentration calibration measurements (Methods Fig. 2, line 568 and on). It is well known that background noise is going to interfere with the measurement of N when the signal becomes equivalent to the background noise (Koppel 197, Phys Rev A 10:1938-1945, and for a recent discussion of this effect for morphogens in fly embryos: Zhang et al., 2021, Biophysical Journal 120,4230-4241). It is almost certain that in the low signal regions of the embryo (e.g. posterior cytoplasm) this is affecting the reported concentration, and should be at least acknowledged.

      We agree with the reviewer. We will provide the SBR. We will also correct the N values based on the method followed in Zhang et al., 2021, Biophysical Journal 120,4230-4241.

      *7) Reference 3 is mis-characterized in two different ways in the manuscript: * * 7a. Line 50: The conclusion in Ref. 3 was not that the gradient was due to a diffusive process, on the contrary Gregor et al. argued that Bcd was too slow to form such a long-range gradient by diffusion. Studies that do present data consistent with a morphogen gradient formation mechanism driven by diffusion are reference 5, reference 30, Zhou et al., Curr. Biol. 2012;22(8):668-75 and Müller et al., Science 336 (2012) 721-724. *

      Gregor et al., do not argue against a diffusion process – indeed, they utilise a SDD model in their paper. However, they do extensively discuss how the predicted dynamics from the SDD model are not compatible with gradient formation as observed after n.c. 13. This problem was resolved to some degree by FCS measurements of Bcd (e.g., Dostatni lab, Development 2011) and the use of a Bcd tandem reporter which showed that production and degradation change during n.c. 14 (Durrieu et al., MSB 2018). We will improve the framing of these results in the revision.

      7b. The diffusion coefficient estimated from FRAP measurements and reported in Ref. 3 (D = 0.4 micron^2/s) is mentioned a couple of times in the manuscript (line 66, line 395, line 411). However, this number is simply incorrect. When fast components (such as the ones clearly detected here by FCS) are present, they diffuse out of the photobleached area during the photobleaching step. If that is not corrected for during the analysis (and it wasn't in Ref. 3), then the recovery time measured is just equal to the photobleaching time, and has nothing to do with either the fast or slow fraction of the studied molecule - it has no other meaning than to give a lower bound on the value of the actual effective diffusion coefficient of the molecule. This effect (called the halo effect) is well known in the FRAP community (see e.g. Weiss 2004, Traffic 5:662-671), it has been experimental demonstrated to occur for Bcd-eGFP in the conditions used in Ref. 3 (Reference 30), and the actual diffusion coefficient that should have been extracted from the data presented in Ref. 3 has been recalculated by another group to be instead D = 0.9 micron^2/s (Castle et al., 2011, Cell. Mol. Bioeng. 4:116-121). It would therefore be better to report the corrected value from Castle et al. to help the field converge towards an accurate description of Bcd mobility.

      We fully agree and will use the improved FRAP estimated value for Bcd.

      *Minor comments and suggestions: *

      • 8) Figure 1: From panel A, it seems that what is called "Anterior" and "Posterior" is about 150 micron away from the embryo mid-section, i.e. about 100 micron from either the anterior pole or the posterior pole (so not the tip of the embryo, but somewhere in the anterior half or posterior half). Maybe this should be made clear in the text. *

      We have made changes in Figure 1A to indicate the region within which the FCS measurements are carried out. We have added the relevant details in the legend of figure 1 lines 137-138.

      *9) Fig. 2A; It might be good to put this graph on a log scale, so that cytoplasmic values are seen more clearly. Also, what about reporting on nuclear to cytoplasmic ratios? *

      We will rework on this graph and make necessary changes.

      *10) Fig. 2: It could be interesting to plot D_effective as a function of the measured concentration of Bicoid in different locations, since the (interesting) suggestion is made several time that [Bcd] could the a determinant of the protein mobility. *

      Our work provides an indication that Bcd concentration is connected to the diffusion. We did this by measuring at two locations. To extend this to a rigorous model would require substantial new measurement along the whole length of the embryo. While interesting, this represents a very large investment of time and lies beyond the current manuscript.

      *11) Figure 3B&C: Is the curve for 2-component diffusion (without concentration dependence) for steady-state missing? *

      We will clarify in the revision.

      *12) Lines 78 and 471: What do the authors mean by "new reagents"? The word reagent evokes a chemical reaction, but there are none here. Do the authors mean new constructs? or new mutants? *

      We have changed lines 78 and 479 from “new reagents” to new Bcd mutant eGFP lines”.

      *13) Lines 57-59: Another good reference for FCS measurements performed to study the dynamics of a morphogen (in this case Dpp) is Zhou et al., Curr. Biol. 2012;22(8):668-75 *

      We added this reference in no.70.

      *14) Lines 109-111: A word must be missing. Precisely determined what? *

      Precisely measure within cytoplasm, and nuclear compartments and also during interphase stages. We have changed to “precisely measure in the cytoplasmic and nuclear regions during the interphase stages of nuclear cycles (n.c.)12-14.” in line no.111-112.

      *15) Line 278: The increase in the slow mode is expected. Maybe explicitly mention why. *

      In line 286, we have added “due to the loss of Bcd binding to the DNA”.

      *16) Line 282: "with the fast component increasing", maybe replace with "with the diffusion coefficient of the fast component increasing" or "with the fraction of the fast component increasing". *

      We have changed line 289 “with the diffusion component of fast component increasing towards the posterior”.

      *17) Line 517: Is there a reason why the dorsal surface is always placed in the coverslip? *

      We have added these details in line 528-529 in Methods.

      *18) Line 524 and on: FCS measurements: What was the duration of each individual FCS measurement? It is great that the exact number of measurements are reported in the supplementary! *

      Thank you for the complement. Typically, cytoplasmic measurements are 60secs and nuclear measurements are 20-40s. We have added this in line no.528-529. We also added a column to indicate the duration of each of the measurements in the supplementary tables.

      *19) An Airy unit of 120 um seems large in combination with an objective with a NA of 1.2, is there a reason for that? What was the radius of the resulting detection volume? *

      Olympus microscopes have a 3x magnification stage in their confocals. This leads to the change in the Airy unit. Otherwise, it would be 40 mm.

      *20) Thank you for detailing the reasons behind the choice of excitation power, an important and often omitted details. Where in the excitation path were the values of the laser power measured (before or after the objective?)? *

      Thank you for the complement. The laser power is measured before the objective. We removed the objective and measured the laser power in the objective path.

      *21) Line 585: "since the brightness of eGFP::Bcd..." do the authors mean the molecular brightness of a single eGFP::Bcd molecule, or the total fluorescence signal? *

      It is the total fluorescence signal. We have edited line no.592.

      *22) It would be good for reference to mention the approximate value of the molecular brightness recorded for these eGFP constructs at the laser power used. *

      We will measure and tabulate in the revised manuscript.

      *23) Reference 766: The year (and maybe other things) is missing. *

      We have corrected this reference.

      24) Figure 2 (Methods): The concentrations shown on the figure should be in nM not uM. * * Thanks for noticing – we have changed.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      MAJOR POINTS

      • 1) FCS measurements and fits *
      • a) Please state the duration of each individual FCS measurement. *

      In the cytoplasm, the measurements were carried out for 60 secs and in nuclei it is between 20-40s. We could not measure for 60s in the nuclei as the nuclear position fluctuates from its initial position. We will add another column to indicate the duration of FCS measurements in the supplementary tables.

      b) The authors acknowledge potential issues with fluorophore photophysics and use different lag time ranges for the calibration dye Atto-488 (0.001 ms in Method Fig. 2) and eGFP (0.1 ms in the main figures). Given the strong influence of different parameters on data interpretation and conclusions, Method Fig. 2 should be repeated with purified eGFP. This is particularly relevant for the noisy FCS measurements in posterior regions.

      Performing the experiment with purified eGFP will be a volume calibration. We routinely performed this before each imaging session, and that should be fluorophore independent. As noted by Reviewer 1, it is also important to be clear about background correction. We will provide brightness data for eGFP and background values in the revised manuscript. We can then use this to estimate the corrected concentrations.

      We use 0.1 ms to start, as at that point any contribution from the photo-physics should have decayed (0.1 ms is about 3-5 times the day rate of the photophysical process, Sun et al., Analytical Chem 2015).

      c) Please explain why no data is shown for "AN" around 0.1 ms lag time in Fig. 1B in contrast to all other figures.

      We will add the data for AN from 0.01 in the revised figures.

      d) Please state what the estimated diffusion coefficients with one-component model fits are. Please also explain why the fits in Fig. S1E do not reach a value of 1 and why they plateau higher than the experimental data at long lag times. Please constrain the fits to G=1 at 0.1 ms tau and G=0 at 1 s tau to make a fair comparison.

      The experimental ACF curves reach 0 at long lag times as would be expected. The one-component fits, however, don’t describe the data well and as a result they do not reach 1 and 0 at short and long lag times, respectively. The fitting is done using a mean-squared estimation of the best approximation of the particular model function to the data. Fixing the parameters can be done, but it will further reduce fit accuracy and deviations will be larger. We will perform this analysis and tabulate the one component fits in supplementary 1 with necessary corrections.

      e) Please assess the validity of all multi-component fits by comparing the relative quality of the models to the number of estimated parameters using the Akaike information criterion or similar approaches.

      We will provide the values denoting the quality of the fits in the revision. We will provide the 3D 1 particle fit, the 3D 1 particle fit with triplet, the 3D 2 particle fit and the 3D 2 particle fit with triple and will provide appropriate measures of fit quality.

      f) Please also present the Bcd-GFP fits with 0.001 ms that are mentioned in line 590, and present the results for the data that did not give comparable tau_D1 and tau_D2 values mentioned in line 593.

      We will provide all the curves from 0.001ms in the supplementary. We did not provide these details as we have followed the methods from Abu Arish et al., 2010. As our cytoplasmic and nuclear TauD values match with Abu Arish et al., 2010 and Porcher et al., 2010, we thought the excess data would be redundant.

      3) Bicoid gradient and modeling * a) Little et al. 2011 observed that the Bcd gradient decreases around n.c. 13. Can the authors of the present work observe a similar concentration decrease using FCS? This is important to i) validate the FCS concentration measurements, and ii) to resolve the controversy regarding "previous claims based on imaging the Bcd profile within nuclei, which predicted decrease in Bcd diffusion in later stages".*

      This is a good point regarding conclusions from the previous literature. The Little et al. paper inferred that diffusion had to decrease from fitting to the gradient profiles. However, subsequent analysis from our lab (Durrieu et al., MSB 2018 [which uses a different method involving a tandem reporter for Bicoid] and this manuscript) strongly suggest that Bicoid remains dynamic, at least through n.c. 13 and early n.c. 14. One way to test this is to use SPIM-FCS, where longer time courses can be taken (though with slower time resolution in the FCS). We have performed preliminary experiments with SPIM-FCS and we will revisit this data to see if we can find evidence for changes in the diffusion.

      We will also extend the Discussion to make the results clearer in terms of previous models and literature.

      b) Please explain why the experimental Bcd-GFP gradient data does not reach a value of 1 (e.g. in Fig. 3D) despite normalization. Please also explain why the fits become flatter in Fig. 5B compared to the steep fit in Fig. 3D.

      Both lines were measured under identical conditions. Therefore, we normalised to the maximum value of both experiments. We will redo, normalising to each individual experiment. Regarding Fig. 5C, the Bcd::eGFP curve is identical to Fig. 3D. The flatter curve is the line with eGFP tagged to a NLS alone.

      c) For modeling, please take into account observations that the Bcd source is graded with a wide distribution (30-40% EL, see Spirov et al. 2009, Little et al. 2011, Cai et al. 2017 etc.). The extent of the source used in the present work (x_s=20 um, line 620) is at least five times too small.

      Care must be taken in defining the source extent. The most careful measurements are reported in Little et al., PLoS Biology 2011 who performed single molecule FISH. They conclude “We demonstrate that all but a few mRNA particles are confined to the anterior 20% of the egg”. Further, the peak in the particle density is around 20-30um from the anterior (Figure 3, Little et al., PLoS Biology 2011), with the vast majority of counts being with 10% of the anterior pole. Further, Durrieu et al. MSB 2018, showed using a Bcd tandem reporter that there was unlikely to be an extended gradient of bcd mRNA (maximum extent of around 50um). Here, we used a simple source domain, which was arguable a little narrow, but not significantly so. We will increase the value in the revision, but the claim that there is an extended bcd mRNA gradient (Spirov et al., Development 2009) has not been substantiated by later experiments.

      • d) Please discuss in the paper how well the simulations in Fig. 3B agree with the experimental data.*

      We will provide these details in the revision.

      • e) Please provide a precise estimate for the statement "Even with an effective diffusion coefficient of 7 μm2s-1, few molecules would be expected at the posterior given the estimated Bcd lifetime (30-50 minutes)" to turn this into a quantitative argument. How many molecules are expected to reach posterior in which model, and how does it compare to experimental observations?*

      This can be estimated based on the root-mean-square distance for diffusive processes. We will provide this in the revision.

      • f) The sentence "we find that a model of Bcd dynamics that explicitly incorporates fast and slow forms of Bcd (rather than a single "effective" dynamic mode) is consistent with a range of observations that are otherwise incompatible with the standard SDD model" needs to be toned down and corrected since a simple SDD appears to be sufficient to account for the observed gradients. If the authors disagree, please specifically point out in the paragraph around line 249 what observations exactly are incompatible with a standard SDD model.*

      This is similar to the point raised by Reviewer 1. While the standard SDD model can explain the overall gradient shape, it is not compatible with the observed time scales and Bcd puncta tracked in the posterior pole. We will improve the Discussion around this point to make the distinctions between the models clearer.

      • 5) Data presentation *
      • a) In line 27 and 122 it would be better to rephrase the wording "find/found" and give credit to previous papers that first made these observations. *

      We will edit in the revision.

      • b) For the statement "This suggests that the dynamics of the fast fraction were not captured by previous FRAP measurements", please explain why this should not be the case even though the fast fraction is shown to be larger than the slow fraction in the current work.*

      We will edit in the revision.

      • c) Similarly, the sentence "The dynamics of the slower mode correspond closely to measured Bcd dynamics from FRAP" likely needs to be corrected since it neglects the contribution of the faster mode, which is fluorescent as well and should also contribute to the dynamics from FRAP.*

      This is similar to the point raised by Reviewer 1 and we will edit in the revision.

      d) In the absence of further evidence (see above), the sentences "We establish that such spatially varying differences in the Bcd dynamics are sufficient to explain how Bcd can have a steep exponential gradient in the anterior half of the embryo and yet still have an observable fraction of Bcd near the posterior pole" and "These results explain how a long- ranged gradient can form while retaining a steep profile through much of its range" in the abstract need to be toned down.

      We are not sure here what needs to be toned down. Our results show that there are (at least) two dynamic forms of Bcd and, combined, they are capable of forming a long-ranged gradient while also ensuring the gradient remains steep in the anterior (because the diffusion coefficient itself varies across the embryo). We will go through these statements and make sure the meaning is clear.

      e) The authors state that "However, we show that eGFP::Bcd in its fastest form can move quickly (~18 μm2s-1), and the fraction of eGFP::Bcd in this form increases at lower concentrations", but this has not been directly shown. Please tone down this statement or directly test the prediction that Bcd has a higher fraction of the fast form in earlier nuclear cycles when Bcd concentration is smaller.

      This is a good suggestion, and we will test whether early nuclear cycles of the anterior domain show faster dynamics.

      *MINOR POINTS * * 1) Introduction * * a) Please explain explicitly what exactly the contention in Bcd, Nodal and Wingless dynamics is in the cited references. *

      We will add in the revision. b) In line 95, it would be better to state that this is a variation of the SDD model rather than "a new model". * We changed from “a new model” to “an improved version of SDD model” in the current version of the manuscript. 2) Methods * * a) The authors state that "The same software was also used to calculate the cross-correlation function", but I couldn't find any cross-correlation analyses. Please clarify. *

      It is line 538. There is no cross correlation. We changed this to the autocorrelation function.

      b) Please correct the "uM" typo to "nM" in the legend of Method Fig. 2A.

      We have changed this in the current version.

      • c) In the sentence "Further, since the brightness eGFP:Bcd in the anterior and posterior cytoplasm is lower compared to the nuclei", "brightness" probably needs to be changed to "concentration" since the molecular brightness is unlikely to change. *

      We edited the line no.591.

      • d) Please explain the background-correction method mentioned in line 612. Please also state at what temperature the experiments were performed.*

      We will add a better background correction in the revision. Currently, it is the non-embryo background as background noise. The measurements are carried out at 25oC.

      *3) Results * * a) Please provide labels for anterior, posterior, dorsal and ventral in Fig. 1A. * * b) Please explain the colors in Fig. 5C. * * c) Please explain the dashed lines in Fig. 3C. * We have edited Figure 1A and Figure 5C. We will edit Figure 3C in further revision.

      *OPTIONAL * * 1) If possible, it would be helpful to mention whether the transgenic animals have any abnormal phenotypes or whether they can rescue the bcd mutant. * We will update in the revision.

      *2) To validate the concentration measurements, it would be ideal if the authors could determine the Bcd concentration gradient using FCS along the anterior-posterior axis. This would also address whether there are further unexpected changes in diffusivity in medial regions and along the anterior-posterior axis that would have to be considered for modeling. * To measure the Bcd concentration using FCS along the whole axis would be a very challenging undertaking. To get the data for the two positions analysed already represents a significant amount of work. We have done SPIM-FCS measurements, and we will be repeating our FCS measurements in the Fritzsche lab at Oxford. Combined, we believe this provides sufficient corroboration of our results.

      *3) Local photoconversion experiments, e.g. in Bcd-Dendra2 embryos if available, would provide compelling support for the relevance of the measurements in the current work. * This is a nice idea, but this would represent a substantial project in its own right and lies beyond the current work.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *In my estimation the experimental work is rigorous and the results fully support the conclusions of the authors. I was surprised, however, that the HD-only form localizes via very different and simpler dynamics than does full-length Bcd, but nevertheless forms at least a qualitatively similar gradient. That leads to the question as to whether the existence of the fast and slow forms and their different ratios in different parts of the embryo actually are physiologically relevant. I don't see a straightforward way to test this experimentally, because the mutations that effect Bcd gradient formation also affect essential functions of the protein that if abrogated produce severe downstream effects on embryonic development and lethality. However I would like to see this point at least addressed in the discussion. The data and the methods are presented in such a manner that they can be reproduced, and the number of replicates and statistical analysis is overall robust. * We thank the Reviewer for the positive and constructive review. They, like both previous reviewers, raise the issue of the model and how it fits with the data. As outlined above, we will improve this part of the data presentation and also the Discussion to make sure the main results are clear.

      We agree that the underlying importance of the different dynamic forms of Bicoid – and why they change across the embryo – remains unknown. We believe that our careful characterisation of such behaviour is important nonetheless, as it reveals that: (1) morphogen dynamics are more complicated than typically modelled, and this may be just as relevant for ligands moving through extracellular space; and (2) dynamics can vary in space/time, providing an additional possible mechanism of control for regulating morphogen gradient profiles.

      Of course, we would like to explore potential physiological relevance. Further exploration of the homeodomain and its role in regulating dynamics is a potential route, but that belongs in future work.

      *Minor comments: *

      • The presentation of the graphical data measuring Bcd levels along the a-p axis (Fig 1C, 1D, 4C-F and others) needs to be improved, because the grey lines that represent ACF curves are essentially invisible. This is partly because there is usually extensive overlap between the grey lines and other lines. This may be solved by using a more vivid colour than grey for the ACF curves, or perhaps the ACF lines could be made thicker but with some transparency so that overlapping data can be seen. In any event this aspect of the presentation needs to be improved. * We have made the ACF lines thicker to distinguish from the model fit.

      *In Figs 2D and 2I measurements of statistical significance between the proportion of protein in fast and slow modes need to be added. * We will add in the revision.

      *Relevant to line 174 and Fig 2, NLS should be defined when first used, the source of the NLS should be given (is it from Bcd?) and the rationale for looking at eGFP::NLS should be made explicit. *

      We have added details on how the eGFP::NLS is generated in the methods.

      *In Fig 3D the dashed lines need to be defined. I assume these are experimental error bars but this is not stated. *

      We now state this in the legends.

      *On lines 344-5, shouldn't this conclusion concern the HD rather than the NLS? * Yes, thanks for pointing it out it is related to only NLS not NLSHD. We removed this statement from line 351.

      *On line 432, CAP is not an acronym, the correct term is 5' 'cap' or 'cap structure'. Also Cho et al. PMID 15882623 should be added to the references here. * We changed the corresponding section and added the references.

      *On lines 446, 456, 469, and throughout: replace 'blastocyst' with 'blastoderm'. The former term is generally used for embryos that undergo full cellular divisions and cleavage in early embryogenesis, not for syncytial embryos such as Drosophila. * We have changed blastocyst to blastoderm throughout the manuscript.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Major comments: The averaged autocorrelation curves were fitted to models of diffusion with one and two components. The one-component model was insufficient to reproduce the data and the two-component model seems to fit the data. Have the authors tested models with more than two components? Could it be possible to distinguish more Bcd populations?

      While it is possible to fit with further components, it rarely provides useful further insight. In particular, the error in measuring three tau_D’s is typically very large. In addition, the improvement in the fit will be marginal, and thus the extra components cannot be justified statistically. Of course, we cannot exclude a third (or more) possible dynamic modes, but within the resolution of our FCS measurements two components with triplets are in general the maximum that can be accommodated without overfitting. We will provide evidence for this claim in the supplement of the revised manuscript.

      In Figure 2E, the same concentration of eGFP::NLS is estimated to exist in the cytoplasm and nucleus. Since the NLS should target eGFP to the nucleus, what is the explanation for this observation? Is it possible that the method used to estimate the concentration of molecules is underestimating the concentration in the nucleus or the opposite in the cytoplasm?

      This is a good observation. There are two possible explanations. First, the regular division cycles “reset” the nuclear levels. Therefore, differences may not be so large. Second, FCS measurements of concentration can be noisy, as they depend on the very short time scales in the measurement. We will double check our measurements and clarify this in our revision.

      *In the simulation of the SDD model (Figure 3B), simulations at 10 min, 25 min and 120 min are shown. Assuming that 120 min corresponds to early nc14, are simulations at earlier timepoints corresponding to nc12 and nc13 indistinguishable from the profile at 120 min? This demonstration would further support the option to merge the data from all nuclear cycles. *

      This is a good point. Here, we were primarily focused on showing the time evolution of the model, rather than directly mapping onto experiment. We will clarify in the revision.

      *The results obtained with the BcdN51A mutant show an increase in diffusion speed, while retaining similar proportions of fast and slow populations. In the slow fraction, a new population is found. Assuming that the BcdN51A molecules cannot bind specifically to DNA due to the mutation, what would this newly found population correspond to? Could the authors explore the possibility of nonspecific binding to DNA? The article would also win by discussing more on this aspect or other options. *

      This is an interesting question. Dslow for anterior nuclei of N51A mutants increases (Dslow from ~0.2um2/s to ~1.5 um2/s), and the proportion is similar to the slow fraction of WT Bcd in the anterior nuclei (F=50%). The Dslow values of bcdWT suggest that 0.2um2/s is a result of DNA binding. For bcdN51A, Dslow of 1.5 um2/s is suggestive of nonspecific interaction of bcdN51A to the DNA. Such a nonspecific interaction is also noticed in the case of NLS::eGFP, where we see a significant amount (Dslow~ 1-1.5 um2/s , F=20%) of slow form in the anterior nuclei, likely due to non-specific interaction with the DNA.

      It is worth noting that the inactive homeodomain of transcription factor sex comb reduced (scr) also interacts non-specifically with DNA at high concentration (Vukojevic et al., PNAS 2010). Non-specific interaction of eGFP fluorophore is also noted to be higher in the nuclei of AT-1 cells that suggest “obstacle-free accessible space” is low in the nuclei (Wachsmuth et al., JMB 2000). Therefore, though we do not understand the specific mechanism, our results for N51 mutants are aligned with previous observations of intra-nuclei dynamics.

      The experimental rational behind the BcdMM reporter needs to be better explained as it is not clear. It was previously shown that the N51A mutation disturbs zygotic hb activation and Caudal gradient formation (see Figure 3 in Niessing et al., 2000). Since N51A already causes a strong phenotype by disturbing hb expression and Cad gradient formation, what is the reasoning being adding extra mutations to this background? Since the mutations in the PEST domain and YIRPYL motif are involved in cad translational repression, it would be more interesting to add them to the R54A mutation and further study the repression of cad? It would also shed light on the unexpected no difference or even decrease in diffusion in the cytoplasm of the R54A mutant which should increase if indeed the cad mRNA binding is being repressed.

      Our rationale was to remove more elements of Bcd to see if there was some degree of redundancy – at least in terms of the dynamics.

      The Bicoid homeodomain N51A mutation is physiologically known to cause de-repression of caudal and inhibit hunchback expression. Mechanistically, nuclear Bcd activates hb transcription. However, in the cytoplasm Bcd interacts with other proteins and forms a complex to de-repress caudal. Bcd binds to caudal mRNA through its HD at one end of the complex. However, in the other end, other proteins in the complex are bound to the 5’cap region caudal mRNA. Our rationale for generating the MM mutation was that the N51A mutation may not be sufficient for Bcd to be released from the protein complex. Therefore, additional mutations to N51A may release Bcd from interactions with either DNA or with other proteins through PEST domain and YIRPYL motif.

      *Have the authors confirmed that their BcdR54A indeed inhibits cad translation? *

      We have not tested the eGFP:bcdR54A to inhibit cad translation. We will add the data in the revision.

      *How many embryos of BcdMM were analysed? The authors should also provide a table with all the values in SI as they have done for all the other reporters. *

      We will add this data with the revision.

      *The claims with eGFP::NLSBcdHD need to be supported by data from multiple embryos. Even if multiple ACF curves are obtained from one embryo, analysing only one embryo is not sufficient. This would clarify the fact that this reporter seems to be able to reproduce the mobility of Bcd in the nucleus. *

      We agree and we are arranging to collect more data. This should be completed by the end of the summer.

      *According to the methods, all reporters were expressed in a bcd null background, made with the bcd1 allele. This allele is also known as bcd085 and according to Driever and Nusslein-Volhard, 1988 (PMID: 3383244), this allele only causes an intermediate phenotype. This indicates that a truncated version of the protein probably still exists on the embryo. Do the conclusions obtained here still hold if a truncated version of the Bcd protein exists in addition to their reporters? *

      We used the bcdE1 mutant, a null mutant of bcd. This was used by Gregor et al., Cell 2007 in their generation of the original Bcd::eGFP. We have also recently generated a more complete bcdKO mutation (Huang et al., eLife 2017). Our embryos do not have a clear phenotype that we can relate to the specific bcd- background used. Nonetheless, we agree it is an important point to be clear about the genetic background and we will clarify in the revised manuscript.

      Minor comments: * * In line 45: "Morphogens are signalling molecules", the authors should consider removing the word "signalling" since not all morphogens are, especially the one being studied, Bicoid. * * In lines 80-81 (and also throughout the text): "We measure the Bcd dynamics at multiple locations along the embryo AP-axis", should be more accurate and changed to anterior and posterior of the embryo. Using "multiple locations along the AP axis" is ambiguous and not exact for what was done.

      Yes, this is a fair comment. We have edited these sections in the current manuscript.

      *Throughout the article, the authors refer multiple times to "modes for/of Bcd transport". Since they or others have not proven that Bcd is being transported, which would involve at least another factor, the authors should replace transport by movement, diffusion or a similar word with which they are comfortable. *

      We have changed transport to movement wherever relevant in the text.

      *Suggestion: The authors claim that the Bcd gradient is exponential up to 60% of embryo length. Would this information allow a more precise calculation of the gradient decay length in the exponential region than the 80-100µm stated on line 202? *

      This is an interesting point, but our results suggest that the idea of the decay length is not so applicable in the posterior region. There, the Bcd dynamics are generally quicker, thereby increasing l. Of course, we cannot discount possible spatial variation in degradation. However, in previous work, our Bcd tandem reporter (which is sensitive to changes in degradation) did not reveal spatial variation in degradation.

      In lines 258-259, the sentence "Further, Bcd binds to caudal mRNA, repressing its expression in the cytoplasm" should be improved to clarify the role of Bcd in caudal mRNA translation repression and references should be added. This should also be corrected in the following paragraph.

      We will add the necessary corrections in the revision.

      *In line 262, "mutations" should be singular since it corresponds to only one amino acid mutation. *

      We have corrected this.

      *Figure 4J needs to be corrected as the fractions of the slow and fast populations do not correspond to what is shown in Table 3. For example, Fslow fraction of AC is ~45% in the figure while it is 36% in Table 3. The problem occurs in all fractions. *

      We are sorry there is a mislabelling in the corresponding figure. AN is in the place of AC. We have edited figure 4J and removed the mislabelling.

      *In the discussion, in lines 379-380, "Given the changing fractions of the fast and slow populations in space, the interactions between the populations are likely non-linear". What is the reasoning for non-linearity and not interchangeability? *

      If the interactions between the two populations were linear, then the fraction in each form would be constant across the embryo. Some degree of nonlinearity is required in order to have spatially varying relative populations.

      *In line 432 caudal should be italicized. *

      We have edited this.

      *In the discussion, the authors conclude that "In the nucleus, the two populations can be largely (though not completely) explained by Bcd binding to DNA". The discussion would win by explaining all the possible options. * We will add the necessary changes in the discussion. This is also related to above reviewer comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Weaknesses

      Showing that A-2 and especially A-3 are outliers in the PCA analysis is useful, but it may be hiding other interesting signals in the data. The other strains are remarkably colinear on these plots, hinting that if the outliers were removed, one main component would emerge along which they are situated. It also seems possible that this additional analysis step would allow the second dimension to better differentiate them in a way that is interesting with respect to their mutator status or mutations in key metabolic or regulatory genes.

      We thank the reviewer for their positive comments and their constructive feedback on the manuscript. Following reviewer’s recommendation, we performed the PCA analysis on metabolism data after removing A-2 and A-3 data. We have detailed those results below. Consistent with a similar analysis performed on RNA-seq datasets in our previous publication, we find that removing these outliers has only a modest effect on separating mutators from non-mutators. We find that, while the new PC2 separates most mutators from the non-mutators, the separation is rather weak. Moreover, we do not see a similar distinction when looking at metabolic data in the Stationary phase. In the interest of improving the readability of the manuscript, we recommend not including these analysis in the final manuscript. We have presented the data for the reviewer’s benefit in Author response image 1, 2 and 3.

      Author response image 1.

      Author response image 2.

      Author response image 3.

      There is a missed opportunity to connect some key results to what is known about LTEE mutations that reduce the activity of pykF (pyruvate kinase I). This gene is mutated in all 12 LTEE populations, and often these mutations are frameshifts or transposon insertions that should completely knock out its activity. At first glance, inactivating an enzyme for a step in glycolysis does not make sense when the nutrient source in the growth medium is glucose, even though PykF is only one of two isozymes E. coli encodes for this reaction. There has been speculation that inactivating pykF increases the concentration of phosphoenolpyruvate (PEP) in cells and that this can lead to increased rates of glucose import because PEP is used by the phosphotransferase system of E. coli to import glucose (see https://doi.org/10.1002/bies.20629). The current study has confirmed the higher PEP levels, which is consistent with this model.

      We thank the reviewer for pointing out this missed opportunity. We have expanded the discussion around the role of pykF mutations and the elevated concentrations of PEP observed in our data in section 3.4.

      In the introduction, the papers cited to show the importance of changes in metabolism for adaptation do not seem to fit the focus of this study very well. They stress production of toxins and secondary metabolites, which do not seem to be mechanisms that are at work in the LTEE. I can think of two areas of background that would be more relevant: (1) studies of how bacterial metabolism evolves in adaptive laboratory evolution (ALE) experiments to optimize metabolic fluxes toward biomass production (for example, https://doi.org/10.1038/nature01149), and (2) discussions of how cross-feeding, metabolic niche specialization, and metabolic interdependence evolve in microbial communities, including in other evolution experiments (for example, https://doi.org/10.1073/pnas.0708504105 and https://doi.org/10.1128/mBio.00036-12).

      We thank the reviewer for pointing out missed citations in our introduction. We agree that these papers are relevant to the topic and have added their citations. Additionally, following the suggestion of another reviewer, we have reorganized the introduction so that the concept of the role of metabolism in evolution is presented first and the LTEE second.

      Reviewer #2 (Public Review):

      [...] Overall, this is a significant and well-executed research study. It offers new insights into the complex relationship between genetic changes and observable traits in evolving populations and utilizes metabolomics in the LTEE, a novel approach in combination with RNA-seq and mutation datasets.

      However, the paper's overall clarity is lacking. It is spread too thin and covers many topics without a clear focus. I strongly recommend a substantial rewrite of the manuscript, emphasizing structure and readability. The science is well executed, but the current writing does not do it justice.

      We thank the reviewer for their positive comments and their constructive feedback on the lack of clarity in writing. Following the reviewer’s suggestions, we have rewritten parts of the manuscript and reorganizd a few sections to improve readability. We hope the revised manuscript is significantly improved.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      1) Title and Abstract: Add the study organism to the abstract, and probably also the title. Currently, E. coli is not mentioned in either! I'm also not sure that the LTEE is a sufficiently well-known acronym to abbreviate this in the title.

      We have revised the title of the manuscript and now spell out LTEE and included E. coli in the title and the abstract.

      2) Abstract: I would switch the usage of metabolome to metabolism in a few more places. For example, "changes in its metabolism", "networked and convoluted nature of metabolism". The metabolome, the concentrations of all metabolites, is what is being measured, but I think of this as a phenotypic readout of how metabolism evolving.

      We have changed “metabolome” to “metabolism” in cases where we refer to what is evolving and use “metabolome” when we refer to what is being measured.

      3) Line 16: Technically, the 12 LTEE populations were not initially identical. The Ara- differed from the Ara+ ancestors by one intentional mutation and one unintentional mutation that was not discovered until whole genomes were sequenced. I would rephrase this to "where 12 replicate populations of E. coli are propagated" or something similar so that it can be correct without needing to describe this unnecessary detail.

      The line has been rephrased as suggested.

      4) General Note: The text refers to populations as Ara-3 but the figures use A-3. I'd suggest going with A-3 and similar throughout for consistency.

      Instances of Ara have been changed to A+/-, and a sentence specifying as such has been added to the intro to make mention of this.

      5) Lines 43-44, 97-98. My understanding is that both S and L ecotypes in A-2 can use both glucose and acetate, but that the differentiation is related to their specialization that leads to each one being better on one or the other nutrient. The descriptions make it sound like each grows at a different time. Also, by definition, cells are not growing during "stationary phase". The change from glucose utilization (and acetate secretion) to acetate utilization during one cycle of growth is better described as a diauxic shift.

      We have reworded this part to remove mention of “growth” during stationary phase and changed the wording such that it no longer sounds like they grow at different times.

      6) Line 54: The statement "provide the ability to test hypotheses from previous data" is vague. Either provide an example or delete.

      We have removed this sentence as suggested.

      7) Lines 71-72: The terms "interphase" and "intraphase" sound too much like parts of the cell cycle. I'd suggest describing the comparisons as between and within growth phases.

      The use of intra and interphase have been changed as suggested.

      8) Line 79: The citrate is presumably still a chelating agent, so change phrasing to "Citrate is present in the medium because it was originally added as a chelating agent" or something similar.

      This sentence has been rewritten as suggested.

      9) Line 83: Write out "mutation accumulations" so it is easier to understand as "the number of mutations that have accumulated".

      The phrase has been changed as suggested.

      10) Line 116: It's unclear whether the abundances of metabolites are "strategies of survival" in stationary phase. An equally valid explanation is that there is less selection on the metabolome to have a specific composition during stationary phase to have high fitness.

      We have added a line about the possibility for alternative hypotheses.

      11) Figure 1: There seems to be some information missing from the legend. What are R06 and R07 in Panels A and B? Is panel D exponential phase and panel E stationary phase?

      This information was inadvertently missing from the caption and has been added.

      12) Figures 2 and 3: Gene names should be in italics. To me, the gray for deleted genes is hard to tell apart from the blue/red. Perhaps you could put a little X in these boxes instead? I think that having a little triangle pointing from each gene or metabolite name its corresponding abundance panel would help the reader track which information goes with which features. In Fig. 3 the placement of L-aspartate is a bit awkward. I'd suggest moving it down so the dashed line does not have to go through the abundance panel.

      These figures have been edited to include small triangles that link a gene or metabolite and its heatmap. Additionally, an X has been added where genes have suffered inactivating mutations and the placement of some elements has been moved to improve overall clarity.

      13) Lines 183-185: It would be easier to see and judge the consistency of these argR related relationships if a correlation graph of some kind was shown, probably as a supplemental figure. This plot could, for example, have genes/metabolites across the x-axis and fold-change on the y-axis with lines connecting points corresponding to each of the twelve populations across these categories (like Fig S8 but with lines added). Alternatively, it could be a heat map with the populations across one axis and the genes/metabolites across the other axis (like Fig S3).

      We have added a supplementary figure consisting of heatmaps showing the consistency of these changes within an evolved line. It is now figure S9.

      14) Line 195: I think adding a sentence elaborating on what exactly mutation accumulation means in this context would be helpful to readers.

      We have attempted to clarify the meaning of this by specifically stating that it is due to the accumulation of deleterious mutations.

      15) Line 293: Is standard LTEE medium DM25? These omics experiments with the LTEE sometimes use similar media with different glucose concentrations, and this is a very important detail to precisely specify.

      We reference “standard” LTEE medium in the methods section and have additionally specified the amount of sugar to make it clear that we are not supplementing the media with additional sugar.

      16) Figure S8B. Is "cystine" used instead of "cysteine" on purpose here since the compound is oxidized in the metabolomics treatment?

      The use of cystine is intentional, we detect the oxidized compound.

      Reviewer #2 (Recommendations For The Authors):

      Title:

      The abbreviation "LTEE" should not be in the title. Most readers will not recognize what it means. Instead, either the full name of the experiment, "Long-Term Evolution Experiment with E. coli," should be used, or the title should be rephrased to "Linking genotypic and phenotypic changes during a long-term evolution experiment using metabolomics."

      We have spelled out LTEE and included E. coli in the title.

      Abstract:

      Sentence 1: Consider softening the statement: "Do changes in an organism's environment, genome, or gene expression patterns often lead to changes in its metabolome?"

      We have rephrased this sentence to “Changes in an organism's environment, genome, or gene expression patterns can lead to changes in its metabolism”.

      Sentence 4: Use a hyphen for "Long-Term."

      This addition has been made.

      Sentence 4: Replace "transduce" with a more appropriate term: "...how the effects of mutations can be distributed through a cellular network to eventually affect metabolism and fitness."

      We have rewritten this sentence as “to understand how mutations can eventually affect metabolism and perhaps fitness”.

      Sentence 5: Clarify the use of "both" to refer to the ancestor of the LTEE and its descendant populations as two classes.

      We have reworded this sentence so it’s clear that the ancestors and evolved lines are two separate classes “We used mass-spectrometry to broadly survey the metabolomes of the ancestral strains and all 12 evolved lines…”.

      Sentence 6: Reverse the order for better emphasis: "Our work provides a better understanding of how mutations might affect fitness through the metabolome in the LTEE, and thus provides a major step in developing a complete genotype-phenotype map for this experimental system."

      We have rearranged this sentence per the reviewers suggestion.

      Introduction:

      Revise the introduction for clarity, readability, and logical narrative progression. Start with the second paragraph to set up the basic scientific principles being studied and then transition to describing the LTEE as a model system to examine those principles.

      The introduction has been rearranged and reworded in parts to increase clarity.

      Sentence 1: Revise for clarity: "The Long-Term Evolution Experiment (LTEE) has studied 12 initially identical populations of Escherichia coli as they have evolved in a carbon-limited, minimal glucose medium under a daily serial transfer regime."

      Sentence 2: Suggestion: "Begun in 1988, the LTEE populations have evolved for more than 75,000 generations, making it the longest-running experiment of its kind."

      Paragraph 2, sentence 2: Italicize "Drosophila."

      Paragraph 3, sentence 2: Make an important distinction: "Ara-3 is unique in that it evolved the ability to grow aerobically on citrate."

      Paragraph 3, sentence 4: Introduce the IS-mediated loss of the rbs operon in the LTEE as if it has not been described elsewhere.

      These suggestions have been incorporated into the manuscript.

      Results:

      Section 3.1: The use of samples from hours 2 and 24 to represent exponential and stationary phase may present some issues. For instance, capturing Ara-3 during its exponential growth on glucose, but not citrate, at hour 2. Furthermore, except for Ara-3, the LTEE populations reach stationary phase after approximately 4 hours, and there could be significant differences between early, mid, and late stationary phase. This possibility should be acknowledged, and future follow-up work should consider exploring these differences.

      We have added sentences in the first paragraph of the results section to include these details. We have also added a short paragraph to the conclusions suggesting additional studies of stationary phase, citing work on evolution of E. coli during long term stationary phase.

      Paragraph 3: While Turner et al. 2017 is an essential reference regarding resource use differences between Ara-3 and other LTEE populations, it would be more suitable to reference Blount et al. 2012 for the mutations that enabled access to citrate. Also, it is important to note that the difference lies in the ability to grow aerobically on citrate, rather than the ability to metabolize it.

      This citation has been added.

      Paragraph 4: As mentioned elsewhere, most LTEE populations exhibit balanced polymorphisms. Therefore, it is more appropriate to state that Ara-2 is the best-understood example of long-term diversity. It is likely that there are important metabolic differences between co-existing lineages in other LTEE populations.

      We now refer to Ara-2 as being the best-understood example of long term diversity..

      Paragraph 5: The first sentence of this paragraph should likely end with "levels."

      The word “levels” was added to the end of this sentence.

      Figure 3: It is preferable to refer to the "Superpathway of arginine and polyamine biosynthesis," citing EcoCyc as a reference, rather than a descriptor.

      This has been changed to a reference.

      Section 3.3, Paragraph 3: While higher intracellular amino acid abundances may facilitate higher translation rates and faster growth, the higher abundances themselves do not evaluate the hypothesis. To evaluate the hypothesis, it is necessary to demonstrate that higher abundances are associated with higher translation or growth rates. Therefore, the final sentence of this paragraph is not meaningful.

      We have reworded this sentence to say that it’s not possible to tell what the additional amino acids are being used for given only this data and that additional experiments are needed to confirm this hypothesis.

      Section 3.4: The first paragraph of this section misstates how evolution works. The low level of glucose in the LTEE does not drive innovation; instead, innovation occurs at random through the introduction of variation by mutation. Although the existence of the citrate resource acts as a reward that selects for variation that provides access to it, it is essential to remember that evolution is blind to such a reward. Moreover, regarding the evolution of the Cit+ trait, it is incorrect to assert that low glucose contributed to its evolution. As shown by Quandt et al. (2015), it seems probable that Cit+ evolution was potentiated by adaptation to specialization on acetate, which is produced by overflow metabolism resulting from rapid growth on glucose. This rapid growth only occurs when glucose is relatively abundant. The level of glucose seems low to us because it is low relative to traditional levels in bacteriological media, but not to the bacteria.

      We agree that this is a semantical, but important distinction. We have reworded this part as to not suggest that evolution has any forward thinking properties and is indeed blind to any rewards that might occur as the result of adaptation.

      In general, all instances of "utilize" and its cognates should be replaced with "use" and its cognates.

      Instances of “utilize” have been changed to use and its cognates.

      There is some uncertainty about the expectation of ramping up the TCA cycle in the LTEE. Overflow metabolism and acetate production appear to be prevalent in the LTEE, suggesting that many lineages only partially oxidize carbon derived from glucose, thereby bypassing the TCA cycle. While it is possible that this interpretation is incorrect, it would be helpful to see it addressed in the manuscript.

      We agree that this is a plausible hypothesis, we have added a paragraph at the end of this section that discusses the implications of overflow metabolism as an alternative hypothesis.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Although I found the introduction well written, I think it lacks some information or needs to develop more on some ideas (e.g., differences between the cerebellum and cerebral cortex, and folding patterns of both structures). For example, after stating that "Many aspects of the organization of the cerebellum and cerebrum are, however, very different" (1st paragraph), I think the authors need to develop more on what these differences are. Perhaps just rearranging some of the text/paragraphs will help make it better for a broad audience (e.g., authors could move the next paragraph up, i.e., "While the cx is unique to mammals (...)").

      We have added additional context to the introduction and developed the differences between cerebral and cerebellar cortex, also re-arranging the text as suggested.

      2) Given that the authors compare the folding patterns between the cerebrum and cerebellum, another point that could be mentioned in the introduction is the fact that the cerebellum is convoluted in every mammalian species (and non-mammalian spp as well) while the cerebrum tends to be convoluted in species with larger brains. Why is that so? Do we know about it (check Van Essen et al., 2018)? I think this is an important point to raise in the introduction and to bring it back into the discussion with the results.

      We now mention in the introduction the fact that the cerebellum is folded in mammals, birds and some fishes, and provide references to the relevant literature. We have also expanded our discussion about the reasons for cortical folding in the discussion, which now contains a subsection addressing the subject (this includes references to the work of Van Essen).

      3) In the results, first paragraph, what do the authors mean by the volume of the medial cerebellum? This needs clarification.

      We have modified the relevant section in the results, and made the definition of the medial cerebellum more clear indicating that we refer to the vermal region of the cerebellum.

      4) In the results: When the authors mention 'frequency of cerebellar folding', do they mean the degree of folding in the cerebellum? At least in non-mammalian species, many studies have tried to compare the 'degree or frequency of folding' in the cerebellum by different proxies/measurements (see Iwaniuk et al., 2006; Yopak et al., 2007; Lisney et al., 2007; Yopak et al., 2016; Cunha et al., 2022). Perhaps change the phrase in the second paragraph of the result to: "There are no comparative analyses of the frequency of cerebellar folding in mammals, to our knowledge".

      We have modified the subsection in the methods referring to the measurement of folial width and folial perimeter to make the difference more clear. The folding indices that have been used previously (which we cite) are based on Zilles’s gyrification index. This index provides only a global idea of degree of folding, but it’s unable to distinguish a cortex with profuse shallow folds from one with a few deep ones. An example of this is now illustrated in Fig. 3d, where we also show how that problem is solved by the use of our two measurements (folial width and perimeter). The problem is also discussed in the section about the measurement of folding in the discussion section:

      “Previous studies of cerebellar folding have relied either on a qualitative visual score (Yopak et al. 2007, Lisney et al. 2008) or a “gyrification index” based on the method introduced by Zilles et al. (1988, 1989) for the study of cerebral folding (Iwaniuk et al. 2006, Cunha et al. 2020, 2021). Zilles’s gyrification index is the ratio between the length of the outer contour of the cortex and the length of an idealised envelope meant to reflect the length of the cortex if it were not folded. For instance, a completely lissencephalic cortex would have a gyrification index close to 1, while a human cerebral cortex typically has a gyrification index of ~2.5 (Zilles et al. 1988). This method has certain limitations, as highlighted by various researchers (Germanaud et al. 2012, 2014, Rabiei et al. 2018, Schaer et al. 2008, Toro et al. 2008, Heuer et al. 2019). One important drawback is that the gyrification index produces the same value for contours with wide variations in folding frequency and amplitude, as illustrated in Fig. 3d. In reality, folding frequency (inverse of folding wavelength) and folding amplitude represent two distinct dimensions of folding that cannot be adequately captured by a single number confusing both dimensions. To address this issue we introduced 2 measurements of folding: folial width and folial perimeter. These measurements can be directly linked to folding frequency and amplitude, and are comparable to the folding depth and folding wavelength we introduced previously for cerebral 3D meshes (Heuer et al. 2019). By using these measurements, we can differentiate folding patterns that could be confused when using a single value such as the gyrification index (Fig. 3d). Additionally, these two dimensions of folding are important, because they can be related to the predictions made by biomechanical models of cortical folding, as we will discuss now.”

      5) Sultan and Braitenberg (1993) measured cerebella that were sagittally sectioned (instead of coronal), right? Do you think this difference in the plane of the section could be one of the reasons explaining different results on folial width between studies? Why does the foliation index calculated by Sultan and Braitenberg (1993) not provide information about folding frequency?

      The measurement of foliation should be similar as far as enough folds are sectioned perpendicular to their main axis. This will be the case for folds in the medial cerebellum (vermis) sectioned sagittally, and for folds in the lateral cerebellum sectioned coronally. The foliation index of Sultan and Braitenberg does not provide a similar account of folding frequency as we do because they only measure groups of folia (what some called lamellae), whereas we measure individual folia. It is not easy to understand exactly how Sultan and Braitenberg proceeded from their paper. We contacted Prof. Fahad Sultan (we acknowledge his help in our manuscript). Author response image 1 provides a more clear description of their procedure:

      Author response image 1.

      As Author response image 1 shows, each of the structures that they call a fold is composed of several folia, and so their measurements are not comparable with ours which measure individual folia (a). The flattened representation (b) is made by stacking the lengths of the fold axes (dashed lines), separating them by the total length of each fold (the solid lines), which each may contain several folia.

      6) Another point that needs to be clarified is the log transformation of the data. Did the authors use log-transformed data for all types of analyses done in the study? Write this information in the material and methods.

      Yes, we used the log10 transformation for all our measurements. This is now mentioned in the methods section, and again in the section concerning allometry. We are including a link to all our code to facilitate exact replication of our entire method, including this transformation.

      7) The discussion needs to be expanded. The focus of the paper is on the folding pattern of the cerebellum (among different mammalian species) and its relationship with the anatomy of the cerebrum. Therefore, the discussion on this topic needs to be better developed, in my opinion (especially given the interesting results of this paper). For example, with the findings of this study, what can we say about how the folding of the cerebellum is determined across mammals? The authors found that the folial width, folial perimeter, and thickness of the molecular layer increase at a relatively slow rate across the species studied. Does this mean that these parameters have little influence on the cerebellar folding pattern? What mostly defines the folding patterns of the cerebellum given the results? Is it the interaction between section length and area? Can the authors explain why size does not seem to be a "limiting factor" for the folding of the cerebellum (for example, even relatively small cerebella are folded)? Is that because the 'white matter' core of the cerebellum is relatively small (thus more stress on it)?

      We have expanded the discussion as suggested, with subsections detailing the measuring of folding, the modelling of folding for the cerebrum and the cerebellum, and the role that cerebellar folding may play in its function. We refer to the literature on cortical folding modelling, and we discuss our results in terms of the factors that this research has highlighted as critical for folding. From the discussion subsection on models of cortical folding:

      “The folding of the cerebral cortex has been the focus of intense research, both from the perspective of neurobiology (Borrell 2018, Fernández and Borrell 2023) and physics (Toro and Burnod 2005, Tallinen et al. 2014, Kroenke and Bayly 2018). Current biomechanical models suggest that cortical folding should result from a buckling instability triggered by the growth of the cortical grey matter on top of the white matter core. In such systems, the growing layer should first expand without folding, increasing the stress in the core. But this configuration is unstable, and if growth continues stress is released through cortical folding. The wavelength of folding depends on cortical thickness, and folding models such as the one by Tallinen et al. (2014) predict a neocortical folding wavelength which corresponds well with the one observed in real cortices. Tallinen et al. (2014) provided a prediction for the relationship between folding wavelength λ and the mean thickness (𝑡) of the cortical layer: λ = 2π𝑡(µ/(3µ𝑠))1/3. (...)”

      From this biomechanical framework, our answers to the questions of the Reviewer would be:

      • How is the folding of the cerebellum determined across mammals? By the expansion of a layer of reduced thickness on top of an elastic layer (the white matter)

      • Folial width, folial perimeter, and thickness of the molecular layer increase at a relatively slow rate across the species studied. Does this mean that these parameters have little influence on the cerebellar folding pattern? On the contrary, that indicates that the shape of individual folia is stable, providing the smallest level of granularity of a folding pattern. In the extreme case where all folia had exactly the same size, a small cerebellum would have enough space to accommodate only a few folia, whereas a large cerebellum would accommodate many more.

      • What mostly defines the folding patterns of the cerebellum given the results? Is it the interaction between section length and area? It’s the mostly 2D expansion of the cerebellar cortical layer and its thickness.

      • Can the authors explain why size does not seem to be a "limiting factor" for the folding of the cerebellum? Because even a cerebellum of very small volume would fold if its cortex were thin enough and expanded sufficiently. That’s why the cerebellum folds even while being smaller than the cerebrum: because its cortex is much thinner.

      8) One caveat or point to be raised is the fact that the authors use the median of the variables measured for the whole cerebellum (e.g., median width and median perimeter across all folia). Although the cerebellum is highly uniform in its gross internal morphology and circuitry's organization across most vertebrates, there is evidence showing that the cerebellum may be organized in different functional modules. In that way, different regions or folia of the cerebellum would have different olivo-cortico-nuclear circuitries, forming, each one, a single cerebellar zone. Although it is not completely clear how these modules/zones are organized within the cerebellum, I think the authors could acknowledge this at the end of their discussion, and raise potential ideas for future studies (e.g., analyse folding of the cerebellum within the brain structure - vermis vs lateral cerebellum, for example). I think this would be a good way to emphasize the importance of the results of this study and what are the main questions remaining to be answered. For example, the expansion of the lateral cerebellum in mammals is suggested to be linked with the evolution of vocal learning in different clades (see Smaers et al., 2018). An interesting question would be to understand how foliation within the lateral cerebellum varies across mammalian clades and whether this has something to do with the cellular composition or any other aspect of the microanatomy as well as the evolution of different cognitive skills in mammals.

      We now address this point in a subsection of the discussion which details the implications of our methodological decisions and the limitations of our approach. It is true that the cerebellum is regionally variable. Our measurements of folial width, folial perimeter and molecular layer thickness are local, and we should be able to use them in the future to study regional variation. However, this comes with a number of difficulties. First, it would require sampling all the cerebellum (and the cerebrum) and not just one section. But even if that were possible that would increase the number of phenotypes, beyond the current scope of this study. Our central question about brain folding in the cerebellum compared to the cerebrum is addressed by providing data for a substantial number of mammalian species. As indicated by Reviewer #3, adding more variables makes phylogenetic comparative analyses very difficult because the models to fit become too large.

      Reviewer #2 (Public Review):

      1) The methods section does not address all the numerical methods used to make sense of the different brain metrics.

      We now provide more detailed descriptions of our measurements of foliation, phylogenetic models, analysis of partial correlations, phylogenetic principal components, and allometry. We have added illustrations (to Figs. 3 and 5), examples and references to the relevant literature.

      2) In the results section, it sometimes makes it difficult for the reader to understand the reason for a sub-analysis and the interpretation of the numerical findings.

      The revised version of our manuscript includes motivations for the different types of analyses, and we have also added a paragraph providing a guide to the structure of our results.

      3) The originality of the article is not sufficiently brought forward:

      a) the novel method to detect the depth of the molecular layer is not contextualized in order to understand the shortcomings of previously-established methods. This prevents the reader from understanding its added value and hinders its potential re-use in further studies.

      The revised version of the manuscript provides additional context which highlights the novelty of our approach, in particular concerning the measurement of folding and the use of phylogenetic comparative models. The limitations of the previous approaches are stated more clearly, and illustrated in Figs. 3 and 5.

      b) The numerous results reported are not sufficiently addressed in the discussion for the reader to get a full grasp of their implications, hindering the clarity of the overall conclusion of the article.

      Following the Reviewer’s advice, we have thoroughly restructured our results and discussion section.

      Reviewer #3 (Public Review):

      1) The first problem relates to their use of the Ornstein-Uhlenbeck (OU) model: they try fitting three evolutionary models, and conclude that the Ornstein-Uhlenbeck model provides the best fit. However, it has been known for a while that OU models are prone to bias and that the apparent superiority of OU models over Brownian Motion is often an artefact, a problem that increases with smaller sample sizes. (Cooper et al (2016) Biological Journal of the Linnean Society, 2016, 118, 64-77).

      Cooper et al.’s (2016) article “A Cautionary Note on the Use of Ornstein Uhlenbeck Models in Macroevolutionary Studies” suggests that comparing evolutionary models using the model’s likelihood leads often to incorrectly selecting OU over BM even for data generated from a BM process. However, Grabowski et al (2023) in their article ‘A Cautionary Note on “A Cautionary Note on the Use of Ornstein Uhlenbeck Models in Macroevolutionary Studies”’ suggest that Cooper et al.’s (2016) claim may be misleading. The work of Clavel et al. (2019) and Clavel and Morlon (2017) shows that the penalised framework implemented in mvMORPH can successfully recover the parameters of a multivariate OU process. To address more directly the concern of the Reviewer, we used simulations to evaluate the chances that we would decide for an OU model when the correct model was BM – a similar procedure to the one used by Cooper et al.’s (2016). However, instead of using the likelihood of the fitted models directly as Cooper et al. (2016) – which does not control for the number of parameters in the model – we used the Akaike Information Criterion, corrected for small sample sizes: AICc. The standard Akaike Information Criterion takes the number of parameters of the model into account, but this is not sufficient when the sample size is small. AICc provides a score which takes both aspects into account: model complexity and sample size. This information has been added to the manuscript:

      “We selected the best fitting model using the Akaike Information Criterion (AIC), corrected for 𝐴𝐼𝐶 = − 2 𝑙𝑜𝑔(𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑) + 2 𝑝. This approximation is insufficient when the𝑝 sample size small sample sizes (AICc). AIC takes into account the number of parameters in the model: is small, in which case an additional correction is required, leading to the corrected AIC: 𝐴𝐼𝐶𝑐 = 𝐴𝐼𝐶 + (2𝑝2 + 2𝑝)/(𝑛 − 𝑝 − 1), where 𝑛 is the sample size.”

      In 1000 simulations of 9 correlated multivariate traits for 56 species (i.e., 56*9 data points) using our phylogenetic tree, only 0.7% of the times we would decide for OU when the real model was BM.

      2) Second, for the partial correlations (e.g. fig 7) and Principal Components (fig 8) there is a concern about over-fitting: there are 9 variables and only 56 data points (violating the minimal rule of thumb that there should be >10 observations per parameter). Added to this, the inclusion of variables lacks a clear theoretical rationale. The high correlations between most variables will be in part because they are to some extent measuring the same things, e.g. the five different measures of cerebellar anatomy which include two measures of folial size. This makes it difficult to separate their effects. I get that the authors are trying to tease apart different aspects of size, but in practice, I think these results (e.g. the presence of negative coefficients in Fig 7) are really hard or impossible to interpret. The partial correlation network looks like a "correlational salad" rather than a theoretically motivated hypothesis test. It isn't clear to me that the PC analyses solve this problem, but it partly depends on the aims of these analyses, which are not made very clear.

      PCA is simply a rigid rotation of the data, distances among multivariate data points are all conserved. Neither our PCA nor our partial correlation analysis involve model fitting, the concept of overfitting does not apply. PCA and partial correlations are also not used here for hypothesis testing, but as exploratory methods which provide a transformation of the data aiming at capturing the main trends of multivariate change. The aim of our analysis of correlation structure is precisely to avoid the “correlational salad” that the Reviewer mentions. The Reviewer is correct: all our variables are correlated to a varying degree (note that there are 56 data points per variable = 56*9 data points, not just 56 data points). Partial correlations and PCA aim at providing a principled way in which correlated measurements can be explored. In the revised version of the manuscript we include a more detailed description of partial correlations and PCA (phylogenetic). Whenever variables measure the same thing, they will be combined into the same principal component (these are the combinations shown in Fig. 8 b and d). Additionally, two variables may be correlated because of their correlation with a third variable (or more). Partial correlations address this possibility by looking at the correlations between the residuals of each pair of variables after all other variables have been covaried out. We provide a simple example which should make this clear, providing in particular an intuition for the meaning of negative correlations:

      “All our phenotypes were strongly correlated. We used partial correlations to better understand pairwise relationships. The partial correlation between 2 vectors of measurements a and b is the correlation between their residuals after the influence of all other measurements has been covaried out. Even if the correlation between a and b is strong and positive, their partial correlation could be 0 or even negative. Consider, for example, 3 vectors of measurements a, b, c, which result from the combination of uncorrelated random vectors x, y, z. Suppose that a = 0.5 x + 0.2 y + 0.1 z, b = 0.5 x - 0.2 y + 0.1 z, and c = x. The measurements a and b will be positively correlated because of the effect of x and z. However, if we compute the residuals of a and b after covarying the effect of c (i.e., x), their partial correlation will be negative because of the opposite effect of y on a and b. The statistical significance of each partial correlation being different than 0 was estimated using the edge exclusion test introduced by Whittaker (1990).”

      The rationale for our analyses has been made more clear in the revised version of the manuscript, aided by the more detailed description of our methods. In particular, we describe better the reason for our 2 measurements of folial shape – width and perimeter – which measure independent dimensions of folding (this is illustrated in Fig. 3d).

      3) The claim of concerted evolution between cortical and cerebellar values (P 11-12) seems to be based on analyses that exclude body size and brain size. It, therefore, seems possible - or even likely - that all these analyses reveal overall size effects that similarly influence the cortex and cerebellum. When the authors state that they performed a second PC analysis with body and brain size removed "to better understand the patterns of neuroanatomical evolution" it isn't clear to me that is what this achieves. A test would be a model something like [cerebellar measure ~ cortical measure + rest of the brain measure], and this would deal with the problem of 'correlation salad' noted below.

      The answer to this question is in the partial correlation diagram in Fig. 7c. This analysis does not exclude body weight nor brain weight. It shows that the strong correlation between cerebellar area and length is supported by a strong positive partial correlation, as is the link between cerebral area and length. There is a significant positive partial correlation between cerebellar section area and cerebral section length. That is, even after covarying everything else, there is still a correlation between cerebellar section area and cerebral section length (this partial correlation is equivalent to the suggestion of the Reviewer). Additionally, there is a positive partial correlation between body weight and cerebellar section area, but not significant partial correlation between body weight and cerebral section area or length. Our approach aims at obtaining a general view of all the relationships in the data. Testing an individual model would certainly decrease the number of correlations, however, it would provide only a partial view of the problem.

      4) It is not quite clear from fig 6a that the result does indeed support isometry between the data sets (predicted 2/3 slope), and no coefficient confidence intervals are provided.

      We have now added the numerical values of the CIs to all our plots in addition to the graphical representations (grey regions) in the previous version of the manuscript. The isometry slope (0.67) is either within the CIs (both for the linear and orthogonal regressions) or at the margin, indicating that if the relationships are not isometric, they are very close to it.

      Referencing/discussion/attribution of previous findings

      5) With respect to the discussion of the relationship between cerebellar architecture and function, and given the emphasis here on correlated evolution with cortex, Ramnani's excellent review paper goes into the issues in considerable detail, which may also help the authors develop their own discussion: Ramnani (2006) The primate cortico-cerebellar system: anatomy and function. Nature Reviews Neuroscience 7, 511-522 (2006)

      We have added references to the work of Ramnani.

      6) The result that humans are outliers with a more folded cerebellum than expected is interesting and adds to recent findings highlighting evolutionary changes in the hominin human cerebellum, cerebellar genes, and epigenetics. Whilst Sereno et al (2020) are cited, it would be good to explain that they found that the human cerebellum has 80% of the surface area of the cortex.

      We have added this information to the introduction:

      “In humans, the cerebellum has ~80% of the surface area of the cerebral cortex (Sereno et al. 2020), and contains ~80% of all brain neurons, although it represents only ~10% of the brain mass (Azevedo et al. 2009)”

      7) It would surely also be relevant to highlight some of the molecular work here, such as Harrison & Montgomery (2017). Genetics of Cerebellar and Neocortical Expansion in Anthropoid Primates: A Comparative Approach. Brain Behav Evol. 2017;89(4):274-285. doi: 10.1159/000477432. Epub 2017 (especially since this paper looks at both cerebellar and cortical genes); also Guevara et al (2021) Comparative analysis reveals distinctive epigenetic features of the human cerebellum. PLoS Genet 17(5): e1009506. https://doi.org/10.1371/journal. pgen.1009506. Also relevant here is the complex folding anatomy of the dentate nucleus, which is the largest structure linking cerebellum to cortex: see Sultan et al (2010) The human dentate nucleus: a complex shape untangled. Neuroscience. 2010 Jun 2;167(4):965-8. doi: 10.1016/j.neuroscience.2010.03.007.

      The information is certainly important, and could have provided a wider perspective on cerebellar evolution, but we would prefer to keep a focus on cerebellar anatomy and address genetics only indirectly through phylogeny.

      8) The authors state that results confirm previous findings of a strong relationship between cerebellum and cortex (P 3 and p 16): the earliest reference given is Herculano-Houzel (2010), but this pattern was discovered ten years earlier (Barton & Harvey 2000 Nature 405, 1055-1058. https://doi.org/10.1038/35016580; Fig 1 in Barton 2002 Nature 415, 134-135 (2002). https://doi.org/10.1038/415134a) and elaborated by Whiting & Barton (2003) whose study explored in more detail the relationship between anatomical connections and correlated evolution within the cortico-cerebellar system (this paper is cited later, but only with reference to suggestions about the importance of functions of the cerebellum in the context of conservative structure, which is not its main point). In fact, Herculano-Houzel's analysis, whilst being the first to examine the question in terms of numbers of neurons, was inconclusive on that issue as it did not control for overall size or rest of the brain (A subsequent analysis using her data did, and confirmed the partially correlated evolution - Barton 2012, Philos Trans R Soc Lond B Biol Sci. 367:2097-107. doi: 10.1098/rstb.2012.0112.)

      We apologise for this oversight, these references are now included.

    2. Reviewer #1 (Public Review):

      This paper provides valuable (and impressive) data on the geometry of cerebellar foliation among 56 species of mammals and gives novel insights into the evolution of cerebellar foliation and its relationship with the anatomy of the cerebrum. Thus far, the majority of the research on brain folding focuses on the cerebral cortex with little research on the cerebellum. The results from Heuer et al confirm that the evolution of the cerebellum and cerebrum follows a concerted fashion across mammals. Moreover, they suggest that both the cerebrum and cerebellum folding are explained by a similar mechanistic process.

      1. Although I found the introduction well written, I think it lacks some information or needs to develop more on some ideas (e.g., differences between the cerebellum and cerebral cortex, and folding patterns of both structures). For example, after stating that "Many aspects of the organization of the cerebellum and cerebrum are, however, very different" (1st paragraph), I think the authors need to develop more on what these differences are. Perhaps just rearranging some of the text/paragraphs will help make it better for a broad audience (e.g., authors could move the next paragraph up, i.e., "While the cx is unique to mammals (...)").

      2. Given that the authors compare the folding patterns between the cerebrum and cerebellum, another point that could be mentioned in the introduction is the fact that the cerebellum is convoluted in every mammalian species (and non-mammalian spp as well) while the cerebrum tends to be convoluted in species with larger brains. Why is that so? Do we know about it (check Van Essen et al., 2018)? I think this is an important point to raise in the introduction and to bring it back into the discussion with the results.

      3. In the results, first paragraph, what do the authors mean by the volume of the medial cerebellum? This needs clarification.

      4. In the results: When the authors mention 'frequency of cerebellar folding', do they mean the degree of folding in the cerebellum? At least in non-mammalian species, many studies have tried to compare the 'degree or frequency of folding' in the cerebellum by different proxies/measurements (see Iwaniuk et al., 2006; Yopak et al., 2007; Lisney et al., 2007; Yopak et al., 2016; Cunha et al., 2022). Perhaps change the phrase in the second paragraph of the result to: "There are no comparative analyses of the frequency of cerebellar folding in mammals, to our knowledge".

      5. Sultan and Braitenberg (1993) measured cerebella that were sagittally sectioned (instead of coronal), right? Do you think this difference in the plane of the section could be one of the reasons explaining different results on folial width between studies? Why does the foliation index calculated by Sultan and Braitenberg (1993) not provide information about folding frequency?

      6. Another point that needs to be clarified is the log transformation of the data. Did the authors use log-transformed data for all types of analyses done in the study? Write this information in the material and methods.

      7. The discussion needs to be expanded. The focus of the paper is on the folding pattern of the cerebellum (among different mammalian species) and its relationship with the anatomy of the cerebrum. Therefore, the discussion on this topic needs to be better developed, in my opinion (especially given the interesting results of this paper). For example, with the findings of this study, what can we say about how the folding of the cerebellum is determined across mammals? The authors found that the folial width, folial perimeter, and thickness of the molecular layer increase at a relatively slow rate across the species studied. Does this mean that these parameters have little influence on the cerebellar folding pattern? What mostly defines the folding patterns of the cerebellum given the results? Is it the interaction between section length and area? Can the authors explain why size does not seem to be a "limiting factor" for the folding of the cerebellum (for example, even relatively small cerebella are folded)? Is that because the 'white matter' core of the cerebellum is relatively small (thus more stress on it)?

      8. One caveat or point to be raised is the fact that the authors use the median of the variables measured for the whole cerebellum (e.g., median width and median perimeter across all folia). Although the cerebellum is highly uniform in its gross internal morphology and circuitry's organization across most vertebrates, there is evidence showing that the cerebellum may be organized in different functional modules. In that way, different regions or folia of the cerebellum would have different olivo-cortico-nuclear circuitries, forming, each one, a single cerebellar zone. Although it is not completely clear how these modules/zones are organized within the cerebellum, I think the authors could acknowledge this at the end of their discussion, and raise potential ideas for future studies (e.g., analyse folding of the cerebellum within the brain structure - vermis vs lateral cerebellum, for example). I think this would be a good way to emphasize the importance of the results of this study and what are the main questions remaining to be answered. For example, the expansion of the lateral cerebellum in mammals is suggested to be linked with the evolution of vocal learning in different clades (see Smaers et al., 2018). An interesting question would be to understand how foliation within the lateral cerebellum varies across mammalian clades and whether this has something to do with the cellular composition or any other aspect of the microanatomy as well as the evolution of different cognitive skills in mammals.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper studies color vision in anemonefish. The central conclusion of the paper is that anemonefish use signals from their UV cones to discriminate colors that would not otherwise be distinguishable; this differs from other fish in which UV cones extend the range of wavelengths of sensitivity but do not add a dimension to color vision. The work fits into a rich history of studies investigating how color vision fits into an animal's ecological niche. My primary concerns regard the microspectrophotometry data from single cones and some aspects of the presentation of the behavioral data.

      Microspectrophotometry

      The spectral properties of the cone types are a key issue for interpreting the results. These were measured using MSP, and fits are shown in Figure 2. The raw data shown in Fig. S1 appears more complicated than indicated in the main text. The templates miss the measurements across broad wavelength bands in each cone type. Particularly concerning is the high UV absorbance across cone types and the long-wavelength absorbance in the UV cone. It is not clear how this picture supports the relatively simple description of cone types and spectral sensitivities given in the main text and which forms the basis of the modeling.

      Microspectrophotometry is an inherently noise-prone measurement technique, particularly for very small photoreceptor outer segments such as that of single cones, which are also difficult to detect as intact, isolated (nonoverlapping) cells. As such, the absorbance curve fitting and derived lambda max (λmax) values should be treated as estimates. The accuracy of these estimates is adequate for this type of study, and visual modelling results have been shown to be robust against small errors (±10 nm λmax) in photoreceptor sensitivity for multiple species [see Lind, O. & Kelber, A. (2009). Vis Res. 49(15), 1939-1947; and Bitton, PP. et al. (2017). PLOS ONE, 12: e0169810]. We consider it highly unlikely that small shifts in cone λmax from measurement error would make a meaningful difference to the colour discrimination thresholds.

      It should be noted that the raw data shown in the original Supplementary Figure 1, included all scans overlain with an average absorbance curve for presentation purposes; however, the actual lambda max values for different cone types were measured and then averaged among individual scans fitted with photopigment absorbance curve templates. For clarity and transparency, we have now provided three multipaned plots (see Figure 1 – figure supplements 1-3) showing the individual pre- and post-bleach scans of absorbance spectra, fitted absorbance curve templates, and R2 values from the best visual pigment template fit.

      It is worth noting that most of the cone absorbance spectra found in our study closely resemble those in λmax and quality to those measured in another anemonefish species (Amphiprion akindynos) [see Supplementary Figure 1 in Stieb S. et al. (2019). Sci Rep. 9, 16459]. These cone λmax values can also be reconciled with previous estimates on opsin λmax based on amino acid sequences and cone opsin expression in the A. ocellaris retina characterised in Mitchell LJ et al. (2021). GBE, 13: evab184.

      Evidence that the unusual long-wavelength absorbance detected in a couple of the single cone (pre-bleach) measurements were not of visual pigment in origin comes from post-bleach scans, which showed their persistence (i.e., did not show a photobleaching response) and were likely instead contaminants (e.g., blood, RPE pigment). UV absorbance in some of the double cone measurements (above that expected of the prebleached beta peak from chromophore spectral absorption) can be attributed to either noise from scans as is quite typical of MSP and/or partial (accidental) bleaching from stray light sources. Although utmost care was taken to minimise contamination and unintended bleaching sometimes it is unavoidable.

      We refer the Reviewer to multiple published studies for further examples of typical MSP measurements that share similar levels of noise to ours e.g., see Figure 1 in Knott B. et al. (2013). JEB, 216:4454-4461; Figure 3 in Schott, RK et al. (2015). PNAS, 113(2): 356-361; Figure 2 in Dalton BE et al. (2014). Proc R Soc B. 281; Figure 5 in Tosetto, JE et al. (2021). Brain Behav Evol. 96: 103-123.

      Presentation

      The results are not presented in a straightforward way - at least for this reviewer. What is missing for me is a clear link between the psychometric curves in Figure 3A and the discrimination thresholds indicated in Figure 3B and Figure 4. Figure 3A is only discussed in the text on line 289 - after Figure 4 has been introduced and discussed. It would have been very helpful for me if the psychometric curves were first introduced and described, then the relation to Figure 3B was clearly indicated (perhaps with a single psychometric curve as an example). Similarly for Figure 4 the relationship between specific psychometric curves and the threshold plotted would be quite helpful. Currently it takes a careful reading to understand why being below the dashed line in Figure 4 is important.

      We have made the following changes, including the introduction of the psychometric curves earlier in the results (lines 236-249) and moved the psychometric function comparison before the mention of Figure 4. Additionally, to make the association between the plotted colour loci and psychometric curves clearer, we have added a smaller psychometric curve plot adjacent to the colour space (in Figure 3B) using red as an example which has an averaged psychometric curve overlying the individual fish curves. The figure caption (lines 250-274) explains that the plotted colour loci and given thresholds are mean values calculated from the individual fish behavioural data.

      We have also added a brief reminder that the theoretical limit of colour discrimination is predicted by the RNL model as 1∆S, where in our task fish should be just able to distinguish targets from grey distractors (see lines 222-224). To clarify, the plotted values in Figure 4B are both the individual fish thresholds (points) and average threshold (black bar) per colour set. The individual threshold values are taken at a correct choice probability of 50% from fitted psychometric curves of fish behavioural performance (shown in Figure 3A).

      RNL model

      The data is fit and interpreted in the context of the receptor noise limited model. The paragraph in the discussion about complementary color pairs suggests that this model is incorrect (text around line 332). Consideration of how the results depend on the RNL model is important, especially given the interpretation here.

      The inability of the RNL model to account for the observed asymmetry between color discrimination thresholds implies that they cannot be solely attributed to photoreceptor noise. We can therefore infer from the asymmetry that thresholds are set by a higher-level process, whether that involves post-receptor processes within the inner retina or in the brain remains to be investigated. As explained in lines 396-397 one possibility is that activation of the UV receptor suppresses noise in the visual pathway or enhances the saliency of colors for anemonefish. The high sensitivity to violet-green, which was found in all six of the fish tested, is consistent with the heightened saliency of this color (lines 397-399).

      Figure 3B

      This is the key figure in the paper. But several issues make seeing the data in this figure difficult. First, the important part of the figure is buried near the origin and hard to see. Can you show a surface that connects the thresholds in the different chromatic directions, or otherwise highlight the regions of discriminable and not discriminable colors?

      See previous comment. In short, we have taken the advice of the Reviewer and added highlighted areas around the regions of discriminable colors in Figure 3B to help visually separate them from the non-discriminable regions of colors (from grey). Additionally, we have added an inset showing an enlarged image of the area surrounding the centre of colour space.

      Reviewer #2 (Public Review):

      Mitchell and colleagues examined the contribution of a UV-sensitive cone photoreceptor to chromatic detection in Amphiprion ocellaris, a type of anemonefish. First, they used biophysical measurements to characterize the response properties of the retinal receptors, which come in four spectrally-distinct subtypes: UV, M1, M2, and L. They then used these spectral sensitivities to construct a 4-dimensional (tetrahedral) color space in which stimuli with known spectral power distributions can be represented according to the responses they elicit in the four cone types. A novel five-LED display was used to test the fish's ability to detect "chromatic" modulations in this color space against a background of random-intensity, "achromatic" distractors that produce roughly equal relative responses in the four cone types. A subset of stimuli, defined by their high positive UV contrast, were more readily detected than other colors that contained less UV information. A well-established model was used to link calculated receptor responses to behavioral thresholds. This framework also enabled statistical comparisons between models with varying number of cone types contributing to discrimination performance, allowing inferences to be drawn about the dimensionality of color vision in anemonefish.

      The authors make a compelling case for how UV light in the anemonefish habitat is likely an important ecological source of information for guiding their behavior. The authors are to be commended for developing an elegant behavioral paradigm to assess visual performance and for incorporating a novel display device especially suited to addressing hypotheses about the role of UV light in color perception. While the data are suggestive of behavioral tetrachromacy in anemonefish, there are some aspects of the study that warrant additional consideration:

      1) One challenge faced by many biological imaging systems is longitudinal chromatic aberration (LCA) - that is, the focal power of the system depends on wavelength. In general, focal power increases with decreasing wavelength, such that shorter wavelengths tend to focus in front of longer wavelengths. In the human eye, at least, this focal power changes nonlinearly with wavelength, with the steepest changes occurring in the shorter part of the visible spectrum (Atchison & Smith, 2005). In the fish eye, where the visible spectrum extends to even shorter wavelengths, it seems plausible that a considerable amount of LCA may exist, which could in turn cause UV-enriched stimuli to be more salient (relative to the distractor pixels) due to differences in perceived focus rather than due solely to differences in their respective spectral compositions. Such a mechanism has been proposed by Stubbs & Stubbs (2016) as a means for supporting "color vision" in monochromatic cephalopods (but see Gagnon et al. 2016). It would be worth discussing what is known about the dispersive properties of the crystalline lens in A. ocellaris (or similar species), and whether optical factors could produce sufficient cues in the retinal image that might explain aspects of the behavioral data presented in the current study.

      This is an interesting point, and we appreciate the reviewer’s thoughtful comment regarding this topic especially as LCA increases exponentially in the UV. Although we certainly cannot disprove such a mechanism in the present study, we are highly sceptical that LCA could be used by reef fish and is involved in the heightened saliency of UV stimuli. Previous work has found that LCA is mostly corrected for in the teleost retina of both marine and freshwater species by graded, multifocal lenses that focus different wavelengths at the same depth as their maximally sensitive cone photoreceptors [e.g., for evidence in African cichlids see Kröger, R. H. H. et al. (1999). J Comp Physiol. A, 184, 361-369; Malkki, P. E. & Kröger, R. H. H. (2005). J Opt. A, 7, 691-700; and for various reef fishes see Karpestam, B. et al. (2007). J Exp Biol., 210, 16: 2923-2931]. In essence, LCA is corrected in the eyes of many teleosts by accurately tuning longitudinal spherical aberration through having a graded density lens. We draw particular attention to the latter reference which comparatively examined the optical properties of reef fish lenses, including diurnal, planktivorous damselfishes (from the same family as anemonefishes, Pomacentridae). They found that not only were the lenses of these species highly UV-transmissive (as we show in anemonefish), but all were multifocal and capable of focusing both visible (non-UV) and UV wavelengths. Considering the coastal cephalopod species examined thus far, all of them contain only one type of visual pigment which is packed in their long photoreceptor (150-450µm long outer segment) across an entire retina (Chung and Marshall 2016, Proceeding B). Theoretically, given these long photoreceptors, the LCA and the resulting differentials of focal length onto different patches of photoreceptors or different depth of the outer segment might provide cues for colour discrimination even though no behavioural evidence exists to prove this hypothesis yet. Unlike the cephalopod case, the four specific spectral cones arranged in a mosaic pattern along with their very short outer segments (5-10µm) in the anemonefish retina likely makes the LCA less effective in this retinal design.

      We have added a short paragraph (Lines 400-412) discussing the possibility of an optical mechanism contributing to heightened UV saliency with a particular focus on LCA and our thoughts on why we consider it an unlikely mechanism in anemonefish.

      2) The authors provide a quantitative description of anemonefish visual performance within the context of a well-developed receptor-based framework. However, it was less clear to me what inferences (if any) can be drawn from these data about the post-receptoral mechanisms that support tetrachromatic color vision in these organisms. Would specific cone-opponent processes account for instances where behavioral data diverged from predictions generated with the "receptor noise limited" model described in the text? The general reader may benefit from more discussion centered on what is known (or unknown) about the organization of cone-opponent processing in anemonefish and related species.

      In short, we do not know the specific opponent interactions of anemonefish cones. The RNL model assumes all possible opponent interactions in its calculations. From our results, very little can be said about the post-receptor mechanisms involved in their putative tetrachromatic vision. We would like to avoid overreaching beyond what our data can show. A future directions section has now been added to the discussion (lines 467-497), which briefly mentions the known UV opponency in larval zebrafish and that future investigation in anemonefish should attempt to disentangle the specific opponent (chromatic) and non-opponent (achromatic) circuits in the anemonefish retina.

      Reviewer #3 (Public Review):

      The comments below focus mainly on ways that the data and analysis as currently present do not to this reviewer compel the conclusions the authors wish to draw. It is possible that further analysis and/or clarification in the presentation would more persuasively bolster the authors' position. It also seems possible that a presentation with more limited conclusions but clarity on exactly what has been demonstrated and where additional future work is needed would make a strong contribution to the literature.

      • Fig 3A. It might be worth emphasizing a bit more explicitly that the x-axis (delta S) is the result of a model fit to the data being shown, since this then means that if RNL model fit the data perfectly, all of the thresholds would fall at deltaS = 1. They don't, so I would like to see some evaluation from the authors' experience with this model as to whether they think the deviations (looks like the delta S range is ~0.4 to ~1.6 in Figure 4B) represent important deviations of the data from the model, the non-significant ANOVA notwithstanding. For example, Figure 4B suggests that the sign of the fit deviations is driven by the sign of the UV contrast and that this is systematic, something that would not be picked up by the ANOVA. Quite a bit is made of the deviations below, but that the model doesn't fully account for the data should be brought out here I think. As the authors note elsewhere, deviations of the data from the RNL model indicate that factors other than receptor noise are at play, and reminding the reader of this here at the first point it becomes clear would be helpful.

      We have now stated more explicitly in the figure caption for Figure 3A, that the delta S values presented were calculated by fitting fish behavioral data to the RNL model. To test the overall effect that the sign of the UV contrast had on the discrimination threshold, we have now included ‘contrast’ (positive or negative) as another fixed effect in the linear mixed effects model. We have now included details of this test in the results which shows the systematic effect (lines 338-340). Additionally, as suggested we now briefly introduce in the results the idea that factors other than receptor noise are causing the observed deviations in data from the RNL model.

      • Line 217 ff, Figure 4, Supplemental Figure 4). If I'm understanding what the ANOVA is telling us, it is that the deviations of the data across color directions and fish (I think these are the two factors based on line 649) is that the predictions deviate significantly from the data, relative to the inter-fish variability), for the trichromatic models but not the tetrachromatic model. If that's not correct, please interpret this comment to mean that more explanation of the logic of the test would be helpful.

      The interpretation of the ANOVA by the Reviewer is mostly correct. We had the variables color set and Fish ID, with threshold delta S as the dependent variable. This showed that deviations from the predicted threshold were significant relative to the inter-fish variability for the trichromatic models. Missing details describing the ANOVA have now been added to the methods (lines 789-798).

      Assuming that the above is right about the nature of the test, then I don't think the fact that the tetrachromatic model has an additional parameter (noise level for the added receptor type) is being taken into account in the model comparison. That is, the trichromatic models are all subsets of the tetrachromatic model, and must necessarily fit the data worse. What we want to know is whether the tetrachromatic model is fitting better because its extra parameter is allowing it to account for measurement noise (overfitting), or whether it is really doing a better job accounting for systematic features of the data. This comparison requires some method of taking the different number of parameters into account, and I don't think the ANOVA is doing that work. If the models being compared were nested linear models, than an F-ratio test could be deployed, but even this doesn't seem like what is being done. And the RNL model is not linear in its parameters, so I don't think that would be the right model comparison test in any case.

      Typical model comparison approaches would include a likelihood ratio test, AIC/BIC sorts of comparisons, or a cross-validation approach.

      If the authors feel their current method does persuasively handle the model comparison, how it does so needs to be brought out more carefully in the manuscript, since one of the central conclusions of the work hinges at least in part on the appropriateness of such a statistical comparison.

      Our visual model comparisons were aimed at assessing whether a trichromatic or tetrachromatic model best fit the colour discrimination data. The trichromatic and tetrachromatic models assume two and three opponency pathways, respectively. If the fish were not tetrachromatic, and instead trichromatic, then we would expect that the RNL model should better fit the data with two opponency mechanisms (rather than three). Our reason for making this assessment, is because of the possibility that not all the cones could be contributing to colour vision and could be used exclusively for achromatic tasks (e.g., luminance vision or motion detection). However, according to our finding that the data best fit the tetrachromatic model (i.e., how the behavioural discrimination thresholds more closely fitted the theoretical prediction of 1∆S), it is likely that anemonefish used all four cones for colour vision.

      We have also now repeated our analysis using unweighed delta S values which are calculated using general n-dimensional models of colour vision (using the PAVO2 package). These models essentially follow the same initial steps followed by the RNL model (and many others) but omit the receptor noise correction stage. After comparing (using ANOVA, see lines 303-311) the predicted thresholds with the data in this non-RNL space, it was found that again the tetrachromatic model predictions did not deviate significantly from the data relative to individual fish performance; however, we also found that the trichromatic model without M2 cone input no longer differed from the predicted values. In this case, it seems that the extra noise parameter did contribute to the difference in fit. Whether this is a biologically meaningful comparison (as all photoreceptors contain noise) is an open question. We have added a short statement explicitly framing our interpretation of anemonefish having a 3-D colour space to being in accordance with the closeness of RNL model predictions (lines 370-371, 506-508).

      • Also on the general point on conclusions drawn from the model fits, it seems important to note that rejecting a trichromatic version of the RNL model is not the same as rejecting all trichromatic models. For example, a trichromatic model that postulates limiting noise added after a set of opponent transformations will make predictions that are not nested within those of RNL trichromatic models. This point seems particularly important given the systematic failures of even the tetrachromatic version of the RNL model.

      This is a good point. We have limited our conclusions to specifically address trichromatic models generated within the framework of the RNL model by adding in the conclusion section that fish psychophysical thresholds were best explained by the RNL model when all four cone types contributed to colour vision (see lines 370-371, 506-508). In this same sentence, we have also added in parentheses that “suggesting (but not proving) tetrachromacy” (line 508). We have also edited the abstract to state that our results were “…best described by a tetrachromatic model using all four cone types…”, rather than stating we have shown tetrachromacy (lines 36-37).

      • More generally, attempts to decide whether some human observers exhibit tetrachromacy have taught us how hard this is to do. Two issues, beyond the above, are the following. 1) If the properties of a trichromatic visual system vary across the retina, then by imaging stimuli on different parts of the visual field an observer can in principle make tetrachromatic discriminations even though visual system is locally trichromatic at each retinal location. 2) When trying to show that there is no direction in a tetrachromatic receptor space to which the observer is blind, a lot of color directions need to be sampled. Here, 9 directions are studied. Is that enough? How would we know? The following paper may be of interest in this regard: Horiguchi, Hiroshi, Jonathan Winawer, Robert F. Dougherty, and Brian A. Wandell. "Human trichromacy revisited." Proceedings of the National Academy of Sciences 110, no. 3 (2013): E260-E269. Although I'm not suggesting that the authors conduct additional experiments to try to address these points, I do think they need to be discussed. We agree with the reviewer, that colour discriminability achieved by tetrachromatic vision could in theory be achieved by the combined effect of localised, distinct forms of trichromacy. Evidence in other fishes suggests that such multiple forms of trichromacy across the retina likely exist in many species. However, the behavioural effects of this retinal setup remain to be studied likely due to its extremely difficult nature. We have added a new section titled “future directions” (Lines 474-489), in which we discuss the possibility that distinct forms of trichromacy in the anemonefish retina could in theory achieve colour discrimination on par with tetrachromatic vision. We also give suggestions on how this could be investigated.

      Although we tried to include as many colour directions as practically possible in our experiment, we have certainly not provided an exhaustive range that completely encompasses anemonefish colour space. Whether 9 colour directions are adequate to assess the dimensionality of their color vision is difficult to say. As addressed in the previous comment, we now acknowledge this limitation by refining our conclusion, saying that our results do not prove tetrachromacy.

      • Line 277 ff. After reading through the paper several times, I remain unsure about what the authors regard as their compelling evidence that the UV cone has a higher sensitivity or makes an omnibus higher contribution to sensitivity than other cones (as stated in various forms in the title, Lines 37-41, 56-57, 125, 313, 352 and perhaps elsewhere).

      At first, I thought they key point was that the receptor noise inferred via the RNL model as slightly lower (0.11) for the UV cone than for the double cones (0.14). And this is the argument made explicitly at line 326 of the discussion. But if this is the argument, what needs to be shown is that the data reject a tetrachromatic version of the RNL model where the noise value of all the cones is locked to be the same (or something similar), with the analysis taking into account the fewer parametric degrees of freedom where the noise parameters are so constrained. That is, a careful model comparison analysis would be needed. Such an analysis is not presented that I see, and I need more convincing that the difference between 0.11 and 0.14 is a real effect driven by the data. Also, I am not sanguine that the parameters of a model that in some systematic ways fails to fit the data should be taken as characterizing properties of the receptors themselves (as sometimes seems to be stated as the conclusion we should draw).

      We have performed various modelling scenarios where receptor noise was adjusted for each channel; however, the UV channel was consistently found to be more sensitive than the other channels. In (the original) Supplementary Figure 6 (now Figure 4 – figure supplements 1 and 2), we show predicted dS values calculated using receptor noise levels in the exact manner that the Reviewer suggests by ranging from 0.05 to 0.15, and most importantly, included scenarios where receptor noise was held equal across cone types and others where it was varied between single cones and double cones. None of the models adjusted the data so that sensitivity was equal across all four channels, which means that by an unknown mechanism, the UV channel is more sensitive, but this is unrelated to noise levels. Our best-fit receptor noise values of 0.11 (for single cones) and 0.14 (for double cones) are estimate values and should be treated as such till actual receptor noise measurements are made.

      Then, I thought maybe the argument is not that the noise levels differ, but rather that the failures of the model are in the direction of thresholds being under predicted for discriminations that involve UV cone signals. That's what seems to be being argued here at lines 277 ff, and then again at lines 328 ff of the discussion. But then the argument as I read it more detail in both places switches from being about the UV cones per se to being about postive versus negative UV contrast. That's fine, but it's distinct from an argument that favors omnibus enhanced UV sensitivity, since both the UV increments and decrements are conveyed by the UV cone; it's an argument for differential sensitivity for increments versus decrements in UV mediated discriminations. The authors get to this on lines 334 of the discussion, but if the point is an increment/decrement asymmetry the title and many of the terser earlier assertions should be reworked to be consistent with what is shown.

      To clarify our argument, we found that the colour discrimination thresholds were systematically lower than predicted by the RNL model for colours which elicited higher UV cone stimulation relative to other cone types. These colours we refer to as UV positive based on the sign direction of their contrast against grey distractors produced by higher UV/V LED channel (i.e., in a positive direction). Whereas colours with UV negative chromatic contrast had lower UV cone stimulation relative to the other cone types. Therefore, our interpretation of the importance of UV cone signals for colour discrimination are congruent with the results. In the discussion, we suggest a possibility that activation of the UV receptor suppresses noise downstream in the visual pathway or enhances the saliency of colours (see lines 397-398). This activation of the UV receptor would, of course, be at its highest for colours with positive UV chromatic contrast.

      Note that we have added to the discussion the possibility that colour preferences or a difference in attentiveness might have contributed to differences in discrimination thresholds (see discussion lines 412-413, 427-428, 433-435, 456-466, and 469-473). However, we consider it a less likely explanation due to a couple of reasons, including 1) a lack of difference in responsiveness across colour sets in their timing to peck the target, and 2) any non-learnt bias would have likely been overridden or at least weakened by training prior to the experiment where colours were rewarded equally (see lines 462-466).

      We have edited the results (lines 334-352) to make our point clearer and by changing the subtitle to be more explicit: “Lower discrimination thresholds induced by positive UV contrast”. The subsection begins by explaining the different types of UV chromatic contrast by elevation angle and, finally, how this division among colour sets was a major determinant of colour discrimination thresholds.

      Perhaps the argument with respect to model deviations and UV contrast independent of sign could be elaborated to show more systematically that the way the covariation with the contrasts of the other cone stimulations in the stimulus set goes, the data do favor deviations from the RNL in the direction of enhanced sensitivity to UV cone signals, but if this is the intent I think the authors need to think more about how to present the data in a manner that makes it more compelling than currently, and walk the reader carefully through the argument.

      We have added to the results the linear mixed-effects model output with ‘contrast’ (positive/negative) added as a fixed effect. This analysis shows that the sign direction of UV contrast was a strong predictor of threshold (see address to previous comments and lines 399-401, 790-799).

      • On this point, if the authors decide to stick with the enhanced UV sensitivity argument in the revision, a bit more care about what is meant by "the UV cone has a comparatively high sensitivity (line 313 and throughout)" needs more unpacking. If it is that these cones have lower inferred noise (in the context of a model that doesn't account for at least some aspects of the data), is this because of properties of the UV cones, or the way that post-receptoral processing handles the signals from these cones mimicking a cone effect in the model. And if it is thought that it is because of properties of the cones, some discussion of what those properties might be would be helpful. As I understand the RNL model, relative numbers of cones of each type are taken into account, so it isn't that. But could it be something as simple as higher photopigment density or larger entrance aperture (thus more quantum catches and higher SNR)?

      It is unknown what aspect of the cone morphology or physiology sets the activation or inactivation threshold. Electrophysiological data collected from the UV cones of other fish species e.g., in goldfish and zebrafish [see Hawryshyn & Beauchamp (1985). 25, Vis Res.; and Yoshimatsu et al. (2020). 107, Neuron.] show that they have exceptionally high sensitivity. What has not been shown is that having a UV cone can improve colour discrimination.

      Previous quantitative cone opsin gene expression analysis showed that the single cone opsins (SWS1 and SWS2B) are expressed at lower levels than all double cone opsin genes. This difference in expression combined with the smaller size of single cone outer segments than the double cones make it unlikely that a larger photoreceptor size, higher volume or packing density of visual pigment is responsible. Contrary to our findings, these aspects of the different cone types (if they had an effect) would instead predict that double cones have a higher SNR, and non-UV colours would be more discriminable. We have now added these details to the discussion (see lines 391-397).

      • Line 288 ff. The fact that the slopes of the psychometric functions differed across color directions is, I think, a failure of the RNL model to describe this aspect of the data, and tells us that a simple summary of what happens for thresholds at delta S = 1 does not generalize across color directions for other performance levels. Since one of the directions where the slope is shallower is the UV direction, this fact would seem to place serious limits on the claim that discrimination in the UV direction is enhanced relative to other directions, but it goes by here without comment along those lines. Some comment here, both about implications for fit of RNL model and about implications for generalizations about efficacy of UV receptor mediated discrimination and UV increment/decrement asymmetries, seems important.

      The variation in the psychometric functions is difficult to interpret and cannot be explained by the RNL model. What the RNL model predicts is delta S based on low level factors (namely receptor noise). In the discussion, we completely agree with the notion that the asymmetry in thresholds from predicted values, and the variation in psychometric slopes cannot be explained by the RNL model, e.g., this is heavily implied by “colour discrimination thresholds cannot be directly attributed to noise in the early stages of the visual pathway…” (lines 388-390). To clarify the inability of the RNL model to account for this aspect of the data, we have included a statement (see line 390).

      It is a good point that this could be an indication of heterogeneity in colour space. Heterogeneity in discrimination thresholds across animal colour space (both surrounding the threshold area and for more saturated regions) has been explored in detail using trichromatic triggerfish by Green N. F. et al. (2022). JEB, 7(225):jeb243533. We have added this idea to the discussion (see lines 490-498). For UV, it seems that two of the five fish (#34 and 20) had noticeably shallower curves than the others tested for UV (fish #19, 33, 36). Both also varied more in their ability to distinguish targets, as shown by their wider confidence intervals. One of these two fish (#34) was retested for UV at the end of the experiment, and in the secondary assessment had a steeper psychometric curve more in line with the other fish in the experiment (see Figure 3 – figure supplement 1 and added lines 247-250). Based on this discrepancy in performance between assessments, it is also possible that individual learning effects had a role in impacting the shape of the psychometric curve. Note, this had minimal effect on colour discrimination thresholds and any differences were in the direction of change observed across colour sets in the experiment (i.e., lower dS for UV positive directions).

      • Line 357 ff. Up until this point, all of the discussion of differences in threshold across stimulus sets has been in terms of sensitivity. Here the authors (correctly) raise the possibility that a difference in "preference" across stimulus sets could drive the difference in thresholds as measured. Although the discussion is interesting and germaine, it does to some extent further undercut the security of conclusions about differential sensitivity across color directions relative to the RNL model predictions, and that should be brought out for the reader here. The authors might also discuss about how a future experiment might differentiate between a preference explanation and a sensitivity explanation of threshold differences.

      We have now added a paragraph (see lines 469-473) discussing that future work should test for color preferences and suggest how this could be done using a similar foraging task. We also include our thoughts immediately prior on why it is unlikely that a colour preference was a major contribution towards the results. In short, we consider it unlikely as fish showed no evidence of reduced latency for pecking at targets across the colour sets and because the training regime prior to the experiment equally rewarded fish for all colours and would likely have overridden a strong preference (at least in this specific foraging context).

      • RNL model. The paper cites a lot of earlier work that used the RNL model, but I think many readers will not be familiar with it. A bit more descriptive prose would be helpful, and particularly noting that in the full dimensional receptor space, if the limiting noise at the photoreceptors is Gaussian, then the isothreshold contour will be a hyper-ellipsoid with its axes aligned with the receptor directions.

      There is now added explanation of the RNL model (see lines 141-151), particularly on its assumptions that it only receives chromatic input and that discrimination is limited by noise arising in the photoreceptors and not by any specific opponent mechanisms. We also added the mention of the expected hyper-ellipsoid shape of isothreshold contours if receptor noise is Gaussian. Note, while we appreciate the importance of the reader to understand the basic functionality of the model, we wanted to avoid overloading the introduction with details on the RNL model which is not the focus of the paper. The RNL model is well-established in the field of visual ecology and animal vision research for well over a decade and has been thoroughly dissected by previous methodological reviews. We refer to one of these more recent reviews by Olsson et al. (2018) Behav Ecol. 29(2):273-282, and direct the reader to the methods section for further details on the RNL model.

      • Use of cone isolating stimuli? For showing that all four cone classes contribute to what the authors call color discrimination, a more direct approach would seem to be to use stimuli that target stimulation of only one class of cone at a time. This might require a modified design in which the distractors and target were shown against a uniform background and approximately matched in their estimated effect on a putative achromatic mechanism. Did the authors consider this approach, and more generally could they discuss what they see as its advantages and disadvantages for future work.

      The Reviewer is correct in that a targeted approach of isolated cone stimulation would be the optimal approach to demonstrating tetrachromatic colour vision. However, the extreme spectral overlap in the absorption curves of anemonefish cones, particularly in the mid-wavelength region makes this problematic in using the current LED display. We added to the discussion ways that this could be studied in the future (see lines 474-489). This might be possible (but still challenging) using a monochromator, but such technology severely limits the diversity of stimuli which can be created and usually restricts experiments to a simple paired choice design (or grey card experiment). The traditional paired choice experiment requires animals to be trained to distinguish a specific colour, while the Ishihara-like task trains animals to distinguish targets using an odd-one-out approach. This latter approach is highly efficient, as it does not require retraining when testing a new colour (i.e., fish learnt the task not a specific colour). Here, we wanted to assess colour discrimination in multiple directions to compare performance, and the flexible LED display combined with a generalisable task was important.

      The above assumes that anemonefish do not use multiple trichromatic systems. In which case, the use of standard experimental stimuli (e.g., a monochromator, an LED display) would be unsuitable as they illuminate the whole retina. To definitively test the range of opponent interactions, it would be necessary to make electrophysiological measurements targeting the transmitting neurons using a retinal multielectrode array (MEA) approach or by in-vivo calcium imaging (lines 484-486).

      We understand that our results are not a direct test of the dimensionality of anemonefish colour vision and should not be interpreted as such, as we do not have direct evidence of tetrachromacy. To recognize this limitation of our data, we have drawn back some of our conclusive statements that claimed to have demonstrated tetrachromacy.

    2. Reviewer #3 (Public Review):

      The comments below focus mainly on ways that the data and analysis as currently present do not to this reviewer compel the conclusions the authors wish to draw. It is possible that further analysis and/or clarification in the presentation would more persuasively bolster the authors' position. It also seems possible that a presentation with more limited conclusions but clarity on exactly what has been demonstrated and where additional future work is needed would make a strong contribution to the literature.

      * Fig 3A. It might be worth emphasizing a bit more explicitly that the x-axis (delta S) is the result of a model fit to the data being shown, since this then means that if RNL model fit the data perfectly, all of the thresholds would fall at deltaS = 1. They don't, so I would like to see some evaluation from the authors' experience with this model as to whether they think the deviations (looks like the delta S range is ~0.4 to ~1.6 in Figure 4B) represent important deviations of the data from the model, the non-significant ANOVA notwithstanding. For example, Figure 4B suggests that the sign of the fit deviations is driven by the sign of the UV contrast and that this is systematic, something that would not be picked up by the ANOVA. Quite a bit is made of the deviations below, but that the model doesn't fully account for the data should be brought out here I think. As the authors note elsewhere, deviations of the data from the RNL model indicate that factors other than receptor noise are at play, and reminding the reader of this here at the first point it becomes clear would be helpful.

      * Line 217 ff, Figure 4, Supplemental Figure 4). If I'm understanding what the ANOVA is telling us, it is that the deviations of the data across color directions and fish (I think these are the two factors based on line 649) is that the predictions deviate significantly from the data, relative to the inter-fish variability), for the trichromatic models but not the tetrachromatic model. If that's not correct, please interpret this comment to mean that more explanation of the logic of the test would be helpful.

      Assuming that the above is right about the nature of the test, then I don't think the fact that the tetrachromatic model has an additional parameter (noise level for the added receptor type) is being taken into account in the model comparison. That is, the trichromatic models are all subsets of the tetrachromatic model, and must necessarily fit the data worse. What we want to know is whether the tetrachromatic model is fitting better because its extra parameter is allowing it to account for measurement noise (overfitting), or whether it is really doing a better job accounting for systematic features of the data. This comparison requires some method of taking the different number of parameters into account, and I don't think the ANOVA is doing that work. If the models being compared were nested linear models, than an F-ratio test could be deployed, but even this doesn't seem like what is being done. And the RNL model is not linear in its parameters, so I don't think that would be the right model comparison test in any case.

      Typical model comparison approaches would include a likelihood ratio test, AIC/BIC sorts of comparisons, or a cross-validation approach.

      If the authors feel their current method does persuasively handle the model comparison, how it does so needs to be brought out more carefully in the manuscript, since one of the central conclusions of the work hinges at least in part on the appropriateness of such a statistical comparison.

      * Also on the general point on conclusions drawn from the model fits, it seems important to note that rejecting a trichromatic version of the RNL model is not the same as rejecting all trichromatic models. For example, a trichromatic model that postulates limiting noise added after a set of opponent transformations will make predictions that are not nested within those of RNL trichromatic models. This point seems particularly important given the systematic failures of even the tetrachromatic version of the RNL model.

      * More generally, attempts to decide whether some human observers exhibit tetrachromacy have taught us how hard this is to do. Two issues, beyond the above, are the following. 1) If the properties of a trichromatic visual system vary across the retina, then by imaging stimuli on different parts of the visual field an observer can in principle make tetrachromatic discriminations even though visual system is locally trichromatic at each retinal location. 2) When trying to show that there is no direction in a tetrachromatic receptor space to which the observer is blind, a lot of color directions need to be sampled. Here, 9 directions are studied. Is that enough? How would we know? The following paper may be of interest in this regard: Horiguchi, Hiroshi, Jonathan Winawer, Robert F. Dougherty, and Brian A. Wandell. "Human trichromacy revisited." Proceedings of the National Academy of Sciences 110, no. 3 (2013): E260-E269. Although I'm not suggesting that the authors conduct additional experiments to try to address these points, I do think they need to be discussed.

      * Line 277 ff. After reading through the paper several times, I remain unsure about what the authors regard as their compelling evidence that the UV cone has a higher sensitivity or makes an omnibus higher contribution to sensitivity than other cones (as stated in various forms in the title, Lines 37-41, 56-57, 125, 313, 352 and perhaps elsewhere).

      At first, I thought they key point was that the receptor noise inferred via the RNL model as slightly lower (0.11) for the UV cone than for the double cones (0.14). And this is the argument made explicitly at line 326 of the discussion. But if this is the argument, what needs to be shown is that the data reject a tetrachromatic version of the RNL model where the noise value of all the cones is locked to be the same (or something similar), with the analysis taking into account the fewer parametric degrees of freedom where the noise parameters are so constrained. That is, a careful model comparison analysis would be needed. Such an analysis is not presented that I see, and I need more convincing that the difference between 0.11 and 0.14 is a real effect driven by the data. Also, I am not sanguine that the parameters of a model that in some systematic ways fails to fit the data should be taken as characterizing properties of the receptors themselves (as sometimes seems to be stated as the conclusion we should draw).

      Then, I thought maybe the argument is not that the noise levels differ, but rather that the failures of the model are in the direction of thresholds being under predicted for discriminations that involve UV cone signals. That's what seems to be being argued here at lines 277 ff, and then again at lines 328 ff of the discussion. But then the argument as I read it more detail in both places switches from being about the UV cones per se to being about postive versus negative UV contrast. That's fine, but it's distinct from an argument that favors omnibus enhanced UV sensitivity, since both the UV increments and decrements are conveyed by the UV cone; it's an argument for differential sensitivity for increments versus decrements in UV mediated discriminations. The authors get to this on lines 334 of the discussion, but if the point is an increment/decrement asymmetry the title and many of the terser earlier assertions should be reworked to be consistent with what is shown.

      Perhaps the argument with respect to model deviations and UV contrast independent of sign could be elaborated to show more systematically that the way the covariation with the contrasts of the other cone stimulations in the stimulus set goes, the data do favor deviations from the RNL in the direction of enhanced sensitivity to UV cone signals, but if this is the intent I think the authors need to think more about how to present the data in a manner that makes it more compelling than currently, and walk the reader carefully through the argument.

      * On this point, if the authors decide to stick with the enhanced UV sensitivity argument in the revision, a bit more care about what is meant by "the UV cone has a comparatively high sensitivity (line 313 and throughout)" needs more unpacking. If it is that these cones have lower inferred noise (in the context of a model that doesn't account for at least some aspects of the data), is this because of properties of the UV cones, or the way that post-receptoral processing handles the signals from these cones mimicking a cone effect in the model. And if it is thought that it is because of properties of the cones, some discussion of what those properties might be would be helpful. As I understand the RNL model, relative numbers of cones of each type are taken into account, so it isn't that. But could it be something as simple as higher photopigment density or larger entrance aperture (thus more quantum catches and higher SNR)?

      * Line 288 ff. The fact that the slopes of the psychometric functions differed across color directions is, I think, a failure of the RNL model to describe this aspect of the data, and tells us that a simple summary of what happens for thresholds at delta S = 1 does not generalize across color directions for other performance levels. Since one of the directions where the slope is shallower is the UV direction, this fact would seem to place serious limits on the claim that discrimination in the UV direction is enhanced relative to other directions, but it goes by here without comment along those lines. Some comment here, both about implications for fit of RNL model and about implications for generalizations about efficacy of UV receptor mediated discrimination and UV increment/decrement asymmetries, seems important.

      * Line 357 ff. Up until this point, all of the discussion of differences in threshold across stimulus sets has been in terms of sensitivity. Here the authors (correctly) raise the possibility that a difference in "preference" across stimulus sets could drive the difference in thresholds as measured. Although the discussion is interesting and germaine, it does to some extent further undercut the security of conclusions about differential sensitivity across color directions relative to the RNL model predictions, and that should be brought out for the reader here. The authors might also discuss about how a future experiment might differentiate between a preference explanation and a sensitivity explanation of threshold differences.

      * RNL model. The paper cites a lot of earlier work that used the RNL model, but I think many readers will not be familiar with it. A bit more descriptive prose would be helpful, and particularly noting that in the full dimensional receptor space, if the limiting noise at the photoreceptors is Gaussian, then the isothreshold contour will be a hyper-ellipsoid with its axes aligned with the receptor directions.

      * Use of cone isolating stimuli? For showing that all four cone classes contribute to what the authors call color discrimination, a more direct approach would seem to be to use stimuli that target stimulation of only one class of cone at a time. This might require a modified design in which the distractors and target were shown against a uniform background and approximately matched in their estimated effect on a putative achromatic mechanism. Did the authors consider this approach, and more generally could they discuss what they see as its advantages and disadvantages for future work.

    1. Author Response

      Reviewer #1 (Public Review):

      Precise regulation of gamete fusion ensures that offspring will have the same ploidy as the parents. However, breaking this regulation can be useful for plant breeding. Haploid induction followed by chemical-induced genome doubling can be used to fix desirable genotypes, while triparental hybrids where two sperm cells with two different genotypes fertilize an egg cell can be advantageous for bypassing hybridization barriers to create interspecies hybrids with increased fitness. This manuscript follows up on a previous study from the same research group that used a clever high throughput polyspermy detection assay (HIPOD) to show that wild-type Arabidopsis naturally forms triparental hybrids at very low frequencies (less than 0.05% of progeny) and that these triparental hybrids can bypass dosage barriers in the endosperm (Nakel, et al., 2017). Mao and co-authors hypothesized that mutants that conferred polytubey, the attraction of multiple pollen tubes by mutant female gametophytes, would also increase the rate of triparental hybrids. They used a double mutant in the endopeptidase genes ECS1 and ECS2 which had previously been reported to induce supernumerary pollen tube attraction to test this hypothesis with their two-component HIPOD system in which one pollen donor constitutively expresses the mGAL4-VP16 transcription factor while the second pollen donor carries an herbicide resistance gene regulated by the GAL4-responsive UAS promoter. Triparental hybrids are detected as herbicide-resistant progeny from wild-type Arabidopsis flowers that have been pollinated by the two paternal genotypes. The authors convincingly show that the ecs1 ecs2-1 double mutant more than doubled the frequency of triparental, triploid hybrids in HIPOD crosses. They next tested the hypothesis that this increase in triparental hybrids was due to a gametophytic effect by using an ecs1-/- ecs2-1/ECS2 maternal parent in the HIPOD assay and testing whether the ecs2-1 mutant allele was preferentially inherited in triparental hybrids. The mutant allele was inherited at a much higher rate than expected, confirming their hypothesis.

      The triparental hybrid results with the ecs1 ecs2 mutant were not that surprising since the presence of extra sperm cells gives more opportunities for triparental hybrids to form, especially if gamete fusion is misregulated. However, an unexpected result came when the authors used aniline blue staining to analyze the ecs1 ecs2 polytubey phenotype. They confirmed that the double mutant had increased levels of polytubey compared to wild-type ovules, but they also noticed that 13% of seeds were not developing normally. This phenotype was confirmed with a second ecs2 allele and was complemented with both ECS1 and ECS2 transgenes under their native promoters. Microscopic analysis revealed normal gametophyte morphology before fertilization, but 8% of pollinated ovules failed to develop an embryo and 7% failed to develop endosperm, suggesting single fertilization events. In a logical set of experiments, they followed up on this result by crossing ecs1 ecs2 with pollen carrying a fluorescent reporter that would be expressed in developing embryos and endosperm. In this experiment, they were again surprised. Some of the wild-type-looking seeds lacked a paternal contribution (i.e. no fluorescent signal from the paternal reporter construct) in the embryo. This prompted them to look more closely at the progeny, upon which they detected small plants that were haploid. They confirmed the haploid nature by chromosome spreads. Finally, they used interaccession crosses between ecs1 ecs2 (Col-0) and Landsberg to verify that haploid progeny only carried maternal alleles of markers on all five chromosomes, indicating that the ecs1 ecs2 genotype can induce maternal haploids.

      This interesting study highlights the importance of following up on unexpected results. The conclusions are well-supported by the data and quite exciting. Paternal haploid inducers have been discovered in several species, but this is one of only two examples of maternal haploid induction. While the percentage of maternal haploids is very low, this phenomenon could be useful for plant breeding.

      Weaknesses

      The data in the manuscript is intriguing, but the question of how the same mutant combination promotes the formation of both triploid and haploid progeny remains unanswered and is not thoroughly discussed, nor is any model suggested for how the ECS1/2 peptidases could play a role in regulating gamete fusion and/or repressing parthenogenesis. A second unanswered question is whether the maternal haploids are a result of failed plasmogamy or karyogamy between the egg and sperm leading to parthenogenesis or a result of paternal genome elimination after plasmogamy. In figure 3B, the authors attempted to test whether plasmogamy occurs between the male and female gametes in ecs1 ecs2 ovules by crosses with pollen that expresses a mitochondrial marker under control of the pRPS5a promoter which is active in sperm cells as well as embryos and endosperm of fertilized ovules. This experiment allowed them to detect sperm cells that had not fused with the egg and central cell at 2 days after pollination. They also counted the percentage of seeds that expressed the mitochondrial marker in both embryo and endosperm at 2 DAP and found that ecs1 ecs2 mutants had a 20% reduction of visible mitochondria in embryo sacs compared to wildtype. They conclude that the result indicates a potential plasmogamy defect. However, the dependability of this marker is questionable since only ~55% of wild-type seeds had detectable signal in the embryo and endosperm. The authors imply that this experiment could be used to test plasmogamy, but it is not clear how any conclusions related to the abnormal seed phenotype could be drawn from examining the rate of signal in both the embryo and endosperm. Since the mitochondrial marker was not expressed from a sperm-specific promoter, the fluorescent signal at 2DAP is likely due to new gene expression from pRPS5a in the fertilized embryo and endosperm, not an indication of the presence of sperm-derived mitochondria. Perhaps an earlier timepoint could be used as well as a spermspecific promoter instead of pRPS5a to answer the question of whether plasmogamy is happening in the ecs1 ecs2 ovules.

      Thanks for the suggestion. We here provide two additional new data sets to provide evidence that ecs1 ecs2 mutant plants indeed exhibit single fertilization that lead to fertilization recovery.

      We determined the fertilization failure by checking the decondensation HTR10-RFP labelled sperm nuclei 8-10 HAP (Figure 3B) and the frequency of heterofertilization through dual pollination experiment (Figure 3C-E) (see above).

      Reviewer #2 (Public Review):

      The manuscript reports the triploid and haploid productions using an ecs1ecs2 mutant as the maternal donor, in addition to the evaluation of the sexual process observed in the mutant. The indicated data show exquisite quality. To improve the content, I recommend carefully reconsidering the descriptions because some of the insights would cause a stir in the controversy regarding ECS1&2 functions in plant reproduction.

      Strengths

      Triploid production by a combination of ecs1ecs2 mutant and HIPOD system has potential as a future plant breeding tool. Moreover, it's intriguing that both triploid and haploid productions were achieved using the same mutant as a maternal donor. I think authors can claim the value of their results more by adding descriptions about the usefulness of the aneuploid plants in plant breeding history.

      The evidence of the persistent synergid nucleus (Figure 3A) is critical insight reported by this study. As Maruyama et al. (2013) reported by live cell imaging, synergid-endosperm fusion had occurred at the two endosperm nuclei stage. It would be valuable to claim the observed fact by citing Maruyama's previous observation.

      Weakness

      As the authors suggested, the higher triploid frequency observed in ecs1ecs2 than WT was likely caused by the increased polyspermy. However, it also could be that reduction of normal seed number in ecs1ecs2 (whichever is due to failure of fertilization or embryo development arrest) accounts for the increased frequency of the triploid compared to WT.

      The results in Figure 3C-E suggested the single fertilization for both egg and central cells at similar frequencies. This is an exciting result, but it is still possible that the fertilized egg or central cell degenerated after fertilization resulting in the disappearance of paternally inherited fluorescence. Evaluation of fertilization patterns at 7-10HAP in ecs1ecs2 mutant may provide more confident insight, although unfused sperm cell was evaluated at 1DAP (Figure 3-figure supplement 1B). The fertilization states can be distinguished depending on the HTR10RFP sperm nuclei morphology and their positions, as reported by Takahashi et al (2018).

      Thank you for your suggestion. We added the requested experiment see Figure 3B in the revised manuscript. In addition, we conducted a dual pollination experiment, that provides evidence for the activation of the fertilization recovery machinery (Figure 3C-E) (see above).

      Several recent studies have reported exciting insights on ECS1&2 functions; however, various results from different laboratories have raised controversy. Though, the commonly found feature is the repression of polytubey. For readers, it would be helpful to organize the explanation about which insights are concordant or different.

      Thank you for your suggestion. We now indicate using terms like in line with or in contrast to, where our data confirms /or contradicts with previous data.

      In addition, a drawing that explains the time course in the process from pollination to seed development (up to 6DAP) based on WT would help to understand which point is evaluated in each data.

      Thank you for your suggestion. We added a model figure (Figure 4E) at the end of the manuscript that brings the concepts together and facilitates the understandings.

      Reviewer #3 (Public Review):

      In this manuscript, Mao et al. reported that the two proteases ECS1 and ECS2 participate in both polyspermy block and gamete fusion in Arabidopsis thaliana. The authors could observe polytubey phenotype which has been reported previously and obtain both triparental plants and haploids in ecs1 ecs2 mutants. Therefore, they proposed that the triparental plants resulted from the polytubey block defect, whereas the haploids were caused by the gamete fusion defect. Together with two other previous reports, I think it is very interesting to see these two proteases participating in so many different but connected processes. Although they did not provide the molecular mechanism of how ECS participated in polyspermy block and gamete fusion, their findings provide more options for and thus promote plant breeding. The work may have a wide application in the future and will be of broad interest to cell biologists working on gamete fusion and plant breeders.

      We thank the reviewer for their positive comments.

      Although most of the conclusions in this paper are well supported by the data, it could be improved with a minor revision including providing clearer data analysis and descriptions, images with higher resolution, and more discussions.

    1. Author Response

      Reviewer #2 (Public Review):

      Here, a simple model of cerebellar computation is used to study the dependence of task performance on input type: it is demonstrated that task performance and optimal representations are highly dependent on task and stimulus type. This challenges many standard models which use simple random stimuli and concludes that the granular layer is required to provide a sparse representation. This is a useful contribution to our understanding of cerebellar circuits, though, in common with many models of this type, the neural dynamics and circuit architecture are not very specific to the cerebellum, the model includes the feedforward structure and the high dimension of the granule layer, but little else. This paper has the virtue of including tasks that are more realistic, but by the paper’s own admission, the same model can be applied to the electrosensory lateral line lobe and it could, though it is not mentioned in the paper, be applied to the dentate gyrus and large pyramidal cells of CA3. The discussion does not include specific elements related to, for example, the dynamics of the Purkinje cells or the role of Golgi cells, and, in a way, the demonstration that the model can encompass different tasks and stimuli types is an indication of how abstract the model is. Nonetheless, it is useful and interesting to see a generalization of what has become a standard paradigm for discussing cerebellar function.

      We appreciate the Reviewer’s positive comments. Regarding the simplifications of our model, we agree that we have taken a modeling approach that abstracts away certain details to permit comparisons across systems. We now include an in-depth discussion of our simplifying assumptions (Assumptions & Extensions section in the Discussion) and have further noted the possibility that other biophysical mechanisms we have not accounted for may also underlie differences across systems.

      Our results predict that qualitative differences in the coding levels of cerebellum-like systems, across brain regions or across species, reflect an optimization to distinct tasks (Figure 7). However, it is also possible that differences in coding level arise from other physiological differences between systems.

      Reviewer #3 (Public Review):

      1) The paper by Xie et al is a modelling study of the mossy fiber-to-granule cell-to-Purkinje cell network, reporting that the optimal type of representations in the cerebellar granule cell layer depends on the type task. The paper stresses that the findings indicate a higher overall bias towards dense representations than stated in the literature, but it appears the authors have missed parts of the literature that already reported on this. While the modelling and analysis appear mathematically solid, the model is lacking many known constraints of the cerebellar circuitry, which makes the applicability of the findings to the biological counterpart somewhat limited.

      We thank the Reviewer for suggesting additional references to include in our manuscript, and for encouraging us to extend our model toward greater biological plausibility and more critically discuss simplifying assumptions we have made. We respond to both the comment about previous literature and about applicability to cerebellar circuitry in detail below.

      2) I have some concerns with the novelty of the main conclusion, here from the abstract: ’Here, we generalize theories of cerebellar learning to determine the optimal granule cell representation for tasks beyond random stimulus discrimination, including continuous input-output transformations as required for smooth motor control. We show that for such tasks, the optimal granule cell representation is substantially denser than predicted by classic theories.’ Stated like this, this has in principle already been shown, i.e. for example: Spanne and Jo¨rntell (2013) Processing of multi-dimensional sensorimotor information in the spinal and cerebellar neuronal circuitry: a new hypothesis. PLoS Comput Biol. 9(3):e1002979. Indeed, even the 2 DoF arm movement control that is used in the present paper as an application, was used in this previous paper, with similar conclusions with respect to the advantage of continuous input-output transformations and dense coding. Thus, already from the beginning of this paper, the novelty aspect of this paper is questionable. Even the conclusion in the last paragraph of the Introduction: ‘We show that, when learning input-output mappings for motor control tasks, the optimal granule cell representation is much denser than predicted by previous analyses.’ was in principle already shown by this previous paper.

      We thank the Reviewer for drawing our attention to Spanne and Jo¨rntell (2013). Our study shares certain similarities with this work, including the consideration of tasks with smooth input-output mappings, such as learning the dynamics of a two-joint arm. However, our study differs substantially, most notably the fact that we focus our study on parametrically varying the degree of sparsity in the granule cell layer to determine the circumstances under which dense versus sparse coding is optimal. To the best of our ability, we can find no result in Spanne and J¨orntell (2013) that indicates the performance of a network as a function of average coding level. Instead, Spanne and Jo¨rntell (2013) propose that inhibition from Golgi cells produces heterogeneity in coding level which can improve performance, which is an interesting but complementary finding to ours. We therefore do not believe that the quantitative computations of optimal coding level that we present are redundant with the results of this previous study. We also note that a key contribution of our study is mathemetical analysis of the inductive bias of networks with different coding levels which supports our conclusions.

      We have included a discussion of Spanne and Jo¨rntell (2013) and (2015) in the revised version of our manuscript:

      "Other studies have considered tasks with smooth input-output mappings and low-dimensional inputs, finding that heterogeneous Golgi cell inhibition can improve performance by diversifying individual granule cell thresholds (Spanne and J¨orntell, 2013). Extending our model to include heterogeneous thresholds is an interesting direction for future work. Another proposal states that dense coding may improve generalization (Spanne and Jo¨rntell, 2015). Our theory reveals that whether or not dense coding is beneficial depends on the task."

      3) However, the present paper does add several more specific investigations/characterizations that were not previously explored. Many of the main figures report interesting new model results. However, the model is implemented in a highly generic fashion. Consequently, the model relates better to general neural network theory than to specific interpretations of the function of the cerebellar neuronal circuitry. One good example is the findings reported in Figure 2. These represent an interesting extension to the main conclusion, but they are also partly based on arbitrariness as the type of mossy fiber input described in the random categorization task has not been observed in the mammalian cerebellum under behavior in vivo, whereas in contrast, the type of input for the motor control task does resemble mossy fiber input recorded under behavior (van Kan et al 1993).

      We agree that the tasks we consider in Figure 2 are simplified compared to those that we consider elsewhere in the paper. The choice of random mossy fiber input was made to provide a comparison to previous modeling studies that also use random input as a benchmark (Marr 1969, Albus 1971, Brunel 2004, Babadi and Sompolinsky 2014, Billings 2014, LitwinKumar et al., 2017). This baseline permits us to specifically evaluate the effects of lowdimensional inputs (Figure 2) and richer input-output mappings (Figure 2, Figure 7). We agree with the Reviewer that the random and uncorrelated mossy fiber activity that has been extensively used in previous studies is almost certainly an unrealistic idealization of in vivo neural activity—this is a motivating factor for our study, which relaxes this assumption and examines the consequences. To provide additional context, we have updated the following paragraph in the main text Results section:

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a lowdimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks."

      4) The overall conclusion states: ‘Our results....suggest that optimal cerebellar representations are task-dependent.’ This is not a particularly strong or specific conclusion. One could interpret this statement as simply saying: ‘if I construct an arbitrary neural network, with arbitrary intrinsic properties in neurons and synapses, I can get outputs that depend on the intensity of the input that I provide to that network.’ Further, the last sentence of the Introduction states: ‘More broadly, we show that the sparsity of a neural code has a task-dependent influence on learning...’ This is very general and unspecific, and would likely not come as a surprise to anyone interested in the analysis of neural networks. It doesn’t pinpoint any specific biological problem but just says that if I change the density of the input to a [generic] network, then the learning will be impacted in one way or another.

      We agree with the Reviewer that our conclusions are quite general, and we have removed the final sentence as we agree it was unspecific. However, we disagree with the Reviewer’s paraphrasing of our results.

      First, we do not select arbitrary intrinsic properties of neurons and synapses. Rather, we construct a simplified model with a key quantity, the neuronal threshold, that we vary parametrically in order to assess the effect of the resulting changes in the representation on performance. Second, we do not vary the intensity/density of inputs provided to the network – this is fixed throughout our study for all key comparisons we perform. Instead, we vary the density (coding level) of the expansion layer representation and quantify its effect on inductive bias and generalization. Finally, our study’s key contribution is an explanation of the heterogeneity in average coding level observed across behaviors and cerebellum-like systems. We go beyond the empirical statement that there is a dependence of performance on the parameter that we vary by developing an analytical theory. Our theory describes the performance of the class of networks that we study and the properties of learning tasks that determine the optimal expansion layer representation.

      To clarify our main contributions, we have updated the final paragraph of the Introduction. We have also removed the sentence that the Reviewer objects to, as it was less specific than the other points we make here.

      "We propose that these differences can be explained by the capacity of representations with different levels of sparsity to support learning of different tasks. We show that the optimal level of sparsity depends on the structure of the input-output relationship of a task. When learning input-output mappings for motor control tasks, the optimal granule cell representation is much denser than predicted by previous analyses. To explain this result, we develop an analytic theory that predicts the performance of cerebellum-like circuits for arbitrary learning tasks. The theory describes how properties of cerebellar architecture and activity control these networks’ inductive bias: the tendency of a network toward learning particular types of input-output mappings (Sollich, 1998; Jacot et al., 2018; Bordelon et al., 2020; Canatar et al., 2021; Simon et al., 2021). The theory shows that inductive bias, rather than the dimension of the representation alone, is necessary to explain learning performance across tasks. It also suggests that cerebellar regions specialized for different functions may adjust the sparsity of their granule cell representations depending on the task."

      5) The interpretation of the distribution of the mossy fiber inputs to the granule cells, which would have a crucial impact on the results of a study like this, is likely incorrect. First, unlike the papers that the authors cite, there are many studies indicating that there is a topographic organization in the mossy fiber termination, such that mossy fibers from the same inputs, representing similar types of information, are regionally co-localized in the granule cell layer. Hence, there is no support for the model assumption that there is a predominantly random termination of mossy fibers of different origins. This risks invalidating the comparisons that the authors are making, i.e. such as in Figure 3. This is a list of example papers, there are more: van Kan, Gibson and Houk (1993) Movement-related inputs to intermediate cerebellum of the monkey. Journal of Neurophysiology. Garwicz et al (1998) Cutaneous receptive fields and topography of mossy fibres and climbing fibres projecting to cat cerebellar C3 zone. The Journal of Physiology. Brown and Bower (2001) Congruence of mossy fiber and climbing fiber tactile projections in the lateral hemispheres of the rat cerebellum. The Journal of Comparative Neurology. Na, Sugihara, Shinoda (2019) The entire trajectories of single pontocerebellar axons and their lobular and longitudinal terminal distribution patterns in multiple aldolase C-positive compartments of the rat cerebellar cortex. The Journal of Comparative Neurology.

      6) The nature of the mossy fiber-granule cell recording is also reviewed here: Gilbert and Miall (2022) How and Why the Cerebellum Recodes Input Signals: An Alternative to Machine Learning. The Neuroscientist. Further, considering the re-coding idea, the following paper shows that detailed information, as it is provided by mossy fibers, is transmitted through the granule cells without any evidence of re-coding: Jo¨rntell and Ekerot (2006) Journal of Neuroscience; and this paper shows that these granule inputs are powerfully transmitted to the molecular layer even in a decerebrated animal (i.e. where only the ascending sensory pathways remains) Jo¨rntell and Ekerot 2002, Neuron.

      We agree that there is strong evidence for a topographic organization in mossy fiber to granule cell connectivity at the microzonal level. We thank the Reviewer for pointing us to specific examples. We acknowledge that our simplified model does not capture the structure of connectivity observed in these studies.

      However, the focus of our model is on cerebellar neurons presynaptic to a single Purkinje cell. Random or disordered distribution of inputs at this local scale is compatible with topographic organization at the microzonal scale. Furthermore, while there is evidence of structured connections at the local scale, models with random connectivity are able to reproduce the dimensionality of granule cell activity within a small margin of error (Nguyen et al., 2022). Finally, our finding that dense codes are optimal for learning slowly varying tasks is consistent with evidence for the lack of re-coding – for such tasks, re-coding may absent because it is not required.

      We have dedicated a section on this issue in the Assumptions and Extensions portion of our Discussion:

      "Another key assumption concerning the granule cells is that they sample mossy fiber inputs randomly, as is typically assumed in Marr-Albus models (Marr, 1969; Albus, 1971; LitwinKumar et al., 2017; Cayco-Gajic et al., 2017). Other studies instead argue that granule cells sample from mossy fibers with highly similar receptive fields (Garwicz et al., 1998; Brown and Bower, 2001; J¨orntell and Ekerot, 2006) defined by the tuning of mossy fiber and climbing fiber inputs to cerebellar microzones (Apps et al., 2018). This has led to an alternative hypothesis that granule cells serve to relay similarly tuned mossy fiber inputs and enhance their signal-to-noise ratio (Jo¨rntell and Ekerot, 2006; Gilbert and Chris Miall, 2022) rather than to re-encode inputs. Another hypothesis is that granule cells enable Purkinje cells to learn piece-wise linear approximations of nonlinear functions (Spanne and J¨orntell, 2013). However, several recent studies support the existence of heterogeneous connectivity and selectivity of granule cells to multiple distinct inputs at the local scale (Huang et al., 2013; Ishikawa et al., 2015). Furthermore, the deviation of the predicted dimension in models constrained by electron-microscopy data as compared to randomly wired models is modest (Nguyen et al., 2022). Thus, topographically organized connectivity at the macroscopic scale may coexist with disordered connectivity at the local scale, allowing granule cells presynaptic to an individual Purkinje cell to sample heterogeneous combinations of the subset of sensorimotor signals relevant to the tasks that Purkinje cell participates in. Finally, we note that the optimality of dense codes for learning slowly varying tasks in our theory suggests that observations of a lack of mixing (J¨orntell and Ekerot, 2002) for such tasks are compatible with Marr-Albus models, as in this case nonlinear mixing is not required."

      7) I could not find any description of the neuron model used in this paper, so I assume that the neurons are just modelled as linear summators with a threshold (in fact, Figure 5 mentions inhibition, but this appears to be just one big lump inhibition, which basically is an incorrect implementation). In reality, granule cells of course do have specific properties that can impact the input-output transformation, PARTICULARLY with respect to the comparison of sparse versus dense coding, because the low-pass filtering of input that occurs in granule cells (and other neurons) as well as their spike firing stochasticity (Saarinen et al (2008). Stochastic differential equation model for cerebellar granule cell excitability. PLoS Comput. Biol. 4:e1000004) will profoundly complicate these comparisons and make them less straight forward than what is portrayed in this paper. There are also several other factors that would be present in the biological setting but are lacking here, which makes it doubtful how much information in relation to the biological performance that this modelling study provides: What are the types of activity patterns of the inputs? What are the learning rules? What is the topography? What is the impact of Purkinje cell outputs downstream, as the Purkinje cell output does not have any direct action, it acts on the deep cerebellar nuclear neurons, which in turn act on a complex sensorimotor circuitry to exert their effect, hence predictive coding could only become interpretable after the PC output has been added to the activity in those circuits. Where is the differentiated Golgi cell inhibition?

      Thank you for these critiques. We have made numerous edits to improve the presentation of the details of our model in the main text of the manuscript. Indeed, granule cells in the main text are modeled as linear sums of mossy fiber inputs with a threshold-linear activation function. A more detailed description of the model for granule cells can now be found in Equation 1 in the Results section:

      "The activity of neurons in the expansion layer is given by: h = φ(Jeffx − θ), (1) where φ is a rectified linear activation function φ(u) = max(u,0) applied element-wise. Our results also hold for other threshold-polynomial activation functions. The scalar threshold θ is shared across neurons and controls the coding level, which we denote by f, defined as the average fraction of neurons in the expansion layer that are active."

      Most of our analyses use the firing rate model we describe above, but several Supplemental Figures show extensions to this model. As we mention in the Discussion, our results do not depend on the specific choice of nonlinearity (Figure 2-figure supplement 2). We have also considered the possibility that the stochastic nature of granule cell spikes could impact our measures of coding level. In Figure 7-figure supplement 1 we test the robustness of our main conclusion using a spiking model where we model granule cell spikes with Poisson statistics. When measuring coding level in a population of spiking neurons, a key question is at what time window the Purkinje cell integrates spikes. For several choices of integration time windows, we show that dense coding remains optimal for learning smooth tasks. However, we agree with the Reviewer that there are other biological details our model does not address. For example, our spiking model does not capture some of the properties the Saarinen et al. (2008) model captures, including random sub-threshold oscillations and clusters of spikes. Modeling biophysical phenomena at this scale is beyond the scope of our study. We have added this reference to the relevant section of the Discussion:

      "We also note that coding level is most easily defined when neurons are modeled as rate, rather than spiking units. To investigate the consistency of our results under a spiking code, we implemented a model in which granule cell spiking exhibits Poisson variability and quantify coding level as the fraction of neurons that have nonzero spike counts (Figure 7-figure supplement 1; Figure 7C). In general, increased spike count leads to improved performance as noise associated with spiking variability is reduced. Granule cells have been shown to exhibit reliable burst responses to mossy fiber stimulation (Chadderton et al., 2004), motivating models using deterministic responses or sub-Poisson spiking variability. However, further work is needed to quantitatively compare variability in model and experiment and to account for more complex biophysical properties of granule cells (Saarinen et al., 2008)."

      A second concern the Reviewer raises is our implementation of Golgi cell inhibition as a homogeneous rather than heterogeneous input onto granule cells. In simplified models, adding heterogeneous inhibition does not dramatically change the qualitative properties of the expansion layer representation, in particular the dimensionality of the representation (Billings et al., 2014, Cayco-Gajic et al., 2017, Litwin-Kumar et al., 2017). We have added a section about inhibition to our Discussion:

      "We also have not explicitly modeled inhibitory input provided by Golgi cells, instead assuming such input can be modeled as a change in effective threshold, as in previous studies (Billings et al., 2014; Cayco-Gajic et al., 2017; Litwin-Kumar et al., 2017). This is appropriate when considering the dimension of the granule cell representation (Litwin-Kumar et al., 2017), but more work is needed to extend our model to the case of heterogeneous inhibition."

      Regarding the mossy fiber inputs, as we state in response to paragraph 3, we agree with the Reviewer that the random and uncorrelated mossy fiber activity that has been used in previous studies is an unrealistic idealization of in vivo neural activity. One of the motivations for our model was to relax this assumption and examine the consequences: we introduce correlations in the mossy fiber activity by projecting low-dimensional patterns into the mossy fiber layer (Figure 1B):

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a low-dimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks.

      We therefore assume that the inputs to our model lie on a D-dimensional subspace embedded in the N-dimensional input space, where D is typically much smaller than N (Figure 1B). We refer to this subspace as the “task subspace” (Figure 1C)."

      The Reviewer also mentions the learning rule at granule cell to Purkinje cell synapses. We agree that considering online, climbing-fiber-dependent learning is an important generalization. We therefore added a new supplemental figure investigating whether we would still see a difference in optimal coding levels across tasks if online learning were used instead of the least squares solution (Figure 7-figure supplement 2). Indeed, we observed a similar task dependence as we saw in Figure 2F. We have added a new paragraph in the Discussion under Assumptions and Extensions describing our rationale and approach in detail:

      "For the Purkinje cells, our model assumes that their responses to granule cell input can be modeled as an optimal linear readout. Our model therefore provides an upper bound to linear readout performance, a standard benchmark for the quality of a neural representation that does not require assumptions on the nature of climbing fiber-mediated plasticity, which is still debated. Electrophysiological studies have argued in favor of a linear approximation (Brunel et al., 2004). To improve the biological applicability of our model, we implemented an online climbing fiber-mediated learning rule and found that optimal coding levels are still task-dependent (Figure 7-figure supplement 2). We also note that although we model several timing-dependent tasks (Figure 7), our learning rule does not exploit temporal information, and we assume that temporal dynamics of granule cell responses are largely inherited from mossy fibers. Integrating temporal information into our model is an interesting direction for future investigation."

      Finally, regarding the function of the Purkinje cell, our model defines a learning task as a mapping from inputs to target activity in the Purkinje cell and is thus agnostic to the cell’s downstream effects. We clarify this point when introducing the definition of a learning task:

      "In our model, a learning task is defined by a mapping from task variables x to an output f(x), representing a target change in activity of a readout neuron, for example a Purkinje cell. The limited scope of this definition implies our results should not strongly depend on the influence of the readout neuron on downstream circuits."

      8) The problem of these, in my impression, generic, arbitrary settings of the neurons and the network in the model becomes obvious here: ‘In contrast to the dense activity in cerebellar granule cells, odor responses in Kenyon cells, the analogs of granule cells in the Drosophila mushroom body, are sparse...’ How can this system be interpreted as an analogy to granule cells in the mammalian cerebellum when the model does not address the specifics lined up above? I.e. the ‘inductive bias’ that the authors speak of, defined as ‘the tendency of a network toward learning particular types of input-output mappings’, would be highly dependent on the specifics of the network model.

      We agree with the Reviewer that our model makes several simplifying assumptions for mathematical tractability. However, we note that our study is not the first to draw analogies between cerebellum-like systems, including the mushroom body (Bell et al., 2008; Farris, 2011). All the systems we study feature a sparsely connected, expanded granule-like layer that sends parallel fiber axons onto densely connected downstream neurons known to exhibit powerful synaptic plasticity, thus motivating the key architectural assumptions of our model. We have constrained anatomical parameters of the model using data as available (Table 1). However, we agree with the Reviewer that when making comparisons across species there is always a possibility that differences are due to physiological mechanisms we have not fully understood or captured with a model. As such, we can only present a hypothesis for these differences. We have modified our Discussion section on this topic to clearly state this.

      "Our results predict that qualitative differences in the coding levels of cerebellum-like systems, across brain regions or across species, reflect an optimization to distinct tasks (Figure 7). However, it is also possible that differences in coding level arise from other physiological differences between systems."

      9) More detailed comments: Abstract: ‘In these models [Marr-Albus], granule cells form a sparse, combinatorial encoding of diverse sensorimotor inputs. Such sparse representations are optimal for learning to discriminate random stimuli.’ Yes, I would agree with the first part, but I contest the second part of this statement. I think what is true for sparse coding is that the learning of random stimuli will be faster, as in a perceptron, but not necessarily better. As the sparsification essentially removes information, it could be argued that the quality of the learning is poorer. So from that perspective, it is not optimal. The authors need to specify from what perspective they consider sparse representations optimal for learning.

      This is an important point that we would like to clarify. It is not the case that sparse coding simply speeds up learning. In our study and many related works (Barak et al. 2013; Babadi and Sompolinsky 2014; Litwin-Kumar et al. 2017), learning performance is measured based on the generalization ability of the network – the ability to predict correct labels for previously unseen inputs. As our study and previous studies show, sparse codes are optimal in the sense that they minimize generalization error, independent of any effect on learning speed. To communicate this more effectively, we have added the following sentence to the first paragraph of the Introduction:

      "Sparsity affects both learning speed (Cayco-Gajic et al., 2017), and generalization, the ability to predict correct labels for previously unseen inputs (Barak et al., 2013; Babadi and Sompolinsky, 2014; Litwin-Kumar et al., 2017)."

      10) Introduction: ‘Indeed, several recent studies have reported dense activity in cerebellar granule cells in response to sensory stimulation or during motor control tasks (Knogler et al., 2017; Wagner et al., 2017; Giovannucci et al., 2017; Badura and De Zeeuw, 2017; Wagner et al., 2019), at odds with classic theories (Marr, 1969; Albus, 1971).’ In fact, this was precisely the issue that was addressed already by Jo¨rntell and Ekerot (2006) Journal of Neuroscience. The conclusion was that these actual recordings of granule cells in vivo provided essentially no support for the assumptions in the Marr-Albus theories.

      In our reading, the main finding of J¨orntell and Ekerot (2006) is that individual granule cells are activated by mossy fibers with overlapping receptive fields driven by a single type of somatosensory input. However, there is also evidence of nonlinear mixed selectivity in granule cells in support of the re-coding hypothesis (Huang et al., 2013; Ishikawa et al., 2015). Jo¨rntell and Ekerot (2006) also suggest that the granule cell layer shares similar topographic organization as mossy fibers, organized into microzones. The existence of topographic organization does not invalidate Marr-Albus theories. As we have suggested earlier, a local combinatorial expansion can coexist with a global topographic organization.

      We have described these considerations in the Assumptions and Extensions portion of the Discussion:

      "Another key assumption concerning the granule cells is that they sample mossy fiber inputs randomly, as is typically assumed in Marr-Albus models (Marr, 1969; Albus, 1971; LitwinKumar et al., 2017; Cayco-Gajic et al., 2017). Other studies instead argue that granule cells sample from mossy fibers with highly similar receptive fields (Garwicz et al., 1998; Brown and Bower, 2001; J¨orntell and Ekerot, 2006) defined by the tuning of mossy fiber and climbing fiber inputs to cerebellar microzones (Apps et al., 2018). This has led to an alternative hypothesis that granule cells serve to relay similarly tuned mossy fiber inputs and enhance their signal-to-noise ratio (Jo¨rntell and Ekerot, 2006; Gilbert and Chris Miall, 2022) rather than to re-encode inputs. Another hypothesis is that granule cells enable Purkinje cells to learn piece-wise linear approximations of nonlinear functions (Spanne and J¨orntell, 2013). However, several recent studies support the existence of heterogeneous connectivity and selectivity of granule cells to multiple distinct inputs at the local scale (Huang et al., 2013; Ishikawa et al., 2015). Furthermore, the deviation of the predicted dimension in models constrained by electron-microscopy data as compared to randomly wired models is modest (Nguyen et al., 2022). Thus, topographically organized connectivity at the macroscopic scale may coexist with disordered connectivity at the local scale, allowing granule cells presynaptic to an individual Purkinje cell to sample heterogeneous combinations of the subset of sensorimotor signals relevant to the tasks that Purkinje cell participates in. Finally, we note that the optimality of dense codes for learning slowly varying tasks in our theory suggests that observations of a lack of mixing (J¨orntell and Ekerot, 2002) for such tasks are compatible with Marr-Albus models, as in this case nonlinear mixing is not required."

      We have also included the Jo¨rntell and Ekerot (2006) study as a citation in the Introduction:

      "Indeed, several recent studies have reported dense activity in cerebellar granule cells in response to sensory stimulation or during motor control tasks (Jo¨rntell and Ekerot, 2006; Knogler et al., 2017; Wagner et al., 2017; Giovannucci et al., 2017; Badura and De Zeeuw, 2017; Wagner et al., 2019), at odds with classic theories (Marr, 1969; Albus, 1971)."

      11) Results: 1st para: There is no information about how the granule cells are modelled.

      We agree that this should information should have been more readily available. We now more completely describe the model in the main text. Our model for granule cells can be found in Equation 1 in the Results section and also the Methods (Network Model):

      "The activity of neurons in the expansion layer is given by: h = φ(Jeffx − θ), (2)

      where φ is a rectified linear activation function φ(u) = max(u,0) applied element-wise. Our results also hold for other threshold-polynomial activation functions. The scalar threshold θ is shared across neurons and controls the coding level, which we denote by f, defined as the average fraction of neurons in the expansion layer that are active."

      12) 2nd para: ‘A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space.’ Yes, I agree, and this is in fact in conflict with the known topographical organization in the cerebellar cortex (see broader comment above). Mossy fiber inputs coding for closely related inputs are co-localized in the cerebellar cortex. I think for this model to be of interest from the point of view of the mammalian cerebellar cortex, it would need to pay more attention to this organizational feature.

      As we discuss in our response to paragraphs 5 and 6, we see the random distribution assumption at the local scale (inputs presynaptic to a single Purkinje cell) as being compatible with topographic organization occurring at the microzone scale. Furthermore, as discussed earlier, we specifically model low-dimensional input as opposed to the random and high-dimensional inputs typically studied in prior models.

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a low-dimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks. We therefore assume that the inputs to our model lie on a D-dimensional subspace embedded in the N-dimensional input space, where D is typically much smaller than N (Figure 1B). We refer to this subspace as the “task subspace” (Figure 1C)."

      References

      Albus, J.S. (1971). A theory of cerebellar function. Mathematical Biosciences 10, 25–61.

      Apps, R., et al. (2018). Cerebellar Modules and Their Role as Operational Cerebellar Processing Units. Cerebellum 17, 654–682.

      Babadi, B. and Sompolinsky, H. (2014). Sparseness and expansion in sensory representations. Neuron 83, 1213–1226.

      Badura, A. and De Zeeuw, C.I. (2017). Cerebellar granule cells: dense, rich and evolving representations. Current Biology 27, R415–R418.

      Barak, O., Rigotti, M., and Fusi, S. (2013). The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. Journal of Neuroscience 33, 3844– 3856.

      Bell, C.C., Han, V., and Sawtell, N.B. (2008). Cerebellum-like structures and their implications for cerebellar function. Annual Review of Neuroscience 31, 1–24.

      Billings, G., Piasini, E., Lo˝rincz, A., Nusser, Z., and Silver, R.A. (2014). Network structure within the cerebellar input layer enables lossless sparse encoding. Neuron 83, 960–974.

      Bordelon, B., Canatar, A., and Pehlevan, C. (2020). Spectrum dependent learning curves in kernel regression and wide neural networks. International Conference on Machine Learning 1024–1034.

      Brown, I.E. and Bower, J.M. (2001). Congruence of mossy fiber and climbing fiber tactile projections in the lateral hemispheres of the rat cerebellum. Journal of Comparative Neurology 429, 59–70.

      Brunel, N., Hakim, V., Isope, P., Nadal, J.P., and Barbour, B. (2004). Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cell. Neuron 43, 745–757.

      Canatar, A., Bordelon, B., and Pehlevan, C. (2021). Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature Communications 12, 1–12.

      Cayco-Gajic, N.A., Clopath, C., and Silver, R.A. (2017). Sparse synaptic connectivity is required for decorrelation and pattern separation in feedforward networks. Nature Communications 8, 1–11.

      Chadderton, P., Margrie, T.W., and Ha¨usser, M. (2004). Integration of quanta in cerebellar granule cells during sensory processing. Nature 428, 856–860.

      Churchland, M.M., et al. (2010). Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nature Neuroscience 13, 369–378.

      Farris, S.M. (2011). Are mushroom bodies cerebellum-like structures? Arthropod structure & development 40, 368–379.

      Garwicz, M., Jorntell, H., and Ekerot, C.F. (1998). Cutaneous receptive fields and topography of mossy fibres and climbing fibres projecting to cat cerebellar C3 zone. The Journal of Physiology 512 ( Pt 1), 277–293.

      Gilbert, M. and Chris Miall, R. (2022). How and Why the Cerebellum Recodes Input Signals: An Alternative to Machine Learning. The Neuroscientist 28, 206–221.

      Giovannucci, A., et al. (2017). Cerebellar granule cells acquire a widespread predictive feedback signal during motor learning. Nature Neuroscience 20, 727–734.

      Huang, C.C., et al. (2013). Convergence of pontine and proprioceptive streams onto multimodal cerebellar granule cells. eLife 2, e00400.

      Ishikawa, T., Shimuta, M., and Ha¨usser, M. (2015). Multimodal sensory integration in single cerebellar granule cells in vivo. eLife 4, e12916.

      Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems 31.

      Jo¨rntell, H. and Ekerot, C.F. (2002). Reciprocal Bidirectional Plasticity of Parallel Fiber Receptive Fields in Cerebellar Purkinje Cells and Their Afferent Interneurons. Neuron 34, 797–806.

      Jorntell, H. and Ekerot, C.F. (2006). Properties of Somatosensory Synaptic Integration in Cerebellar Granule Cells In Vivo. Journal of Neuroscience 26, 11786–11797.

      Knogler, L.D., Markov, D.A., Dragomir, E.I., Stih, V., and Portugues, R. (2017). Senso-ˇ rimotor representations in cerebellar granule cells in larval zebrafish are dense, spatially organized, and non-temporally patterned. Current Biology 27, 1288–1302.

      Litwin-Kumar, A., Harris, K.D., Axel, R., Sompolinsky, H., and Abbott, L.F. (2017). Optimal degrees of synaptic connectivity. Neuron 93, 1153–1164. Marr, D. (1969). A theory of cerebellar cortex. Journal of Physiology 202, 437–470.

      Nguyen, T.M., et al. (2022). Structured cerebellar connectivity supports resilient pattern separation. Nature 1–7.

      Saarinen, A., Linne, M.L., and Yli-Harja, O. (2008). Stochastic Differential Equation Model for Cerebellar Granule Cell Excitability. PLOS Computational Biology 4, e1000004.

      Simon, J.B., Dickens, M., and DeWeese, M.R. (2021). A theory of the inductive bias and generalization of kernel regression and wide neural networks. arXiv: 2110.03922.

      Sollich, P. (1998). Learning curves for Gaussian processes. Advances in Neural Information Processing Systems 11.

      Spanne, A. and Jo¨rntell, H. (2013). Processing of Multi-dimensional Sensorimotor Information in the Spinal and Cerebellar Neuronal Circuitry: A New Hypothesis. PLOS Computational Biology 9, e1002979.

      Spanne, A. and Jo¨rntell, H. (2015). Questioning the role of sparse coding in the brain. Trends in Neurosciences 38, 417–427.

      van Kan, P.L., Gibson, A.R., and Houk, J.C. (1993). Movement-related inputs to intermediate cerebellum of the monkey. Journal of Neurophysiology 69, 74–94.

      Wagner, M.J., Kim, T.H., Savall, J., Schnitzer, M.J., and Luo, L. (2017). Cerebellar granule cells encode the expectation of reward. Nature 544, 96–100.

      Wagner, M.J., et al. (2019). Shared cortex-cerebellum dynamics in the execution and learning of a motor task. Cell 177, 669–682.e24.

      Wolpert, D.M., Miall, R.C., and Kawato, M. (1998). Internal models in the cerebellum. Trends in Cognitive Sciences 2, 338–347.

      Yu, B.M., et al. (2009). Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. Journal of Neurophysiology 102, 614–635.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Huang et al., assess cognitive flexibility in rats trained on an animal model of anorexia nervosa known as activity-based anorexia (ABA). For the first time, they do this in a way that is fully automated and free from experimenter interference, as apparently experimenter interference can affect both the development of ABA as well as the effect on behaviour. They show that animals that are more cognitively flexible (i.e. animals that had received reversal training) were better able to resist weight loss upon exposure to ABA, whereas animals exposed to ABA first show poorer cognitive flexibility (reversal performance).

      Strengths:

      • The development of a fully-automated, experimenter-free behavioural assessment paradigm that is capable of identifying individual rats and therefore tracking their performance.

      • The bidirectional nature of the study - i.e. the fact that animals were tested for cognitive flexibility both before and after exposure to ABA, so that direction of causality could be established.

      • The analyses are rigorous and the sample sizes sufficient.

      • The use of touchscreens increases the translational potential of the findings.

      Weaknesses

      • Some descriptions of methods and results are confusing or insufficiently detailed.

      We have been through all methods and results to include additional details as requested by this reviewer below.

      It seems to me that performance on the pairwise discrimination task cannot be directly (statistically) compared to performance on reversal (as in Figure 4E), as these are tapping into fundamentally different cognitive processes (discrimination versus reversal learning). I think comparing groups on each assessment is valid, however.

      We agree that discrimination and reversal are different cognitive processes, and statistical comparisons between these two components of the task were only made when examining the speed of learning in the validation of the novel testing system. Moreover, our inclusion of the pink and purple bars on graphs such as Figure 4C & 4E represent “main effects of ABA exposure”, regardless of learning phase (PD or reversal) rather than, as you describe, comparing PD to R1. Perhaps this comparison wasn’t clear, so we have amended the text to say ‘main effect of ABA exposure p=.0017’ rather than just “exposure”.

      Not necessarily a 'weakness' but I would have loved to see some assessment of the alterations in neural mechanisms underlying these effects, and/or some different behavioural assessments in addition to those used here. In particular, the authors mention in the discussion that this manipulation can affect cholinergic functioning in the dorsal striatum We (Bradfield et al., Neuron, 2013) and a number of others have now demonstrated that cholinergic dysfunction in the dorsomedial striatum impairs a different kind of reversal learning that based on alterations in outcome identity and thus relies on a different cognitive process (i.e. 'state' rather than 'reward' prediction error). It would be interesting perhaps in the future to see if the ABA manipulation also alters performance on this alternative 'cognitive flexibility' task.

      This is an excellent suggestion and we have already begun exploring this in other ongoing work in the laboratory. Due to ‘compulsive’ wheel running being a hallmark of ABA, we are interested in determining if this also translates to a goal-directed action impairment using the well-established outcome-specific devaluation task. Perhaps with ABA it may be more relevant to investigate outcome-reversals rather than stimulus-reversals, and if this is the case, it would further support the use of the ABA model for investigating cognitive dysfunction relevant to AN. We have included an additional section in the discussion text relating to our hypotheses regarding outcome-specific reversal learning in the ABA model.

      Nevertheless, I certainly think the manuscript provides a solid appraisal of cognitive flexibility using more traditional tasks, and that the authors have achieved their aims. I think the work here will be of importance, certainly to other researchers using the ABA model, but perhaps also of translational importance in the future, as the causal relationship between ABA and cognitive inflexibility is near impossible to establish using human studies, but here evidence points strongly towards this being the case.

      Reviewer #2 (Public Review):

      Huang and colleagues present data from experiments assessing the role of cognitive inflexibility in the vulnerability to weight loss in the activity-based anorexia paradigm in rats. The experiments employ a novel in-home cage touchscreen system. The home cage touch screen system allows reduced testing time and increased throughput compared with the more widely used systems resulting in the ability to assess ABA following testing cognitive flexibility in relatively young female rats. The data demonstrate that, contrary to expectations, cognitive inflexibility does not predispose to greater ABA weight loss, but instead, rats that performed better in the reversal learning task lost more weight in the ABA paradigm. Prior ABA exposure resulted in poorer learning of the task and reversal. An additional experiment demonstrated that rats that had been trained in reversal learning resisted weight loss in the ABA paradigm. The findings are important and are clearly presented. They have implications for anorexia nervosa both in terms of potentially identifying those at risk also in understanding the high rates of relapse.

      Thanks for a great summary of the manuscript.

      Reviewer #3 (Public Review):

      Activity-based anorexia (ABA), which combines access to a running wheel and restricted access to food, is a most common paradigm used to study anorexic behavior in rodents. And yet, the field has been plagued by persistent questions about its validity as a model of anorexia nervosa (AN) in humans. This group's previous studies supported the idea that the ABA paradigm captures cognitive inflexibility seen in AN. Here they describe a fully automated touchscreen cognitive testing system for rats that makes it possible to ask whether cognitive inflexibility predisposes individuals to severe weight loss in the ABA paradigm. They observed that cognitive inflexibility was predictive of resistance to weight loss in the ABA, the opposite of what was predicted. They also reported reciprocal effects of ABA and cognitive testing on subsequent performance in the other paradigm. Prior exposure to the ABA decreased subsequent cognitive performance, while prior exposure to the cognitive task promoted resistance to the ABA. Based on these findings, the authors argue that the ABA model can be used to identify novel therapeutic targets for AN.

      The strength of this manuscript is primarily as a methods paper describing a novel automated cognitive behavioral testing system that obviates the need for experimentalist handling and single housing, which can interfere with behavioral testing, and accelerate learning on the task. Together, these features make it feasible to perform longitudinal studies to ask whether cognitive performance is predictive of behavior in a second paradigm during adolescence, a peak period of vulnerability for many psychiatric disorders. The authors also used machine learning tools to identify specific behaviors during the cognitive task that predicted later susceptibility to the ABA paradigm. While the benefits of this system are clear, the rigor and reproducibility of experiments using this paradigm would be enhanced if the authors provided clear guidelines about which parameters and analyses are most useful. In their absence, the large amount of data generated can promote p-hacking.

      The authors use their automated behavioral testing paradigm to ask whether cognitive inflexibility is a cause or consequence of susceptibility to ABA, an issue that cannot be addressed in AN. They provide compelling evidence that there are reciprocal effects of the two behavioral paradigms, but do not perform the controls needed to evaluate the significance of these observations. For example, the learning task involves sucrose consumption and food restriction, conditions that can independently affect susceptibility to the ABA. Similarly, the ABA paradigm involves exercise and restricted access to food, which can both affect learning.

      In the Discussion, the authors hypothesize that the ABA paradigm produces cognitive inflexibility and argue that uncovering the underlying mechanism can be used to identify new therapeutic targets for AN. The rationale for their claim of translational relevance is undermined by the fact that the biggest effect of the ABA paradigm is seen in the pair discrimination task, and not reversal learning. This pattern does not fit clinical observations in AN.

      In summary, the significance of this manuscript lies in the development of a new system to test cognitive function in rats that can be combined with other paradigms to explore questions of causality. While the authors clearly demonstrate that cognitive flexibility does not promote susceptibility to ABA, the experiments presented do not provide a compelling case that their model captures important features of the pathophysiology of AN.

      We thank the reviewer for this detailed review and note that we have now both explicitly defined the most useful parameters for analyses from the novel touchscreen system as well as removed some comparisons that could be considered superfluous. We argue that the additional information provided by the machine learning analyses are, at this stage, exploratory, and rather than reveal independent descriptions of behavioural change in ABA exposed versus naïve rats this information will aid in the generation of hypotheses to be tested in future studies. Therefore, the figures pertaining to these analyses have now been provided as supplements to Figures 3 & 4 (Figure 3-figure supplement 3; Figure 4-figure supplements 3&4). We have also clarified our intention to explore possible behavioural differences using this technique in the methods and discussion.

      We have also completed the essential control experiment, defined in the “essential revisions” section of this review, whereby we show only moderate impairments in reversal learning following a matched period of food restriction without rapid weight loss, suggesting that the substantial impairment seen following ABA exposure was not due to food restriction alone (see updated Figure 4 and supplements).

      However, we do not agree with this reviewer “that the biggest effect of the ABA paradigm is seen in the pair discrimination task” and point to the outcomes of both reciprocal experiments.

      In the first experiment, rats that went onto be susceptible or resistant to ABA did not differ on pairwise discrimination learning but specifically on performance at the reversal of reward contingencies (Figure 3B & E). Although this result was not in the hypothesised direction, this suggests that reversal learning specifically and not pairwise discrimination can differentiate those rats that go on to be susceptible to weight loss. We have included additional discussion in the text related to this finding (see line 490-497).

      In the second experiment, it is clear by the number of ABA exposed rats that were unable to learn the reversal component even after being able to learn pairwise discrimination, that flexible learning is more impaired by ABA. While it is true that ABA exposed rats that were successful in learning the reversal task were slower to learn the pairwise discrimination component than naïve rats (Figure 4E), this was not related to their ability to learn the reversal task overall – with equivalent learning rates in pairwise discrimination to ABA exposed rats that failed to learn the reversal component (Figure 4G-I). The absence of significant differences between ABA exposed and naïve animals in Figure 4F relates to the fact that the large proportion of ABA exposed animals never reached performance criterion in the reversal phase of the task and therefore data from these animals could not be included in the figure. This is where the trials completed within each session becomes important for interpretation (i.e. Figure 4-figure supplement 1M-O), whereby ABA exposure caused impaired responding specifically within the reversal phase of the task. The results text has been updated to better reflect this critical point.

      Overall, this suggests that the impairment in cognitive flexibility caused by ABA exposure was related both to an associative learning impairment (slower to learn PD than naïve animals) and an impairment in the integration of new and existing learning (failure to learn R1 in a large proportion of animals).

    1. one must conclude that community is always in/with time, always unfinished,

      Pauline van Mourik Broekman: And community is also always in/with space. In that respect, it seems so important to recognise how hard editors of ‘living books’ actually find it to encourage the reuse/appropriation/disappropriation offered up, and quite how much (material, socialised) time and care it takes to coax – and perform – this activity sensitively, on- and offline, with all the nuances you’ve described (and which run counter to the ‘social’ as the metricised communicating human being is now supposed to perform – and seek – it, and whose conditions of ‘communication’ Jodi Dean has done a lot to theorise).

      My PhD research on early Soviet life made me realise it is just really hard to conceive of the experience of true convulsive collectivity (a loss of individuality that I realise may be different, but that I hope might also be compared to the forms of subjectivation inherent in disappropriation?). And how creativity, let alone ‘authorship’, might be experienced within that. Do we (and I am thinking here especially of scholarly workers) come anything close to Walter Benjamin’s experience, from 1927, of how “Each thought, each day, each life lies here [in Moscow] as on a laboratory table. … No organism, no organisation, can escape this process.” Sensations which are also documented in Richard Stites’ Revolutionary Dreams: Utopian Vision and Experimental Life in the Russian Revolution, Oxford: Oxford University Press, 1989; and similarly, in Kristin Ross’s works on the Paris commune (Ross, 2008, 2015). The Soviet concept of the ‘social condenser’ is fascinating in this respect in that it places architecture, and space/s, right in the centre of psychosocial subjectivation, as a potentially intensifying, opening or collectivising force in social movement and change (as some have commented, these might importantly be separated into ‘planned’ and ‘accidental’ social condensers, meaning those which are forward-looking and intentional, or retroactively recognised for their capacities).

      If, as Teju Cole so memorably described, we have achieved the sort of collective spectacular alienation wherein we can witness ‘death in the browser tab’ while sitting still in front of a computer and toggling between that and other media ‘content’ (The New York Times Magazine, 2015, and online: https://www.nytimes.com/2015/05/24/magazine/death-in-the-browser-tab.html, how are we to expand living books’ writerly ‘space’ such that the tabs which living books’ readers/writers painstakingly write into might truly act as social condensers, in line with the more fervent hopes and dreams of ‘radical’ open access? As we sit at those computers, writing, our bodies slumped in chairs and our eyes tired and glazed, should we, can we, seek an experience of elated social dissolution the likes of which I’ve in recent times only seen described by authors contemplating the psychological experience of riots (e.g. Hannah Black, 2022; Tobi Haslett, 2021; Adrian Wohlleben, 2021). It is a vain imagining, probably, but I can’t help but wonder how might try and think of these phenomena together, or at least as potentially related? To me it seems inevitably to point to the fact that we cannot conceive of digital materials outside of the spaces in which they are engaged with. I’ve found Mark Nowak’s Social Poetics (2020) and June Jordan’s Poetry for the People (1995) some of the more helpful sources to think this relationship through (though I realise there are countless others). It also seems telling that they are to a lesser or greater extent centred in interpretations of communal pedagogy.

    1. Author Response

      Reviewer #1 (Public Review):

      Strengths

      This paper is well situated theoretically within the habit learning/OCD literature. Daily training in a motor-learning task, delivered via smartphone, was innovative, ecologically valid and more likely to assay habitual behaviors specifically. Daily training is also more similar to studies with non-humans, making a better link with that literature. The use of a sequential-learning task (cf. tasks that require a single response) is also more ecologically valid. The in-laboratory tests (after the 1 month of training) allowed the researchers to test if the OCD group preferred familiar, but more difficult, sequences over newer, simpler sequences.

      The authors achieved their aims in that two groups of participants (patients with OCD and controls) engaged with the task over the course of 30 days. The repeated nature of the task meant that 'overtraining' was almost certainly established, and automaticity was demonstrated. This allowed the authors to test their hypotheses about habit learning. The results are supportive of the authors' conclusions.

      We truly appreciate the positive assessment of referee 1, particularly the consideration that our study is theoretically strong and that ‘the results are supportive of the authors' conclusions’. This is an important external endorsement of our conclusions, contrasting somewhat with the views of referee 2.

      Weaknesses

      The sample size was relatively small. Some potentially interesting individual differences within the OCD group could have been examined more thoroughly with a bigger sample (e.g., preference for familiar sequences). A larger sample may have allowed the statistical testing of any effects due to medication status.

      The authors were not able to test one criterion of habits, namely resistance to devaluation, due to the nature of the task

      We agree with the reviewer that the proof of principle established in our study opens new avenues for research into the psychological and behavioral determinants of the heterogeneity of this clinical population. However, considering the study timeline and the pandemic constraints, a bigger sample was not possible. Our sample can indeed be considered small if one compares it with current online studies, which do not require in-person/laboratory testing, thus being much easier to recruit and conduct. However, given the nature of our protocol (with 2 demanding test phases, 1-month engagement per participant and the inclusion of OCD patients without comorbidities only) and the fact that this study also involved laboratory testing, we consider our sample size reasonable and comparable to other laboratory studies (typically comprising on average between 30-50 participants in each group).

      This article is likely to be impactful -- the delivery of a task across 30 days to a patient group is innovative and represents a new approach for the study of habit learning that is superior to an inlaboratory approach.

      An interesting aspect of this manuscript is that it prompts a comparison with previous studies of goal-directed/habitual responding in OCD that used devaluation protocols, and which may have had their effects due to deficits in goal-directed behavior and not enhanced habit learning per se.

      Thank you for acknowledging the impact of our study, in particular the unique ability of our task to interrogate the habit system.

      Reviewer #2 (Public Review):

      In this study, the researchers employed a recently developed smartphone application to provide 30 days of training on action sequences to both OCD patients and healthy volunteers. The study tested learning and automaticity-related measures and investigated the effects of several factors on these measures. Upon training completion, the researchers conducted two preference tests comparing a learned and unlearned action sequences under different conditions. While the study provides some interesting findings, I have a few substantial concerns:

      1) Throughout the entire paper, the authors' interpretations and claims revolve around the domain of habits and goal-directed behavior, despite the methods and evidence clearly focusing on motor sequence learning/procedural learning/skill learning. There is no evidence to support this framing and interpretation and thus I find them overreaching and hyperbolic, and I think they should be avoided. Although skills and habits share many characteristics, they are meaningfully distinguishable and should not be conflated or mixed up. Furthermore, if anything, the evidence in this study suggests that participants attained procedural learning, but these actions did not become habitual, as they remained deliberate actions that were not chosen to be performed when they were not in line with participants' current goals.

      We acknowledge that the research on habit learning is a topic of current controversy, especially when it comes to how to induce and measure habits in humans. Therefore, within this context referee’s 2 criticism could be expected. Across disQnct fields of research, different methodologies have been used to measure habits, which represent relaQvely stereotyped and autonomous behavioral sequences enacted in response to a specific sQmulus without consideraQon, at the Qme of iniQaQon of the sequence, of the value of the outcome or any representaQon of the relaQonship that exists between the response and the outcome. Hence these are sQmulus-bound responses which may or may not require the implementaQon of a skill during subsequent performance. Behavioral neuroscienQsts define habits similarly, as sQmulus-response associaQons which are independent of reward or outcome, and use devaluaQon or conQngency degradaQon strategies to probe habits (Dickinson and Weiskrantz, 1985; Tricomi et al., 2009). Others conceptualize habits as a form of procedural memory, along with skills, and use motor sequence learning paradigms to invesQgate and dissect different components of habit learning such as acQon selecQon, execuQon and consolidaQon (Abrahamse et al., 2013; Doyon et al., 2003; Squire et al., 1993). It is also generally agreed that the autonomous nature of habits and the fluid proficiency of skills are both usually achieved with many hours of training or pracQce, respecQvely (Haith and Krakauer, 2018).

      We consider that Balleine and Dezfouli (2019) made an excellent attempt to bring all these different criteria within a single framework, which we have followed. We also consider that our discussion in fact followed a rather cautious approach to interpretation solely in terms of goaldirected versus habitual control.

      Referee 2 does not actually specify criteria by which they define habits and skills, except for asserting that skilled behavior is goal-directed, without mentioning what the actual goal of the implantation of such skill is in the present study: the fulfillment of a habit? We assume that their definition of habit hinges on the effects of devaluation, as a single criterion of habit, but which according to Balleine and Dezfouli (2019) is only 1 of their 4 listed criteria. We carefully addressed this specific criterion in our manuscript: “We were not, however, able to test the fourth criterion, of resistance to devaluation. Therefore, we are unable to firmly conclude that the action sequences are habits rather than, for example, goal-directed skills. Regardless of whether the trained action sequences can be defined as habits or goal-directed motor skills, it has to be considered…”. Therefore, we took due care in our conclusions concerning habits and thus found the referee’s comment misleading and unfair.

      We note that our trained motor sequences did in fact fulfil the other 3 criteria listed by Balleine and Dezfouli (2019), unlike many studies employing only devaluation (e.g. Tricomi et al 2009; Gillan et al 2011). Moreover, we cited a recent study using very similar methodology where the devaluation test was applied and shown to support the habit hypothesis (Gera et al., 2022).

      Whether the initiation of the trained motor sequences in experiment 3 (arbitration) are underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1). Transitions between habitual and goal-directed control over behavior are quite well established in the experimental literature, especially when choice opportunities become available (Bouton et al (2021), Frölich et al (2023), or a new goal-directed schemata is recruited to fulfill a habit (Fouyssac et al, 2022). This switching between habits and goal-directed responding may reflect the coordination of these systems in producing effective behavior in the real world.

      • Fouyssac M, Peña-Oliver Y, Puaud M, Lim NTY, Giuliano C, Everitt BJ, Belin D. (2021).Negative Urgency Exacerbates Relapse to Cocaine Seeking After Abstinence. Biological Psychiatry. doi: 10.1016/j.biopsych.2021.10.009

      • Frölich S, Esmeyer M, Endrass T, Smolka MN and Kiebel SJ (2023) Interaction between habits as action sequences and goal-directed behavior under time pressure. Front. Neurosci. 16:996957. doi: 10.3389/fnins.2022.996957

      • Bouton ME. 2021. Context, attention, and the switch between habit and goal-direction in behavior. Learn Behav 49:349– 362. doi:10.3758/s13420-021-00488-z

      2) Some methodological aspects need more detail and clarification.

      3) There are concerns regarding some of the analyses, which require addressing.

      We thank referee 2 for their detailed review of the methods and analyses of our study and for the helpful feedback, which clearly helps improve our manuscript. We will clarify the methodological aspects in detail and conduct the suggested analysis. Please see below our answers to the specific points raised.

      Introduction:

      4) It is stated that "extensive training of sequential actions would more rapidly engage the 'habit system' as compared to single-action instrumental learning". In an attempt to describe the rationale for this statement the authors describe the concept of action chunking, its benefits and relevance to habits but there is no explanation for why sequential actions would engage the habit system more rapidly than a single-action. Clarifying this would be helpful.

      We agree that there is no evidence that action sequences become habitual more readily than single actions, although action sequences clearly allow ‘chunking’ and thus likely engage neural networks including the putamen which are implicated in habit learning as well as skill. In our revised manuscript we will instead state: “we have recently postulated that extensive training of sequential actions could be a means for rapidly engaging the ‘habit system’ (Robbins et al., 2019)]”

      5) In the Hypothesis section the authors state: “we expected that OCD patients... show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence”. I find it particularly difficult to interpret preference for familiar sequences as enhanced habit attainment.

      We agree that choice of the familiar response sequence should not be a necessary criterion for habitual control although choice for a familiar sequence is, in fact, not inconsistent with this hypothesis. In a recent study, Zmigrod et al (2022) found that 'aversion to novelty' was a relevant factor in the subjective measurement of habitual tendencies. It should also be noted that this preference was present in patients with OCD. If one assumes instead, like the referee, that the familiar sequence is goal-directed, then it contravenes the well-known 'egodystonia' of OCD which suggests that such tendencies are not goal-directed.

      To clarify our hypothesis, we will amend the sentence to the following: “Finally, we expected that OCD patients would generally report greater habits, as well as attribute higher intrinsic value to the familiar app sequences manifested by a greater preference for performing them when given the choice to select any other, easier sequence”.

      A few notes on the task description and other task components:

      6) It would be useful to give more details on the task. This includes more details on the time/condition of the gradual removal of visual and auditory stimuli and also on the within practice dynamic structure (i.e., different levels appear in the video).

      These details will be included in the revised manuscript. Thank you for pointing out the need for further clarification of the task design.

      7) Some more information on engagement-related exclusion criteria would be useful (what happened if participants did not use the app for more than one day, how many times were allowed to skip a day etc.).

      This additional information will be added to the revised manuscript. If participants omitted to train for more than 2 days, the researcher would send a reminder to the participant to request to catch up. If the participant would not react accordingly and a third day would be skipped, then the researcher would call to understand the reasons for the lack of engagement and gauge motivation. The participant would be excluded if more than 5 sequential days of training were missed. Only 2 participants were excluded given their lack of engagement.

      8) According to the (very useful) video demonstrating the task and the paper describing the task in detail (Banca et al., 2020), the task seems to include other relevant components that were not mentioned in this paper. I refer to the daily speed test, the daily random switch test, and daily ratings of each sequence's enjoyment and confidence of knowledge.

      If these components were not included in this procedure, then the deviations from the procedure described in the video and Banca al. (2020) should be explicitly mentioned. If these components were included, at least some of them may be relevant, at least in part, to automaticity, habitual action control, formulation of participants' enjoyment from the app etc. I think these components should be mentioned and analyzed (or at least provide an explanation for why it has been decided not to analyze them).

      This is also true for the reward removal (extinction) from the 21st day onwards which is potentially of particular relevance for the research questions.

      The task procedure was indeed the same as detailed in Banca et al., 2020. We did not include these extra components in this current manuscript for reasons of succinctness and because the manuscript was already rather longer than a common research article, given that we present three different, though highly inter-dependent, experiments in order to answer key interrelated questions in an optimal manner. However, since referee 2 considers this additional analysis to be important, we will be happy to include it in the supplementary material of the revised manuscript.

      Training engagement analysis:

      9)I find referring to the number of trials including successful and unsuccessful trials as representing participants "commitment to training" (e.g. in Figure legend 2b) potentially inadequate. Given that participants need at least 20 successful trials to complete each practice, more errors would lead to more trials. Therefore, I think this measure may mostly represent weaker performance (of the OCD patients as shown in Figure 2b). Therefore, I find the number of performed practice runs, as used in Figure 2a (which should be perfectly aligned with the number of successful trials), a "clean" and proper measure of engagement/commitment to training.

      We acknowledge referee’s concern on this matter and agree to replace the y-axis variable of Figure 2b to the number of performed practices (thus aligning with Figure 2a). This amendment will remove any potential effect of weaker performance on the engagement measurement and will provide clearer results.

      10) Also, to provide stronger support for the claim about different diurnal training patterns (as presented in Figure 2c and the text) between patients and healthy individuals, it would be beneficial to conduct a statistical test comparing the two distributions. If the results of this test are not significant, I suggest emphasizing that this is a descriptive finding.

      We will conduct the statistical test and report accordingly.

      Learning results:

      11) When describing the Learning results (p10) I think it would be useful to provide the descriptive stats for the MT0 parameter (as done above for the other two parameters).

      Thank you for pointing this out. The descriptive stats for MT0 will be added to the revised version of the manuscript.

      12) Sensitivity of sequence duration and IKI consistency (C) to reward:

      I think it is important to add details on how incorrect trials were handled when calculating ∆MT (or C) and ∆R, specifically in cases where the trial preceding a successful trial was unsuccessful. If incorrect trials were simply ignored, this may not adequately represent trial-by-trial changes, particularly when testing the effect of a trial's outcome on performance change in the next trial.

      This is an important question. Our analysis protocol was designed to ensure that incorrect trials do not contaminate or confound the results. To estimate the trial-to-trial difference in ∆MT (or C) and ∆R, we exclusively included pairs of contiguous trials where participants achieved correct performance and received feedback scores for both trials. For example, if a participant made a performance error on trial 23, we did not include ∆R or ∆MT estimates for the pairs of trials 23-22 and 24-23. Instead of excluding incorrect trials from our analyses, we retained them in our time series but assigned them a NaN (not a number) value in Matlab. As a result, ∆R and ∆MT was not defined for those two pairs of trials. Similarly for C. This approach ensured that our analyses are not confounded by incremental or decremental feedback scores between noncontiguous trials. In the past, when assessing the timing of correct actions during skilled sequence performance, we also considered events that were preceded and followed by correct actions. This excluded effects such as post-error slowing from contaminating our results (Herrojo Ruiz et al., 2009, 2019). Therefore, we do not believe that any further reanalysis is required.

      • Ruiz MH, Jabusch HC, Altenmüller E. Detecting wrong notes in advance: neuronal correlates of error monitoring in pianists. Cerebral cortex. 2009 Nov 1;19(11):2625-39.

      • Bury G, García-Huéscar M, Bhattacharya J, Ruiz MH. Cardiac afferent activity modulates early neural signature of error detection during skilled performance. NeuroImage. 2019 Oct 1;199:704-17.

      13) I have a serious concern with respect to how the sensitivity of sequence duration to reward is framed and analyzed. Since reward is proportional to performance, a reduction in reward essentially indicates a trial with poor performance, and thus even regression to the mean (along with a floor effect in performance [asymptote]) could explain the observed effects. It is possible that even occasional poor performance could lead to a participant demonstrating this effect, potentially regardless of the reward. Accordingly, the reduced improvement in performance following a reward decrease as a function of training length described in Figure 5b legend may reflect training-induced increased performance that leaves less room for improvement after poor trials, which are no longer as poor as before. To address this concern, controlling for performance (e.g., by taking into consideration the baseline MT for the previous trial) may be helpful. If the authors can conduct such an analysis and still show the observed effect, it would establish the validity of their findings."

      Thank you for raising this point. Figure 5b illustrates two distinct effects of reward changes on behavioral adaptation, which are expected based on previous research.

      I. Practice effects: Firstly, we observe that as participants progress across bins of practice, the degree of improvement in behavior (reflected by faster movement time, MT) following a decrease in reward (∆R−) diminishes, consistent with our expectations based on previous work. Conversely, we found that ∆MT does not change across bins of practices following an increase in reward (∆R+). We appreciate the reviewer's suggestion regarding controlling for the reference movement time (MT) in the previous trial when examining the practice effect in the p(∆T|∆R−) and p(∆T|∆R+) distributions. In the revised manuscript, we will conduct the proposed control analysis to better understand whether the sensitivity of MT to score decrements changes across practice when normalising MT to the reference level on each trial. But see below for a preliminary control analysis.

      II. Asymmetry of the effect of ∆R− and ∆R+ on performance: Figure 5b also depicts the distinct impact of score increments and decrements on behavioural changes. When aggregating data across practice bins, we consistently observed that the centre of the p(∆T|∆R−) distribution was smaller (more negative) than that of p(∆T|∆R+). This suggests that participants exhibited a greater acceleration following a drop in scores compared to a relative score increase, and this effect persisted throughout the practice sessions. Importantly, this enhanced sensitivity to losses or negative feedback (or relative drops in scores) aligns with previous research findings (Galea et al., 2015; Pekny et al., 2014; van Mastrigt et al., 2020).

      We have conducted a preliminary control analysis to exclude the potential impact that reference movement time (MT) values could have on our analysis. We have assessed the asymmetry between behavioural responses to ∆R− and ∆R+ using the following analysis: We estimated the proportion of trials in which participants exhibited speed-up (∆T < 0) or slow-down (∆T > 0) behaviour following ∆R− and ∆R+ across different practice bins (bins 1 to 4). By discretising the series of behavioural changes (∆T) into binary values (+1 for slowing down, -1 for speeding up), we can assess the type of changes (speed-up, slow-down) without the absolute ∆T or T values contributing to our results. We obtained several key findings:

      • Consistent with expectations (sanity check), participants exhibited more instances of speeding up than slowing down across all reward conditions.

      • Participants demonstrated a higher frequency of speeding up following ∆R− compared to ∆R+, and this asymmetry persisted throughout the practice sessions (greater proportion of -1 events than +1 events). 53% events were speed-up events in the in the p(∆T|∆R+) distribution for the first bin of practices, and 55% for the last bin. Regarding p(∆T|∆R-), there were 63% speed-up events throughout each bin of practices, with this proportion exhibiting no change over time.

      • Accordingly, the asymmetry of reward changes on behavioural adaptations, as revealed by this analysis, remained consistent across the practice bins.

      Thus, these preliminary findings provide an initial response to referee 2 and offer valuable insights into the asymmetrical effects of positive/negative reward changes on behavioural adaptations. We plan to include these results in the revised manuscript, as well as the full control analysis suggested by the referee. We will further expand upon their interpretation and implications.

      14) Another way to support the claim of reward change directionality effects on performance (rather than performance on performance), at least to some extent, would be to analyze the data from the last 10 days of the training, during which no rewards were given (pretending for analysis purposes that the reward was calculated and presented to participants). If the effect persists, it is less unlikely that the effect in question can be attributed to the reward dynamics.

      The reviewer’s concern is addressed in the previous quesQon. Also, this analysis would not be possible because our Gaussian fit analyses use the Qme series of conQnuous reward scores, in which ∆R− or ∆R+ are embedded. These events cannot be analyzed once reward feedback is removed because we do not have behavioral events following ∆R− or ∆R+ anymore.

      15) This concern is also relevant and should be considered with respect to the sensitivity of IKI consistency (C) to reward. While the relationship between previous reward/performance and future performance in terms of C is of a different structure, the similar potential confounding effects could still be present.

      We will conduct this analysis for the revised manuscript, similarly to the control analysis suggested by referee 2 on MT. Our preliminary control analysis, as explained above, suggests that the fundamental asymmetry in the effect of ∆R+ and ∆R+ on behavioral changes persists when excluding the impact of reference performance values in our Gaussian fit analysis.

      16) Another related question (which is also of general interest) is whether the preferred app sequence (as indicated by the participants for Phase B) was consistently the one that yielded more reward? Was the continuous sequence the preferred one? This might tell something about the effectiveness of the reward in the task.

      We have now conducted this analysis. There is in fact no evidence to conclude that the continuously rewarded sequence was the preferred one. The result shows that 54.5% of HV and 29% of the OCD sample considered the continuous sequence to be their preferred one. Of note, this preference may not necessarily be linked to the trial-by-trial reward sensitive analysis. The latter assesses how learning may be affected by reward. The overall preference may be influenced by many other factors, such as, for example, the aesthetic appeal of particular combinations of finger movements.

      Regarding both experiments 2 and 3:

      17) The change in context in experiment 2 and 3 is substantial and include many different components. These changes should be mentioned in more detail in the Results section before describing the results of experiments 2 and 3.

      Following referee’s advice, we will move these details (currently written in the Methods section) to the Results section, when we introduce Phase B and before describing the results of experiments 2 and 3.

      Experiment 2:

      18) In Experiment 2, the authors sometimes refer to the "explicit preference task" as testing for habitual and goal-seeking sequences. However, I do not think there is any justification for interpreting it as such. The other framings used by the authors - testing whether trained action sequences gain intrinsic/rewarding properties or value, and preference for familiar versus novel action sequences - are more suitable and justified. In support of the point I raised here, assigning intrinsic rewarding properties to the learned sequences and thereby preferring these sequences can be conceptually aligned with goal-directed behavior just as much as it could be with habit.

      We clearly defined the theoretical framing of experiment 2 as a test of whether trained action sequences gain intrinsic value and we are pleased to hear that the referee agrees with this framing. If the referee is referring to the paragraph below (in the Discussion), we actually do acknowledge within this paragraph that a preference for the trained sequences can either be conceptually aligned with a habit OR a goal-directed behavior.

      “On the other hand, we are describing here two potential sources of evidence in favor of enhanced habit formation in OCD. First, OCD patients show a bias towards the previously trained, apparently disadvantageous, action sequences. In terms of the discussion above, this could possibly be reinterpreted as a narrowing of goals in OCD (Robbins et al., 2019) underlying compulsive behavior, in favor of its intrinsic outcomes”

      This narrowing of goals model of OCD refers to a hypothetically transiQonal stage of compulsion development driven by behavior having an abnormally strong, goal-directed nature, typically linked to specific values and concerns.

      If the referee is referring to the penulQmate sentence of hypothesis secQon, this has been amended in response to Q5. We cannot find any other possible instances in this manuscript stating that experiment 2 is a test of habitual or goal-directed behavior.

      Experiment 3:

      19) Similar to Experiment 2, I find the framing of arbitration between goal-directed/habitual behavior in Experiment 3 inadequate and unjustified. The results of the experiment suggest that participants were primarily goal-directed and there is no evidence to support the idea that this reevaluation led participants to switch from habitual to goal-directed behavior.

      Also, given the explicit choice of the sequence to perform participants had to make prior to performing it, it is reasonable to assume that this experiment mainly tested bias towards familiar sequence/stimulus and/or towards intrinsic reward associated with the sequence in value-based decision making.

      This comment is aligned with (and follows) the referee’s criticism of experiment 1 not achieving automatic and habitual actions. We have addressed this matter above, in response 1 to Referee 2.

      Mobile-app performance effect on symptomatology: exploratory analyses:

      20) Maybe it would be worth testing if the patients with improved symptomatology (that contribute some of their symptom improvement to the app) also chose to play more during the training stage.

      We have conducted analysis to address this relevant question. There is no correlation between the YBOCS score change and the number of total practices, meaning that the patients who improved symptomatology post training did not necessarily chose to play the app more during the training stage (rs = 0.25, p = 0.15). Additionally, we have statistically compared the improvers (patients with reduced YBOCS scores post-training) and the non-improvers (patients with unchanged or increased YBOCS scores post-training) in their number of app completed practices during the training phase and no differences were observed (U = 169, p = 0.19).

      Discussion:

      21) Based on my earlier comments highlighting the inadequacy and mis-framing of the work in terms of habit and goal-directed behavior, I suggest that the discussion section be substantially revised to reflect these concerns.

      We do not agree that the work is either "inadequate or mis-framed" and will not therefore be substantially revising the Discussion. We will however clarify further the interpretation we have made and make explicit the alternative viewpoint of the referee. For example, we will retitle experiment 3 as “Re-evaluation of the learned action sequence: possible test of goal/habit arbitration” to acknowledge the referee’s viewpoint as well as our own interpretation.

      22) In the sentence "Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions" the term "disadvantageously" is not necessarily accurate. While there was potentially more effort required, considering the possible presence of intrinsic reward and chunking, this preference may not necessarily be disadvantageous. Therefore, a more cautious and accurate phrasing that better reflects the associated results would be useful.

      We recognize that the term "disadvantageously" may be semantically ambiguous for some readers and therefore we will remove it.

      Materials and Methods:

      23) The authors mention: "The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained)." - for the sake of completeness, more details on the pre-training done on that day would be useful.

      Details of the learning procedure of the novel sequence (in condition 3, experiment 3) will be provided in the methods of the revised version of the manuscript.

      Minor comments:

      24) In the section discussing the sensitivity of sequence duration to reward, the authors state that they only analyzed continuous reward trials because "a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials." However, feedback was also provided on all trials in the variable reward condition, even though the reward was not necessarily aligned with participants' performance. Therefore, it may be beneficial to rephrase this statement for clarity.

      We will follow this referee’s advice and will rephrase the sentence for clarity.

      25) With regard to experiment 2 (Preference for familiar versus novel action sequences) in the following statement "A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition." I find the use of the word "further" here potentially misleading.

      The word "further" will be removed.

    2. Reviewer #2 (Public Review):

      In this study, the researchers employed a recently developed smartphone application to provide 30 days of training on action sequences to both OCD patients and healthy volunteers. The study tested learning and automaticity-related measures and investigated the effects of several factors on these measures. Upon training completion, the researchers conducted two preference tests comparing a learned and unlearned action sequences under different conditions. While the study provides some interesting findings, I have a few substantial concerns:

      1. Throughout the entire paper, the authors' interpretations and claims revolve around the domain of habits and goal-directed behavior, despite the methods and evidence clearly focusing on motor sequence learning/procedural learning/skill learning. There is no evidence to support this framing and interpretation and thus I find them overreaching and hyperbolic, and I think they should be avoided. Although skills and habits share many characteristics, they are meaningfully distinguishable and should not be conflated or mixed up. Furthermore, if anything, the evidence in this study suggests that participants attained procedural learning, but these actions did not become habitual, as they remained deliberate actions that were not chosen to be performed when they were not in line with participants' current goals.<br /> 2. Some methodological aspects need more detail and clarification.<br /> 3. There are concerns regarding some of the analyses, which require addressing.

      Please see details below, ordered by the paper sections.

      Introduction:<br /> It is stated that "extensive training of sequential actions would more rapidly engage the 'habit system' as compared to single-action instrumental learning". In an attempt to describe the rationale for this statement the authors describe the concept of action chunking, its benefits and relevance to habits but there is no explanation for why sequential actions would engage the habit system more rapidly than a single-action. Clarifying this would be helpful.

      In the Hypothesis section the authors state: "we expected that OCD patients... show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence." I find it particularly difficult to interpret preference for familiar sequences as enhanced habit attainment.

      A few notes on the task description and other task components:<br /> It would be useful to give more details on the task. This includes more details on the time/condition of the gradual removal of visual and auditory stimuli and also on the within practice dynamic structure (i.e., different levels appear in the video).

      Some more information on engagement-related exclusion criteria would be useful (what happened if participants did not use the app for more than one day, how many times were allowed to skip a day etc.).

      According to the (very useful) video demonstrating the task and the paper describing the task in detail (Banca et al., 2020), the task seems to include other relevant components that were not mentioned in this paper. I refer to the daily speed test, the daily random switch test, and daily ratings of each sequence's enjoyment and confidence of knowledge.<br /> If these components were not included in this procedure, then the deviations from the procedure described in the video and Banca al. (2020) should be explicitly mentioned. If these components were included, at least some of them may be relevant, at least in part, to automaticity, habitual action control, formulation of participants' enjoyment from the app etc. I think these components should be mentioned and analyzed (or at least provide an explanation for why it has been decided not to analyze them).<br /> This is also true for the reward removal (extinction) from the 21st day onwards which is potentially of particular relevance for the research questions.

      Training engagement analysis:<br /> I find referring to the number of trials including successful and unsuccessful trials as representing participants "commitment to training" (e.g. in Figure legend 2b) potentially inadequate. Given that participants need at least 20 successful trials to complete each practice, more errors would lead to more trials. Therefore, I think this measure may mostly represent weaker performance (of the OCD patients as shown in Figure 2b). Therefore, I find the number of performed practice runs, as used in Figure 2a (which should be perfectly aligned with the number of successful trials), a "clean" and proper measure of engagement/commitment to training.

      Also, to provide stronger support for the claim about different diurnal training patterns (as presented in Figure 2c and the text) between patients and healthy individuals, it would be beneficial to conduct a statistical test comparing the two distributions. If the results of this test are not significant, I suggest emphasizing that this is a descriptive finding.

      Learning results:<br /> When describing the Learning results (p10) I think it would be useful to provide the descriptive stats for the MT0 parameter (as done above for the other two parameters).

      Sensitivity of sequence duration and IKI consistency (C) to reward:<br /> I think it is important to add details on how incorrect trials were handled when calculating ∆MT (or C) and ∆R, specifically in cases where the trial preceding a successful trial was unsuccessful. If incorrect trials were simply ignored, this may not adequately represent trial-by-trial changes, particularly when testing the effect of a trial's outcome on performance change in the next trial.

      I have a serious concern with respect to how the sensitivity of sequence duration to reward is framed and analyzed. Since reward is proportional to performance, a reduction in reward essentially indicates a trial with poor performance, and thus even regression to the mean (along with a floor effect in performance [asymptote]) could explain the observed effects. It is possible that even occasional poor performance could lead to a participant demonstrating this effect, potentially regardless of the reward. Accordingly, the reduced improvement in performance following a reward decrease as a function of training length described in Figure 5b legend may reflect training-induced increased performance that leaves less room for improvement after poor trials, which are no longer as poor as before. To address this concern, controlling for performance (e.g., by taking into consideration the baseline MT for the previous trial) may be helpful. If the authors can conduct such an analysis and still show the observed effect, it would establish the validity of their findings."<br /> Another way to support the claim of reward change directionality effects on performance (rather than performance on performance), at least to some extent, would be to analyze the data from the last 10 days of the training, during which no rewards were given (pretending for analysis purposes that the reward was calculated and presented to participants). If the effect persists, it is less unlikely that the effect in question can be attributed to the reward dynamics.<br /> This concern is also relevant and should be considered with respect to the Sensitivity of IKI consistency (C) to reward (even though the relationship between previous reward/performance and future performance in terms of C is of a different structure).<br /> This concern is also relevant and should be considered with respect to the sensitivity of IKI consistency (C) to reward. While the relationship between previous reward/performance and future performance in terms of C is of a different structure, the similar potential confounding effects could still be present.

      Another related question (which is also of general interest) is whether the preferred app sequence (as indicated by the participants for Phase B) was consistently the one that yielded more reward? Was the continuous sequence the preferred one? This might tell something about the effectiveness of the reward in the task.

      Regarding both experiments 2 and 3:<br /> The change in context in experiment 2 and 3 is substantial and include many different components. These changes should be mentioned in more detail in the Results section before describing the results of experiments 2 and 3.

      Experiment 2:<br /> In Experiment 2, the authors sometimes refer to the "explicit preference task" as testing for habitual and goal-seeking sequences. However, I do not think there is any justification for interpreting it as such. The other framings used by the authors - testing whether trained action sequences gain intrinsic/rewarding properties or value, and preference for familiar versus novel action sequences - are more suitable and justified. In support of the point I raised here, assigning intrinsic rewarding properties to the learned sequences and thereby preferring these sequences can be conceptually aligned with goal-directed behavior just as much as it could be with habit.

      Experiment 3:<br /> Similar to Experiment 2, I find the framing of arbitration between goal-directed/habitual behavior in Experiment 3 inadequate and unjustified. The results of the experiment suggest that participants were primarily goal-directed and there is no evidence to support the idea that this re-evaluation led participants to switch from habitual to goal-directed behavior.<br /> Also, given the explicit choice of the sequence to perform participants had to make prior to performing it, it is reasonable to assume that this experiment mainly tested bias towards familiar sequence/stimulus and/or towards intrinsic reward associated with the sequence in value-based decision making.

      Mobile-app performance effect on symptomatology: exploratory analyses:<br /> Maybe it would be worth testing if the patients with improved symptomatology (that contribute some of their symptom improvement to the app) also chose to play more during the training stage.

      Discussion:<br /> Based on my earlier comments highlighting the inadequacy and mis-framing of the work in terms of habit and goal-directed behavior, I suggest that the discussion section be substantially revised to reflect these concerns.

      In the sentence "Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions" the term "disadvantageously" is not necessarily accurate. While there was potentially more effort required, considering the possible presence of intrinsic reward and chunking, this preference may not necessarily be disadvantageous. Therefore, a more cautious and accurate phrasing that better reflects the associated results would be useful.

      Materials and Methods:<br /> The authors mention: "The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained)." - for the sake of completeness, more details on the pre-training done on that day would be useful.

      Minor comments:<br /> In the section discussing the sensitivity of sequence duration to reward, the authors state that they only analyzed continuous reward trials because "a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials." However, feedback was also provided on all trials in the variable reward condition, even though the reward was not necessarily aligned with participants' performance. Therefore, it may be beneficial to rephrase this statement for clarity.

      With regard to experiment 2 (Preference for familiar versus novel action sequences) in the following statement "A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition." I find the use of the word "further" here potentially misleading.

    1. Reviewer #2 (Public Review):

      Olszyński et al. claim that they identified a "new-type" ultrasonic vocalization around 44 kHz that occurs in response to prolonged fear conditioning (using foot-shocks of relatively high intensity, i.e. 1 mA) in rats. Typically, negative 22-kHz calls and positive 50-kHz calls are distinguished in rats, commonly by using a frequency threshold of 30 or 32 kHz. Olszyński et al. now observed so-called "44-kHz" calls in a substantial number of subjects exposed to 10 tone-shock pairings, yet call emission rate was low (according to Fig. 1G around 15%, according to the result text around 7.5%). They also performed playback experiments and concluded that "the responses to 44-kHz aversive calls presented from the speaker were either similar to 22-kHz vocalizations or in-between responses to 22-kHz and 50-kHz playbacks".

      Strengths: Detailed spectrographic analysis of a substantial data set of ultrasonic vocalizations recorded during prolonged fear conditioning, combined with playback experiments.

      Weaknesses: I see a number of major weaknesses.

      While the descriptive approach applied is useful, the findings have only focused importance and scope, given the low prevalence of "44 kHz" calls and limited attempts made to systematically manipulate factors that lead to their emission. In fact, the data presented appear to be derived from reanalyses of previously conducted studies in most cases and the main claims are only partially supported. While reading the manuscript, I got the impression that the data presented here are linked to two or three previously published studies (Olszyński et al., 2020, 2021, 2023). This is important to emphasize for two reasons: 1) It is often difficult (if not impossible) to link the reported data to the different experiments conducted before (and the individual experimental conditions therein). While reanalyzing previously collected data can lead to important insight, it is important to describe in a clear and transparent manner what data were obtained in what experiment (and more specifically, in what exact experimental condition) to allow appropriate interpretation of the data. For example, it is said that in the "trace fear conditioning experiment" both single- and group-housed rats were included, yet I was not able to tell what data were obtained in single- versus group-housed rats. This may sound like a side aspect, however, in my view this is not a side aspect given the fact that ultrasonic vocalizations are used for communication and communication is affected by the social housing conditions. 2) In at least two of the previously published manuscripts (Olszyński et al., 2021, 2023), emission of ultrasonic vocalizations was analyzed (Figure S1 in Olszyński et al., 2021, and Fig. 1 in Olszyński et al., 2023). This includes detailed spectrographic analyses covering the frequency range between 20 and 100 kHz, i.e. including the frequency range, where the "new-type" ultrasonic vocalization, now named "44 kHz" call, occurs, as reflected in the examples provided in Fig. 1 of Olszyński et al. (2023). In the materials and methods there, it was said: "USV were assigned to one of three categories: 50-kHz (mean peak frequency, MPF >32 kHz), short 22-kHz (MPF of 18-32 kHz, <0.3 s duration), long 22-kHz (MPF of 18-32 kHz, >0.3 s duration)". Does that mean that the "44 kHz" calls were previously included in the count for 50-kHz calls? Or were 44 kHz calls (intentionally?) left out? What does that mean for the interpretation of the previously published data? What does that mean for the current data set? In my view, there is a lack of transparency here.

      Moreover, whether the newly identified call type is indeed novel is questionable, as also mentioned by the authors in their discussion section. While they wrote in the introduction that "high-pitch (>32 kHz), long and monotonous ultrasonic vocalizations have not yet been described", they wrote in the discussion that "long (or not that long (Biały et al., 2019)), frequency-stable high-pitch vocalizations have been reported before (e.g. Sales, 1979; Shimoju et al., 2020), notably as caused by intense cholinergic stimulation (Brudzynski and Bihari, 1990) or higher shock-dose fear conditioning (Wöhr et al., 2005)" (and I wish to add that to my knowledge this list provided by the authors is incomplete). Therefore, I believe, the strong claims made in abstract ("we are the first to describe a new-type..."), introduction ("have not yet been described"), and results ("new calls") are not justified.

      In general, the manuscript is not well written/ not well organized, the description of the methods is insufficient, and it is often difficult (if not impossible) to link the reported data to the experiments/ experimental conditions described in the materials and methods section. For example, I miss a clear presentation of basic information: 1) How many rats emitted "44 kHz" calls (in total, per experiment, and importantly, also per experimental condition, i.e. single- versus group-housed)? 2) Out of the ones emitting "44 kHz" calls, what was the prevalence of "44 kHz" calls (relative to 22- and 50-kHz calls, e.g. shown as percentage)? 3) How did this ratio differ between experiments and experimental conditions? 4) Was there a link to freezing? Freezing was apparently analyzed before (Olszyński et al., 2021, 2023) and it would be important to see whether there is a correlation between "44-kHz" calls and freezing. Moreover, it would be important to know what behavior the rats are displaying while such "44-kHz" calls are emitted? (Note: Even not all 22-kHz calls are synced to freezing.) All this could help to substantiate the currently highly speculative claims made in the discussion section ("frequency increases with an increase in arousal" and "it could be argued that our prolonged fear conditioning increased the arousal of the rats with no change in the valence of the aversive stimuli"). Such more detailed analyses are also important to rule out the possibility that the "new-type" ultrasonic vocalization, the so-called "44 kHz" call, is simply associated with movement/ thorax compression.

      The figures currently included are purely descriptive in most cases - and many of them are just examples of individual rats (e.g. majority of Fig. 1, all of Fig. 2 to my understanding, with the exception of the time course, which in case of D is only a subset of rats ("only rats that emitted 44-kHz calls in at least seven ITI are plotted" - is there any rationale for this criterion?)), or, in fact, just representative spectrograms of calls (all of Fig. 3, with the exception of G, all of Fig. 4). Moreover, the differences between Fig. 5 and Fig. 6 are not clear to me. It seems Fig. 5B is included three times - what is the benefit of including the same figure three times? A systematic comparison of experimental conditions is limited to Fig. 7 and Fig. 8, the figures depicting the playback results (which led to the conclusion that "the responses to 44-kHz aversive calls presented from the speaker were either similar to 22-kHz vocalizations or in-between responses to 22-kHz and 50-kHz playbacks", although it remains unclear to me why differences were seen b e f o r e the experimental manipulation, i.e. the different playback types in Fig. 8B).

      Related to that, I miss a clear presentation of relevant methodological aspects: 1) Why were some rats single-housed but not the others? 2) Is the experimental design of the playback study not confounded? It is said that "one group (n = 13) heard 50-kHz appetitive vocalization playback while the other (n = 16) 22-kHz and 44-kHz aversive calls". How can one compare "44 kHz" calls to 22- and 50-kHz calls when "44 kHz" calls are presented together with 22-kHz calls but not 50-kHz calls? What about carry-over effects? Hearing one type of call most likely affects the response to the other type of call. It appears likely that rats are a bit more anxious after hearing aversive 22-kHz calls, for example. Therefore, it would not be very surprising to see that the response to "44 kHz" calls is more similar to 22-kHz calls than 50-kHz calls. Of note, in case of the other playback experiment it is just said that rats "received appetitive and aversive ultrasonic vocalization playback" but it remains unclear whether "44 kHz" calls are seen as appetitive or aversive. Later it says that "rats were presented with two 10-s-long playback sets of either 22-kHz or 44-kHz calls, followed by one 50-kHz modulated call 10-s set and another two playback sets of either 44-kHz or 22-kHz calls not previously heard" (and wonder what data set was included in the figures and how - pooled?). Again, I am worried about carry-over effects here. This does not seem to be an experimental design that allows to compare the response to the three main call types in an unbiased manner. Of note, what exactly is meant by "control rats" in the context of fear conditioning is also not clear to me. One can think of many different controls in a fear conditioning experiment. More concrete information is needed.

    1. Reviewer #2 (Public Review):

      Theta-nested gamma oscillations (TNGO) play an important role in hippocampal memory and cognitive processes and are disrupted in pathology. Deep brain stimulation has been shown to affect memory encoding. To investigate the effect of pulsed CA1 neurostimulation on hippocampal TNGO the authors coupled a physiologically realistic model of the hippocampus comprising EC, DG, CA1, and CA3 subfields with an abstract theta oscillator model of the medial septum (MS). Pathology was modeled as weakened theta input from the MS to EC simulating MS neurodegeneration known to occur in Alzheimer's disease. The authors show that if the input from the MS to EC is strong (the healthy state) the model autonomously generates TNGO in all hippocampal subfields while a single neurostimulation pulse has the effect of resetting the TNGO phase. When the MS input strength is weaker the network is quiescent but the authors find that a single CA1 neurostimulation pulse can switch it into the persistent TNGO state, provided the neurostimulation pulse is applied at the peak of the EC theta. If the MS theta oscillator model is supplemented by an additional phase-reset mechanism a single CA1 neurostimulation pulse applied at the trough of EC theta also produces the same effect. If the MS input to EC is weaker still, only a short burst of TNGO is generated by a single neurostimulation pulse. The authors investigate the physiological origin of this burst and find it results from an interplay of CAN and M currents in the CA1 excitatory cells. In this case, the authors find that TNGO can only be rescued by a theta frequency train of CA1 pulses applied at the peak of the EC theta or again at either the peak or trough if the MS oscillator model is supplemented by the phase-reset mechanism.

      The main strength of this model is its use of a fairly physiologically detailed model of the hippocampus. The cells are single-compartment models but do include multiple ion channels and are spatially arranged in accordance with the hippocampal structure. This allows the understanding of how ion channels (possibly modifiable by pharmacological agents) interact with system-level oscillations and neurostimulation. The model also includes all the main hippocampal subfields. The other strength is its attention to an important topic, which may be relevant for dementia treatment or prevention, which few modeling studies have addressed.

      The work has several weaknesses. First, while investigations of hippocampal neurostimulation are important there are few experimental studies from which one could judge the validity of the model findings. All its findings are therefore predictions. It would be much more convincing to first show the model is able to reproduce some measured empirical neurostimulation effect before proceeding to make predictions. Second, the model is very specific. Or if its behavior is to be considered general it has not been explained why. For example, the model shows bistability between quiescence and TNGO, however what aspect of the model underlies this, be it some particular network structure or particular ion channel, for example, is not addressed. Similarly for the various phase reset behaviors that are found. We may wonder whether a different hippocampal model of TNGO, of which there are many published (for example [1-6]) would show the same effect under neurostimulation. This seems very unlikely and indeed the quiescent state itself shown by this model seems quite artificial. Some indication that particular ion channels, CAN and M are relevant is briefly provided and the work would be much improved by examining this aspect in more detail. In summary, the work would benefit from an intuitive analysis of the basic model ingredients underlying its neurostimulation response properties. Third, while the model is fairly realistic, considerable important factors are not included and in fact, there are much more detailed hippocampal models out there (for example [5,6]). In particular, it includes only excitatory cells and a single type of inhibitory cell. This is particularly important since there are many models and experimental studies where specific cell types, for example, OLM and VIP cells, are strongly implicated in TNGO. Other missing ingredients one may think might have a strong impact on model response to neurostimulation (in particular stimulation trains) include the well-known short-term plasticity between different hippocampal cell types and active dendritic properties. Fourth the MS model seems somewhat unsupported. It is modeled as a set of coupled oscillators that synchronize. However, there is also a phase reset mechanism included. This mechanism is important because it underlies several of the phase reset behaviors shown by the full model. However, it is not derived from experimental phase response curves of septal neurons of which there is no direct measurement. The work would benefit from the use of a more biologically validated MS model.

      [1] Hyafil A, Giraud AL, Fontolan L, Gutkin B. Neural cross-frequency coupling: connecting architectures, mechanisms, and functions. Trends in neurosciences. 2015 Nov 1;38(11):725-40.

      [2] Tort AB, Rotstein HG, Dugladze T, Gloveli T, Kopell NJ. On the formation of gamma-coherent cell assemblies by oriens lacunosum-moleculare interneurons in the hippocampus. Proceedings of the National Academy of Sciences. 2007 Aug 14;104(33):13490-5.

      [3] Neymotin SA, Lazarewicz MT, Sherif M, Contreras D, Finkel LH, Lytton WW. Ketamine disrupts theta modulation of gamma in a computer model of hippocampus. Journal of Neuroscience. 2011 Aug 10;31(32):11733-43.

      [4] Ponzi A, Dura-Bernal S, Migliore M. Theta-gamma phase-amplitude coupling in a hippocampal CA1 microcircuit. PLOS Computational Biology. 2023 Mar 23;19(3):e1010942.

      [5] Bezaire MJ, Raikov I, Burk K, Vyas D, Soltesz I. Interneuronal mechanisms of hippocampal theta oscillations in a full-scale model of the rodent CA1 circuit. Elife. 2016 Dec 23;5:e18566.

      [6] Chatzikalymniou AP, Gumus M, Skinner FK. Linking minimal and detailed models of CA1 microcircuits reveals how theta rhythms emerge and their frequencies controlled. Hippocampus. 2021 Sep;31(9):982-1002.

    1. Author Response

      eLife assessment

      This study assesses homeostatic plasticity mechanisms driven by inhibitory GABAergic synapses in cultured cortical neurons. The authors report that up- or down-regulation of GABAergic synaptic strength, rather than excitatory glutamatergic synaptic strength, is critical for homeostatic regulation of neuronal firing rates. The reviewers noted that the findings are potentially important, but they also raised questions. In particular, the evidence supporting the findings is currently incomplete and demonstration of independent regulation of mEPSCs and mIPSCs is a necessary experiment to support the major claims of the study.

      We appreciate the detailed, thoughtful assessment of our paper by the reviewers and editors and will submit a revised version in the future that addresses the reviewers’ comments as detailed below in response to each concern. We will include a more open discussion of alternative possibilities. Further, we will repeat the optogenetic experiments assessing AMPAergic scaling in our mouse cortical cultures in order to demonstrate independent regulation of mEPSCs and mIPSCs as suggested.

      Reviewer #1 (Public Review):

      In the manuscript titled "GABAergic synaptic scaling is triggered by changes in spiking activity rather than transmitter receptor activation," the authors present an investigation of the role of GABAergic synaptic scaling in the maintenance of spike rates in networks of cultured neurons. Their main findings suggest that GABAergic scaling exhibits features consistent with a key homeostatic mechanism that contributes to the stability of neuronal firing rates. Their data demonstrate that GABAergic scaling is multiplicative and emerges when postsynaptic spike rates are altered. Finally, their data suggest that, in contrast to their prior data on glutamatergic scaling, GABAergic scaling is driven by spike rates. The authors set the paper up as an argument that GABAergic scaling, rather than glutamatergic scaling, serves as the critical homeostatic mechanism for spike rate regulation.

      While the paper is ambitious in its rhetorical scope and certainly presents intriguing findings, there are several serious concerns that need to be addressed to substantiate the interpretations of the data. For example, the CTZ data do not support the interpretations and conclusions drawn by the authors. Summarily, the authors argue that GABAergic scaling is measuring spiking (at the time scale of the homeostatic response, which they suggest is a key feature of a homeostat) yet their data in figure 5B show more convincingly that CTZ does not influence spiking levels - only one out of four time points is marginally significant (also, I suspect that the bootstrapping method mentioned in line 454-459 was conducted as a pairwise comparison of distributions. There is no mention of multiple comparisons corrections, and I have to assume that the significance at 3h would disappear with correction).

      We certainly understand the criticism here (similar to reviewer 2’s third point). In our resubmission we will do a better job discussing these complications, which we now summarize. First, we are presenting our entire dataset to be as transparent as possible. Unlike most synaptic scaling studies (including our own) that apply drugs to alter activity and assess mPSC amplitude at the final time point, here we are actually showing CTZ’s effect on spiking activity within the culture over time. This is critical because it has informed us of the drug’s true effect on spiking, the variability that is associated with these perturbations, and the ability and timing of the cultured network to homeostatically recover initial levels. This was important because it revealed that the drugs do not always influence activity in the way we assume, and this provides greater context to our results. Second, we are showing all of our data, and presenting it using estimation statistics which go beyond the dichotomy of a simple p value yes or no (Ho J, Tumkaya T, Aryal S, Choi H, Claridge-Chang A. 2019. Moving beyond P values: data analysis with estimation graphics. Nat Methods 16: 565-66). Estimation statistics have become a more standard statistical approach in the last 15 years and is the preferred method for the Society for Neuroscience’s eNeuro Journal. This method shows the effect size and the confidence interval of the distribution. For the 3 hr time point in Fig. 5B the CTZ/ethanol vs. ethanol data points exhibit very little overlap and the effect size demonstrates a near doubling of spike frequency, and the confidence interval shows a clear separation from 0. This was a pairwise comparison as we compared values at each time point after the addition of ethanol or ethanol/CTZ. Third, the plots illustrate an upward trend in spike frequency at 1 and 6 hrs, but that there is also clear variability. It is important to note that while these recordings help us to understand effects on spiking across the cultured network, they cannot directly speak to spiking activity in the principal neurons that we target. This complication along with the variability inherent in these cultures could make simple comparisons difficult to interpret. Regardless, we do see some increase in spiking with CTZ and we clearly see increases in mIPSC amplitude, thus providing some support for the idea that spiking could be a critical player in terms of GABAergic scaling, particularly when put in the context of our other findings. However, it is important to recognize that something other than total spike rate may contribute to GABAergic scaling, such as the pattern of spiking that produces a particular calcium transient, and this will be discussed in the resubmission.

      Then, the fact that TTX applied on top of CTZ drives a increase in mIPSC amplitude is interpreted as a conclusive demonstration that GABAergic scaling is sensing spiking. It is inevitable, however, that TTX will also severely reduce AMAP-R activation - a very plausible alternative explanation is that the augmentation of AMPAR activation caused by CTZ is not sufficient to overcome the dramatic impact of TTX. All together, these data do not provide substantial evidence for the conclusion drawn by the authors.

      We understand this point when considering the CTZ/TTX experiments by themselves. However, spiking appears to be a more straightforward trigger when the CTZ/TTX results are coupled with the prevention of GABAergic downscaling by optogenetic restoration of spiking in the presence of AMPAR antagonists. Further, an important point here is that our results with TTX vs. TTX + CTZ are different for GABAergic scaling (no difference) and AMPAergic scaling (CTZ diminished upward scaling) suggesting different triggers for the two forms of scaling. We will make this more clear in our resubmission.

      Specific points:

      • The logic of the basis for the argument is somewhat flawed: A homeostat does not require a multiplicative mechanism, nor does it even need to be synaptic. Membrane excitability is a locus of homeostatic regulation of firing, for example. In addition, synapse-specific modulation can also be homeostatic. The only requirement of the homeostat is that its deployment subserves the stabilization of a biological parameter (e.g., firing rate).

      We agree with the reviewer and should not have suggested that this was a necessary requirement for a spike rate hemostat. What we should have said was that historically this definition has been attributed to AMPAergic scaling, which is thought to be a spike rate homeostat. We will correct this in the resubmission.

      • Line 63 parenthetically references an important, but contradictory study as a brief "however". Given the tone of the writing, it would be more balanced to give this study at least a full sentence of exposition.

      Agreed, we will do this.

      • The authors state (line 11) that expression of a hyperpolarizing conductance did not trigger scaling. More recent work ('Homeostatic synaptic scaling establishes the specificity of an associative memory') does this via expression of DREADDs and finds robust scaling.

      The purpose of citing this study was to argue that the spike rate homeostat hypothesis doesn’t make sense for AMPAergic scaling based on a study that hyperpolarized an individual cell while leaving the rest of the network unaltered and therefore leaving network activity and neurotransmission largely normal. In this case scaling was not triggered, suggesting reduced spike rate within an individual cell was insufficient to trigger scaling. The study that the reviewer refers to hyperpolarizes a majority of cells in the network and therefore will also alter neurotransmission throughout the network, which does not separate the importance of spiking and receptor activation as in the above-mentioned study. We will make this point more clearly in the resubmission.

      • Supplemental figure 1 looks largely linear to me? Out of curiosity, wouldn't you expect the left end to be aberrant because scaling up should theoretically increase the strength of some synapses that would have been previously below threshold for detection?

      We agree that the scaling ratio plot is largely linear. To be clear, the linearity of the ratio plot was interesting but our main point here was that this line had a positive slope meaning ratios (CNQX mPSC amplitudes/control mPSC amplitudes) got bigger for the larger CNQX-treated mPSCs. Alternatively, a multiplicative relationship where mPSCs are all increased by a single factor (e.g. 2X) would be a flat line with 0 slope at the multiplicative value (e.g. 2). In terms of the left side of the plot, we do see values that rise abruptly from 1 - this is partially obstructed by the Y axis in this figure and we will adjust this. This left part of the plot is likely due the CNQX-induced increases in mPSC amplitudes of mini’s that were below our detection threshold of 5pA. Therefore, mini’s that were 4pAs could now be 5pAs after CNQX treatment and these are then divided by the smallest control mPSCs which are 5 pAs (ratio of 1). We will try to do a better job describing this in the resubmission.

      Given that figure 2B also shows warping at the tail ends of similar distributions, how is this to be interpreted?

      The left side of the ratio plot shows evidence consistent with the idea that mIPSCs are dropping into the noise after CNQX treatment (similar to above argument), while most of the distribution suggests mIPSCs are reduced to 50% by CNQX treatment. On the right side of the ratio plot the values appear to mostly increase. We are not sure why this is happening, but it looks like some mIPSCs are not purely multiplicative at 0.5, particularly in TTX. It is also important to point out that this is a relatively small percent of the total population and the biggest mPSCs can vary to a great degree from one cell to the next. We will discuss this in the resubmission.

      • The readability of the figures is poor. Some of them have inconsistent boundary boxes, bizarre axes, text that appears skewed as if the figures were quickly thrown together and stretched to fit.

      We will address these issues in the resubmission.

      • I'm concerned about the optogenetic restoration of activity experiment. Cortical pyramidal neuron mean firing rates are log normally distributed and span multiple orders of magnitude. The stimulation experiments can only address the total firing at a network-level - given than a network level "mean" is meaningless in a lognormal distribution, how are we to think about the effect of this manipulation when it comes to individual neurons homeostatically stabilizing their own activities? In essence, the argument is made at the single-neuron level, but the experiment is conducted with a network-level resolution.

      As described above, we do not have the capacity to know what the actual firing rate of a particular neuron was before and after introducing a drug and so we cannot absolutely say that we have restored the original firing rates of neurons. However, there is reason to believe that this is achieved to some extent. Our optogenetic stimulation is only 50-100 ms long activating a subset of neurons. This is sufficient to provide a synaptic barrage that then triggers a full blown network burst where the majority of spikes occur, but this is after the light is off. In other words, the optogenetic light pulse only initiates what becomes a normal network burst that fortunately allows the individual cells to express their relatively normal (pre-drug) activity pattern. In our previous study we show that this is the case for individual units - the spiking of an individual unit during a burst is similar before and after CNQX/optostim (see Figure 4b and Suppl. Fig 4 in Fong et al. 2015 Nat. Comm.). We are not claiming that we have restored spiking to exactly the pre-drug state, but bring it back toward those levels and we see this is associated with a return of the mIPSC amplitude to near control levels. We will include a description of this in the resubmission.

      • Line 198-99: multiplicativity is not a requirement of a homeostatic mechanism.

      • Line 264-265 - again, neither multiplicativity and synaptic mechanisms are fundamentally any more necessary for a homeostatic locus than anything else that can modulate firing rate in via negative feedback.

      Agreed, see above discussion of homeostat requirement. Will adjust these statements in our resubmission.

      • 277: do you mean AMPAR?

      We were not clear enough here. We actually do mean GABAR. The idea is that CTZ increases network activity and thus increases both AMPAergic and GABAergic transmission. We will clarify this in the resubmission.

      • Example: Figure 1A is frustratingly unreadable. The axes on the raster insets are microscopic, the arrows are strangely large, and it seems unnecessary to fill so much realestate with 4 rasters. Only one is necessary to show the concept of a network burst. The effect of time+CNQX on the frequency of burst is shown in B and C.

      • Example: Figure 2 appears warped and hastily assembled. Statistical indications are shown within and outside of bounding boxes. Axes are not aligned. Labels are not aligned. Font sizes are not equal on equivalent axes.

      We will adjust these issues in the resubmission.

      • The discussion should include mention of the limitations and/or constraints of drawing general conclusions from cell culture.

      We agree and will adjust the discussion. Also, this is why we cited studies that argue GABAergic neurons have a particularly important role in homeostatic regulation of firing following sensory deprivations in vivo.

      • The discussion should include mention of the role of developmental age in the expression of specific mechanisms. It is highly likely that what is studied at ~P14 is specific to early postnatal development.

      We will discuss caveats of cortical cultures at DIV 14-20.

      It is essential to ensure that the data presented in the paper adequately supports the conclusions drawn. A more cautious approach in interpreting the results may lead to a stronger argument and a more robust understanding of the underlying mechanisms at play.

      Agreed.

      Reviewer #2 (Public Review):

      Synaptic scaling has long been proposed as a homeostatic mechanism for the regulation for the activity of individual neurons and networks. The question of whether homeostasis is controlled by neuronal spiking or by the activation of specific receptor populations in individual synapses has remained open. In a previous work, the Wenner group had shown that upscaling of glutamatergic transmission is triggered by direct blockade of glutamate receptors rather than by the concomitant reduction in firing rate (Nat Comm 2015). In this manuscript they investigate the mechanisms regulating scaling of GABA-mediated responses in cortical cell cultures using whole-cell recordings to detect GABAergic currents and multielectrode arrays to monitor global firing activity, and find that spiking plays a fundamental role in scaling.

      Initially, the authors show that chronic blockade (24 h) of glutamatergic transmission by CNQX first reduces spontaneous spiking (at 2 h), but later (24 h) firing grows back towards higher frequencies, suggesting a compensatory mechanism. Then it is shown that either chronic CNQX treatment or TTX cause a reduction in the amplitude of GABAergic mIPSCs. Effects of CNQX on IPSCs are then reverted by replacing spontaneous network firing by chronic optogenetic stimulation of the entire culture, also indicating that GABAergic transmission is homeostatically regulated by global firing. Enhancing glutamatergic transmission with CTZ increases mIPSC amplitude, while addition of TTX in the presence of CTZ causes the opposite effect. Finally, increasing spiking activity using bicuculline also increases mIPSC amplitude, and the authors conclude that spiking activity rather than neurotransmission control homeostatic GABA scaling. The manuscript shows interesting properties in the regulation of global GABAergic transmission and highlight the important role of spiking activity in triggering GABA scaling. However, it is strongly recommended to address some caveats in order to better support the conclusions presented in the manuscript.

      Major points:

      1) The reason why CNQX does not completely eliminate spiking is unclear (Fig. 1). What is the circuit mechanism by which spiking continues, although at lower frequency, in the absence of AMPA-mediated transmission and what the mechanism by which spiking frequency grows back after 24h (still in the absence of AMPA transmission)?

      Is it possible that NMDA-mediated transmission takes over and triggers a different type of network plasticity?

      The bursting in AMPAR blockade is due to the remaining NMDA receptor mediated transmission. We showed this in our previous study in Suppl. Figure 2 and 6 of Fong et al., 2015 Nat. Comm.. Our ability to optically induce normal looking bursts of spikes was also dependent NMDAR activation. Further, in Dr Fong’s PhD dissertation it was shown that the bursting activity was abolished when AMPA and NMDA receptors were both blocked. There are likely many factors that contribute to the recovery of activity, and certainly one of them is likely to be the weakening of inhibitory GABAergic currents. These points will be discussed in the resubmission.

      2) A possible activation of NMDARs should be considered. One would think that experiments involving chronic glutamatergic blockade could have been conducted in the presence of NMDAR blockers. Why this was not the case?

      Unfortunately, it was not possible to optogenetically restore normal bursting in the presence of NMDAR blockade (even when AMPAergic transmission was intact), as NMDARs appeared to be critical for the optical restoration of the normal duration of the burst (see Suppl. Figure 6 Fong et al., 2015 Nat. Comm). The reviewer raises an excellent point about a possible NMDAR contribution to altered synaptic strength, however. It is likely that NMDAR signaling is reduced in the presence of CNQX since burst frequency was reduced along with AMPAR-mediated depolarizations. We cannot rule out the possibility that NMDAR signaling could contribute to the alterations in GABAergic mIPSCs and will discuss this in the resubmission. However, previous work suggests that 24/48 hour block NMDARs (APV) did not trigger AMPAergic scaling in cortical or hippocampal cultures (see Figure 1 Turrigiano et al., 1998 Nature and Suppl. Figure 4 Sutton et al., 2006 Cell), moreover, our previous study showed that restoring NMDAergic transmission optogentically, at least to some point, had no influence on AMPAergic scaling (Fong et al., 2015, Nat. Comm.). Regardless, we cannot rule out a role for NMDAergic transmission in GABAergic scaling and this discussion will be included in the resubmission.

      Also, experiments with global ChR2 stimulation with coincident pre and postsynaptic firing might also activate NMDARs and result in additional effects that should be taken into consideration for the global scaling mechanism.

      To be clear, our optical stimulation was turned off before the vast majority of spiking that occurred in the bursts, which played out in a relatively natural manner (see lower panel of Figure 3B optogenetic stimulation – short duration only at onset of burst – we will make this clearer in resubmission). Therefore, we were unlikely to trigger significant synchronous activation that does not normally occur in network bursts.

      3) Cultures exposed to CTZ to enhance AMPA receptors generated variable results (Fig. 5), somewhat increasing spiking activity in a non-significant manner but, at the same time, strengthening mIPSC amplitude. This result seems to suggest that spiking might be involved in GABAergic scaling, but it does not seem to prove it.Then, addition of TTX that blocked spiking reduced mIPSC amplitude. It was concluded here that the ability of CTZ to enhance GABAergic currents was primarily due to spiking, rather than the increase in AMPA-mediated currents. However, in addition to blocking action potentials, TTX would also prevent activation of AMPARs in the presence of CTZ due to the lack of glutamatergic release. Therefore, under these conditions, an effect of glutamatergic activation on GABAergic scaling cannot be ruled out.

      These concerns were very similar to reviewer 1’s first comments. We will address these issues in the resubmission, but to briefly repeat our responses: We are going a step beyond most scaling studies by assessing MEA-wide firing rate, but this still provides an incomplete picture of the particular cells that we target for patch recordings in terms of their firing before and after a drug. Further, we see considerable variability in effect on firing rate from culture to culture, which we will better recognize in the resubmission. Finally, While the CTZ results are not conclusive, taken together with the optogenetic results we think our results are most consistent with idea that GABAergic scaling is a strong candidate as a spike rate homeostat.

      4) The sample size is not mentioned in any figure. How many cells/culture dishes were used in each condition?

      The individual dots represent either individual cells for mIPSC amplitude or individual cultures in MEA experiments. Number of cultures for figures were: Figure 2 – con = 10, TTX = 3, CNQX = 6, Figure 4 – CNQX = 4, con = 10, CNQX/photostim = 6, Figure 5 – ethanol = 3, CTZ = 3, CTZ + TTX =3, Figure 6 – con = 10, bicuculline = 4. We will include the number of cultures for mIPSC amplitude experiments in the figure legends upon resubmission.

      5) Cortical cultures may typically contain about 5-10% GABAergic interneurons and 90-95 % pyramidal cells. One would think that scaling mechanisms occurring in pyramidal cells and interneurons could be distinct, with different impact on the network. Although for whole-cell recordings the authors selected pyramidal looking cells, which might bias recordings towards excitatory neurons, naked eye selection of recording cells is quite difficult in primary cultures. Some of the variability in mIPSC amplitude values (Fig. 2A for example) might be attributed to the cell type? One could use cultures where interneurons are fluorescently labeled to obtain an accurate representation. The issue of the possible differential effects of scaling in pyramidal cells vs. interneurons and the consequences in the network should be discussed.

      We will include this discussion in the resubmission. Briefly, we chose large cells, which will be predominantly glutamatergic neurons as suggested by the reviewer. Ultimately, even among glutamatergic principal cells there may be variability in the response to drug application. All of these issues could contribute to variability and we will expand our description of the variability in our results, including that based on cellular heterogeneity.

      Reviewer #3 (Public Review):

      This paper concerns whether scaling (or homeostatic synaptic plasticity; HSP) occurs similarly at GABA and Glu synapses and comes to the surprising conclusion that these are regulated separately. This is surprising because these were thought to be co-regulated during HSP and in fact, the major mechanisms thought to underlie downscaling (TTX or CNQX driven), retinoic acid and TNF, have been shown to regulate both GABARs and AMPARs directly. (As a side note, it is unclear that the manipulations used in Josesph and Turrigiano represent HSP, and so might not be relevant). Thus the main result, that GABA HSP is dissociable from Glu HSP, is novel and exciting. This suggests either different mechanisms underlie the two processes, or that under certain conditions, another mechanism is engaged that scales one type of synapse and not the other.

      However, strong claims require strong evidence, and the results presented here only address GABA HSP, relying on previous work from this lab on Glu HSP (Fong, et al., 2015). But the previous experiments were done in rat cultures, while these experiments are done in mice and at somewhat different ages (DIV). Even identical culture systems can drift over time (possibly due to changes in the components of B27 or other media and supplements). Therefore it is necessary to demonstrate in the same system the dissociation. To be convincing, they need to show the mEPSCs for Fig 4, clearly showing the dissociation. Doing the same for Fig 5 would be great, but I think Fig 4 is the key.

      We understand the concern of the reviewer as we do see significant variability within our cultures and they were plated in different places, by different people, in different species (rat vs mouse). Therefore, in the resubmission to strengthen the conclusions we will repeat our optogenetic studies restoring activity in the presence of AMPAergic blockade in our mouse cortical cultures and measuring AMPA mEPSCs to assess scaling.

      The paper also suggests that only receptor function or spiking could control HSP, and therefore if it is not receptor function then it must be spiking. This seems like a false dichotomy; there are of course other options. Details in the data may suggest that spiking is not the (or the only) homeostat, as TTX and CNQX causes identical changes in mIPSC amplitude but have different effects on spiking. Further, in Fig 5, CTZ had a minimal effect on spiking but a large effect on mIPSCs. Similar issues appear in Fig 6, where the induction of increased spiking is highly variable, with many cells showing control levels or lower spiking rates. Yet the synaptic changes are robust, across all cells. Overall, this is not persuasive that spiking is necessarily the homeostat for GABA synapses.

      Together our results argue against AMPAR or GABAR activation as a trigger for GABAergic scaling and that this is different than our results for AMPAergic scaling. These points alone are important to recognize. While changes in spiking do not perfectly follow the changes in GABAergic scaling they do always trend in the right direction. As mentioned above, total spiking activity is only one measure of spiking. It is possible that these drugs alter the pattern of spiking that translates into an altered calcium transient that is important for triggering the plasticity. Again, it is important to note that we are going a step beyond most homeostatic plasticity studies that add a drug and simply assume it is having an effect on spiking (e.g. CNQX was initially thought to completely abolish spiking, but clearly does not). Based on the variability that we observe and the nature of our MEA recordings we cannot precisely determine how the total activity or pattern of activity changes with drug application in the specific cells that we target for whole cell recordings. However, we believe our results are more consistent with our proposal that GABAergic scaling is a strong candidate as a spike rate homeostat. Regardless, in the resubmission we will include a broader discussion about these possibilities, and the reality that there could be multiple homeostatic mechanisms that act to recover spiking activity.

      The paper also suggests that the timing of the GABA changes coincides with the spiking changes, but while they have the time course of the spiking changes and recovery, they only have the 24h time point for synaptic changes. It is impossible to conclude how the time courses align without more data.

      We can only say that by the 24 hour CNQX time point, when overall spiking is recovered, that GABAergic scaling has already occurred. We will state this more clearly in the resubmission.

    2. Reviewer #1 (Public Review):

      In the manuscript titled "GABAergic synaptic scaling is triggered by changes in spiking activity rather than transmitter receptor activation," the authors present an investigation of the role of GABAergic synaptic scaling in the maintenance of spike rates in networks of cultured neurons. Their main findings suggest that GABAergic scaling exhibits features consistent with a key homeostatic mechanism that contributes to the stability of neuronal firing rates. Their data demonstrate that GABAergic scaling is multiplicative and emerges when postsynaptic spike rates are altered. Finally, their data suggest that, in contrast to their prior data on glutamatergic scaling, GABAergic scaling is driven by spike rates. The authors set the paper up as an argument that GABAergic scaling, rather than glutamatergic scaling, serves as the critical homeostatic mechanism for spike rate regulation.

      While the paper is ambitious in its rhetorical scope and certainly presents intriguing findings, there are several serious concerns that need to be addressed to substantiate the interpretations of the data. For example, the CTZ data do not support the interpretations and conclusions drawn by the authors. Summarily, the authors argue that GABAergic scaling is measuring spiking (at the time scale of the homeostatic response, which they suggest is a key feature of a homeostat) yet their data in figure 5B show more convincingly that CTZ does not influence spiking levels - only one out of four time points is marginally significant (also, I suspect that the bootstrapping method mentioned in line 454-459 was conducted as a pairwise comparison of distributions. There is no mention of multiple comparisons corrections, and I have to assume that the significance at 3h would disappear with correction). Then, the fact that TTX applied on top of CTZ drives a increase in mIPSC amplitude is interpreted as a conclusive demonstration that GABAergic scaling is sensing spiking. It is inevitable, however, that TTX will also severely reduce AMAP-R activation - a very plausible alternative explanation is that the augmentation of AMPAR activation caused by CTZ is not sufficient to overcome the dramatic impact of TTX. All together, these data do not provide substantial evidence for the conclusion drawn by the authors.

      Specific points:

      - The logic of the basis for the argument is somewhat flawed: A homeostat does not require a multiplicative mechanism, nor does it even need to be synaptic. Membrane excitability is a locus of homeostatic regulation of firing, for example. In addition, synapse-specific modulation can also be homeostatic. The only requirement of the homeostat is that its deployment subserves the stabilization of a biological parameter (e.g., firing rate).<br /> - Line 63 parenthetically references an important, but contradictory study as a brief "however". Given the tone of the writing, it would be more balanced to give this study at least a full sentence of exposition.<br /> - The authors state (line 11) that expression of a hyperpolarizing conductance did not trigger scaling. More recent work ('Homeostatic synaptic scaling establishes the specificity of an associative memory') does this via expression of DREADDs and finds robust scaling.<br /> - Supplemental figure 1 looks largely linear to me? Out of curiosity, wouldn't you expect the left end to be aberrant because scaling up should theoretically increase the strength of some synapses that would have been previously below threshold for detection? Given that figure 2B also shows warping at the tail ends of similar distributions, how is this to be interpreted?<br /> - The readability of the figures is poor. Some of them have inconsistent boundary boxes, bizarre axes, text that appears skewed as if the figures were quickly thrown together and stretched to fit.<br /> - I'm concerned about the optogenetic restoration of activity experiment. Cortical pyramidal neuron mean firing rates are log normally distributed and span multiple orders of magnitude. The stimulation experiments can only address the total firing at a network-level - given than a network level "mean" is meaningless in a lognormal distribution, how are we to think about the effect of this manipulation when it comes to individual neurons homeostatically stabilizing their own activities? In essence, the argument is made at the single-neuron level, but the experiment is conducted with a network-level resolution.<br /> - Line 198-99: multiplicativity is not a requirement of a homeostatic mechanism.<br /> - Line 264-265 - again, neither multiplicativity and synaptic mechanisms are fundamentally any more necessary for a homeostatic locus than anything else that can modulate firing rate in via negative feedback.<br /> - 277: do you mean AMPAR?<br /> - Example: Figure 1A is frustratingly unreadable. The axes on the raster insets are microscopic, the arrows are strangely large, and it seems unnecessary to fill so much realestate with 4 rasters. Only one is necessary to show the concept of a network burst. The effect of time+CNQX on the frequency of burst is shown in B and C.<br /> - Example: Figure 2 appears warped and hastily assembled. Statistical indications are shown within and outside of bounding boxes. Axes are not aligned. Labels are not aligned. Font sizes are not equal on equivalent axes.<br /> - The discussion should include mention of the limitations and/or constraints of drawing general conclusions from cell culture.<br /> - The discussion should include mention of the role of developmental age in the expression of specific mechanisms. It is highly likely that what is studied at ~P14 is specific to early postnatal development.

      It is essential to ensure that the data presented in the paper adequately supports the conclusions drawn. A more cautious approach in interpreting the results may lead to a stronger argument and a more robust understanding of the underlying mechanisms at play.

    1. Reviewer #1 (Public Review):

      In this paper, Scholz and colleagues introduce a new paradigm aimed to bridge the gap between two domains that rely on hierarchical processing: language and memory. They find that, generally in line with their hypotheses, hierarchical processing is associated with activation in hippocampus (especially anterior), medial prefrontal cortex (mPFC), posterior superior temporal sulcus (pSTS), and inferior frontal gyrus (IFG). They also report that these effects in IFG are particularly strong late in the task, once participants have had a lot of experience and processing is presumably more automatic.

      This work has many strengths. The goal to bridge these literatures by developing a new task is commendable. I appreciate also that the authors separately validated their new task behaviorally by comparing it to another accepted as tapping hierarchical processing. I also liked that the authors were transparent about their hypotheses, and certain analyses like the grid coding one that was planned but did not work out. I do however have a number of concerns about the interpretations of the findings, such as whether some patterns are ambiguous as to the true underlying effects. I also have a number of clarification questions. All concerns are described below.

      1. Broadly, I would like to see the authors provide more information and logic on why hierarchical processing should be associated with a big reduction in univariate activation between P1 and P2-why would this signify item in contexts binding? How does this relate to existing work using other methods (e.g., like animal studies, which seem to make predictions more about representational structures)?

      2. There are many differences between what kind of information participants are processing between Position 1 and Position 2 for the HIER but not ITER conditions, and these may not be related to the hierarchical structure specifically. Related to but I think distinct from some of the limitations mentioned in the Discussion is the fact that in the HIER condition, what is happening cognitively between Position 1 and Position 2 items is more distinct (attending to color for position 1, and shape for position 2), whereas the two positions are equivalent in the ITER condition. This is a bit different from the authors' intended manipulation of hierarchy, because it involves a specific dimension. A stronger design might have been to flip the dimensions with respect to position specifically, to make shape sometimes important for position 1, and color for position 2 (perhaps by counterbalancing across subjects, so half would see the current P1=color and P2=shape rules, and the other half P1=shape and P2=color rules). Another important difference between color and shape is that while color is a simple binary distinction that participants can make based on their preexisting knowledge of red versus green, and to which they can assign a verbal label; whereas, the shape distinction was something novel they acquired during the experiment, has no real-world validity or meaning, and would presumably rely more on visuospatial processing. The shape dimension was also much more variable, I believe. I should say that I do find comfort in a few things - (1) that behavior on this task is correlated with another one that also indexes hierarchy processing, and (2) that the results show regional specificity in a pattern at least not easily explained by this distinction. However, I do think future work will be needed to ask whether it is hierarchy processing per se or rather something to do with the particular cognitive states engaged during each phase in this particular task that is eliciting activation in this set of regions. It would strengthen the paper to discuss this issue directly so readers are alerted to the caveat.

      3. I did not understand what data went into creating the schematic in Figure 2E. First, I think this depiction of a gradient might be easily misinterpreted because it seems to imply that the authors have a higher resolution analysis than they actually do. I believe the data were just analyzed in three subregions of hippocampus - head, body, and tail. Variability within each subregion (as seems to be implied by certain parts of a region being more grey and others more red/orange), is not something that could be assessed in this analysis. For example, why does the medial part of the head seem to be more "unspecific" whereas lateral regions look more HIER Pos1 specific? This type of depiction would only make sense in my mind if the authors had performed something like a voxelwise analysis to determine where specifically the interaction "peaks." I would recommend this visualization be cut or significantly changed to do away with the gradient.

      4. I believe the authors have not reported enough information for us to know that hippocampus involvement indeed does not change with experience. It is interesting that hippocampus in the task x experience ROI analysis shows, if anything, bigger differentiation between the two tasks (numerically) for the late trials. This seems to go against the authors' hypothesis, and a lot of existing data, that hippocampus is preferentially involved in early (vs. late) learning. Given that the key signature in this region, though, is that it differentiates between position 1 and position 2 in HIER but not ITER, and doesn't show a big difference in magnitude across the two tasks, it makes me wonder whether the task x experience interaction collapsing across the two positions makes sense for this region. Did the authors consider a similar task x experience interaction within hippocampus, but additionally considering position? I think there are multiple ways to look at this question (e.g., either looking for a task x experience x position interaction, a task x experience within position 1, a task x position interaction separately in early vs. late portions of the task, or even a position x experience interaction only within the HIER task), and I'm sure the authors would be in a better place to decide on a specific path forward. The same logic might go for mPFC, which shows an interaction but no main effect of task. This relates to claims in the discussion as well, such as that "hippocampus was equally active in early and late trials," but given this analysis is collapsing across the dimension hippocampus (and mPFC) seem to be sensitive to (position), it seems like this could be masking an underlying effect in which hippocampus/mPFC might still be differentially involved early vs. late (i.e., they might show the task x position interaction preferentially during some task phases).

      5. For the IFG regions, the task x experience interaction seems to be driven mainly by change (decrease in activation) for the ITER, rather than change in the HIER. The authors are at times careful to talk about this as "sustained" activity in IFG, which I appreciated, but other times talk about a "relative increase." I am not sure how I feel about that. I see the compelling evidence that there are task differences by experience, and that there is reduction for ITER that is interestingly not present for HIER, but I think I am still feeling uncomfortable with the term "increase" or even "relative increase" for HIER. For example, couldn't it simply be that the ITER task is requiring less processing with experience, whereas the HIER does not (perhaps because it requires more processing to begin with)? i.e., we do not know whether the reduction for ITER is simply a neural signal thing (i.e., activations diminish over time/experience) or a cognitive thing, specific to the ITER task. I think the authors are wanting to interpret the reductions as the former, but perhaps it would be more powerful to demonstrate if there was a baseline task that also showed reductions but for which not much would be expected in the way of cognitive change. Can the authors provide more justification for their choice of terminology (through either more logic or analyses), or if not, simply talk about it as sustained activity for HIER-which is especially interesting in the face of reductions for the ITER task?

      6. Please define what is meant by the term "automaticity" in the introduction. A clearer definition of the concept would make the paper generally easier to follow, and it would also help foreshadow the hypotheses about mPFC activity in the introduction. To this end, it could be useful to elaborate on how learning takes place in this task, how it could foster increasing automaticity, and how automaticity maps onto behaviour (e.g., is it RT decrease alone, which happens for both conditions in this task?) the brain regions discussed.

      7. There was no association between brain and behavior, which the authors interpret as a positive (as therefore task difficulty differences could not explain the effects). However in light of these null findings, it is on the flip side hard to know whether this neural engagement carries any behavioral significance. It seems to me as though the authors' framework makes predictions about brain-behavior correlations that were not tested in the manuscript. For example, I believe the authors asked whether behavior overall was correlated with activation. However, wouldn't the automaticity in IFG explanation for example predict that more engagement or an increase in engagement from early to late should be associated with e.g., faster RTs-not necessarily a relationship overall?

      8. On p. 8, it is stated that "In the hippocampus, this effect is driven by higher betas for the presentation of the first object (H1 > I1) and lower betas for the second object (H2 < I2) when comparing across tasks." Can the authors confirm whether the pairwise comparisons following up on the interaction here are significant, or rather if they are referring to a numerical difference in the betas? It looked like the same (numerically) would be true for mPFC; is there a reason why the same information is not included for the mPFC ROI? Also, might the authors provide more speculation as to why one might see both enhanced and reduced activation for P1 and P2, respectively?

      9. I was expecting some discussion of how hippocampus does not seem to show preferential involvement early, given that its potential role being restricted to early in learning (i.e., during acquisition only) was one of the primary motivators for using this task. As noted in my above comment (#4), I am not quite sure that I think there is evidence that the hippocampal role remains constant over this task, given the analyses provided (i.e., that they did not look at the position effect for early vs. late). However upon further analysis if it does seem to be more stable, and/or if it even increases over experience, the authors might want to talk about that in the Discussion.

      10. The fact that the hierarchies in this paradigm unfolded over time makes them distinct on some level from the hierarchies present in the VRT task that was used to validate the HIER task's hierarchical processing demands. For example, there might be additional computations required to processes these temporally ordered structures, support online maintenance, and so on. It may be worth considering this aspect of the task, and whether/to what extent the results could be related to it, in the paper.

      11. I also have many methodological and analytic clarification questions, which I detail in the recommendations for authors.

    1. It may already be clear that ethical conflict in psychological research is unavoidable. Because there is little, if any, psychological research that is completely risk free, there will almost always be conflict between risks and benefits. Research that is beneficial to one group (e.g., the scientific community) can be harmful to another (e.g., the research participants), creating especially difficult trade-offs. We have also seen that being completely truthful with research participants can make it difficult or impossible to conduct scientifically valid studies on important questions.   Of course, many ethical conflicts are fairly easy to resolve. Nearly everyone would agree that deceiving research participants and then subjecting them to physical harm would not be justified by filling a small gap in the research literature. But many ethical conflicts are not easy to resolve, and competent and well-meaning researchers can disagree about how to resolve them. Consider, for example, an actual study on “personal space” conducted in a public men’s room (Middlemist, Knowles, & Matter, 1976). The researchers secretly observed their participants to see whether it took them longer to begin urinating when there was another man (a confederate of the researchers) at a nearby urinal. While some critics found this to be an unjustified assault on human dignity (Koocher, 1977), the researchers had carefully considered the ethical conflicts, resolved them as best they could, and concluded that the benefits of the research outweighed the risks (Middlemist, Knowles, & Matter, 1977). For example, they had interviewed some preliminary participants and found that none of them was bothered by the fact that they had been observed.   The point here is that although it may not be possible to eliminate ethical conflict completely, it is possible to deal with it in responsible and constructive ways. In general, this means thoroughly and carefully thinking through the ethical issues that are raised, minimizing the risks, and weighing the risks against the benefits. It also means being able to explain one’s ethical decisions to others, seeking feedback on them, and ultimately taking responsibility for them.

      It would be beneficial to speak a bit more of the achievements from an unethical study. For example, we do tests on rats and that's not completely ethical, right? So are there any studies that weren't ethical but we learned a lot from that we could add to the conversation. Was there a benefit to deceiving participants? I think an example of this could make readers analyze is there's a reason some fight ethics boards to do studies that may not be entirely ethical. You could also add that most of the time there is a way to get rid of an unethical part of a study, for example, the study by Lahaut, was there a need to visit people's houses multiple times, or could they have just offered an incentive?

    1. Background Reproducibility of data analysis workflow is a key issue in the field of bioinformatics. Recent computing technologies, such as virtualization, have made it possible to reproduce workflow execution with ease. However, the reproducibility of results is not well discussed; that is, there is no standard way to verify whether the biological interpretation of reproduced results are the same. Therefore, it still remains a challenge to automatically evaluate the reproducibility of results.Results We propose a new metric, a reproducibility scale of workflow execution results, to evaluate the reproducibility of results. This metric is based on the idea of evaluating the reproducibility of results using biological feature values (e.g., number of reads, mapping rate, and variant frequency) representing their biological interpretation. We also implemented a prototype system that automatically evaluates the reproducibility of results using the proposed metric. To demonstrate our approach, we conducted an experiment using workflows used by researchers in real research projects and the use cases that are frequently encountered in the field of bioinformatics.Conclusions Our approach enables automatic evaluation of the reproducibility of results using a fine-grained scale. By introducing our approach, it is possible to evolve from a binary view of whether the results are superficially identical or not to a more graduated view. We believe that our approach will contribute to more informed discussion on reproducibility in bioinformatics.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad031 ) , which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      **Reviewer Stian Soiland-Reyes ** Hi, I am Stian Soiland-Reyes https://orcid.org/0000-0001-9842-9718 and have pledged the Open Peer Review Oath https://doi.org/10.12688/f1000research.5686.2: *

      Principle 1: I will sign my name to my review Principle 2: I will review with integrity Principle 3: I will treat the review as a discourse with you; in particular, I will provide constructive criticism Principle 4: I will be an ambassador for the practice of open science. This review is licensed under a Creative Commons Attribution 4.0 International License

      . --- This article presents a method for comparing reproducibility of computational workflow runs captured as RO-Crates, by calculating a set of genomics metrics ("features") and adding these to the crate's metadata. Overall I find this a valuable contribution and worthy of publication with GigaScience, primarily as a way for users of workflow systems CWL, Nextflow, Cromwell or Snakemake to ensure reproducibility, but also for workflow engine developers who may want to build on this methodology to improve their provenance support. In general the method proposed is sound, however it does have some limitations and inherent assumptions that are not highlighted sufficiently in the current manuscript, particularly concerning the selection of features and the reproducibility of the metrics calculation itself. I have detailed this with some points below that I would like the authors to clarify in a minor revision.

      --- Note - the below questions from GigaScience Reviewer Guidelines mainly relate to data, but I also here interpret them for the software described.

      Q1: Is the rationale for collecting and analyzing the data well defined? The author's workflow executions https://doi.org/10.5281/zenodo.7098337 are based on three 3rdparty bioinformatics workflows. Although they are not particularly "large-scale", they are representative best-practice pipelines in this field (data sizes from 200 MB to 6 GB) and also fairly representative for scalable workflow systems (Nextflow, CWL and WDL) used by bioinformaticians.

      Q2: Is it clear how data was collected and curated? It is not explicit in the text why these particular workflows were selected, beyond being realistic pipelines used in research. I would suggest something like "these workflows have been selected as fairly representative and mature current best-practice for sequencing pipelines, implemented in different but typical workflow systems, and have similar set of genomics features that we can assess for provenance comparison." The workflows have each been cited, but I would appreciate some consistency so that each workflow is cited both by its closest journal article and as their original download sources (e.g. GitHub).

      Q3: Is it clear - and was a statement provided - on how data and analyses tools used in the study can be accessed? Yes, full availability statements have been provided both for data and software, archived on Zenodo for longevity.

      Q4: Are accession numbers given or links provided for data that, as a standard, should be submitted to a community approved public repository? Yes, the tools have been added to https://bio.tools/ -- I don't think it's necessary to further register the data outputs with accession numbers. RRIDs for tools can be considered at a later stage, perhaps only for Sapporo.

      Q5: Is the data and software available in the public domain under a Creative Commons license? Yes, the software and dataset is open source under Apache License, version 2.0. The dataset https://doi.org/10.5281/zenodo.7098337 embeds existing workflows and data, however this is OK as included resources such as the rnaseq Nextflow workflow have compatible licenses (MIT) or are also Apache-licensed. The manuscript has software citations for two of the workflows, but this is missing for the CWL workflow, which is only cited by manuscript (33) (also missing DOI). It is unclear if any of the workflows are registered in https://workflowhub.eu/ but that should primarily be done by their upstream authors. The RO-Crates in https://doi.org/10.5281/zenodo.7098337 don't include any licensing and attribution for the embedded workflows, and its metadata file is misleadingly declaring the crate license as CC0 public domain. While CC0 is appropriate for examples and metadata file itself, the embedded MIT/Apache workflows from third parties can't legally be relicensed in this way and should have their original licenses declared. See https://www.researchobject.org/ro-crate/1.1/contextualentities.html#licensing-access-control-and-copyright I understand these RO-Crates are generated automatically by Sapporo, which does not directly understand licensing, and for documenting the test runs with Sapporo, I think these should not be modified post-execution. Pending further license support by Sapporo, perhaps a manual outer RO-Crate that aggregate these (e.g. adding a direct top-level ro-crate-metadata.json to the Zenodo entry) can provide more correct metadata as well as workflow citations. The authors could add to Discussion some consideration on (lack of) propagation of such metadata for auto-generated crates as part of workflow run provenance. For instance, if a workflow run was initiated from a Workflow Crate https://w3id.org/workflowhub/workflow-ro-crate/ at WorkflowHub, its license, attributions and descriptions could be carried forward to the final Workflow Run Crate provenance together with the Sapporo-calculated features.

      Q6: Are the data sound and well controlled? Yes, the data is sound. The testing on Mac gives null-results, but the authors explain the workflows failed to execute there due to archicectural differences, which is flagged as a valid concern for reproducibility. It may be worth further investigating if this is due to misconfiguration on that particular test machine in which case these columns should be removed.

      Q7: Is the interpretation (Analysis and Discussion) well balanced and supported by the data? The authors' discussion have some implicit assumptions that should be made more clear, together with implications: The Tonkaz tool assumes the workflow execution has already extracted the features and added them to the RO-Crate This assumes the right features have been correctly extracted by each execution Feature extraction also depend on bioinformatics tools that are subject to change/updates Newer versions of Sapporo-service, and in particular any non-Sapporo executors also making Workflow run Crates, may have a different feature selection Being able to fairly compare two workflow runs therefore depends on careful control of the Sapporo executor versions so that they have consistent feature selection This means the reproducibility metrics proposed has a potential reproducibility challenge itself This is not to say that the approach is bad, as the feature extraction is using predictable measures such as counting sequences, rather than heuristics. This means Future Work should point out the need for guidelines on what kind of features should be selected, to ensure they are consistent and reproducible. The set of features also depend on the type of data and class of analysis. As a minimum, the RO-Crate should therefore include provenance of that feature extraction, noting the Sapporo version, and ideally the version of the tools used for that. The authors may want to consider if feature extraction should be a separate workflow (e.g. in CWL), that itself can be subject to the same reproducibility preservation measures, and therefore also can be performed post-execution as part of Tonkaz' comparison or as a curation activity when storing Workflow Run Crates.

      Q8: Are the methods appropriate, well described, and include sufficient details and supporting information to allow others to evaluate and replicate the work? Yes, it was very easy to replicate the Tonkaz analysis of the workflow run crate that is already provided, as it is provided also as a Docker container. The Docker container is provided as part of GitHub releases, and so is not at risk of Docker Hub's automatic deletion. I have not tried installing my own Sapporo service to re-execute the workflow, but detailed installation and run details are provided in the README of both Tonkaz https://github.com/sapporowes/tonkaz#readme and sapporo-service https://github.com/sapporowes/sapporo/blob/main/docs/GettingStarted.md

      Q9: What are the strengths and weaknesses of the methods? The method provided is strong compared to naive checksum-based comparison of workflow outputs, which has been pointed out as a challenge by previous work. The advantage of the feature extraction is that the statistics can be compared directly and any disreprancies can be displayed to the user at a digestible high-level. The disadvantage is that this depends wholy on the selection of features, which must be done carefully to cover the purpose of the particular workflow and its type of data. For instance, a workflow that generates diagrams of sequence alignments could not be sufficiently tested in the suggested approach, as analyzing the diagram for correctness would require tools that may not even exist. Perhaps feature extraction should be a part of the workflow itself, so it can self-determine what is important for its analysis? The current approach also is quite sensitive to output data filenames, so changes in filename would mean features are not compared, even where such files are equivalent. This should be made more explicit in the manuscript, for instance workflows should ensure they don't include timestamps or random identifiers in their filenames. Further work could have a deeper understanding of the workflow structure to compare outputs based on their corresponding FormalParameter in the RO-Crate.

      Q10: Have the authors followed best-practices in reporting standards? Yes, the details provided are at a sufficient detail level, and the authors have re-used the RO-Crate data packaging. The RO-Crates created by Sapporo-service adds several terms for the metrics, which are declared on the @context according to RO-Crate specs https://www.researchobject.org/rocrate/1.1/appendix/jsonld.html#extending-ro-crate However the terms point to GitHub "raw" pages, which are not particularly stable, and may change depending on sapporo versions and GitHub's repository behaviour. I recommend changing the ad-hoc terms to PIDs such as a namespace under https://w3id.org/ or https://purl.org/ so that these terms can be stable semantic artefacts, e.g. submitting them to https://github.com/ResearchObject/ro-terms to register https://w3id.org/ro/terms/sapporo#WorkflowAttachment that can be used instead of https://raw.githubusercontent.com/sapporo-wes/sapporo-service/main/sapporo/roterms.csv#WorkflowAttachment or alternatively https://w3id.org/sapporo#WorkflowAttachment could be set up to redirect to the ro-terms.csv on GitHub. (discussed with the authors at ELIXIR Biohackathon) In doing so you should separate into two namespaces, the general Sapporo terms like "sha512", and the particular genomics feature sets including "totalReads" (e.g. https://w3id.org/datafeatures/genomics#WorkflowAttachment) as the second are a) Not sapporo-specific b) domainspecific. RO-Crate is developing Workflow Run profiles https://www.researchobject.org/workflow-runcrate/profiles/, although these have not been released at time of my review they are now stable, so the authors may want to check https://www.researchobject.org/workflow-runcrate/profiles/workflow_run_crate to ensure "FormalParameter" are declared correctly in the generated RO-Crate as separate entities, linked from the "File" using "exampleOfWork".

      Q11: Can the writing, organization, tables and figures be improved? The language and readability of this article is generally very good. Light copy-editing may improve some of the sentences, e.g. reducing the use of "Thus" phrases.

      Q12: When revisions are requested. See suggestions from above for minor revisions: Make explicit why these 3 workflows where selected (see Q2) Make pipeline software citations consistent in manuscript (see Q2, Q5) Avoid declaring CC0 within generated RO-Crate -- move this to only apply to the ro-cratemetadata.json Add an outer RO-Crate metadata file to Zenodo deposit to carry the correct licenses and pipeline licenses for each of rnaseq_1st.zip, trimming.zip etc. Improve discussion to better reflect limitations of the features and its own reproducibility issues (see Q7, Q9) Consider improvements to the RO-Crate context (see Q10) - this may just be noted as Future Work in the manuscript rather than regenerating the crates In addition: p2: Add citation for claim on file checksums different depending on software versions etc., for instance https://doi.org/10.1145/3186266 p3. "We converted Sapporo's provenance into RO-Crate" -- re-cite (20) as this is the paragraph explaining what it is. p10. Citations 7, 8 are missing authors p10. Citation 15 is now published, replace with https://doi.org/10.1145/3486897 p0. Citations 28, 33 is missing DOI

      Q13: Are there any ethical or competing interests issues you would like to raise? No, the third-party pipelines selected for reproducibility testing are already published and are here represented fairly, and only used as executable methods (as intended by their original authors), which I would say do not need ethical approval.

    1. Background Integration of data from multiple domains can greatly enhance the quality and applicability of knowledge generated in analysis workflows. However, working with health data is challenging, requiring careful preparation in order to support meaningful interpretation and robust results. Ontologies encapsulate relationships between variables that can enrich the semantic content of health datasets to enhance interpretability and inform downstream analyses.Findings We developed an R package for electronic Health Data preparation ‘eHDPrep’, demonstrated upon a multi-modal colorectal cancer dataset (n=661 patients, n=155 variables; Colo-661). eHDPrep offers user-friendly methods for quality control, including internal consistency checking and redundancy removal with information-theoretic variable merging. Semantic enrichment functionality is provided, enabling generation of new informative ‘meta-variables’ according to ontological common ancestry between variables, demonstrated with SNOMED CT and the Gene Ontology in the current study. eHDPrep also facilitates numerical encoding, variable extraction from free-text, completeness analysis and user review of modifications to the dataset.Conclusion eHDPrep provides effective tools to assess and enhance data quality, laying the foundation for robust performance and interpretability in downstream analyses. Application to a multi-modal colorectal cancer dataset resulted in improved data quality, structuring, and robust encoding, as well as enhanced semantic information. We make eHDPrep available as an R package from CRAN [[URL will go here]].

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad030 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer Janna Hastings

      The manuscript describes a toolkit for the automated semantic enrichment and quality control of electronic health data using ontologies. This is a much needed utility that will add value to electronic data sharing and re-use for many different purposes including the development of machine learning for medical applications and personalised medicine. Overall the manuscript is well written and the functionality offered by the toolkit is well thought out and motivated. The internal consistency checks and the use of ontology-based information content to semantically aggregate variables into more informative meta-variables are particularly welcome functions.

      However, I recommend that the description of the tool functionality be clarified in some points, and the evaluation could be strengthened.page 6-7, internal consistency:

      1. How should the user specify semantic dependencies between variable pairs? Would it not be helpful to use a standard format for this specification to enable interoperability and re-use of such specifications?

      2. Should the specification of semantic relationships between variables not be linked to the knowledge from the ontologies? Ontologies are able to represent many different types of logical relationships between classes, which make them ideal for then serving as a standard and interoperable format for specifying this type of constraint. Rules are another promising standard approach for logic-based knowledge representation.

      Page 11, figure 4 a: I think it would be informative for evaluating the operation of the tool if the heatmap of variable missingness after application of the tool could also be illustrated beside the current Fig 4a.

      Page 13, ontology preparation: The paragraph describes what the authors have done to prepare ontologies for use with the tool. Is this preparation procedure also necessary for users to follow when they use the eHDPrep tool? How can alternative ontologies be incorporated (which may be useful for other domains)?Evaluation: The biggest shortcoming of the presented manuscript is that the evaluation is limited to the application of the tool to one dataset and subsequent manual evaluation of the outcome by one group, the study authors.

      The results as presented are positive, but there is a significant risk that the tool performs well on this task, as assessed by these study authors, but then fails to generalise to other tasks and datasets that future users might wish to use it with. To mitigate against this challenge, it would be optimal if somewhat more independent methods could be found for evaluating the performance of the different aspects of the tool. One approach could a rigorous comparison of this tool's performance against the performance of other tools that have similar functionality, e.g. comparison of the semantic aggregation function with other tools that find and recommend MICAs. An alternative approach might be to apply the tool to an additional dataset for which a group outside of the study authors would be prepared to provide an independent evaluation.

    1. Background Eukaryotic gene expression is controlled by cis-regulatory elements (CREs), including promoters and enhancers, which are bound by transcription factors (TFs). Differential expression of TFs and their binding affinity at putative CREs determine tissue- and developmental-specific transcriptional activity. Consolidating genomic data sets can offer further insights into the accessibility of CREs, TF activity, and, thus, gene regulation. However, the integration and analysis of multi-modal data sets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined chromatin state data (e.g., ChIP-seq, ATAC-seq, or DNase-seq) and RNA-seq data exist, they do not offer convenient usability, have limited support for large-scale data processing, and provide only minimal functionality for visually interpreting results.Results We developed TF-Prioritizer, an automated pipeline that prioritizes condition-specific TFs from multi-modal data and generates an interactive web report. We demonstrated its potential by identifying known TFs along with their target genes, as well as previously unreported TFs active in lactating mouse mammary glands. Additionally, we studied a variety of ENCODE data sets for cell lines K562 and MCF-7, including twelve histone modification ChIP-seq as well as ATAC-seq and DNase-seq datasets, where we observe and discuss assay-specific differences.Conclusion TF-Prioritizer accepts ATAC-seq, DNase-seq, or ChIP-seq and RNA-seq data as input and identifies TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad026 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer Kaixuan Luo

      This paper develops a novel pipeline TF-Prioritizer to prioritize condition-specific TFs thorough integrative analysis of histone modification (HM) ChIP-seq and RNA-seq data. The pipeline integrates multiple computational tools: calculate TF binding site affinities and link candidate binding sites to genes using the TRAP and TEPIC. It uses DYNAMITE, a sparse logistic regression classifier, to infer TFs related to differential gene expression between conditions. It computes an aggregated score "TF-TG score" to score TFs from multiple types of evidence, and obtains a prioritized list of TFs from all histone modifications using a discounted cumulative gain ranking approach. It also provides additional functionality and web interface to visualize the results.

      Overall, the pipeline could be very useful for biologists with a user-friendly web application to automate the entire process from data preprocessing to statistical analysis and obtain interactive reports to gain novel biological insights. However, more systematic evaluations are needed to demonstrate the benefits of this pipeline.

      Major comments:

      1. In the computation of an aggregated score "TF-TG score", it uses a multiplicative function to combine differential expression (absolute log2FC), TF-Gene scores computed from TEPIC, and the total coefficients computed from DYNAMITE. One concern about this approach is that it may miss some TFs with support from only one or two types of evidence. In Fig 5, we see diffTF identifies a lot more TFs than diffTF. I don't think we can conclude that diffTF is less specific than TF-Prioritizer simply based on the number of TFs prioritized. Some of the TFs identified only by diffTF may be important but missed by TF-Prioritizer? I would like to see more detailed analysis comparing the lists of TFs identified by diffTF and TF-Prioritizer. Other evidence or metrics in addition to the number of prioritized TFs would be helpful to evaluate the plausibility of the prioritized lists of TFs.

      2. It is hard to interpret and evaluate the contribution of the evidence for prioritized TFs. Figure 6b is helpful, but it is unclear how the users would be able to evaluate the contribution of the components. Does the software run each of the combination separately and outputs a list of prioritized TFs under each combination?

      3. The TEPIC2 paper has already developed a very comprehensive pipeline, including TF affinity calculation by TRAP and computation of TF gene scores by TEPIC, as well as logistic regression to identify TFs between conditions by DYNAMITE, and it is already well paralyzed. The authors should clearly list the novel contributions from this work. It would be helpful to have a table comparing the functionalities and technical features between TF-Prioritizer and TEPIC2.

      4. The software takes histone modification ChIPseq and RNA-seq data as input. It will significantly improve the usage of the software if it supports DNase-seq and/or ATAC-seq, which are widely used. If this software could take ATAC-seq or DNase-seq data as input, it is important to include those data types and provide some examples to illustrate the usage and performance.

      5. The software combines multiple histone modification ChIP-seq datasets using a discounted cumulative gain ranking approach. However, different types of histone modifications have different epigenomic functions and different combinations indicate different chromatin states. Some TFs may be only enriched in a small subset of histone modifications (already discussed by the authors) and may be missed by the simple discounted cumulative gain ranking approach. The authors should provide prioritized TFs from each histone modification ChIP-seq dataset, and evaluate which TFs were prioritized by all the combined datasets, and which TFs by only one dataset. Also, some ChIP-seq datasets may be of poor quality. Does the software provide other options to rank the TFs from different epigenomic datasets? e.g. set different weights for different epigenomic datasets, etc.

      6. The authors conducted cooccurrence analysis based on the overlapping of peaks. It is unclear if the method would calculate some statistical measure (e.g. p-value) for the significance of co-occurrence. Also, since the TRAP model generates quantitative measure of TF binding affinity, I am curious to see if the quantitative TF binding affinity are also correlated for those co-occurred binding sites.

      Minor comments: 1. In Figure 1, it would be helpful to highlight which steps were already implemented in existing tools (and label the tools used), and which steps are novel in this study. 2. H3K4me3 data seems to be missing in the L10 time point. How does the method handle missing data? 3. It is unclear how the Pol2 ChIP-seq data was used in this study? Was it included in the model or only in the downstream analysis? 4. It is hard to interpret the browser tracks of the TF predictions ("Predicted xxx") in Figure 3 and 4. Please add more details about those tracks .5. Figure 6, the authors should provide more details to help understand this figure, especially panel b. The figure legend is too short.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1: Major comments: The key point of the manuscript is to provide resources for the plant community. The motivation for selecting these specific promoters, how they were obtained and cloned, what they are in detail and how they will be made publically available is all clearly described. The infection experiments presented in it are an added bonus and a proof of concept of the applicability of the system.

      Thank you very much.

      Minor comments: The promotor sequences will probably be included in the AddGene submission, however, it might be helpful to also deposit the promoter sequences at e.g. GenBank.

      Indeed, we have sent all sequence files to AddGene and they will be available for download there. We will look into transferring them to GenBank as well. We have not done this before, but are generally always supportive of maintaining data in open repositories.

      Line 133: "There are few exceptions to this rule...". It would probably helpful to list/mark these exceptions in Table 1

      We agree. We have now marked them in the table, and included the sentence “There are a few exceptions to this rule (marked with a * in the ‘Bases’ column in table 2), where we used a defined stretch of DNA that has previously been described to complement a mutant” in lines 135-137.

      Line 138: "A overhangs". In the GreenGate system, A-modules (promoters) are flanked by A- (5') and B- (3') overhangs (applies to line 144, too). Also, the B-overhang listed here (TTGT) is the reverse complement, which might be confusing for readers.

      A very good point. We have modified these lines to “standard four base pair GreenGate promoter module overhangs (5´-ACCT and TTGT-3´) were added via primers during amplification of the promoter sequences (see Supplementary Table 1 for a list of primer sequences. Note that TTGT is the complementary sequence of the A-to-B-module overhang, as this is added via the reverse primer)” in lines 141-144.

      Line 149 ff.: How many lines have been established per promoter tested? Did they all yield a similar expression pattern?

      This is indeed a very important point which was somehow lost along the way during manuscript preparations, after being moved around between results and methods section. We have put it back in in lines 162-165 as “We recovered several independent transgenic lines for the PEP1 and 2, PEPR1 and 2, as well as BIK1 and RBOHD reporters. Out of those, a minimum of three (RBOHD) and up to seven (PEPR2) independent lines showed fluorescence, and out of those, all individual lines for each reporter showed the same expression patterns.”

      Line 163: As someone not being familiar with microscoping Arabidopsis roots, I'm wondering how the authors can be sure that the tissue in question is the vasculature. Is this obvious for experts in the field?

      Of course, we can’t give a totally objective answer here, but we believe that by including the transmitted light image next to the fluorescence image, it is indeed visible that the fluorescence is limited to the center of the root, not the complete circumference. At the same time, it is important to note that all images are stereomicroscopic images, not confocal images. Thus, it is indeed not possible to, e.g., conclude if pericycle cells are included or excluded in the region with expression. So, while it is, we believe, safe to assume that it is vascular cells, we can’t determine which cell types in the vascular cylinder are expressing the reporters. This would require confocal imaging, which would increase the resolution, but at the expense of a good overview, which we think is more valuable for such a proof-of-principle.

      Discussion: Is there by any chance prior (cell-resolution) knowledge about the expression behaviour of any of the investigated promoters? E. g. by in-situ hybridizations? If so, do the expression patterns match?

      No, the expression of these reporters in direct response to fungal infection have so far only been studied by transcriptomics.

      Presentation and quality of the images need be improved. Scale bars are missing in all confocal images. In Figure 3 and 4, the name of genes examined can be labeled on the image, which will make it easier for readers. In addition, key information such as the inoculum and sampling time point after fungal inoculation should be described in the legend or the main text.

      We have added the scale bars and gene names into the images. We agree that the gene names make it easier for the reader. Further, we have added the inoculum and sampling time to the legend.

      More importantly, a "mock" inoculation or "before fungal inoculation" should be performed to reveal the expression changes of the marker genes after fungal inoculation.

      This is information was provided in the text and via the supplemental figures, but I assume we didn’t make it clear that these results and images were indeed specific control/mock experiments, and not some ‘general’ expression analysis. We have now tried to make this clearer, specifically in lines 192-194.

      Lines 172-174, the pictures are too small to see these details. The same for BIK1 (line 187).

      We have split up figure 3 into two separate figures (figures 3 and 4), to allow for them to be displayed larger, so that more details can be observed. Of course, it would also be helpful to do some confocal microscopy on specific regions of interest of these stereomicroscopic images to obtain high-resolution images of these regions, but, unfortunately, we did not reach this point in this project, before our team was disbanded, and we therefore only have the overview images to get a general idea of the responsiveness of the different reporters.

      Line 174-176, which results are these referring to? The same for line 200-203.

      We assume that this was not clear because we previously failed to make it clear that the control supplementary figures are from experimental controls/mock. We have reworded both paragraphs to, hopefully, explain it a bit better, and included the supplementary figure number that refers to. It’s now in lines 212-215 and 237-242.

      This study provides a valuable collection of vectors/constructs for investigation of transcriptional dynamics of plant immunity genes and should attract broad interest of the plant immunity field.

      Thank you very much.

      The current study by Calabria et al., entitled "pGG-PIP: A GreenGate (GG) entry vector collection with Plant Immune system Promoters (PIP)," reported the development of a set of GreenGate-compatible entry plasmids that contain promoter sequences of a series of immunity-related genes. This tool enables live-cell observation of immune responses at a cellular resolution. Being compatible with many other GreenGate tools, it opens up a door toward simultaneous visualization of different but overlapping immune pathways and ultimately describes the 4D dynamics of plant immunity. It is more than expected that these constructs will be used by a wide range of researchers and contribute to the ultimate understanding of plant innate immunity.

      Thank you very much.

      It is exciting that the authors observed the marker expression by a fluorescent stereomicroscope. This allows for non-destructive observation of response over time, keeping the system gnotobiotic. However, it was partly disappointing that the author did not take full advantage of this. It would have been much nicer if the authors observed the infection process over time, such that one could tell when and where the response starts, and whether local and systemic reactions occur simultaneously or instead require local-to-systemic signal transduction. They indeed seem to have done such time-course observation (line 378) however did not provide the results. I am curious to know what the authors could have found from those experiments. It would also be a strong appealing point of this method and is therefore highly encouraged

      We absolutely agree that this temporal data would be valuable and interesting. So far, we always imaged the colonization sites in the root tips from the first day when they become visible, until the day when the entire root was colonized/dying. However, we only recorded the infection sites directly, and did not image the entire plants, and local as well as systemic responses. This is, of course, something that we would have liked to do, and planned to do in the future, but, so far, we have not gotten to that point. We also attempted to use the images of the infection sites that we have recorded over time to obtain information about disease progression, e.g., colonization speed of the fungus, but this data is not (yet) at a point, where we feel confident that we have enough information to draw solid conclusions. So, while we absolutely agree that this kind of whole-plant imaging with both, high spatial and temporal resolution, must be the aim, at this point, unfortunately, we simply are not at that place yet.

      Immune responses are not always induction of expression but sometimes reduction. Some genes up-regulated in the first phase will also be down-regulated afterward in order to go back to the initial non-responding state. During such down-regulation, the expression of a fluorescence marker gene might not accurately reflect the real expression levels, because the translated proteins might stay longer even while its transcription is suppressed. To address this point, it is suggested that the authors observe the marker lines in the presence of a translation inhibitor, such as cycloheximide, and quantitatively analyze the dynamics of protein degradation when no new protein is synthesized.

      This is indeed an excellent point. Unfortunately, we have to first say that due to funding issues we are currently unable to do this experiment. However, we did include two things in the revised manuscript: First, we have put in a note that this is indeed a caveat of the system that must be acknowledged (lines 334-337). Second, we have included some information from a different study, which at least addresses this point to some degree. We have imaged the transcriptional response of the WRKY11 transcription factor in response to colonization by Fo5176, and in this case, we not only see a local upregulation next to the colonization site, but we see a complete switch in expression pattern. As part of this switch, WRKY11 expression, which was expressed in all root tissues and cells in uninfected control experiments, switches expression off in all tissues and cells except the vascular cells close to the infection site. So here, we indeed have a downregulation of the reporter. In these experiments, signal from the fluorescent WRKY11 reporter disappears from the cells within a day. As we imaged once per day, we can, unfortunately not get more specific than this one-day window. The day before colonization of the tip, signal is seen in all tissues, one day later, if/when the vasculature if colonized in the tip, there is no weak/residual fluorescence left in the cells of the outer tissues. So we can at least state that we would probably also detect downregulation of expression, despite the protein lifetime. Importantly, all our imaging is done on a regular stereomicroscope, and thus, camera sensitivity is moderate. I could imagine that we may be able to detect some residual fluorescence with ultra-sensitive cameras at a spinning disc, or a sensitive detector at a laser-scanning microscope, but we have not tested this. We have added this information in lines 337-347. I apologize that we can’t add more information than this.

      It is remarkable that the authors managed to clone 75 promoter sequences. However, whether all promoters work as expected was not clearly assessed in the present study. Did the authors only transform plants with PEP1, PEP2, PEPR1, and PEPR2 marker constructs? How would they know that the other promoters also work appropriately? In terms of providing these constructs to the research community, it is needed to disclose to which extent the expression has been validated in planta and which promoter has not been assessed.

      This is indeed important information. We have not used the promoters in mutant complementation assays, and have added this caveat in lines 348-350.

    1. Considerate

      My reflections here build on Lino Pertile’s 2010 essay, ‘L’inferno, il lager, la poesia’. Pertile notes the profound correspondence between the opening poem of the book (OC I, 139) and this chapter. He points out how the main theme of Levi’s book, the dehumanising experience in the Lager, based on the annihilation of people’s identity, is expressed in the poem and resurfaces explicitly again in the chapter dedicated to Dante’s Ulysses. The key term revealing the correspondence of themes and intentions is ‘Considerate [consider]’, used twice in Levi’s poem (‘Consider if this is a man | … | Consider if this is a woman’) and rooted in the memory of Dante’s famous tercet where Ulysses addresses his crew as they sail towards the horizon of their last journey beyond the pillars of Hercules: ‘Considerate la vostra semenza: | fatti non foste a viver come bruti, | ma per seguir virtute e canoscenza’ (Inf. 26, 118-20 and OC I, 228).

      There are many other correspondences between the chapter of Ulysses and the opening poem, besides the ‘Considerate’, and that they are profound and filtered through the theme of memory, an eminently Dantean theme: the urgency to fix in the memory itself what is or will be necessary to tell, or the urgency to express and recount what is deposited in memory. Indeed, for Levi, the memory of each individual person contains that person’s humanity.

      Memory is immediately activated as Primo and Jean exit the underground gas tank (‘He [Jean] climbed out and I followed him, blinking in the brightness of the day. It was warm [tiepido] outside; the sun drew a faint smell of paint and tar from the greasy earth that made me think of [mi ricordava] a summer beach of my childhood'). Temporarily escaping hell by means of a ladder (a sort of Dantesque ‘natural burella’), it is the tiepido sun and a characteristic smell that evoke the childhood memory and that at the same time the reader cannot avoid connecting to the tiepide case of the initial poem (‘You who live safe | in your heated houses [tiepide case]’ [my emphasis]). It is then around the memory ‘of our homes, of Strasbourg and Turin, of the books we had read, of what we had studied, of our mothers’ that another theme in the chapter coalesces, the theme of friendship (‘He and I had been friends for a week’), a theme that had already emerged in a more general connotation in the opening poem (‘visi amici’). Warmth, friendship (visi amici…Jean), the kitchens as destination for Primo and Jean’s walk (the walk from the tank with the empty pot is ‘the ever welcomed opportunity of getting near the kitchens’, not for that hot food [cibo caldo] evoked in the poem, but for the soup of the camp, an alienating incarnation of Dantesque ‘pane altrui’ whose various names are dissonant). During the respite of the one hour walk from the tank to the kitchens, the intermittent memory of Dante’s canto emerges as if from an underground consciousness, the memory of Inferno as a partial and imperfect mirror of the human condition in the Lager, Ulysses as poetic memory, a sudden epiphany of a semenza, a seed, of humanity that the Lager is made to suppress, and Primo’s wondering in the face of this sudden internal revelation of still possessing an intact humanity. Primo’s memory of his home resurfaces as if springing from the memory of Dante’s text: the ‘montagna bruna’ of Purgatory is reflected in the memory of ‘my mountains, which would appear in the evening dusk [nel bruno della sera] when I returned from Milan to Turin!' But the real, familiar landscape is too heartbreaking a memory of ‘sweet things cruelly distant’, one of those hurtful thoughts, ‘things one thinks but does not say’. There is an epiphanic memory then, the poetic memory that surfaces during the walk and that reveals to Primo that he still is a man, a memory to which he clings despite the sense of his own audacity (‘us two, who dare to talk about these things with the soup poles on our shoulders’); there is also a more intimate memory, equally pulsating with life and humanity - but dangerous, because it makes Primo vulnerable to despair, threatening his own survival in the camp.

      The urgent need to remember Dante’s verses in this chapter develops the theme of memory, which has been central from the opening poem. In Levi’s poem, though, memory is perceived from a different angle: the readers (who live safe…) must honour that memory and transmit it as an imperative testimony of what happened in the concentration camp from generation to generation, testifying to the suffering of the man and the woman ‘considered’ in the poem. This is a memory to be carved in one’s heart, which must accompany those who receive it in every action and in every moment of each day like a prayer. Not coincidentally the poem follows the text of the most fundamental prayer of Judaism, the Shemà Israel, which is read twice a day, a memory to be passed on to one’s own children, a responsibility which is a sign of one’s humanity. The commandment to remember of the opening poem (‘I consign these words to you. | Carve them into your hearts') issues a potential curse to the reader, threatening the destruction of what most fundamentally characterises their humanity - home, health, children: ‘Or may your house fall down, | May illness make you helpless, | And your children turn their eyes from you’. Finally, Primo’s act of remembering during the walk to the kitchens is submerged by the Babelic soup (‘Kraut und Rüben…cavoli e rape…Choux et navets…Kàposzta és répak…Until the sea again closed – over us’) and yet the memory of it becomes part of his testimony in such a central chapter of the book written after surviving the Shoah. If the memory of Dante’s verses contributed to Primo’s faith in his own humanity and his psychological and physical survival in the camp, he then accomplishes the commandment of memory and his responsibility as a man through his own writing.

      CS

    2. non lasciarmi pensare alle mie montagne

      Very often, when we think about ‘Il canto di Ulisse’, we tend to recall only the most famous pages in which Levi tries to remember Dante’s canto. The depth and sense of urgency of the Ulyssean passages are so overwhelming and passionate that they may distract us from other elements in the chapter. However, if we go back to the text and read it closely, we cannot avoid noticing that, after a brief opening in which Levi introduces Pikolo and narrates how he came to be Pikolo’s ‘fortunate’ chaperone to collect the soup for the day, ‘Il canto di Ulisse’ also dwells quite significantly on a moment of domestic memories. While going to the kitchens, Levi writes: ‘Si vedevano i Carpazi coperti di neve. Respirai l’aria fresca, mi sentivo insolitamente leggero’. This is the first moment in the chapter in which Levi refers to the mountains as something that revitalises him and makes him feel fresh and light, both physically and mentally.

      This moment foreshadows another, also in this chapter, when Levi goes back to his mountains, those close to Turin, and compares them to the mountain that the protagonist of Dante’s canto, Ulysses, encounters just before his shipwreck with his companions:

      ... Quando mi apparve una montagna, bruna

      Per la distanza, e parvemi alta tanto

      Che mai veduta non ne avevo alcuna.

      Sì, sì, ‘alta tanto’, non ‘molto alta’, proposizione consecutiva. E le montagne, quando si vedono di lontano... le montagne... oh Pikolo, Pikolo, di’ qualcosa, parla, non lasciarmi pensare alle mie montagne, che comparivano nel bruno della sera quando tornavo in treno da Milano a Torino! Basta, bisogna proseguire, queste sono cose che si pensano ma non si dicono. Pikolo attende e mi guarda. Darei la zuppa di oggi per saper saldare ‘non ne avevo alcuna’ col finale.

      The significance of the mountains in Levi’s narration is confirmed in this passage. For him, the mountains represent his experience of belonging, his youthful years, and his work as a chemist – the job he was doing when he commuted by train from Turin to Milan. At the same time, Levi’s own memories of the mountains intertwine and overlap with another mountain, Dante’s Mount Purgatory. Here, a deep and perhaps not fully conscious intertextual game starts to emerge and to characterise Levi’s writing. The lines that Levi does not remember are these (compare, on the Dante page):

      Noi ci allegrammo, e tosto tornò in pianto,

      ché de la nova terra un turbo nacque,

      e percosse del legno il primo canto.

      For Dante’s Ulysses, Mount Purgatory signifies the final moment of his adventure and his desire for knowledge. The marvel and enthusiasm that Ulysses and his company feel when they see the mountain is suddenly transformed into its contrary. From the mountain, a storm originates that will destroy the ship and swallow its crew: ‘Tre volte il fe’ girar con tutte l’acque, | Alla quarta levar la poppa in suso | E la prora ire in giù, come altrui piacque’. Dante’s Mount Purgatory, so majestic and spectacular, represents the end of any desire for knowledge that aims to find new answers to and interpretations of human existence in the world without God’s word.

      Going back to Levi’s text, we find that, instead, in a kind of reverse overlapping between his image and that of Ulysses, the image of the mountain of Purgatory suggests to Levi a very different set of thoughts that, although seemingly and similarly overwhelming, opens up new interpretations: ‘altro ancora, qualcosa di gigantesco che io stesso ho visto ora soltanto, nell’intuizione di un attimo, forse il perché del nostro destino, del nostro essere oggi qui’. For a moment, it is almost as if Levi, a new Dantean Ulysses in a new Inferno, stands in front of Mount Purgatory and forgets the terzine and the shipwreck. Maybe Levi cannot or does not want to remember those terzine because the mountain in Purgatory represents something very different for him than for Dante’s Ulysses. Levi’s view of the mountain does not lead to a moment of recognition of sin, as it does in Dante’s Ulysses. For him, the mountain, like his mountain range, is the gateway to knowledge, enrichment, and illumination and to a world that lies beyond the imposed limits of traditional, constricting, and distorted views and that awaits discovery (‘qualcosa di gigantesco che io stesso ho visto ora soltanto’). Something about and beyond the Lager.

      To better understand how the mountains are central in ‘Il canto di Ulisse’, we have to remember that Levi’s view of the mountains strongly depends on his anti-Fascism, which he expressed particularly vigorously in two moments of his life: during his months in the Resistance, just before he was captured and sent to Fossoli, and, even more intensely, during the adventures of his youth, when he was a free young man who enjoyed climbing the mountains surrounding Turin. As Alberto Papuzzi has suggested, ‘le radici del suo rapporto con la montagna sono ben piantate in quella stagione più lontana: radici intellettuali di cittadino che cercava sulla montagna, nella montagna, suggestioni e risposte che non trovava nella vita, o meglio nell’atmosfera ispessita di quella vita torinese, senza passato e senza futuro’ (OC III, 426-27). Indeed, reports Papuzzi, Levi confirms that:

      Avevo anche provato a quel tempo a scrivere un racconto di montagna […]. C’era tutta l’epica della montagna, e la metafisica dell’alpinismo. La montagna come chiave di tutto. Volevo rappresentare la sensazione che si prova quando si sale avendo di fronte la linea della montagna che chiude l’orizzonte: tu sali, non vedi che questa linea, non vedi altro, poi improvvisamente la valichi, ti trovi dall’altra parte, e in pochi secondi vedi un mondo nuovo, sei in un mondo nuovo. Ecco, avevo cercato di esprimere questo: il valico.

      The heart of that epic story made its way into the chapter ‘Ferro’ in Il sistema periodico. The discovery of this (brave) new world, ‘mondo nuovo’, is an integral part and a direct achievement of Levi’s experience in the mountains. The mountains open a new understanding and a new perspective on the world.

      Something that escapes common understanding is revealed through the experience of the mountains, both in Levi’s memories of his youth and in his literary recounting of Auschwitz. Reciting Dante in ‘Il canto di Ulisse’ is therefore not only an intertextual exercise for Levi. Only by inserting Levi’s literary references in the complexity of his own experience – before, during, and after Auschwitz – can we fully capture the depth of his reflections. Levi mentally and metaphorically brought to Auschwitz not only Dante but also his ‘metafisica dell’alpinismo’. Together, they contributed to his attempt to come to terms with that reality.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      […] Overall, the authors build a convincing case for TEs being an important source of regulatory information. I don't have any issues with the analysis, but I am concerned about the sweeping claims made in the title. Once you get rid of eQTLs that could be altered by either SNPs or TIPs and include only those insertions that show strong evidence of selection, the number of genes is reduced to only 30. And even in those cases, the observed linkage is just that, not definitive evidence for the involvement of TEs. Although clearly beyond the scope of this analysis, transgenic constructs with the TEs present or removed, or even segregating families, would have been far more convincing. 

      We notice that the referee thinks that we "built a convincing case for TEs being an important source of regulatory information". This is what we wanted to convey in the title, were we were cautious to not claiming that TEs are the most important contributor to gene expression variability in rice populations. However, we agree with the referee that the title may be improved to better describe the results presented. We have therefore changed the title to "Transposons are an important contributor to gene expression variability under selection in rice populations".

      With respect to demonstrating causality by removing or introducing the TEs, this is indeed a work we plant to do but that, as stated by the referee, is beyond the scope of this analysis.

      The fact that many of the eQTL-TIPs were relatively old is interesting because it suggests that selection in domesticated rice was on pre-existing variation rather than new insertions. This may strengthen the argument because those older insertions are less likely to be purged due to negative effects on gene expression. Given that the sequence of these TEs is likely to have diverged from others in the same family, it would have been interesting to see if selection in favor of a regulatory function had caused these particular insertions to move away from more typical examples of the family. 

      The TIP-eQTL are from different classes, superfamilies and families and the number of TIP-eQTLs of the same family is too small to deduce sequence communalities (4.6 TIP-eQTLs/family in indica and 3.6 TIP-eQTLs/family in japonica). On the other hand the effect of TIPs on expression can be positive or negative (we show actually that it is often negative). In the later case, a plausible scenario would be of the insertion inactivating a promoter element, and in this case it would be the insertion itself, and not the actual sequence of the TE what would be selected.

      Also, previous work done in our lab has shown that TEs can amplify and mobilize transcription factor binding sites that are bound by the TF even when they are not close to a gene and therefore probably not directly affecting gene expression (Hénaff et al.,2014. The Plant Journal). In that case, the sequence of the eQTL TEs and those that are far away from genes will not necessarily differ. 

      Reviewer #2 (Public Review):

      In this manuscript, Castanera et al. investigated how transposable elements (TEs) altered gene expression in rice and how these changes were selected during the domestication of rice. Using GWAS, the authors found many TE polymorphisms in the proximity of genes to be correlated to distinct gene expression patterns between O. sativa ssp. japonica and O. sativa ssp. indica and between two different growing conditions (wet and drought). Thereby, the authors found some evidence of positive selection on some TE polymorphisms that could have contributed to the evolution of the different rice subspecies. These findings are underlined by some examples, which illustrate how changes in the expression of some specific genes could have been advantageous under different conditions. In this work, the authors manage to show that TEs should not be ignored when investigating the domestication of rise as they could have played an important role in contributing to the genetic diversity that was selected. However, this study stops short of identifying causations as the used method, GWAS, can only identify promising correlations. Nevertheless, this study contributes interesting insights into the role TEs played during the evolution of rice and will be of interest to a broader audience interested in the role TEs played during the evolution of plants in general. 

      We agree with the referee that the results presented do not allow concluding on causality, and we have been careful not to pretend they would in the manuscript. We plan to perform analysis of adding or removing TEs by CRIPR/Cas 9 approaches to address this, but, in line with referee's 1 comment, we think this is beyond the scope of this analysis.

      ---------- 

      Reviewer #1 (Recommendations For The Authors): 

      Everything that I need to say is provided in the public portion of my review. 

      Reviewer #2 (Recommendations For The Authors): 

      Major concerns:

      1. The authors compare the proportion of the variance explained by the most significant TIP and SNP on the observed eQLTs associated with TIPs and SNPs. Thereby the authors conclude that TIPs explain more variance than SNPs. If I am not mistaken the GWAS was run separately for TIPs and SNPs, however, I am wondering if running the GWAS on the combined TIP and SNP dataset might be the better way to compare the variance explained by TIPs and SNPs on gene expression differences. It would be nice to see if these results also hold true if a TIP and SNP combined dataset is used as the most significant marker in a GWAS might not be the causal mutation but might just be linked to the causal mutation. Further in the TIP dataset, the number of markers is only 45k and in the SNP dataset, it is 1 000k, which could bias the GWAS toward finding markers that explain more of the variation in the dataset with fewer markers. 

      We addressed the reviewer concern by using two complementary approaches, whose results are described in the text (lines 119-121) and in the new Figure 1-figure supplement 1.

      First, we addressed the concern regarding the independent GWAS for TIPs and SNPs vs a combined strategy. For this, we built new japonica/indica genotype matrices containing all TIP and SNP matrix together and ran eQTL mapping again. Using the same strategy (association + FDR adjust), we found 100% of the previous TIP-eQTLs and 99% of the previous SNP-eQTLs. We repeated the same analysis (proportion of expression variance), and the results were mostly the same (Figure 1-figure supplement 1A).

      Second, we addressed the two concerns (combined genotypes and different amount of TIP and SNP markers) using a single approach. SNP matrices were LD pruned using a r2 = 0.9 and later subsampled to the exact number of TIPs (Indica = 30,396, Japonica = 25,168). We verified that these SNPs covered well the 12 rice chromosomes. SNP and TIP genotypes were later merged into a single matrix, and eQTL mapping was repeated for each of the subspecies and conditions using the same parameters as in the previous version of the manuscript. 100 % of the previously reported TIP-eQTL associations were found using this new approach. Nevertheless, we found a very important drop of sensitivity in the SNP-eQTLs (only 15-20% of the previous associations were detected), possibly due to the strong reduction in the number of SNPs (> 95 %), which results in much lower number of markers at < 5Kb from genes). We repeated the analysis of Figure 1D, and observed very similar results (Figure 1-figure supplement 1D). There is a very important number of TIP-eQTL associations that do not coincide with SNP-eQTLs, (74% in indica, 83% in japonica) indicating that TIP-eQTL mapping is complementary to SNP-eQTL mapping as it uncovers additional associations (note that in this case the overlap between TIP-eQTLs and SNP-eQTLs is lower than in the previous analysis due to the lower sensitivity of SNP-eQTL mapping using less markers). In the cases were both a TIP and a SNP coincide as eQTL, TIPs explained slightly more variance than SNPs in both indica and japonica (in 54% of the cases TIP variance > SNP variance).

      2. Line 146 to 152: in this section, the authors describe overlaps between TIP-eQTLs in two different growth conditions, however, in the text it is not mentioned if the TIPs have the same effect on gene expression in the two conditions or if the gene expression is up-regulated in one condition but down-regulated in the other. This information would be interesting to have here, especially as the authors go on to say that only a small number of TIP-eQTLs are stress-specific. The same comment also goes for the eQTL overlap described on lines 167 to 170. 

      We checked the effect type (positive or negative) of TIP-eQTLs in both scenarios (associations shared between wet/dry conditions, and associations shared between subspecies). In both cases, 100 % of the shared TIP-eQTLs have the same effect type in the two conditions or subspecies. We have updated the text accordingly (Lines 55-157 and Lines 179-181)

      3. Lines 192 to 196: the authors mention that the frequency of non-eQTL-TIPs was at the same frequency in indica and japonica, which is in contrast to eQTL-TIPs. However, on line 132 it is mentioned that eQTL-TIPs were overrepresented in 1 kb regions upstream of genes. Hence, is the pattern of the frequency of non-eQTL-TIPs being at the same frequency in indica and japonica also observed in the 1 kb regions upstream of genes and/or if the distribution of non-eQTL-TIPs is matched to one of the eQTL-TIPs? Or is this pattern driven by non-eQTL-TIPs far away from genes?

      We checked the frequencies of TIPs at 1Kb upstream genes and found that the general pattern is maintained, with the frequencies of TIP no-eQTLs being more correlated than that of TIP-eQTLs. We have included this information (lines 204-206) an added a new supplementary file (Figure 2-figure supplement 2)

      4. In the discussion, the authors could briefly discuss how linked selection affecting TIPs could contribute to the observed results. After reading the second example in the result section where one of the example TIPs (TIP_50059) is found on the Hap B which contains "some additional structural differences" (line 290), I was left wondering how much of the increase in TIP frequency can be attributed to genetic hitchhiking? And how much of the results could be caused by linked selection, especially when considering that structural variations are not included in the GWAS analyses. 

      We agree with the referee in that some of the TIP eQTLs here described might be not the actual cause of expression variability (ej, TIP linked with the causal mutation), although we cannot know the exact fraction. This is stated in several places of the results and discussion sections. However, the fact that TIPs tend to explain more variance than SNPs and that TIP eQTL, but not SNP eQTL, tend to concentrate in the upstream proximal region of genes where most transcription regulatory sequences are located (Figure 1), suggest that TIP eQTLs could be more frequently the causal than SNP eQTLs. We revised the text to ensure that we convey this message appropriately.

      Minor comments: 

      • Lines 80 to 83: the description of the rice phylogeny should be moved to the introduction. 

      Done (Lines 68-72)

      • Line 177 to 186: It was unclear to me if the authors checked in the ancestral rice population laced the TIPs described in this section as recently inserted in the indica and japonica ssp. It would be nice to add this information to this section. 

      Thanks to the referee comment we noted an imprecision in the text. The approximate 1/3 of subspecies specific TIP-eQTLs refers to the TIPs at 3% MAF (ie, some of these insertions could be present at > 3% in indica, but at < 3% MAF in japonica). We now indicate only the TIPs that are truly specific to any of the two subspecies (frequency is zero in one of the two) and looked for their presence in rufipogon:

      59 insertions are indica-specific. Of those, 33 are present in rufipogon.

      21 insertions are japonica-specific. Of those, 5 are present in rufipogon.

      We have incorporated this information in the manuscript (Lines 185-189). The species-specific TIPs are also available in the Supplementary File 3.

      • Line 353: "have two of more TIPs" should be "two or more" 

      Done (Line 369)

      • Figure 1D: Using a square layout instead of a rectangle layout for the plot will make it easier to interpret. 

      Done.

    1. wide variety of methods for any given project.

      Having a variety of methods to get to a solution is exteremely important as a lot of people can come at a problem with different angles and solve it differently. However, this can breed confusion as to which way is the "right" way and which way is the "wrong way" It also may seem like their way might not work but we should do a though examination from their side to see why they think it might work and maybe square that up with the harsh reality.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      Manuscript number: RC-2023-01932

      Corresponding author(s): Dennis KAPPEI

      We would like to thank all reviewers for their recognition of our approach and the quality of our work as well as their constructive criticism.

      Reviewer #1

      Reviewer #1: The manuscript by Yong et. al. describes a comparison of various chromatin immunoprecipitation-mass spectrometric (ChIP-MS) methods targeting human telomeres in a variety of systems. By comparing antibody-based methods, crosslinkers, dCas9 and sgRNA targeted methods, KO cells and various controls, they provide a useful perspective for readers interested in similar experiments to explore protein-DNA interactions in a locus-specific manner.

      Response: We would like to thank the reviewer for the feedback and the appreciation of our work.

      Reviewer #1: While interesting, I found it somewhat difficult to extract a clear comparison of the methods from the text. It was also difficult to compare as data and findings from each method was discussed in its own context. Perhaps it is not in their interest to single out a specific method and it is indeed true that there are caveats with each of the methods.

      Response: Across our manuscript we have established one single workflow, for which we present some technical comparisons (e.g. using single or double cross-linking in Fig. 2a/b), technical recommendations such as the use of loss-of-function controls (e.g. Fig. 1c v. Fig. 2a and Extended Data Fig. 3g vs. 3i) and an application to unique loci using dCas9 (Fig. 3f). Based on the suggestions below, we believe that we will improve the clarity of communicating our approach.

      Reviewer #1: I think the manuscript would be of interest but I believe that there are remaining questions that need to be addressed before publication. In particular, I found it difficult to reconcile the discrepancy in protein IDs between most experiments vs. the WT/KO experiment in Fig 2. The authors make a big deal about the importance of the KO control but I think the fewer proteins identified there may be experiment-specific and not general to the KO system. I ask that this be investigated more carefully by the authors in their revisions.

      Response: We thank the reviewer for highlighting this point. We do not think that the ChIP-MS comparison between U2OS WT and ZBTB48 KO clones (Fig. 2a) has experiment-specific caveats. Instead the KO controls as well as the dTAGV-1 degron system for MYB ChIP-MS (Extended Data Fig. 3) reveal antibody-specific off-targets, which are indeed false-positives. Please see below for further details.

      Reviewer #1: Ln 57: What is "standard double cross-linking ChIP reactions" in this context? Is it the two different crosslinkers? The two proteins? The reciprocal IPs of one protein, and blotting for another? It's not clear here or from Extended Fig 1A. Upon further reading, it seems to pertain to the two crosslinkers - if so, the authors should briefly describe their workflow to help readers.

      Response: As the reviewer correctly concludes, we indeed intended to highlight the use of two separate crosslinkers (formaldehyde/FA and DSP). This combination is important as illustrated in the side-by-side comparison of Fig. 2a and Fig. 2d. Here, we performed ZBTB48 ChIP-MS in five U2OS WT and five U2OS ZBTB48 KO clones. While in both experiments the bait protein ZBTB48 was abundantly enriched in the samples that were fixed with formaldehyde we lose about half of the telomeric proteins that are known to directly bind to telomeric DNA independent of ZBTB48 and all of their interaction partners. For instance, while the FA+DSP reaction in Fig. 2a enriched all six shelterin complex members, the FA only reaction in Fig. 2d only enriches TERF2. These data suggest that the use of a second cross-linker helps to stabilise protein complexes on chromatin fragments. This is a critical message of our manuscript as ChIP-MS only truly lives up its name if we can enrich proteins that genuinely sit on the same chromatin fragment without protein interactions to the bait protein. We will expand on this in both the text and our schematics in Fig. 1a and 3a to make this clearer for the readers.

      Reviewer #1: Ln 95: It is surprising and quite unclear to me why it is that the WT ZBTB48 U2OS pulldown in Fig 1B shows 83 hits for the WT vs Ig control experiment but 27 hits for the WT vs KO condition in Fig 2A. The two WT experiments have the same design and reagents, shouldn't they be as close as technical replicates and provide very similar hits?

      The authors seem to make the claim that most of the 'extra' proteins in WT vs Ig are abundant and false positives, but if this is so, shouldn't they bind non-specifically to the beads and be enriched equally in Ig control and ZBTB48 WT IPs?

      Response: We again thank the reviewer for raising this point and the need to explain in more detail why we interpret the difference between 83 hits (anti-ZBTB48 antibody vs. IgG; Fig. 1c) and 27 hits (anti-ZBTB48 antibody used in both U2OS WT and ZBTB48 KO cells; Fig. 2a) primarily as false-positives. The KO controls in Fig. 2a allow to keep the ZBTB48 antibody as a constant variable while instead comparing the presence (WT) or absence (KO) of the bait protein. Hence, proteins that were enriched in the IgG comparison in Fig. 1c but that are lost in the WT vs. KO comparison in Fig. 2a are likely directly (or indirectly) recognised by the ZBTB48 antibody, akin to off-targets to this particular reagent. In a Western blot this would be equivalent to seeing multiple bands at different molecular weights with only the band belonging to the protein-of-interest disappearing in KO cells. To illustrate this we would like to refer to Extended Data Fig. 2, in which we have replotted the exact same data from Fig. 2a. However, in addition we have here highlighted proteins that were enriched in the IgG comparison in Fig. 1c. 46 proteins (in pink) are indeed quantified in the WT vs. KO comparison, but these proteins are found below the cut-offs (and most of them with very poor fold changes and p-values). In contrast to the other several hundred proteins common between both experiments that can be considered common background non-specifically bound to the protein G beads, these 46 proteins represent antibody-specific false-positives.

      The above consideration is not unique to ChIP-MS as illustrated by the Western blot example. We also do not claim novelty on the experimental logic, e.g. pre-CRISPR in 2006 Selbach and Mann demonstrated the usefulness of RNAi controls in immunoprecipitations (IPs) (PMID: 17072306). However, our data suggests that ChIP-MS is particularly vulnerable to this type of false-positives given that the approach requires (double-)cross-linking to sufficiently stabilise true-positives on the same chromatin fragment.

      To supplement the WT vs. ZBTB48 KO comparison, we had included a second experiment in the manuscript that illustrates the same point in even more dramatic fashion. First, KO controls are very clean in principle, but they themselves might come with caveats if e.g. the expression levels between WT and KO samples differ greatly. This might create a situation that the reviewer hinted to, i.e. differential expression of abundant proteins that would proportionally to their expression levels stick to the beads, resulting in “fold enrichments”. The resulting false positives could e.g. be controlled by matched expression proteomes. For ZBTB48 we have previously measured this (PMID: 28500257) and demonstrated that only a small number of genes are differentially expressed (~10) and hence we can interpret the WT vs. ZBTB48 KO comparison quite cleanly. However, for other classes of proteins such as transcription factors that regulate a large number of genes, E3 ligases etc. this might present a more serious concern. Therefore, we extended our loss-of-function comparison to such a transcription factor, MYB, by using the dTAGV-1 degron system. Importantly, the MYB antibody has been used in previous work for ChIP-seq applications (e.g. PMID: 25394790). Here, instead of 186 hits in the MYB vs. IgG comparison using the same MYB antibody in control-treated and dTAGV-1-treated cells (upon 30 min of treatment only) we only detect 9 hits. Again, similar to the WT vs. ZBTB48 KO comparison, 180 proteins are quantified in the DMSO vs. dTAGV-1 comparison, but these proteins fall below the cut-offs (Extended Data Fig. 3g vs. 3i). Again, we believe that this quite drastically illustrates how vulnerable ChIP-MS data is to large numbers of false-positives. This is not only a technical consideration as such datasets are frequently used in downstream pathway/gene set enrichment analyses etc. Such large false discovery rates would obviously lead to error-carry-forward and additional (unintended) misinterpretations. We will carefully expand our textual description across the manuscript to make these points much clearer. In addition, we will move the previous Extended Data Fig. 3 into the main manuscript to more clearly highlight this important point.

      Reviewer #1: Volcano plots in Figs 1, 2, and Suppl. Tables etc: Are the plotted points the mean of 5 replicates? Was each run normalized between the replicates in each group, for e.g. by median normalization of the log2 MS intensities? This does not appear to be the case upon inspection of the Suppl Tables. Given the variability in pulldown efficiency, gel digest and peptide recovery, this would certainly be necessary.

      Response: All volcano plots are indeed based on 4-5 biological replicates (most stringently in the WT vs. KO comparisons in Fig. 2 based on each 5 independent WT and ZBTB48 KO single cell clones). The x-axis of each volcano plot represents the ratio of mean MS1-based intensities between both experimental conditions in log2 scale. However, precisely to account for the variation that the reviewer highlighted we did not base our analysis on raw MS1 intensities but we used the MaxLFQ algorithm (PMID: 24942700) as part of the MaxQuant analysis software (PMID: 19029910) for genuine label-free quantitation across experimental conditions and replicates. In this context, we would also like to refer to a related comment by reviewer #2 based on which we will now addd concordance information for each replicate (heatmaps for Pearson correlations and PCA plots). We will improve this both in the text and methods section accordingly.

      Reviewer #1: Ln 125: The authors make the claim that the ChIP-MS experiments are inherently noisy, with examples from WT cells, dTAG system and IgG controls. This is likely the case, yet their experiments with WT vs KO cells do not identify as many proteins overall. I find this inconsistency somewhat unclear and does not seem to match the claim of ChIP-MS experiments and crosslinking adding to non-specificity. Can the authors add the total number of identified proteins in each volcano plot, for easier reference?

      Response: The number of identified proteins does not vary majorly between matched IgG and loss-of-function comparisons and for instance the single cross-linking (FA only) experiment in Fig. 2c has the largest number of quantified proteins among all ZBTB48 IPs. But we will of course add the requested information to all plots.

      Reviewer #1: I think the manuscript is interest as it provides important benchmarks for ChIP-proteomics experiments. I believe that there are remaining questions that need to be addressed before publication. In particular, I found it difficult to reconcile the discrepancy in protein IDs between most experiments vs. the WT/KO experiment in Fig 2. The authors make a big deal about the importance of the KO control but I think the fewer proteins identified there may be experiment-specific and not general to the KO system. I ask that this be investigated more carefully by the authors in their revisions.

      Response: We would like to thank the reviewer for recognising our work as a source for important benchmarks for ChIP-MS experiments. We hope that with a more detailed description and discussion the highlighted aspects will be more clearly communicated. We originally conceived our manuscript as a short report and now realised that some of the information became too condensed and might therefore benefit from more extensive explanations.

      Reviewer #2

      Reviewer #2: Summary: In this manuscript, Yong and colleagues have introduced a optimized technique for studying actors on chromatin in specific regions with a localized approach thanks to revisited ChIP-mass spectrometry (MS) with label-free quantitative (LFQ). The authors exhibited the utility of their approach by demonstrating its effectiveness at telomeres from cell culture (human U2OS cells) to tissue samples (liver, mouse embryonic stem cells). As a proof of concept, this technique was tested by the authors with proteins from complex shelterin specific to telomeres (TERF2 and ZBTB48), transcription factors (MYB), and through dCas9-driven locus-specific enrichment. Notably, the authors created a U2OS dCas9-GFP clone and then introduced sgRNAs to target either telomeric DNA (sgTELO) or an unrelated control (sgGAL4). The cells expressing sgTELO exhibited a significant localization of telomeres and an enriched amount of telomeric DNA in ChIP with dCas9. They also found the proteins previously identified as known to be enriched at telomeres (for example, the 6 shelterin members).

      Moreover, the authors illustrated the importance of double crosslinking (formaldehyde (FA) and dithiobis(succinimidyl propionate) (DSP) in ChIP-MS. Their data demonstrated also that ChIP-MS is inclined towards false-positives, possibly owing to its inherent cross-linking. However, by utilizing loss-of-function conditions specific to the bait, it can be tightly managed.

      • Can you show the concordance between biological replicates for each ChIP with LFQ? (heatmap of Pearson correlation and PCA plot). This will confirm the robustness of the use of LFQ.

      Response: We will add the requested concordance data for all volcano plots both in the form of heatmaps of Pearson correlation and PCA plots. Across our datasets, the replicates from the same experimental condition clearly cluster with each other and replicates have high concordance values of >0.9. As expected replicates for the target/bait samples have slightly higher concordance values compared to the negative controls (IgG or loss-of-function samples). We thank the reviewer for this suggestion as the new Extended Data panel will strengthen the illustration of our robust LFQ data.

      Reviewer #2: You say that your technique is " a simple, robust ChIP-MS workflow based on comparably low input quantities » (line 139). What would be really interesting for a technical paper would be: a schematic and a table illustrating the differences between your method and the previously published methods (amount of material, timeline,...) to really highlight the novelty in your optimized techniques.

      Response: We will add a comparison table with previous publications using ChIP-MS and for reference include some complementary approaches as requested by reviewer #3. On this note, we would like to stress that we are not “only” intending to use less material and to have an easy-to-adopt protocol. A cornerstone of our manuscript is to apply rigorous expectations to ChIP-MS experiments, in particular the ability to enrich proteins that independently bind to the same chromatin fragments as the bait protein (regardless of whether this is an endogenous protein or a exogenous, targeted bait such as dCas9). Otherwise, such experiments risk to be regular protein IPs under cross-linking conditions, which as illustrated by our loss-of-function comparisons are prone to yield particularly large fractions of false-positives.

      Reviewer #2: It would be interesting to perform the dCas9 ChIP experiment in telomeric regions with and without LFQ. Since the novelty lies in this parameter, at no time does the paper show that LFQ really allows to have as many or more proteins identified but in a simpler way and with less material. A table allowing to compare with and without LFQ would be interesting.

      Response: We do not fully understand what the suggestion “without LFQ” refers to exactly. We assume that this reviewer might suggest to use a different quantitative mass spectrometry approach other than LFQ, e.g. SILAC labelling, TMT labelling etc. Please note that we do not claim that LFQ quantification is per se superior to the various quantification methods that had been developed and widely used across the proteomics community especially before instrument setups and analysis pipelines were stable enough for label-free quantification (a name that is strongly owed to this historic order of development). However, a central goal of our workflow is to make robust and rigorous ChIP-MS accessible to the myriad of laboratories using ChIP-qPCR/-seq and that may not be extensively specialised in mass spectrometry. Both metabolic and isobaric labelling come not only at a higher cost but also present an experimental hurdle to non-specialists compared to performing biological replicates without any labelling, essentially the same way as for any ChIP-qPCR etc. experiment. We will further elaborate on these points in the manuscript to more clearly convey these notions.

      In general, with the right effort different quantitative methods should and will likely yield qualitatively similar results. However, comparisons between LFQ approaches (MaxLFQ, iBAQ,…) and labelling approaches (SILAC, TMT, iTRAQ) have already been better explored and verbalised elsewhere (e.g. PMID: 31814417 & 29535314). Therefore, we believe that this will add relatively little value to our manuscript.

      Reviewer #2: Put a sentence to explain "label free quantification". For a reader who is not at all familiar with this technique, it would be interesting to explain it and to quote the advantages compared to PLEX.

      Response: Thanks for highlighting this. In line with the point above as well as a similar comment by reviewer #1 we will improve this both in the main text and manuscript to clearly explain the terminology, the MaxLFQ algorithm (PMID: 24942700) used and to highlight the advantages compared to labelling approaches.

      Reviewer #2: what does the ranking on the right of each volcano plot represent (figure 1 b-e, figure 2a,d,e for example)? top of the most enriched proteins in the mentioned categories? Not very clear when we look on the volcano plot. it must be specified in the legend.

      Response: The numbering these panels is meant to link protein names to the data points on the volcano plots. The order of hits is ranked based on strongest fold enrichment, i.e. from right to center. We will clarify this in the figure legends.

      Reviewer #2: General assessment/Advance: The authors explain in their article that the ChIP exploiting the sequence specificity of nuclease-dead Cas9 (dCas9) to target specific chromatin loci by directly enriching for dCas9 was already published. Here, the novelty of this study lies in the use of LFQ mass spectrometry to optimize the technique and make it easier to handle. Some comparisons with previous papers or data generated by the lab will be interesting to really show the improvement and the advantage to use LFQ and therefore, to highlight better the novelty of the study.

      Response: We thank the reviewer for this assessment and as mentioned above we will include such a comparison table. dCas9 has been used previously in a ChIP-MS approach termed CAPTURE (PMID: 28841410). While this is clearly a landmark paper that illustrated the dCas9 enrichment concept across multiple omics applications (i.e. not limited to proteomics) in their application to telomeres, the authors enriched only 3 out of the 6 shelterin proteins with quite moderate fold enrichments (POT1: 0.99, TERF2: 2.13, TERF2IP: 1.06; in log2 scale). Based on this alone, POT1 and TERF2IP would not have qualified for our cut-off criteria. In addition, while the authors had performed three replicates, detection is only reported in 1-2 out of 3 replicates. While it is difficult to reconstruct statistical values based on the publicly accessible data, it is therefore unlikely that even these 3 proteins would have robustly be considered hits in our datasets. Similarly, using recombinant dCas9 with a sgRNA targeting telomeres that was in vitro reconstituted with sonicated chromatin extracts from 500 million HeLa cells (CLASP; PMID: 29507191) the authors identified only up to 3 shelterin subunits (TERF2, TERF2IP and TPP1/ACD) based on 1 unique peptide each only. For comparison, in our dCas9 ChIP-MS dataset all 6 shelterin subunits are identified with 9-19 unique peptides, contributing to our robust quantification. Even when considering cell line-specific differences (HeLa cells have shorter telomeres and hence provide less biochemical material for enrichment per cell), these comparisons illustrate that prior attempts struggled to robustly replicate even the most abundant telomeric complex members.

      Based on these findings, others had suggested that dCas9 “might exclude some relevant proteins from telomeres in vivo” (PMID: 32152500), implying that dCas9 ChIP-MS might inherently not be feasible including at repetitive regions such as telomeres. Therefore, we believe that our dCas9 ChIP-MS data is a proof-of-concept that the method has the genuine ability to robustly enrich key proteins at individual loci. In concordance with the comment above we will include a comparison table with previous papers and expand on these points in the discussion.

      Reviewer #2: By presenting this technical paper, the authors allow laboratories across different fields to use this technique to gain insights into protein enrichment in specific chromatin regions such as the promoter of a gene of interest or a particular open region in ATACseq in a easier way and with less materials. This paper holds value in enabling researchers to answer many pertinent questions in various fields.

      Response: We again thank the reviewer for this encouraging assessment and we do indeed hope that this manuscript makes a contribution to a much wider use of ChIP-MS approaches as a promising complement to existing genome-wide epigenetics analyses.

      Reviewer #3

      Reviewer #3: Strengths of the study:

      The study is well-structured and provides a robust workflow for the application of ChIP-MS to investigate chromatin composition in various contexts.

      The use of telomeres as a model locus for testing the developed ChIP-MS approach is appropriate due to its well-characterized protein composition.

      The comparison of WT vs KO lines for ZBTB48 is a rigorous way to control for false-positives, providing more confidence in the results.

      The direct comparison of double vs only FA-crosslinking provides valuable insights into the benefit of additional protein-protein crosslinking in ChIP-MS workflows.

      Response: We thank the reviewer for this assessment and we agree that the above are several of the key features of our manuscript.

      Reviewer #3: Areas for improvement: The novelty of the method is more than questionable as both ChIP-MS coupled to LFQ and dCas9 usage for locus-specific proteomics have been previously reported. The fact that the authors directly pulldown dCas9 instead of using a dCas9-fused biotin ligase and subsequent streptavidin pulldown is only a very minor change to previous methods (not even improvement). It would be more accurate for the authors to present their study as an optimization and rigorous validation of existing techniques rather than a novel approach.

      Response: While we appreciate where the reviewer is coming from, it occurs to us that most of the reviewer’s comments equate ChIP approaches with other complementary methods, in particular proximity labelling. The latter is indeed a powerful experimental strategy and in fact we are ourselves avid users. As highlighted to reviewer #1 as well, our manuscript was originally conceived as a shorter report and based on the feedback we will now expand our discussion to more broadly incorporate related approaches.

      However, we would like to stress that dCas9 ChIP-MS and dCas9-biotin ligase fusions are not the same thing and this is not a minor tweak to an existing protocol. While both approaches have converging aims – to identify proteins that associate with individual genomic loci – the experimental workflows differ fundamentally. Biotin ligases use a “tag and run” approach by promiscuously leaving a biotin tag on encountered proteins. Subsequently, cellular proteins are extracted and in fact proteins can even be denatured prior to enrichment with streptavidin beads. While this is an in vivo workflow that (depending on the biotin ligase used) may provide sensitivity advantages, it does not retain complex information. The latter is inherently part of ChIP workflows due to the use of cross-linkers. One obvious future application would be to maintain (= not to reverse as we have done here) the crosslink during the mass spectrometry sample preparation in order to read out cross-linked peptides to gain insights into interactions and structural features. We will now more clearly incorporate such notions into our discussion.

      In addition, we would like to stress that while this reviewer focuses primarily on the dCas9 aspect of our manuscript, we believe that our general ChIP-MS workflow including the combination with label-free quantitation is useful and important already by itself as e.g. recognised by both reviewers #1 and #2.

      Reviewer #3: The authors should more thoroughly discuss previous works using ChIP-MS and dCas9 for locus-specific proteomics. This would give readers a better understanding of how the current work builds on and improves these earlier methods. For a paper that aims on presenting an optimized ChIP-MS workflow it is crucial to showcase in which use cases it outperforms previously published methods.

      E.g., compare locus-specific dCas9 ChIP-MS to CasID (doi.org/10.1080/19491034.2016.1239000) and C-Berst (doi.org/10.1038/s41592- 018-0006-2); how does your method perform in comparison to these?

      Response: Again, while we will now incorporate more extensively comparisons with previous ChIP-MS publications (and the few prior manuscripts that included dCas9) as well as related techniques, we would like to stress that dCas9 ChIP-MS is not the same approach as CasID and C-BERST, which rely on dCas9 fusions to BirA* and APEX2, respectively. dCas9-APEX2 strategies were also published by two additional groups as CASPEX (back-to-back with the C-BERST manuscript; PMID: 29735997) and CAPLOCUS (PMID: 30805613). All of these methods target specific loci with dCas9 and promiscuously biotinylate proteins that are in proximity to the dCas9-biotin ligase fusion protein. As described above, while the application of the BioID principle (PMID: 22412018) to chromatin regions has converging aims with the dCas9 ChIP-MS part of our manuscript, they do not test the same. ChIP carries chromatin complexes through the entire workflow while the CasID approaches are independent of that. This is the same scenario if we were to compare IP-MS reactions (such as the ChIP-MS reactions presented here for endogenous proteins) and BioID-type experiments for proximity partners of the same bait proteins.

      Reviewer #3: Compare likewise the described protein interactomes to previously published interactomes.

      Response: We will add comparisons in form of Venn diagrams with previously published interactomes. However, we would like to stress that a key aspect of our manuscript is the smaller yet rigorous hit lists based on e.g. loss-of-function controls, higher stringencies and specificity. Simply comparing final interactomes remains reductionist relative to the importance of other variables such as experimental design, number of replicates, data analysis etc.

      Reviewer #3: The authors use sgGAL4 as a control for the telomeric targeting of dCas9. The IF results (Fig3b) show that sgGAL4 barely localizes to the nucleus with very faint signals. It would be helpful to use a control with homogenous nuclear localization of dCas9 to further strengthen the author's conclusions.

      Response: dCas9-EGFP in the presence of sgGAL4 localises diffusely to the nucleus as expected. We have here used a very widely used non-targeting sgRNA control that has been originally used for imaging purposes (PMID: 24360272) and has since been used in a variety of studies (e.g. PMID: 26082495, 32540968, 28427715) including a previous dCas9 ChIP-MS attempt (PMID: 28841410). In addition, to the diffuse nuclear, non-telomeric localisation we provide complementary validation of clean enrichment of telomeric DNA specifically in the sgTELO samples. Therefore, we do not see how other non-targeting sgRNAs would provide for better controls or improve our data.

      Reviewer #3: The extrapolation of results from the use of telomeres as a proof-of-concept to other loci is not a given considering the highly repetitive structure of telomeric DNA. The authors should either be more cautious about generalizing the results to other loci or demonstrate that their method can also capture locus-specific interactomes at non-repetitive regions.

      Response: We agree that the adoption of any locus-specific approach to single genomic loci is a steep additional hurdle and warrants rigorous data on well characterised loci with very clear positive controls. We will expand on these challenges in our discussion. However, we would like to stress that we did not make any such statement in our original manuscript apart from simply referring to our telomeric experiment as proof-of-concept evidence that locus-specific approaches are feasible by ChIP.

      Reviewer #3: What are concrete biological insights from this optimized ChIP-MS workflow that previous methods failed to show?

      Response: We explicitly used telomeres as an extensively studied locus with clear positive controls that at the same time allows us to evaluate likely false positives. As such the intention of the manuscript was not to yield concrete biological insights but to develop a new methodological workflow.

      As also highlighted in a response to reviewer #2, based on other prior attempts to enrich telomers in ChIP-like approaches with dCas9 (PMID: 28841410 & 29507191), it had been suggested that dCas9 “might exclude some relevant proteins from telomeres in vivo” (PMID: 32152500), implying that dCas9 ChIP-MS might inherently not be feasible including at repetitive regions such as telomeres. Therefore, recapitulating the set of well-described telomeric proteins was no trivial feat and our ChIP-MS workflow (both targeted and applied to individual proteins) represents a well-validated method to in the future systematically interrogate changes in chromatin composition. As one example at telomeres, this may include chromatin changes upon the induction of telomeric fusions or general DNA damage.

      Reviewer #3: For instance, the authors could compare their mouse and human TERF2 interactomes and discuss similarities and differences between both species.

      Response: We thank the reviewer for this suggestion, but the comparison between mouse and human TERF2 interactomes is not suitable across the datasets that we generated. U2OS is a human osteosarcoma cell line that relies on the Alternative Lengthening of Telomeres (ALT) pathway while our mouse data is based on embryonic stem cells (mESCs) and mouse liver tissue. Even the latter, in contrast to adult human tissue, expresses telomerase. We can certainly still pinpoint (as already done in our original manuscript) individual differences among known factors, e.g. the fact that proteins such as NR2C2 are more abundantly found at ALT telomeres (PMID: 19135898, 23229897, 25723166) vs. the detection of the CST complex as telomerase terminator (PMID: 22763445) in the mouse samples. However, the TERF2 datasets contain hundreds of proteins as “hits” above our cut-offs and a key message of our manuscript is that the majority of them are likely false positives. Here, differences are likely extending to expression differences between U2OS cells, mESCs and liver samples. So while appealing in theory, this cross data set comparison would remain rather superficial and error prone at this point. As a biology focused follow-up study, this would need to be rigorously conceived based on an appropriate choice of human and murine cell line models. In addition, this would likely require the generation of FKBP12-TERF2 knock-in fusion clones to allow for rapid depletion of TERF2 for a clean loss-of-function control since sustained loss of TERF2 leads to chromosomal fusions and eventually cell death in most cell types.

      Reviewer #3: The authors should also describe which interaction partners are novel and try to validate some of these using orthogonal methods.

      Response: We will now highlight more explicitly two proteins, POGZ and UBTF, that are most robustly and reproducibly enriched on telomeric chromatin across datasets, including the U2OS WT vs. ZBTB48 KO comparison (Fig. 2a). However, we would like to abstain from a molecular characterization at this point. As mentioned above, the discovery of novel telomeric proteins is not the focus of this manuscript, which is primarily dedicated to method development. In addition, these type of validations in methods papers are often limited to a few assays (e.g. can 1 or 2 proteins be enriched by ChIP? Do you see some localisation by IF? etc.). However, our research group has a history of publishing in-depth mechanistic papers on the characterisation of novel telomeric proteins (e.g. PMID: 23685356, 28500257, 20639181, doi.org/10.1101/2022.11.30.518500). Therefore, a genuine validation of such factors would require functional insights and clearly warrants independent follow-up work.

      Reviewer #3: Human Terf2 ChIP-MS (Fig1A) seems to be much more specific than the mouse counterpart (Fig1D) (32 TERF2 interactors out of 176 hits in human vs 12 TERF2 interactors out of 500 hits in mouse). Could the authors explain this notable difference?

      Response: As eluded to above, Fig. 1A and 1D cannot be directly compared, starting with the difference in complexity in the input material – cell line vs. tissue. For comparison, the Terf2 ChIP-MS data from mouse embryonic stem cells tallies up to 19 out of 169 hits, which is much closer to the U2OS results. Again, we deem the majority of hits from the TERF2 ChIP-MS data to be false-positives and the more complex input material from mouse livers likely accounts for the difference in these numbers.

      Reviewer #3: The authors used much higher cell numbers than previously published ChIP-MS experiments; while this is understandable for dCas9-based pulldowns, the cell number is expected to be down-scalable for the other IPs (TERF2, ZBTB48, MYB). Since this work primarily describes an optimized Chip-MS workflow, the authors should show that they can reasonably downscale to at least 15 Mio cells per replicate; one way of achieving this could be through digesting on the beads and not in-gel.

      Response: As we will illustrate in the comparison table that was also requested by reviewer 2, our approach does not use higher cell numbers than previous ChIP-MS approaches – quite the contrary. In addition, we would like to highlight that while we state 50 million cells in Fig. 1a, we only inject 50% of our samples for MS analysis to retain a back-up sample in case of technical issues with the instruments. In other words, our workflow is already effectively based on 25 million cells and thereby pretty close to the requested 15 million cells while simultaneously requiring substantially less reagents.

      Importantly, our examples are based on rather lowly expressed bait proteins such as ZBTB48 (not detected within DDA-based proteomes of ~10,000 proteins in U2OS cells). While the workflow can be applied across proteins, exact input numbers might vary depending on the bait protein, e.g. histones and its modifications would likely require less for the same absolute sample enrichment. For instance, PMID 25990348 and 25755260 performed ChIP-MS on common histone modifications but still used 300-800 million cells per replicate. Considering that we worked on substantially less abundant proteins, we here present a workflow with comparably low input samples.

      Reviewer #3: It is not clear from the text or figure what the authors are trying to show in Fig2c. They should either explain this further or take the figure out.

      Response: We are trying to illustrate the following: As in any IP reaction the bait protein is the most enriched protein with very high relative intensities, e.g. TERF2 in the TERF2 ChIP-MS data. Direct protein interaction partners – here the other shelterin members – follow at about 1 order of magnitude lower signal intensities. In contrast, proteins that are enriched via an interaction with the same DNA molecule (i.e. that do not physically interact with the bait protein) such as NR2C2, HMBOX1 and ZBTB48 further trail by at least 1 more order of magnitude. These are information that are not easily visualised within the volcano plots and mainly “buried” within the Supplementary Tables. However, these relative intensities displayed in Fig. 2c clearly illustrate the dynamic range challenge that ChIP-MS poses for proteins that independently bind to the same chromatin fragment. We have now modified our text to make this point more clear.

      Reviewer #3: Was there any benefit in using a Q Exactive HF vs timsTOF flex?

      Response: Yes, measuring the same samples (e.g. the 50% backup mentioned above) on both instruments enriches more telomeric proteins/shelterin proteins in e.g. the dCas9 ChIP-MS data set on the timsTOF fleX. However, given the difference in age of these instruments/technologies between a Q Exactive HF and a timsTOF fleX (in the context of these experiments the equivalent of a timsTOF Pro 2), this is not a fair comparison beyond concluding that a more recent instrument like the timsTOF fleX achieves better coverage and is more sensitive with otherwise comparable measurement parameters. As we did not have the opportunity to run matched samples on e.g. an Exploris 480, we would not want to make claims across vendors. As stated in the discussion we are expecting that even newer generation of mass spectrometers, such as the very recently released Orbitrap Astral or timsTOF Ultra would further improve the sensitivity and/or allow to reduce the amount of input material. Therefore, the main conclusion is that improvements in the mass spec generations improve proteomics data quality and our samples are no exception, i.e. this is not specifically pertinent to our approach.

      Reviewer #3: How did the authors analyze the PTM data? This is not described in the methods section. In addition, it would be important to validate the novel PTMs described for NR2C2.

      Response: We apologise for the oversight and we will add the description of PTMs as variable modifications during our MaxQuant search in the methods section. The originally deposited datasets already include this and we had simply missed this in our methods text.

      While we are not 100% sure to understand the request for validation correctly, we would like to point out that the PTMs on NR2C2 have been previously reported in several high-throughput datasets and for S19 in functional work on NR2C2 (PMID: 16887930). However, the relevance in our data set is as follows: While the PTMs on TERF2 as the bait protein could occur both on telomere-bound TERF2 as well as on nucleoplasmic TERF2, NR2C2 is only enriched in the TERF2 ChIP-MS reactions due to its direct interaction with telomeric DNA. The co-detection of its modifications therefore implies that at least some of the telomere-bound NR2C2 carries these modifications. We showcase this example as an additional angle of how such ChIP-MS datasets can be analysed.

      While the robust, MS2-based detection of these modified peptides in our data set and several other publicly available datasets provides strong evidence that these modifications are genuine, further functional validation would involve rather labour-intensive experiments and resource generation (e.g. phospho-site specific antibodies). We hope that the reviewer agrees with us that this would require an independent follow-up study and that this goes beyond the scope of our current manuscript.

      Reviewer #3: For this kind of methods paper one would expect to see the shearing results of the ChIP-MS experiments since variations in DNA shearing can impact the detection of false-positives in the ChIP-MS experiments

      Response: We will include agarose gel pictures of our sonicates, which we indeed routinely quality controlled prior to ChIP experiments as stated in our methods description.

      Reviewer #3: Overall, the current state of the manuscript neither provides direct evidence that the "optimized" ChIP-MS workflow is better in certain aspects/use cases than previously published methods nor does it provide novel biological insights. At the current state it even cannot be considered as a validation of previously published methods since it does not discuss them.

      Response: We politely disagree with this conclusion. Again, as mentioned above we are under the impression that this reviewer somehow equates our entire manuscript to a comparison with dCas9-biotin ligase fusions.

      Instead, we here provide a workflow for ChIP-MS that incorporates label-free quantification as the experimentally easiest, most intuitive quantification method for non-mass spectrometry experts. This offers a particularly low barrier to entry aimed at making ChIP-MS more widely accessible as a complement to commonly used ChIP-seq applications. Furthermore, we showcase that as a gold standard ChIP-MS – to truly live up to its name – should have the ability to enrich proteins independently binding to the same chromatin fragment. We demonstrated that double cross-linking is critical for these assays and in return illustrate how rigorous loss-of-function controls (both KOs and degron systems) can mitigate prevalent false-positives that are exacerbated due to the cross-linking. Finally, we applied this workflow to different types of endogenous proteins (transcription factors, telomeric proteins) in cell lines and tissue and extend our work to dCas9 ChIP-MS as a targeted method.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the Reviewers for their detailed and constructive comments. As we describe below, we have now amended the manuscript to address their concerns and suggestions.

      2. Point-by-point description of the revisions

      Reviewer #1

      __In the first paragraph the reviewer states that our study is well presented and convincing, but that it seems “an incremental advance to the previous ones, which properly accounted for PLK4 symmetry breaking and are based on similar assumptions”. __We apologise for not explaining properly why our work is an important advance on these previous studies. Although both previous models can account for some aspects of PLK4 symmetry breaking, they both have significant issues. For example, Takao et al. perform no analysis of the robustness of their model, and from the small number of simulations shown it is clear that some very odd behaviours emerge—e.g. the oscillation of the dominant PLK4 site around the 6 compartments (Figure 3C, Example 3) and the bizarre manner in which PLK4 overexpression drives the formation of multiple PLK4 peaks (Figure 4B, first two examples). The authors do not comment on, analyse, or explain these strange phenomena. This model also relies on STIL being added to the system only after PLK4 has already broken symmetry; this is not plausible in rapidly dividing systems such as the fly embryo where Ana2/STIL levels remain constant through multiple rounds of centriole duplication (Steinacker et al., JCB, 2022). The Leda et al. model predicts that inhibiting PLK4 kinase activity will deplete PLK4 from the centriole, but it is now clear that PLK4 accumulates at centrioles when its kinase activity is inhibited (e.g. Yamamoto and Kitagawa, Nat. Comms., 2019). Moreover, this model supposes no spatial relationship between PLK4-binding compartments; this has important implications for the system’s behaviour (see point 1 in our response to Reviewer #2), and is biologically highly implausible. Thus, neither of the previous models can properly account for several important aspects of PLK4 symmetry breaking.

      Moreover, the two previous studies are not based on similar assumptions. It is only through our analysis that we discover that the underlying biological process driving symmetry breaking in both previous models can be described in the same terms: with short-range activation and long-range inhibition causing diffusion-driven instability. This crucial conclusion was not obvious from, nor claimed by, either of the previous publications. We believe this is an important step in model development for these systems.

      __The reviewer raises a number of minor concerns, the first of which is a previous study from Chau et al. (Cell, 2012), which studies how two component systems break symmetry. Differential diffusion is not essential for symmetry breaking in some of the models considered by Chau et al., and so they wonder if it is really essential in our system. __We thank the reviewer for pointing us to this study. It can be proven mathematically that differential diffusion is essential for symmetry breaking in the Turing-type framework. In the systems studied by Chau et al., symmetry can be broken without differential diffusion if one of the two components can be depleted from the cytoplasm. Such cytoplasmic depletion does not occur in traditional Turing-type systems, and it is almost certainly not occurring during PLK4 symmetry breaking—e.g. FRAP experiments show that PLK4 continuously turns over at centrioles (Cizmecioglu et al., JCB, 2010; Yamamoto and Kitagawa, Nat. Comms., 2019). We discuss this point (p8, para.3).

      __The reviewer states that it is unclear which term in equations (3-4) and (5-6) correspond to the self-activation and activation/inhibition of the other component that are indicated in the schematic summary of the models shown in Figure 1C. __As we now clarify, in general it is not always possible to pinpoint a single term in an equation that corresponds to activation/inhibition. Mathematically, a positive feedback for means that , and a negative feedback for means that . Hence, activation and inhibition can change depending on the values of these derivatives during the dynamics as these inequalities may be achieved with complex expressions that extend beyond the usual proportional relationships. We have amended the manuscript to make this clearer (p10, para.2).

      The reviewer pointed out an error in the arrows in Figure 2 (we believe this is actually Figure 4). We thank the reviewer for pointing this out and have now corrected this mistake.

      Reviewer #2

      Major Comments:

      __ 1. The reviewer points out that in all models of PLK4 symmetry breaking the overexpression of PLK4 should be able to generate multiple PLK4 peaks (as, experimentally, PLK4 overexpression can generate up to 6 procentrioles around the mother centriole). The Reviewer suggests that the two previous models can do this, but we only show examples where PLK4 overexpression generates two peaks, and the reviewer questions whether this is a general limitation that would invalidate our approach. __We are grateful to the reviewer for pointing this out, and we now expand our analysis and discussion of this important issue (p13-15). It is indeed possible to produce more peaks in our model using different parameters—e.g. decreasing diffusivity leads to thinner peaks, allowing more peaks to form (Figure 3B, Figure 5B). Importantly, however, when diffusion is decreased, the region of the parameter space in which only a single peak will form inevitably becomes smaller—as diffusion can no longer efficiently suppress the formation of additional peaks around the rest of the centriole surface. Hence, in both our original models we struggled to find a parameter regime in which PLK4 robustly formed a single peak, but also formed >3 peaks when PLK4 was overexpressed. As we now discuss in detail, we believe that this is a general problem, as any model of PLK4 symmetry breaking must involve information being communicated around the centriole surface. We now show that a possible solution to this problem is to postulate that increasing PLK4 levels leads to a decrease in PLK4 diffusivity (Figure 3C, Figure 5C)—a biologically plausible possibility (p15, para.2).

      In addition, it is not correct to say that the previous formulations of these models do not have this problem (or, in the case of Leda et al., the model actually has a related problem). This problem must apply to the Takao et al. model, as it also relies on information travelling around the centriole surface. This problem is far from obvious, however, because Takao et al. do not analyse the robustness of their model. This problem does not apply to the Leda et al. model, but this is because their model supposes no spatial relationship between the individual compartments and instead assumes that communication between compartments is instantaneous. This allows their system to overcome this communication problem and so robustly form a single peak at low PLK4 concentrations, while forming multiple peaks at high concentrations (as shown in Figure 6B). However, this requires that diffusion is sufficiently fast that concentration gradients are negligible between centriolar compartments, but not so fast that the relevant species are diluted in the much larger cytoplasm. It seems implausible that both of these effects may be achieved with a single diffusion rate in the real-world physical system.

      __ 2. The reviewer points out that in our modelling any multiple PLK4 peaks formed will tend to be evenly spaced around the centriole surface whereas, in their original formulations, the two previous models predict that any multiple ‘winning’ PLK4 compartments will not have any preferential spatial location with respect to each other. They ask that we address this difference and justify why we think our prediction is a better representation of PLK4 symmetry breaking. __Although it is not obvious, neither of the previous models makes clear predictions about the spacing of multiple PLK4 peaks. As described above, Leda et al. assume no spatial relationship between PLK4-binding compartments, so relative peak-spacing cannot be assessed. Moreover, from the limited analysis shown, it is not clear that Takao et al. predict random spacing. The authors show only two simulations of PLK4 overexpression (Figure 4B, first two simulations) and the behaviour of PLK4 is very odd: the initial noise in the system fades away before PLK4 levels rapidly and near-simultaneously rise at multiple, reasonably well-spaced, peaks, before fading away to low levels—even after STIL addition. At the end of the simulation the “winning” compartments contain very low levels of PLK4 (often lower than the noise initially introduced into the system), but these compartments are reasonably (simulation 1) or very (simulation 2) evenly spaced.

      Nevertheless, the reviewer is correct that the even spacing of multiple peaks is a feature of our model. Unfortunately, it is not possible to compare this prediction to reality because the spacing of multiple PLK4 peaks in cells overexpressing PLK4 has not been quantified yet. Thus, one has to interpret published images, some of which support equal spacing while others do not (e.g. Kleylein-Sohn et al, Dev. Cell, 2007). Moreover, this analysis is likely to be complicated because CEP152 can form incomplete rings. This can be appreciated in Figure 2C in Hatch et al., (JCB, 2010) where the extra centrioles induced by PLK4 overexpression do not appear to be evenly spaced around the centriole, but are quite evenly spaced around the partial CEP152 ring. Therefore, equal spacing of peaks in ideal conditions is a feature predicted by our model that still needs to be fully explored experimentally. We believe that part of the power and value of our model is to suggest such hypotheses. We now discuss this important point (p26, para.2).

      __ 3. The reviewer questions our attempt to discretise our continuum model (where we convert the continuous centriole surface to a series of discrete compartments on the centriole surface and show that symmetry breaking can still occur). They note that we only show one example (9 compartments), they ask for more information about how the discretisation was done, and they question the independence of the compartments as PLK4 appears to accumulate in compartments adjacent to the dominant compartment. __We apologise for the lack of clarity here. We now state that our models can break symmetry provided that there are at least two compartments, and we now include simulations showing that this happens for 2 – 10 compartments (Figure S2). The discrete model is a finite-difference discretisation of the continuum model (described in Appendix V). We also now clarify that the compartments are ‘independent’ in the sense that all chemical reactions only occur between components that are within the same compartment. The compartments are still spatially linked via a discretized diffusion (as would likely be the case at the centriole), which explains the observed relationship between neighbouring compartments.

      __ 4. The reviewer asks whether all the parameter values that satisfy the mathematical constraints we calculate for our models will break symmetry. If so, they suggest we are using a circular argument when demonstrating that the models break symmetry as we use parameter values chosen specifically to satisfy these constraints. __In Turing-systems, one can mathematically calculate parameter constraints that allow symmetry breaking. As we now clarify, all parameters that satisfy these constraints can break symmetry, while any parameters outside these constraints cannot break symmetry. Thus, it was never our intention to claim something new or surprising when we illustrated the symmetry-breaking properties of our models (Figures 2 and 4, and associated parameter space analysis in Figures 3 and 5), so we apologise that our intention on this point was unclear. Rather, these Figures illustrate the detailed behaviour of each system under different conditions—something that is not possible to intuit from the equations alone.

      5. The Reviewer requests more information about how we chose the particular parameter values we use to illustrate each model and asks that we convince readers that other sets of values that satisfy the derived mathematical requirements would result in the same qualitative outcomes. As described in point 4 above, and as we now state more clearly, it is a mathematical fact that parameter values that satisfy the derived mathematical requirements can break symmetry. We now discuss our reasons for choosing specific parameters in more detail (see point 6, below).

      __ 6. The Reviewer asks whether the dimensionless parameters we use in our models have any biological relevance, and requests a biological interpretation of all of them. They also request that we relate the Diffusivity ratios of the Activator and Inhibitor species (____) to the experimental observations made by Yamamoto and Kitagawa. __Relating our dimensionless parameters to biologically-relevant dimensional parameters is a complex issue. For example, one can see from equations (5) and (6) that simultaneously doubling (A), (I), and (a), and decreasing (b) by a factor of 4 leaves the system unchanged. Since the concentrations of A and I are unknown at the centriole surface, this means that it is not possible to determine the dimensional values of the rate of production of I (a) and its rate of conversion to A (b). This limitation is the root of the mathematical fact that FRAP experiments can reveal “off” rates but not “on” rates. Moreover, to convert the rate of loss of A (c) and I (d) into dimensional parameters it is necessary to know the timescale of symmetry-breaking. This is unknown, but was assumed to be on the order of hours in the previous models. This corresponds to a degradation/loss rate of minutes with our current choice of parameters, which is consistent with FRAP data (e.g. Yamamoto and Kitagawa, Nat. Comms., 2019). Regarding the ratio, the effective diffusion in our model depends on both the bulk diffusion and the binding/unbinding/degradation rates – a complexity also noted by Yamamoto and Kitagawa. This makes it very difficult to relate the “effective” surface diffusivity to the bulk diffusivity. We are currently investigating the form of this dependency, but this is a complex mathematical problem that is beyond the scope of this manuscript. These issues are difficult to discuss succinctly, so we now simply state that we chose specific parameter values based, in part, on the values and ratios used in the previous modelling papers (p10, para.2; p17, para.2).

      Unfortunately, we could not find any experimental measurements of diffusivity in the Yamamoto and Kitagawa paper, as the Reviewer suggests. We now clarify, however, that the ratio we use in both models (2500) is chosen to be between the effective diffusivity ratio (as the previous models used binding/unbinding rates rather than diffusivity) used by Takao et al. (10000) and Leda at al. (200). We also include a phase diagram showing how varying the diffusivity of both factors influences symmetry breaking in both models (Figure 3B, Figure 5B), and we state that we have chosen all remaining parameter values to reflect the parameter values in the original models, when adjusted to the same timescale.

      __ 7. The Reviewer asks for more information about how we normalised time in our simulations and whether the time in different simulations is comparable. __We now clarify that the simulations run for a single unit of dimensionless time (so they can be compared), and that the reaction/diffusion parameters in the system are sufficiently large by comparison with unity that all simulations achieve steady state within a unit of time (p11, para.2).

      8. The Reviewer asks whether concentrations of _and can be compared between simulations, and also questions our description of _ being uniformly accumulated in Figure 4D, rather than uniformly depleted. __We clarify that concentrations can be compared within a model, but not between models. This is because the dimensional values depend on the dimensional reaction rates, which differ between the models. This is not just a theoretical limitation; experimental fluorescence signals are typically compared in relative arbitrary units so the absolute values of different systems cannot be easily compared for the same reason. We agree with the reviewer that it is better to describe Figure 4D as showing uniform depletion of the activator, and we have adjusted the legend accordingly.

      The reviewer makes a number of minor points that are not numbered.

      __The reviewer asks for clarification of what we mean by “robustness”: does this refer to the ability to produce the same result in multiple simulations, or to the ability to produce the same result when parameter values are varied? If the latter, then the reviewer suggests our models are not very robust. __We apologise for this confusion and now more clearly define what we mean by robust (p13, para.2). As we discuss in point 1 of our response to this Reviewer, our initial models are indeed not very robust at producing a single PLK4 peak over a range of PLK4 concentrations. We now discuss why this lack of robustness is likely to be intrinsic to any PLK4 symmetry breaking system, and how robustness in all such models can be improved by allowing diffusivity to vary with PLK4 expression levels (p13-p15).

      __The Reviewer points out that the original models introduce a noise term at every iteration, whereas we only introduce an initial noise term; they ask us to discuss this difference. __We have run simulations introducing a noise term at every iteration and find that this makes negligible difference (Reviewer Figure 1, attached to the end of this letter). We do not take this approach, however, as this would significantly complicate the mathematical analysis that we perform (the additional noise term turns the system of PDEs into a system of SDEs which do not fit the Turing framework as readily). We now mention this in Appendix V.

      The Reviewer states that the reaction schemes are unnecessarily repeated in Figures 1, 2 and 4. We would like to keep these schematics, as in Figure 1 we show a generic scheme (illustrating the two possible Turing-type reaction diffusion systems) whereas in Figures 2 and 4 we show specific reaction regimes (specifying the relevant species) that we test in each model. We feel this information will be useful to readers in this visual format.

      The Reviewer states that it is confusing that we refer to the specific reaction parameters (k11 and k12) that need to be swapped to convert the Leda et al. model to the Takao et al. model, as this information will not mean anything to readers who are not familiar with the models. We agree and have now removed this information.

      The Reviewer suggests several textual amendments and/or corrections. We thank the reviewer for spotting these and have amended them all accordingly.

      __Finally, the Reviewer states in their significance summary that although our key conclusions are convincing, they are not new as Takao et al. describe their model as analogous to a “reaction-diffusion system (also known as a Turing model)”. __We were aware that Takao et al. make this statement, but this does not invalidate the novelty or significance of our work. This is because although Takao et al. described their model as being analogous to a “Turing model”, it is not actually a reaction-diffusion system, and it does not exhibit the property of long-range inhibition that is central to all Turing-systems to produce a single PLK4 peak. Instead, they use lateral inhibition (in which the influence of the inhibiting species does not extend beyond the neighbouring compartments) to reduce the number of potential PLK4 binding sites from ~12 to ~6. A single winning site is subsequently selected when STIL is added to the system—with additional positive feedback (not involving reaction-diffusion) ensuring that the compartment with most PLK4 becomes the dominant site. Their analysis of the reaction-diffusion version of their system is limited to a single supplementary figure (Figure S2D), and they do not perform or refer to any of the relevant mathematical analyses of their model that makes these well-studied systems such powerful tools. We believe that the model presented here is simple enough to draw the attention of the applied mathematics community while robust and complete enough to provide a mechanistic explanation of many interesting features and suggest new possible phenomena. We now discuss these points (p22, para.1).

      Reviewer #3

      __The Reviewer found our manuscript well-written, and judged it of interest to centriole duplication enthusiasts. __We interpret this to mean that the Reviewer did not think it of more general interest. This seems a harsh assessment, as the precise one-for-one duplication of centrioles is generally considered to be one of the great mysteries of cell biology. It is now widely appreciated that robustly breaking PLK4 symmetry to form a single PLK4 peak is crucial to this process. Thus, our discovery that this process can be described using a well-studied mathematical framework that has already been applied to a vast range of biological processes is potentially of significance even to non-centriole enthusiasts.

      The Reviewer made a number of specific comments:

      Figure 1. The Reviewer felt the graphic in Figure 1A could be improved by combining it with Figure 1B, and noted that the centrioles look strange. We thank the reviewer for these suggestions and we have now rearranged this Figure. We also now clarify that the schematic depicts Drosophila centrioles, which are simpler than human centrioles.

      __Figure 2. The Reviewer suggests that to make the system depicted in Figure 2A fit as a Type I Turing system we have to assume that (I) must dissociate from the centriole or be degraded at higher rates than (I) converts (A) to (I). They suggest this assumption is implicit in the model and they request further explanation. __The reviewer is correct that, in Model 1, the degradation/dissociation of () is the root of its self-inhibition. However, we do not need to make any assumption about the relationship between the rate at which converts to (b), and the dissociation/degradation rate of (d) for this system to work (as the Reviewer implies). This is because, whatever these rates are, the system will approach a steady-state where the production and degradation terms balance, and it is the stability/instability of this state that determines whether the system can break symmetry. Since the degradation rate of (the - term in equation 4) increases more rapidly than its production rate (the term in equation 4) as increases, this results in a stable (i.e. self-inhibiting) system regardless of the parameter values. We have rewritten the sections explaining these equations to try to make these points more clearly and to point readers to Appendix II where we explain the form of the equations.

      __The Reviewer asks if in Model 1 it is realistic to assume no turnover or loss of PLK4 (A), and will the system still work if this is altered? __This is a good point. In Model 1, we set c=0 as this makes the analysis significantly simpler, enabling us to display the mathematical predictions alongside the numerical simulation. We have now added the (c,d) phase diagram to show the effect of varying these parameters on the symmetry breaking properties of the system (Figure 3D). We find that the value of c has a relatively weak effect on the symmetry breaking properties of the model since it does not affect the function of as an activator.

      __The Reviewer asks if our 1D model would work in 2D, and notes the PLK4 peaks in our models are broad, likely limiting the number of peaks formed. They also note that in our Model 1 it is the unphosphorylated form of PLK4 that accumulates in the peak, which seems unlikely as it is widely believed that PLK4 must be active to phosphorylate STIL to promote its interactions with SAS6 and CPAP. __From a mathematical perspective, modelling our system in 2D would produce very similar results. Symmetry breaking is driven by long-range inhibition/short-range activation, and these behaviours will work analogously in 2D. As discussed in our response to Reviewer #2 (point 1), the broad peaks do indeed limit the number of centrioles that can form, but by altering the parameters we can generate more peaks that are less broad (Figures 3 and 5). The Reviewer is correct that Model 1 (based on Takao et al.) predicts that non-phosphorylated PLK4 () accumulates in the peak. This is also true of the original Takao et al. model, although this was not highlighted or commented on by the authors. We now expand our discussion of this point (p25-p26).

      The Reviewer asks if our models can form multiple peaks at higher PLK4 levels. This is again related to Reviewer #2, point 1, and we now show that this is indeed possible under the appropriate parameter regime (Figure 3C and Figure 5C).

      The Reviewer asks for more description of how lateral diffusion works in our system. For example, do we consider that not every molecule of (I) will diffuse laterally (as some will be lost to the cytoplasm), or that the probability of a molecule leaving the surface will increase as distance/time increases. We apologise for our lack of clarity. We now state that the proportion of molecules not rebinding to the surface is accounted for in the reaction components of all our models (p7, para.1). In reality, and as we now state, the relationship between this loss and the diffusion rates (and their relation to distance/time, for example) is complicated. We are investigating this relationship in more detail, but this is beyond the scope of the current paper.

      The Reviewer asks if symmetry breaking might eventually occur if the system in which we reduce the kinase activity of PLK4 (Figure 2D) were given more time. They also ask whether reducing PLK4 levels by half would lead to a failure in site-selection. The kinase inhibited scenario we show here will not break symmetry over any period of time; this can be proven mathematically, and is verified in the numerical simulations (Figure 3A and 5A, bottom left regions of graphs), which we now state more clearly are always run for a long enough period to reach a steady-state (p11, para.2). The effect of reducing PLK4 levels in our models is analysed in the phase diagrams shown in Figure 3 and 5 (and analysed in more detail in Figure S1), where it can be seen that there are multiple PLK4 concentrations that can be halved without a failure in site selection (although, see also our response to Reviewer #2, point 1).

      The Reviewer pointed out some errors in our presentation of Figure 3, (and suggested some improvements in presentation in a point further below) and also asked for more information about the parameters used to generate the data in Figures 2B-D and 4B-D. We thank the Reviewer for these suggestions and have made these changes and provided the additional information requested (e.g. marking the specific parameters used in our simulations on the phase diagrams shown in Figure 3 and Figure 5 with coloured dots).

      The Reviewer points out that when PLK4 levels and activity are both high no centrioles are produced in Model 2, whereas 1 centriole is produced in Model 1—neither of which are consistent with experimental observation. We now show an expanded parameter space (new Figures 3A and 5A) where it can be seen that this is not a problem for Model 1. For Model 2, the region of high kinase levels and activity (dark blue, top right, Figure 5A) corresponds to the uniform accumulation of the activator species. Thus, while there are no peaks, this region might produce multiple centrioles, as it is equivalent to a compartmental model in which all of the compartments are occupied. We now discuss this point (p19, para.1).

      __The Reviewer questions how the biology fits a Type II Turing system, pointing out that current data suggests that active PLK4 turns over more rapidly at centrioles, whereas in the Type II model we describe (based on the Leda et al. model) it is the phosphorylation state of STIL that determines which species of PLK4:STIL turns over rapidly. They also question the logic of the Model 2 Type II circuit (Figure 3A), questioning how A could drive the dephosphorylation of STIL to promote the production of I. __We agree that current data is more consistent with phosphorylated species of PLK4 turning-over more rapidly at centrioles, but this is not what Leda et al. proposed, and so this is not what we implemented in trying to reformulate their model (although this is effectively the change we make that turns the Leda et al. model into the Takao et al. model). As to the second point, the Reviewer has correctly spotted a problem with our model that arises because the direction of the arrows linking and were inadvertently flipped in Figure 4A. This mistake has been corrected, and we now explain more clearly how the biology of this system fits a Type II Turing system in the legend.

      __The Reviewer points out that although we can convert the Leda et al. Model (Model 2) to the Takao et al. Model (Model 1) simply by changing the identity of the _ and _ species, the underlying assumption of the Takao et al. model (that non-phosphorylated PLK4 promotes its own accumulation) was not an inherent assumption of the Leda et al. model. __We apologise for this confusion. As we now clarify (p20, para.1) the Reviewer is correct that when we make mathematical changes to the Leda et al. model we must also assume changes in the underlying biology—so that non-phosphorylated species of PLK4 are now slow diffusing, rather than non-phosphorylated species of STIL, as originally proposed). As the Reviewer points out, current data suggests that non-phosphorylated species of PLK4 do turnover more slowly, although it is not clear why—for example, liquid-liquid phase separation driving the formation of PLK4 condensates has been postulated, but is far from proven. This remains an interesting problem that will be further probed mathematically and experimentally.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We thank both reviewers for their comments, which have suggested changes that have improved the manuscript.

      Reviewer #1 (Public Review): 

      […] A weakness in the methodology is the link to tissue tension and conclusions about tissue mechanics. Methods that directly affect tissue tension and a more thorough and systematic application of laser ablation experiments would be needed to profoundly investigate mechanosensation and consequential effects on tissue tension by the various genetic perturbations.

      Response: In revision, we have added some additional experiments that examine altered tension.

      While the in-silico analysis of competing for F-actin binding sites for βH-Spec and myosin appears logical and supports the authors' claims, no point mutation or truncations were used to test these results in vivo.

      In its current structure the manuscript's strength, the genetic perturbations, is compromised by missing clear assessments of knockdown efficiencies early in the manuscript and other controls such as the actual effect on myosin by ROCK overactivation. 

      Response: In revision, we reorganized the manuscript and figures to document the knockdown efficiency earlier in the manuscript, and have added additional figure panels illustrating the effects of altered tension on myosin levels.

      Reviewer #2 (Public Review):

      […] The authors suggest that Ajuba is required for the effect of beta-heavy spectrin. However, it is still formally possible that this could be a parallel pathway that is being masked by the strong phenotype of Ajuba RNAi flies. 

      Response: While it is formally true that the genetic requirement for Jub could reflect a role in parallel to, rather than downstream of, spectrins, our conclusion that spectrins act through Jub is based not only on the genetic requirement for Jub, but also on the influence of spectrins on junctional tension and Jub localization, which indicate that spectrins influence Jub activity in a manner consistent with their affecting the Hippo pathway through Jub.

      One of the major points of the manuscript is the observation that alpha- and beta-heavy-spectrin are potentially working independently and not as part of a spectrin tetramer. This is mostly dependent on the observation that alpha- and beta-heavy-spectrin appear to have non-overlapping localizations at the membrane and the fact that alpha- and beta-heavy-spectrin localize at the membrane seemingly independently. It is not entirely obvious that a potential lack of colocalization and the fact that protein localization at the membrane is not affected when the other partner is absent is sufficient to argue that alpha- and beta-heavy-spectrin do not form a complex. Moreover, it is possible that the spectrin complexes are only formed in specific conditions (e.g. by modulating tissue tension). 

      Response: Our results argue that alpha- and beta-heavy-spectrin do not form a detectable complex in the wing disc under the conditions examined, and thus that they act independently is this context. However, we agree that it is possible that they could function together contexts, eg in other tissues or under different conditions, and we have revised the text in the Discussion to note this.

      If indeed spectrins function independently, would it not be expected to see additive effects when both spectrins are depleted? 

      Response: Not necessarily, since both alpha- and beta-heavy-spectrin act through Jub, and there may be a limit as to how much Yki activity can be increased by Jub (eg the increases in wing size induced by spectrin RNAi are similar to the increases in wing size observed with constitutive recruitment of Jub through alpha-catenin mutation (Alegot et al 2019).

      Related to the two previous points, the fact that the authors suggest that both alpha- and beta-heavy-spectrin regulate Hippo signaling via Ajuba would be consistent with the necessity of an alpha- and beta-heavy-spectrin complex being formed. How would the authors explain that both spectrins require Ajuba function but work independently? 

      Response: The different spectrins both affect Jub because they both affect cytoskeletal tension, but our results suggest that they act in different ways to affect tension. We have made some revisions to the Discussion section to try to make this clearer.

      Another major point of the manuscript is the potential competition between beta-heavy-spectrin and myosin for F-actin binding. The authors suggest that there is a mutual antagonism between the two proteins regarding apical F-actin. However, this has not been formally assessed. Moreover, despite the arguments put forward in the discussion, it seems hard to justify a competition for F-actin when beta-heavy-spectrin seems to be unable to compete with myosin. Myosin can displace beta-heavy-spectrin from F-actin but the reciprocal effect seems unlikely given the in vitro data. 

      Response: We show in vivo, in vitro, and in silico data that are all consistent with the inference that beta-heavy-spectrin and myosin compete for binding to F-actin. As the reviewer notes, and as we discuss, the in vitro competition experiments were limited because, for technical reason, we were unable to increase the protein concentrations higher. We also note that our in vitro experiments used an active form of myosin, which binds F-actin much more strongly than inactive myosin.

      Reviewer #1 (Recommendations For The Authors): <br /> While the flow of experiments is logical in general, I see major problems regarding the structure of the manuscript and essential controls: 

      • It is very confusing to have samples (kst-CRISPRa) in figures 1-3 that were not introduced in the text until the second-last paragraph of the results. I would suggest introducing this elegant overexpression experiment early in the manuscript as it fits well in the scope of these experiments or alternatively (if the authors prefer) make a new figure containing all the data regarding the overexpression in the end. 

      Response: We have now moved these results to a new figure (new Fig 7) that is described later in the text.

      • At the beginning of the manuscript, essential controls regarding the knockdown efficiency are missing in the main figure. Many of the key experiments are based on KD and as a reader, I want to assess their efficiency. Only in Figure 4, at the end of the manuscript, KST and α-Spec KD efficiency is revealed - this should be shown earlier and quantified properly. While reading the manuscript in its current form, the doubt remains that differences e.g. in α-Spec and KST KD can be explained by varying knockdown efficiencies as their levels can't be assessed. 

      Response: We have now moved these results to a new supplemental figure (Fig 1-supplement 1) that is cited earlier in the text.

      • On a similar line, in Figure 5 where myosin activity is perturbed, induction or repression of myosin activity is only suggested but not formally shown. The authors have to demonstrate that this is indeed the case by showing the myosin signal, ideally accompanied by measurement of tissue tension. 

      Response: This was not included because we and others have assessed these manipulations in earlier publications. However, as requested we have now added a supplemental figure (Fig 6 supplement 1) showing myosin levels in these genotypes.

      • On p. 7, the authors claim that "The epistasis of jub to kst suggests that βH-Spec regulates wing size through its tension-dependent regulation of Jub." While the authors show that KST KD increases myosin and junctional Jub, and that the wing overgrowth phenotype of KST KD depends on Jub, the tension-dependency was not demonstrated. To make that claim, the tension profile should be perturbed e.g. by overexpression of rok, myosin mutants (as the authors do in Fig 5) and the effect on Jub should be analyzed. Induction of tension in these conditions should be measured by laser ablation or a suitable alternative method. It might well be that the induction of Jub in KST KD is not via tension but an alternative mechanism such as the release of steric hindrance, interaction competition, etc. Also: Does KD of Jub affect spectrin localization? 

      Response: The effect of tension on Jub, and the effects of the myosin activity changes we employed on tension, have been analyzed in prior publications (eg Rauskolb et al 2014). To further address the issue raised by the reviewer here as to whether Kst affects Jub and wing growth via tension, we have also now added an additional experiment (Fig 3 supplement 1) in which we decreased tension in a βH-Spec RNAi wing disc by simultaneously expressing RNAi targeting Rok. The results show that the wing growth and Jub accumulation associated with βH-Spec RNAi are suppressed by Rok RNAi, consistent with our conclusion that these effects are mediated via cytoskeletal tension.

      As KD of Jub alters the pattern of myosin accumulation in wing discs (Rauskolb et al 2019) it could be expected to have a complementary influence on βH-Spec localization, but we have not examined this.

      • The authors make a very strong point in saying "The influence of βH-Spec on junctional tension is thus a direct consequence of its competition with myosin for overlapping binding sites on F-actin." While the authors provide some in vitro and in silico evidence, it was for example not possible to outcompete myosin by increasing levels of KST CH1-CH2 domains in vitro (for possible reasons the authors discuss). More importantly, the hypothesis that competition for actin binding is the definite cause of the antagonizing effect was not tested in vivo. Overexpression of a mutant version of KST that is unable to bind F-actin, or that has an increased affinity (etc) for actin was not tested. Such an experiment would be very valuable to enrich this manuscript but at least, claims like that have to be less bold and need to be written in a more speculative language. 

      Response: We consider creating and analyzing mutant forms of Kst in vivo to be beyond the scope of this manuscript, but as suggested we have now modified the text highlighted by the Reviewer to be more cautious.

      Further points: 

      • Why does the thickness of the wing disc epithelium change due to KST and α Spec KD, the authors should introduce this experiment better and draw a proper conclusion. Is there any relocalization of myosin along the apical-basal axis? Can the authors speculate about the differences between KST and α Spec KD? 

      Response: The epithelium thickness changes with α-Spec KD, but does not change with Kst KD. We think the explanation is provided by work from the Pan lab (done mainly in pupal eyes), which reported decreased cortical tension and increased apical area when α-Spec is lost. The interpretation in essence is that with the loss of attachment of F-actin to membranes along the lateral sides of the cells, the sides of the cells are "softer" and the cells expand laterally and thus also (by conservation of volume) shorten apical-basally. This is somewhat speculative, and it's not a focus of our study, but we have added some text to try to explain this better. Myosin along apical-basal axis was not visibly altered, but it is harder to analyze as it is very weak compared to junctional myosin.

      • Given the authors' observation of differences in the relative localization of KST and α Spec (Figure 4), proper quantification of KST, α Spec and myosin levels along the apical-basal cell axis would be important. This would also ease data interpretation. 

      Response: We have now added a higher resolution image and also a line scan of Kst, α-Spec  and Myo in a new supplemental figure (Fig 6 supplement 1)

      • KD of α Spec seems to induce myosin activity more, causes a bigger reduction of wing thickness, a stronger induction of Jub, and a similar effect on wing size. What lead the authors to focus on KST rather than α Spec regarding the detailed analysis of myosin competition? 

      Response: Our observations identify a competition between Kst and myosin, but we have no indication that α-Spec competes with myosin. (It's conceivable that β-Spec might also compete with myosin in some contexts, but wing discs would not be a good place to examine this because the localization profiles of β-Spec and Myosin are so different).

      • A big criticism regarding the figures is the bad color choice which makes it difficult to decipher the fluorescent signals. Likewise, the labels are difficult to read with the present coloring. They should really be changed. 

      Response: We have now changed the single color images to gray scale (for multi-color images we retain RGB coloring).

      A minor point: 

      • To make the manuscript more accessible for researchers outside the Drosophila field, I'd suggest adding explanatory labels for Drosophila-specific terms such as hyperactive myosin for sqhEE, a scheme to show where UAS-dcr2 is active, explain the purpose of Rfp expression as a control for tissue specificity, etc. 

      Response: We have added some explanations to the text to try to make this clearer.

      Reviewer #2 (Recommendations For The Authors): <br /> Major points: 

      In lines 99-101, the authors mention that Deng et al., 2015 report that the depletion of spectrins leads to an increase in pMLC, with no associated changes in the colocalization of myosin and F-actin. It is more accurate to mention that Deng et al. suggest that the levels of a GFP-tagged rescue construct of MLC (Sqh) are unchanged in alpha-spectrin mutants, although this was not formally quantified. Moreover, there was not a formal assessment of colocalization between MLC and F-actin, but rather a suggestion that F-actin levels are unaffected by the alpha-spectrin mutation. Finally, Deng et al. mostly analyzed alpha-spectrin so it remains possible that the new results shown by the authors are compatible with the initial observations from Deng and colleagues. 

      Response: As suggested, we revised the text to note that Deng et al., 2015 specifically examined Sqh:GFP. While we agree that our focus is more on Kst and Deng et al focused on α-Spec, we also examined α-Spec, and as described our results examining Myosin and Jub differ from what was reported by Deng et al 2015.

      As mentioned above, it is still possible that spectrins and Ajuba are working in parallel and Ajuba is not necessarily downstream of spectrins. The strong phenotype of Ajuba RNAi flies in adult wings could mask the effect of spectrins. Are the results similar in other settings, such as in the absence of Dicer2? Also, can Ajuba RNAi phenotypes be modified by overexpression of spectrins? This would provide further evidence of a link to Ajuba function. 

      Response: While formally it is true that the genetic requirement for Jub could reflect a role in parallel to, rather than downstream of, spectrins, our conclusion that spectrins act through Jub is based not only on the genetic requirement for Jub, but also on the influence of spectrins on junctional tension and Jub localization, which indicate that spectrins influence Jub activity in a manner consistent with their affecting the Hippo pathway through Jub.

      We would not expect over-expression of spectrins in a jub RNAi background to further reduce Hippo signaling, and as the jub RNAi phenotype is much stronger than the Kst over-expression phenotype even if there were an effect it would likely be difficult to detect.

      Regarding the potential independent functions of spectrins, it would be interesting to determine if alpha- and beta-heavy-spectrin can still interact at the level of the AJ despite the fact that their distributions appear to be partly non-overlapping. Would it be possible to assess this using PLA? If an interaction is not detected via PLA, it would be more convincing that spectrins are functioning independently. 

      Response: We have now performed this experiment, and no significant signal was detected by PLA. As a control, we used identical antibodies (GFP and α-Spec) to conduct PLA on α-Spec and β-Spec, and we did detect signal by PLA. These results (included in a revised Figure 4) further support the conclusion that α-Spec and βH-Spec are not physically associated in wing discs.

      Related to this point, if the spectrins work independently, it is reasonable to assume that they could display additive effects. Is this the case? If alpha- and beta-heavy-spectrin are simultaneously depleted are the phenotypes more severe than either depletion alone? 

      Response: We disagree here. Since both alpha- and beta-heavy-spectrin act through tension and Jub, and there is likely a limit as to how much Yki activity can be increased by this pathway. For example, the increases in wing size induced by spectrin RNAi are similar to the increases in wing size observed with constitutive recruitment of Jub through alpha-catenin mutation (Alegot et al 2019), which may thus represent the maximum increase that can be induced through this pathway (as there are multiple, independent factors that regulate Hippo signaling).

      Authors should modulate membrane tension and assess if this affects the localization of alpha- and beta-heavy-spectrin and, specifically, their colocalization, as their interaction could be regulated. 

      Response: As reported, we do see effects of tension on βH-Spec localization. We would not expect significant effects of membrane tension on α-Spec localization, but we consider analysis of this outside the scope of this manuscript.

      In lines 185-187, the authors mention that beta-spectrin depletion does not affect beta-heavy-spectrin localization. Interestingly, Figure 4E appears to show that the levels of Kst-YFP appear to be lower in the beta-spectrin-depleted tissue. The localization of beta-heavy-spectrin is not necessarily affected but the overall levels could be. 

      Response: Indeed the levels appear slightly lower, but elucidating the reason for this will require further experiments that are beyond the scope of this manuscript (we suspect it is because cytoskeletal tension increases in β-Spec-depleted tissue as it does in α-Spec depleted tissue, which based on our observations should decrease levels of Kst at near junctions). The key point of these experiments was to show that α-Spec localization does not require βH-Spec, but does require β-Spec, which supports our conclusion that in wing discs α-Spec forms a complex with β-Spec but not with βH-Spec.

      In lines 200-203, the authors state that beta-heavy-spectrin and myosin colocalize extensively at the apical region. However, this colocalization is not as clear as stated. Do the authors have alternative data that suggests that the two proteins are indeed colocalizing? Would it be possible to perform PLA to detect a potential colocalization? 

      Response: Unfortunately we do not have antibodies against both proteins that work well enough for PLA. However, we quantified the co-localization by analysis of Pearson's correlation coefficient, as reported in the manuscript. We also added an additional higher magnification image, and a line scan, in a supplemental figure (Fig. 6 supplement 1).

      Authors should try to assess and quantify colocalization with F-actin for both beta-heavy-spectrin and myosin in wild-type conditions and when the levels (and/or activity) for each of them are modulated. 

      Response: We have added quantification of the co-localization of βH-Spec with F-actin and of myosin with F-actin to the revised manuscript.

      Minor points: 

      In lines 122-124, the authors should clarify the relevance of the observation that alpha-spectrin knockdown affects the thickness of the wing disc epithelium. 

      Response: We have added some text to try to elaborate on this.

      In the intro, it is perhaps necessary to mention that there are conflicting reports regarding the role of spectrins in the regulation of cell proliferation, at least in the follicular epithelium. For instance, Ng et al., 2016 argued that spectrins do not regulate cell proliferation in FECs. 

      Response: Rather than wading into a detailed discussion of issues that are peripheral to this study, we modified the text in the Introduction to avoid implying that spectrins control cell proliferation in the ovary.

      In Figures 1, 2, 3, and 4 (and respective supplements), it is encouraged that, wherever appropriate, the authors mark the different compartments or the relevant boundary using dashed lines, to more clearly indicate the regions to compare. 

      Response: We have now done this.

      In Figure 2, supplement 1 panels C and D should have an indication of the genotype for clarity. 

      Response: We have now added this.

      In lines 362-367, the authors suggest that other actin-binding proteins are likely to influence the role of beta-heavy-spectrin. Have the authors tested the role of spectrin interactors such as Ankyrin and Adducin?

      Response: No, we have not examined this.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We were pleased with the overall enthusiastic comments of the reviewers:

      • Reviewer #1: “This manuscript by Mahlandt, et al. presents a significant advance in the manipulation of endothelial barriers with spatiotemporal precision”

      • Reviewer #2: “The immediate and repeatable responses of barrier integrity changes upon light-on and light-off switches are fascinating and impressive.”

      • Reviewer #3: “, these molecular tools will be of broad interest to cell biologists interested in this family of GTPases.”

      We thank the reviewers for their fair and constructive comments that helped us to improve the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) This paper is likely to attract a diverse audience. However, the order of data presented in this manuscript can be confusing or challenging to follow for the naive reader. This is because the tool characterization is split into two parts: before the barrier strength assay (selection of optogenetic platform and tool expression) and after (characterization of cell morphology with global and local optogenetic stimulation). Reorganizing the results such that the barrier strength results follows from an understanding of individual cell responses to stimulation may improve the ability of this readership to understand the factors at play in the changes in barrier strength observed when opto-RhoGEFs are activated.

      We appreciate this idea, and we initially structured the paper in the proposed order and then decided, that we wanted to put more focus on the barrier strength results by already presenting them in the second figure. Therefore, we prefer to keep this order of figures.

      2) While the description of the selection of iLID as the study's optogenetic platform is clear, a better job could be done motivating the need for engineering new optogenetic tools for the control of GEF recruitment. Given that iLID-based tools for GEFs of RhoA, Rac1, and Cdc42 already exist, some of which are cited in the introduction, more information on why these tools were not used would be helpful-were these tools tested in endothelial cells and found lacking.

      The original system has the domain structure DHPH-tagRFP-SspB. But we wanted to work with a SspB-FP-GEF construct, which would allow easy exchange of the FP and the DHPH domain. This modular approach allowed us to generate and compare the mCherry, iRFP647 and HaloTag version. We don’t want to claim that we engineered an entirely new optogenetic tool but rather optimized an existing one with different tags. To make this more clear we added : ‘The membrane tag of the original iLID was changed to an optimized anchor. In addition, we modified the sequence of the domains to SspB, tag, GEF to simplify the exchange of GEF and genetically encoded tag. A set of plasmids with different fluorescent tags was created for more flexibility in co-imaging.’

      3) Comment on the reason behind using DHPH vs. DH domains for each GEF is needed.

      We have previously found (and this is supported by biochemical analysis of GEF activity) that the selected domains provide the best activity. We will add reference and the following to the text: ‘Their catalytic active DHPH domains were used for ITSN1 and TIAM1 (Reinhard et al., 2019).  In case of p63 the DH domain only was used, because the PH domain of p63 inhibits the GEF activity (Van Unen et al., 2015) (Fig. 1E).

      4) Since multiple Rho GTPases (e.g., RhoA, RhoB, RhoC) exist and Rho is used as the name of the GTPase family, please use RhoA where applicable for clarity.

      Since the RhoGEFp63 will activate RhoA/B/C we would rather not refer to RhoA only. We will clarify this in the text: ‘Three GEFs were selected, ITSN1, TIAM1 and RhoGEFp63, which are known to specifically activate respectively Cdc42, Rac and Rho and their isoforms.’

      5) A brief comment on the use of HeLa cells for protein engineering and characterization (versus the endothelial cells motivated in the introduction) may be helpful.

      We added the following to the text: ‘HeLa cells were used for the tool optimization because of easier handling and  higher transfection rate in comparison to endothelial cells.

      Minor suggestions:

      In figure 1C, line sections showing intensity profiles before and after protein dimerization might further emphasize the change in biosensor localization.

      We are not a fan of intensity profiles as the profile depends strongly on the position of the line and it basically turns a 2D image in 1D data, for a single image. So, we prefer to stick to the quantification as shown in panel 1B (which shows data from multiple cells).

      Reviewer #2 (Recommendations For The Authors):

      1)The study has analyzed the effects of light-induced activation of the three optogenetic constructs in endothelial cells on their barrier function (electrical resistance) at high cell density and correlated the findings with the cellular overlap-producing effects on endothelial cells cultured at sparse cell density. It should be tried to show these effects at a cell density where these light-induced effects increase electrical resistance. Lifeact with different chromophores in adjacent cells might be useful.

      We had attempted to measure the overlap in a monolayer by taking advantage of the Halotag and the variety of dyes available by staining one pool of cells red with JF 552 nm and the other far red with the JF 635 nm dye. However, the cells need at least 24 h to form a monolayer and by then they had exchanged the dye and red and far red pool could not be distinguished any longer.

      Therefore, we used the Lck-mTq2-iLID construct, which already marks the plasma membrane of the cells. We created a mosaic monolayer of cells expressing mScarlet-CaaX and cells expressing Lck-mTq2-iLID + SspB-HaloTag-TIAM(DHPH). We observed and increase in the overlap between cells under this condition. The results have been added to figure 4 - figure supplement 2I&J. To the text we added:

      'Additionally, cell-cell membrane overlap increased about 20 %, up on photo-activation of OptoTIAM, in a mosaic expression monolayer (figure 4 - figure supplement 2I,J, Animation 22)‘

      2) The authors correctly state that some reports have shown that S1P can increase endothelial barrier function in VE-cadherin independent ways and these are related to Rac and Cdc42. This was also shown for Tie-2 in vitro and even in vitro in the absence of VE-cadherin and should also be mentioned.

      We added the following to the text: ‘Not only S1P promotes endothelial barrier independent from VE-cadherin, also Tie2 can increase barrier resistance in the absence of VE-cadherin (Frye et al. 2015).

      Since a blocking antibody against VE-cadherin was used, a negative control antibody should be tested which also binds to endothelial cells.

      To visualize the cell-cell junctions in the experiment shown in Supplemental Fig 3.1, we added a non-blocking VE-cadherin antibody that is directly labeled with ALEXA 647 and shows normal junction morphology. These experiments already give an indication that the live labeling antibody of VE-cadherin does not disturb the junction morphology. However, when we added the blocking antibody against VE-cadherin, known to interfere with the trans-interactions of VE-cadherin, a rapid disruption of the junctions is observed.

      Additionally, previous work has shown, that VE-cadherin labeling antibody does not interfere with junction dynamics and function (see Figure 2.A, Kroon et al. 2014 ‘Real-time imaging of endothelial cell-cell junctions during neutrophil transmigration under physiological flow’, jove.). We have added the figures below, showing that addition of the control IgG and VE-cadherin 55-7H1 Abs at the timepoint where the dotted line is, did not interfere with the resistance whereas the blocking Ab drastically reduced resistance. We have added this reference to the results. ‘Previous work has shown the specific blocking effect of this antibody in comparison to the VE-cadherin (55-7H1) labeling antibody (Kroon et al., 2014).’

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      Additional comments for the authors:

      1) The introduction is very long and would benefit from a more concise emphasis on the information required to put the work and results in context and understand their importance.

      Comment: we appreciate the comment of the reviewer. However, we wish to introduce the topic and the tools thoroughly and therefore we chose to keep the introduction as it is.

      2) The N-terminal membrane-binding domain does not homogeneously translocate to the plasma membrane, since lck is a raft-associated kinase. Please comment on this.

      In our hands, the Lck is among the most selective and efficient tags for plasma membrane localization (https://doi.org/10.1101/160374). We do observe homogeneous translocation, but our resolution is limited to ~200 nm and so we cannot exclude that the Lck concentrates in structures smaller than 200 nm. Given the robust performance of the lck-based iLID anchor in the optogenetics experiments, we think that the Lck anchor is a good choice.

      3) Figure 1D is not very clear. What does 25 or 36% change mean? If iLID tg is conjugated to these sequences, its cytosolic localization should be reduced versus iLID alone. Is this what the graph wants to express? If so, please, label properly the ordinate axis in the graph (% of non-tagged iLID values?)

      The graph is representing the recruitment efficiency of SspB to the plasma membrane for the two different membrane tags, targeting iLID to the plasma membrane. The recruitment efficiency was measured by the depletion of SspB-mScarlet intensity in the cytosol, up on light activation, and represented as a change in percentage.

      We added the following to the title of the graph_: SspB recruitment efficiency for Plasma Membrane tagged iLID._

      4) Supplemental figures in the main text. Fig S1D in the text refers to data in Fig S1E and Fig S1E is supposed to be Fig S1F? (page 11).

      That is correct. The mistakes have been corrected (and this is now renamed to figure 1 - figure supplement 1E and 1F).

      5) Figure 3. Contribution of VE-cadherin. Other junctional complexes, such as tight junctions may also intervene. However, these results would also suggest that cell-substrate adhesion rather than cell-cell junctions may modulate the barrier properties, as it has been previously demonstrated for example by imatinib-mediated activation of Rac1 (Aman et al. Circulation 2012). The ECIS system used to measure TEER in the quantitative barrier function assays can modulate these measurements and discriminate between paracellular permeability (Rb) and cell-substrate adhesion (alpha). Please, provide whether the optogenetic modulation of these GTPases does indeed regulate Rb or alpha.

      The measured impedance is made up of two components: capacitance and resistance. At relatively high AC frequencies (> 32,000 Hz) more current capacitively couples directly through the plasma membranes. At relatively low frequencies (≤ 4000 Hz), the current flows in the solution channels under and between adjacent endothelial cells’ (https://www.biophysics.com/whatIsECIS.php).

      Therefore, the high frequency impedance is representing cell-substrate adhesion whereas the low frequency responds more strongly to changes in cell-cell junction connections.

      We only measured at 4000 Hz, representing the paracellular permeability. We chose a single frequency to maximize time resolution.

      We have added this extra comment to the legend of the figure: ‘(B) Resistance of a monolayer of BOECs stably expressing Lck-mTurquoise2-iLID, solely as a control (grey), and either SspB-HaloTag-TIAM1(DHPH)(purple)/ ITSN1(DHPH) (blue) or p63RhoGEF(DH) (green) measured with ECIS at 4000 Hz, representing paracellular permeability, every 10 s.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-01939

      Corresponding authors: Jiro Toshima, Junko Y. Toshima

      1. __ General Statements __ We are grateful for the reviewer’s evaluation of our study. In the new manuscript, we have answered all of the points raised by the two reviewers (the altered or added text is indicated in red in the new manuscript). Reviewer #1 pointed out that definition of "Vps21 activity" is unclear throughout the manuscript. In this study we have developed a novel biochemical method capable of detecting Vps21p activity with high sensitivity (Fig. 2) and utilized this method to measure Vps21p activity, which is clearly stated in the new manuscript. The reviewer #1 also pointed out the issue that we have not clearly explained about difference of two Vps21p-residing structures, small endosome-like puncta and aberrant large structure. To clearly distinguish them, in the new manuscript we have added data showing the size distribution of Vps21p-residing structures (Fig. S2). Regarding comment #2, we think that the reviewer may have misunderstood the data (please see the response to this comment described below). Reviewer #2 did not request any additional experiments but gave us many helpful comments to improve the manuscript. In the new manuscript, we have revised all the places that the reviewer pointed out.

      __ Point-by-point description of the revisions__

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      (Reviewers’ comments are in italics)

      *Summary: *

      In the present study Nagano et al. identify an overlapping function of clathrin adaptors in the activation of the yeast Vps21 Rab GTPase. This activation is regulated in a concerted manner by two TGN cargo adaptors, AP-1 and GGA1/2. The basis of this study is derived from the previous work Nagano et al., 2019 where authors reported that Ent3p and Ent5p are important for the formation of the Vps21p-positive endosome. By utilizing a synthetic genetic approach, the authors observed that disruption/loss of the AP-1 complex (apl4 mutant), Ent3p, Ent5p or Pik1 decreased fluorescence intensity for GFP-Vps21p and increased number of Vps21p puncta. They found that these effects for AP-1 disruption are additive, that is, each makes a distinct contribution, at least in ent3∆/ent5∆ mutant cells. They next examined the role of factors required for TGN localization of Ent3p/5p and AP-1 in Vps21p activation. The authors reported that GGA1/2, Pik1p and the Ypt31/32 Rab GTPases make modest contributions to targeting of AP-1 and Ent3/5 to the TGN. The observation that accumulation of GFP-Vps21 next to vacuolar compartments in pik1-1 ent3D mutants similar to that of ent3Dent5Dapl4D, lead authors to conclude that both PI(4)P as well as PI(4)P independent Ent3p recruitment to TGN plays a crucial role in Vps21p activation. Further they found that compared to the pik1-1 ypt31ts mutant (41%), activity of Vps21p (14%) was severely reduced in the pik1-1 ypt31ts gga1D gga2D mutant pointing towards redundancy among these factors in Vps21p activation. Finally using a class E Vps mutant authors found a fall in endosomal population of GFP-Vps9p ~29% in the ent3D ent5D mutant, which was further reduced to 0% in the ent3D ent5D apl4D* mutant. Collectively this study suggests a differential role of TGN adaptors, AP-1 and GGA in early endosome formation. Ent3p/5p and AP-1 are proposed to activate Vps21p by localizing Vps9p on endosomes and thus facilitating its transport whereas GGAs act redundantly along with Pik1p and Ypt31/32 in regulating TGN localization of Ent3p/5p and AP-1. *

      Major comments:

      There is a considerable amount of data that address the roles of AP-1, Ent3, Ent5, Gga1/2, and Pik1 in targeting of Vps21 and related trafficking pathway components to the TGN/endosome. The experiments are essentially genetic epistasis tests that compare the fluorescence patterns of GFP-Vps21 in a sophisticated set of strains. The genetic data are interpreted in terms of spatiotemporal dynamics of Vps21: proportion Vps21GTP on a compartment and number of GFP-Vps21 positive compartments. *Being genetic in nature, the data are open to wide interpretations in terms of molecular mechanisms that target candidate proteins Vps21p and Vps9 to the TGN/endosome. The authors presentation (Fig. 7) is based on well controlled experiments and is logical, but key questions regarding Vps9 trafficking as it relates to Vps21 endosome formation are not resolved. *

      Response:

      In this study, in addition to comparison of the fluorescence patterns of GFP-tagged yeast Rab5 (Vps21p), we have developed a novel biochemical method capable of detecting the amount of active Vps21p with high sensitivity. The amount of active Vps21p obtained by this method correlated well with the results obtained by imaging analysis, and we think this approach significantly increased the reliability of our results.

      Using this new biochemical method and fluorescence imaging analysis, we have clarified the overall regulatory mechanisms of Vps21p by vesicle transport from the TGN. In particular, we believe that this is an important study that links the activation of Vps21p that mediates endosome formation with numerous previous studies involving vesicle transport from the TGN to the endosome.

      Comment #1(a)

        • Throughout their study the authors conflate measurements of GFP-Vps21 puncta intensity and number of Vps21p puncta as readouts of Vps21 "activity". Figure 7 exemplifies this especially: "Vps21p Activity: 100%; Vps21p Activity: 45%; Vps21p Activity: 10%". *
      1. *a) Would the authors please explicitly define how they use "activity" in the manuscript? * Response:

      We appreciate the reviewer’s pointing out our error. As the reviewer pointed out, since we have used the word “activity” when we explained the result obtained by the fluorescence intensity and the number of Vps21p puncta in lines 312-315 (in the new manuscript), we have revised this sentence “~ a decreased PI(4)P level reduces Vps21p activity and thus inhibits fusion of Vps21p compartments.” to “~a decreased PI(4)P level seems to inhibit fusion of Vps21p compartments.” (lines 314-315).

      In other parts of the manuscript, we have used the word “activity” only when we explained the result obtained by measuring the amount of active Vps21p by the biochemical method (Fig. 2). “Vps21p Activity” depicted in Fig. 7A-C are also based on the results obtained by the biochemical assay, and thus we have added explanatory sentences in the Discussion section (lines 432-433, 447) and figure legend (lines 996-998) in the new manuscript.

      Comment #1(b)

      1. *b) The amounts of Vps21-GTP were measured for the ent3D ent5 and ent3D ent5 apl4D mutants (Fig. 2). Other mutant backgrounds should be analyzed in order to address the specific requirements of gga1/2, pik1 and ypt31/32 genes and to challenge the assumption that aspects of GFP-Vps21 localization correlate with the proportion of Vps21GTP. * Response:

      We agree with the reviewer’s comment that it is crucial to confirm that aspects of GFP-Vps21 localization correlate with the proportion of Vps21GTP. In the previous manuscript, we have already measured the amount of active Vps21p (GTP-bound form of Vps21p) in the pik1-1, and pik1-1 ent3D mutants (Fig. 4E) and shown that it decreases to ~62% in the pik1-1 mutant, or to ~22% in the pik1-1 ent3D mutant relative to wild-type cells (Fig. 4E). The relative amount of GTP-bound form of Vps21p in these mutants correlated well with the results obtained by imaging analyses of GFP-Vps21p (Fig. 4B and C). To make it clearer, we have added sentences “and the amounts of active Vps21p in these mutants correlate well with the results obtained by imaging analyses of GFP-Vps21p (Fig. 4B, C, and H).” in lines 326-327. We have also demonstrated that the amount of active Vps21p correlated with the fluorescence intensity of GFP-Vps21p at puncta in the pik1-1 ypt31ts or the pik1-1 ypt31ts gga1D2D mutant (Figs 4F-J, S4E), and explained about this in lines 334-341.

      Comment #1(c)

      1. *c) Regarding the measurements of fluorescence intensity of GFP-Vps21 puncta, how were distinct puncta identified, particularly in the large clusters of puncta shown in Figs. 1D, 3A, 4F, 5A, 5C. * Response:

      As the reviewer pointed out, in the previous manuscript we have not clearly explained about how we had distinguished two Vps21p-residing structures, small endosome-like puncta and aberrant large structure. To clearly distinguish them, in the new manuscript we examined the size and number of these structures and showed the data in Fig. S2. This result revealed that the ent3D5D apl4D mutant contains single large Vps21p-residing structure with a size of >100 pixels and many small Vps21p-residing puncta with a size of ~50 pixels. To explain about this, we have added sentences in lines 235-239. Regarding Fig. 5A and 5C, since these figures do not show the localization of Vps21p, we have not added explanation about them.

      Comment #2

      • In the representative micrographs shown in Fig. 1A (Vph1-mCH), 1B (Hse1-tdTom), 1D (Sec7-mCH) and 5A, why do only (roughly) half of the cells in each micrograph express the tagged organelle marker protein? Shouldn't all of the cells? What is especially concerning is that the appearance of GFP-Vps9 in cells that express Sec7-mCH is different than in cells that do not. Specifically, there are fewer GFP-Vps9 puncta in expressing cells and GFP-Vps9 appears to be largely cytosolic in these cells. Have the authors noted the same? *

      Response:

      In Fig. 1, we expressed mCherry/tdTomato-tagged protein only in wild-type cells (Fig. 1A and B) or in ent3D5D mutants (Fig. 1D) to distinguish the mutant cells from the wild-type cells, as described in the Result section (lines 156-159) and figure legends. As explained in the text (lines 156-159), by labeling only wild-type or mutant cells, we precisely evaluated the differences in the localization of GFP-Vps21p by comparing mutant cells directly alongside wild-type cells.

      In Fig. 5A, we expressed Sec7-mCH only in the ent3D5D mutants to distinguish the mutants from wild-type cells (the upper panels) or the ent3D5D apl4D mutants (the lower panels), as described in figure legend. Therefore, the reviewer’s comment that “the appearance of GFP-Vps9 in cells that express Sec7-mCH is different than in cells that do not. Specifically, there are fewer GFP-Vps9 puncta in expressing cells and GFP-Vps9 appears to be largely cytosolic in these cells.” is exactly what we wanted to show in this figure. To show this more clearly, we labeled cells with “WT” or “mutant” in these micrographs (Fig. 1A, 1B, 1D, and 5A).

      Comment #3

      • Figure 4A: How were the proportional contributions of each factor to the TGN localization of Ent3/5, AP-1 determined? What do the percentiles indicate? *

      Response:

      As described in the Result section (lines 293-297), we have shown that deletion of the GGA1 and GGA2 genes significantly decreased the localization of Ent3-GFP at the TGN to ~33% of wild-type cell, without changing the localization of Ent5-GFP and Apl2-GFP (Fig. S3A, B). Based on these results, the contribution of Gga1/2p to the localization of Ent3p, Ent5p, or AP-1 was evaluated to be 37%, 0%, or 0%, respectively (Fig. 4A). To make this clearer, we have added sentence “~ and thus, we evaluated the contribution of Gga1p/2p to the localization of Ent3p, Ent5p, or AP-1 to be 37%, 0%, or 0%, respectively (Fig. 4A)” in line 296-297. Similarly, we have determined the contribution of PI(4)P by assessing the localization of Ent3p, Ent5p and Apl2p at the TGN in the pik1-1 (Fig. S3C and D), as described in lines 297-305. Regarding Rab11s (Ypt31p/32p), we have evaluated the contribution based on the data in our previous study, as described in line 305-309.

      Comment #4

      • In the model presented in Figure 7, the authors proposed that AP-1 is required to target Vps9 from the late TGN to the early TGN. The best characterized function of AP-1 is to concentrate integral membrane proteins to form the inner layer of a clathrin coated vesicle. Vps9 is a soluble protein that fractionates with cytosolic proteins (Burd et al., 1996). Despite measuring intensity and localizing Vps9p with different endosomal markers (Fig. 6), the basis of membrane recruitment of Vps9 by TGN clathrin adaptors is unclear. How do the authors envision AP-1 to function in targeting of Vps9, a soluble protein, between compartments? *

      Response:

      Like other many Rab-GEFs (e.g., Sec2p, the GEF for Sec4p or Mon1p/Ccz1p, the GEF for Rab7), we think that Vps9p transiently localizes to the donor organelle to activate Rab proteins and load them on the transport vesicle. We have previously demonstrated that Arf1p, a Golgi-resident GTPase, plays an important role in the recruitment of Vps9p to the Golgi (Nagano et al., Comm. Biol., 2019). In this study we have shown that deletion of AP-1 in the ent3D5D mutant increases the localization of Vps9p at the TGN (Fig. 5A and B). These suggest that AP-1, like Ent3p/5p (Nagano et al., Comm Bio, 2019), is dispensable for the recruitment of Vps9p to the TGN but required for the transport of Vps9p from TGN to endosomes.

      In a recent study Casler et al. proposed a role of AP-1 function that maintain Golgi-resident proteins by mediating intra-Golgi recycling pathway (Casler et al., JCB, 2021). Based on this model, we have speculated that AP-1 also functions to maintain Vps9p in the TGN by recycling from the late TGN to early TGN and discussed about this in the second paragraph of the Discussion section (lines 434-454 in the new manuscript). However, as the reviewer #2 pointed out (please see comment #6 of the reviewer #2), Casler et al proposed AP-1’s role in transport from the TGN back to earlier Golgi compartment but did not discuss compartmentalization within the TGN, we have modified sentence in the Discussion from “~ the role of AP-1 that recycles Vps9p back to the early TGN might become apparent” to “~ the role of AP-1 that recycles Vps9p back to the earlier Golgi compartment might become apparent” (lines 444-445).

      __Minor Comment: __

      • The interchangeable terminology used to refer to Rab GTPases throughout the manuscript made it exceptionally difficult for me to focus on the presentation of the experiments. Vps21 and Rab5 are used interchangeably, but this study investigated Vps21, not Rab5. Vps21 does not even appear in the title or abstract. Similarly, Ypt31/32 is used interchangeably with Rab11, but this study investigated Ypt31/32, not Rab11. The accurate names of the yeast proteins should be used. A discussion regarding significance of the yeast proteins for understanding mammalian Rab5 and Rab11 belongs in the Discussion. *

      Response:

      In accordance with the reviewer’s suggestion, we have replaced Rab5 with yeast Rab5 or Ypt21p. We have also replaced Rab11 with yeast Rab11 or Ypt31p/32p.

      __Reviewer #1 (Significance (Required)): __

      *General assessment: In general, this is a well-executed and controlled study. The major strengths are the large quantity of data from complementary experiments that provide a rationale for the proposed mechanistic model proposed (Fig. 7). The major weaknesses lie with the genetic approach, which does not lend itself to the mechanistic interpretations that the authors propose, and the narrow scope of the work such that the study will be of interest to a small group of colleagues. The audience will likely include researchers who use yeast to investigate proteins sorting in the endo-lysosome network of organelles and colleagues who investigate signaling by Rab GTPases. *

      Response:

      We cannot agree with the reviewer’s comment that “the narrow scope of the work such that the study will be of interest to a small group of colleagues”, because the regulation of endosome formation by Rab5 is one of the major topics in the field of membrane traffic, and many mechanisms still remain to be elucidated. Moreover, the model we have proposed in this study is adaptable not only to yeast but to higher organisms, as discussed in the last paragraph of the Discussion section. The endolysosomal pathway is important for the regulation of a wide variety of crucial cellular processes, including mitosis, antigen presentation, cell migration, cholesterol uptake, and many intracellular signaling cascades. Our work thus also has implications for development, immunity, and oncogenesis. We believe that the studies described in our paper represent an advance in our understanding of the cellular biology of endocytic trafficking and therefore would be interesting to researchers in other fields, as well as membrane traffic filed.

      __ __

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      (Reviewers’ comments are in italics)

      *Summary: *

      *The manuscript by Nagano et al. describes the results of extensive analysis on the roles of clathrin adaptors for activation of Rab5 during TGN-to-endosome traffic in budding yeast. They examined the localization and activation status of Vps21, a major Rab5 member in yeast, in a variety of mutants and showed that AP-1 had a cooperative role with Epsin-related Ent3/5 in transport of Vps9 (Rab5 GEF) to endosomes. GGAs, PI4 kinase Pik1, and Ypt31/12 (Rab11) had partially overlapping functions in recruitment of AP-1 and Ent3/5 to TGN. *

      *It is an indeed extensive study but the interpretation of the results is complicated and somewhat speculative. It is most probably because the differences between mutants are partial (even though the authors tried to show statistics) and the logics to lead conclusions are not always compelling. To be honest, I had a hard time to follow rationales to justify arguments. The conclusions the authors make, that is, multiple clathrin adaptors cooperate in the TGN-to-endosome traffic, are reasonable, but I have several questions as follows, which I would like the authors to address. *

      Comment #1

        • The description about Vps21 fluorescence is often quite confusing. When the authors say fluorescence intensity, is it the total intensity of a whole cell or the average fluorescence intensity of individual puncta? For example, in Fig. 1D, it doesn't look to me at all that the GFP intensity of ent3/ent5 is lower than WT. How did the authors obtain the data of Fig. 1E? If the authors measured the fluorescence of individual puncta, how did they do it? * Response:

      We agree that in the previous manuscript explanation about how we measured Vps21p fluorescence intensity was insufficient. In this study, we have measured the whole fluorescence intensity of single GFP-Vps21p punctate structure, which was subtracted the cytoplasmic fluorescence background, and shown it as the fluorescence intensity of Vps21p compartment (the aberrant large GFP-Vps21p structure (Fig. 3A) were excluded). The graphs of fluorescence intensity of GFP-Vps21p show the average of three data (each average of 50 puncta) from three independent experiments. To clarify where and how Vps21 fluorescence was measured, in the new manuscript we have revised text (lines 160-161, 163, 166, 177, 179) and added explanatory sentences in “Materials and Methods” (lines 542-546).

      Regarding Fig. 1D and E, since the fluorescence intensity of GFP-Vps21p at the cytosol was increased in the ent3D5D mutant (Fig. 1D), the fluorescence intensity in the mutant may not have appeared lower than that in wild-type cell. To show the decrease of the fluorescence intensities of individual Vps21p puncta in the mutant cells more clearly, we have added the higher magnification view of GFP-Vps21p puncta in Fig. 1D in the new manuscript.

      Comment #2

      • Related to the previous question, how the images were taken is very important. In the legend to Fig.1, there is no description about the image analysis. Are they epifluorescence images or confocal images, and if the latter, are they ones of 2D confocal images or maximum intensity projections of Z stacks as mentioned in the legend to Fig. 3A? It matters very much. *

      Response:

      We appreciate the reviewer’s helpful suggestion. In Fig. 1, we have used epifluorescence images for analyzing the fluorescence intensity or number of GFP-Vps21p puncta, because Vps21p puncta have high mobility (please see also the responses to comment #9). In accordance with the reviewer’s suggestion, we have added the description about imaging method in the legend of Fig. 1 (lines 831-832, 837 and 843).

      Comment #3

      • It is also confusing when the authors say increase or decrease of fluorescence. Is it the intensity or the number of puncta? Please clarify which the authors intend to mention whenever relevant. There are many places that bother readers. *

      Response:

      We appreciate the reviewer’s helpful suggestion. In accordance with the reviewer’s suggestion, we have revised manuscript (lines 274 and 316).

      Comment #4

      • The method the authors developed to estimate the activation states of Vps21 is intriguing. It may provide important information without direct measurements of the GTP-binding activity. However, the results should be carefully interpreted because this kind of tricky experiments may not reflect the exact biochemical statuses in the cell. For example, I am concerned about whether release of GTP or spontaneous GTPase activity during the preparation processes is ignored. *

      Response:

      As the reviewer pointed out, we cannot rule out the possibility that the GTP-bound status might be changed during the preparation processes. However, this problem also occurs in the conventional pull-down assay, which assesses the amount of the GTP-bound form of Rab proteins. To confirm whether the activity of Vps21p assessed by this method reflects in vivo activation level, we have demonstrated that the level of active Vps21p correlated with the in vivo phenotypes, such as fluorescence intensity of GFP-Vps21p at the endosome and number of GFP-Vps21p puncta, that implicate defect of endosomal fusion. Thus, in the new manuscript we have added some sentences to explain about this (lines 221-222).

      Comment #5

      • In Discussion (p. 20, line 410), the authors describe that "Gga2p is localized predominantly at the Tlg2-residing compartment," but this is wrong. In the BioRxiv paper (2022), the authors showed that "Gga2p appears around the Sec7p-subcompartment and disappears at a similar time as Sec7p." I understand that, to explain the roles of GGAs in endosomal transport, it is reasonable to assume their presence in the Tlg2 compartment (and I agree on that), but the above description is wrong and must be corrected. *

      Response:

      We appreciate the reviewer’s helpful suggestion. As the reviewer described, we have recently demonstrated that Gga2p localization well overlapped with the Tlg2p-residing TGN sub-compartment that is structurally distinct from the Sec7p-residing sub-compartment (Toshima et al., BioRxiv, 2022). Thus, in accordance with reviewer's suggestion, we have changed this sentence to “Interestingly, Gga2p appears to reside at the Tlg2p sub-compartment, which is distinct from the Sec7p sub-compartment.” in the new manuscript (lines 427-428).

      Comment #6

      • Hypothesizing the role of AP-1 in the recycling from the late TGN to the early TGN is new. Glick's group proposed its role in transport from the TGN back to earlier compartment (Golgi) but did not discuss compartmentalization within the TGN. The authors' speculation is a fancy idea, but I am afraid there is no direct evidence for that. *

      Response:

      We appreciate the reviewer’s appropriate and helpful suggestion. As the reviewer pointed out, Glick's group has proposed its role in transport from the TGN back to earlier Golgi compartment, but not discussed compartmentalization within the TGN (Casler et al., 2021, JCB), and thus we modified sentence in the Discussion section from “~ the role of AP-1 that recycles Vps9p back to the early TGN might become apparent.” to “~ the role of AP-1 that recycles Vps9p back to the earlier Golgi compartment might become apparent.” (lines 444-445).

      Comment #7

      • The role of Ypt31/32 (Rab11) is also puzzling to me. It could be an indirect effect, which might be due to the complex network of GTPases as proposed by Chris Fromme (2014). Am I correct? *

      Response:

      As the reviewer pointed out, Fromme’s group has shown that Ypt31/32 forms the complex networks with several GTPases and their GEFs (McDonold and Fromme, 2014, Dev Cell; Thomas and Fromme, 2016, JCB, Thomas et al., 2019, Dev Cell), in which Ypt31/32 promotes the activation of Arf1p via its GEF Sec7p. We have previously shown that Arf1p plays an important role in the recruitment of Vps9p to the Golgi (Nagano et al., Comm. Biol., 2019). These findings suggest that disruption of Ypt31p/32p may affect the localization of Vps9p through reduced activity of Arf1p. However, arf1D and ypt31ts mutants exhibit different effects on the Vps9p localization: in arf1D mutant the recruitment of Vps9p to the TGN is impaired and in ypt31ts mutant Vps9p localization at the TGN is increased (Nagano et al., 2019, Comm Biol.). Thus, the role of Ypt31/32 in the Vps9p localization appears to be independent of Arf1p activity. In the new manuscript, we have added a brief discussion about this (lines 466-473).

      Comment #8

      • In the legend to Fig. 3D, the authors state that the read arrowheads indicate 50 nm vesicles and black arrowheads indicate vesicle clusters. However, the electron micrograph clearly shows that their morphologies are different. Red ones, which I estimate to be a little larger than 50 nm, often appear to have dense material inside, while those in black are even larger (probably around 200 nm) and do not look like a cluster of the same type of vesicles (I do not even think that such large structures should be called vesicles). How do the authors explain these differences? *

      Response:

      In the previous manuscript explanation about the electron microscopy analysis was insufficient. In the new manuscript, to clearly distinguish two Vps21p-residing structures, small endosome-like puncta and aberrant large structure, observed in ent3D5D apl4D mutant by fluorescence microscopy (Fig. 3A), we examined the size and number of these structures and showed the data in Fig. S2. This result revealed that the ent3D5D apl4D mutant contains single aberrant large aggregate with a size of >100 pixel adjacent to the vacuole and endosome-like structures with a size of Comment #9

      • In Fig. 4F, the authors show different sets of images, Focal plane and Z projection. What is the purpose to do it? The results with Z projection should be more informative. Why the authors use only Focal plane data for the analysis in panel G? *

      Response:

      We measured the fluorescence intensity or number of individual GFP-Vps21p puncta using a single focal plane images (Figs. 1C, 1E, 3I, and 4B), because Vps21p-residing small puncta have high mobility and identical endosome often appears in multiple different planes in the Z-stack image taken by a conventional epifluorescence microscope. In contrast, we analyzed the aberrant large aggregate using Z projection image (Figs. 3B, S3G) because this structure is relatively stable and low motile, and not observed if it is not in the focal plane. In Fig. 4F, since both of small puncta and large aggregate are analyzed, we have shown both of focal plane image and Z-projection image. In new manuscript, we have added about the description about imaging method in each figure legend or text (lines 230-232, 332-334).

      __Reviewer #2 (Significance (Required)): __

      *It is a complicated story but I find most of the conclusions reasonable. It provides important knowledge to the understanding on the Rab5 GTPase regulation in trafficking from the TGN. *

      Response:

      We are very grateful for this reviewer’s favorable evaluation of our studies.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      1. General Statements [optional]

      We would like to thank all reviewers for their constructive feedback and for raising specific points that have helped to improve our manuscript. We accept that the initial submission did not include some quantitative aspects of the observed effects. These are now included together with all the suggested experiments from the reviewers with the use of additional mutants and appropriate protein markers. We believe that the manuscript offers a conceptual advance and a molecular mechanism for the effects of caffeine on cell cycle progression of eukaryotic cells and is of interest to geneticists working on cell cycle, cancer and biogerontology.

      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      In the manuscript “The AMPK-TORC1 signaling axis regulates caffeine-mediated DNA damage checkpoint override and cell cycle effects in fission yeast,” the authors studied the role of genes that are potentially involved in the caffeine-mediated override of a cell cycle arrest caused by activation of the DNA damage checkpoint. The methylxanthine substance caffeine has been known to override the DNA damage checkpoint arrest and enhance sensitivity to DNA damaging agents. While caffeine was reported to target the ATM ortholog Rad3, the authors previously reported that caffeine targets TORC1 (Rallis et al, Aging Cell, 2013). Inhibition of TORC1, like caffeine, was also reported to override DNA damage checkpoint signaling. Therefore, in the present study, the authors compared the effects of caffeine and torin1 (a potent inhibitor for TORC1 and TORC2) on cell cycle arrest caused by phleomycin, a DNA damaging agent, using various gene deletion S. pombe mutants.

      The authors concluded that they identified a novel role of Ssp1 (calcium/calmodulin-dependent protein kinase) and Ssp2 (catalytic subunit of AMP-activated kinase) in the cell cycle effects caused by caffeine, based on the following findings; (1) the caffeine-mediated DNA damage checkpoint override requires Ssp1 and Ssp2; (2) Ssp1 and Ssp2 are required for caffeine-induced hypersensitivity against phleomycin; (3) under normal growth conditions, caffeine leads to a sustained increase of the septation index in a Ssp2-dependent manner; (4) Caffeine activates Ssp2 and partially inhibits TORC1.

      Major comments:

      I do not think that many of the authors’ claims are supported by the results of the present study. The corresponding parts are detailed below.

      1. The conclusion of the first paragraph in the Results (top in page 6; Our findings indicate that caffeine and torin1 indirectly and directly inhibit TORC1 activity respectively.) is not supported by the data in Figure 1. The result that caffeine, but not torin1, requires Ssp1 and Ssp2 to override the phleomycin-induced cell cycle arrest does not necessarily indicate that caffeine indirectly inhibits TORC1 via Ssp1 and Ssp2. Rather, the authors should mention that this conclusion is based on the authors’ previous reports by citing them (e.g., Rallis et al, Sci Rep, 2017). To add to Figure 1, an additional experiment using a constitutively active AMPK mutant, a temperature-sensitive TORC1 mutant, and a srk1 deletion mutant will help the authors claim their original conclusion as one possibility.

      Torin1 inhibits TORC1 and 2 leading to G2 cell cycle arrest following accelerated mitosis. In contrast, caffeine has been reported to enhance the inhibitory effect of rapamycin on TORC1 signaling but does not inhibit growth. It has not been reported that TORC1 is a direct target of rapamycin. We previously demonstrated that caffeine induces Srk1 in a Sty1 dependent manner (Alao et al., 2014). Furthermore, Ssp1 plays a role in regulating Srk1/ Cdc25 activity. It is therefore possible, that Ssp1 influences the ability of caffeine to promote mitotic progression as part of the stress response while also affecting TORC1 activity via Ssp2. As ssp2∆ cells have higher intrinsic TORC1 activity, this could also attenuate the effect of caffeine on mitosis.

      We have modified the first paragraph of the results section to address the reviewer’s concerns.

      We have previously reported that Srk1 modulates the ability of caffeine to drive cells into mitosis (Alao et al., 2014).

      1. The conclusion of the second paragraph in the Results (lower-middle in page 6; Our results indicate that caffeine induces the activation of Ssp2.) is not based on the results of Figure 2. Figure 2 simply illustrates that both caffeine and torin1 cause hypersensitivity to phleomycin dependent on Ssp1 and Ssp2.

      We appreciate the reviewer’s contention and have modified the text.

      1. The conclusion of the fourth paragraph in the Results (middle in page 7) is not clearly supported by the result, due to an insufficient data analysis. As the cell length and the progress through mitosis are the key assay parameters in Figure 3, the average cell length should be shown next to each micrograph of Figure 3A and 3B. In Figure 3C, a mitotic index and the average cell length should be shown next to each micrograph. A statistical analysis is necessary for the authors to compare the measurements and to claim as the headline (Caffeine exacerbates the ssp1D phenotype under environmental stress conditions), as the effect of caffeine was not evident._

      We have conducted additional experiments to measure cell length and modified the figure to include this data. We believe our observation that caffeine alone induces increased cell length in ssp1 mutants, confirms a role for the Ssp1 protein in modulating the effects of caffeine. We previously showed that Caffeine activates Srk1 which in turn inhibits Cdc25 activity similar to other environmental stresses (Alao et al., 2014). Ssp1 negatively regulates Srk1 following exposure to stress. In contrast, caffeine advances mitosis in wt cells and thus does not result in increased cell length. We also demonstrate that caffeine greatly enhances cell length in ssp1 mutants exposed to heat stress in marked contrast to rapamycin and torin1. These findings indicate that Ssp1 mediates the effect of caffeine on mitosis.

      1. In the middle of page 8, the statement “Accordingly, the effect of caffeine and torin1 on DNA damage sensitivity was attenuated in gsk3D mutants (Figure 5C and 5D).” is not supported by the corresponding results. Rather, Figure 5C and 5D look almost the same.

      We agree with this and other reviewers that demonstrating enhanced sensitivity to caffeine is problematic. Nonetheless, our cell cycle data clearly indicate a differential role for Gsk3 in mediating the cell cycle effects of caffeine and torin1. In terms of DNA damage sensitivity, we have reproducibly observed a lower degree of DNA damage sensitivity in gsk3 mutants relative to wt cells. Hence, while caffeine is less effective at enhancing DNA damage sensitivity relative to torin1 in wt cells; we observed that caffeine and torin1 increase DNA damage sensitivity to a similar degree in gsk3 mutants.

      1. The description and the conclusion of the last paragraph in the Results (bottom in page 8 – page 9) are not supported by the results of Figure 6, due to an insufficient data analysis. The extent of phosphorylation must be quantified as a ratio of the phosphorylated species (e.g., pSsp2) to all species of the protein (e.g., Ssp2).

      We have carefully repeated our experiments under various conditions. Our results clearly indicate caffeine induced Ssp2 phosphorylation. These observations have not been reported previously.

      From Figure 6, the authors claim that caffeine (10 mM) partially inhibits TORC1 signaling. However, the authors previously showed that the same concentration of caffeine inhibited phosphorylation of ribosome S6 kinase as strongly as rapamycin, the potent TOR inhibitor (Rallis et al, Aging Cell, 2013). The authors are advised to assess phosphorylation of S6 kinase again in the present study and compare to the results of the present results in Figure 6, because addition of that data may allow the authors to discuss that caffeine affects TORC1 downstream pathways at different intensities.

      While rapamycin is a strong inhibitor of TORC1 in budding yeast, this is not the case in fission yeast. Our previous assessments of p-S6 levels and polysomal profiles as well as cell-cycle progression kinetics have shown this (Rallis et al, Aging Cell, 2013). In addition, gene expression analysis from our previous studies have shown that caffeine treatment results in a gene expression profile similar to that of cells in nitrogen starvation (TORC1 inhibition).

      We have now used an Sck1-HA strain to further enhance our study and address the reviewer’s concerns. Previous studies have shown that 100 ng/mL rapamycin does not affect Sck1 phosphorylation. We demonstrate that in contrast to rapamycin (100 ng/ mL) 10 mM caffeine affects Sck1-HA expression and or phosphorylation. This effect was also observed with 5 µM torin1 albeit to a greater degree.

      Also, immunoblotting of the same proteins looks somehow different from panel to panel (e.g., pSsp2 in panel A and D; Actin in panel A, C, and D). Therefore, the blotting result before clipping had better be shown as a supplementary material.

      We repeated the blots were necessary and used ponceau S as a loading control. The original blots can be made available to all.

      Minor comments:

      1. (Figure 1) The septation index of the phleomycin-treated cells (without any further additional drugs) should be shown, as a baseline.

      We have included data for untreated cultures and phleomycin-only treated cultures.

      1. (Figure 1D, Optional) As a ppk18D cek1D double deletion mutant is reported, the authors are advised to add and test that mutant in this experiment.

      We have added the related data for the _ppk18_Δ _cek1_Δ double mutant.

      1. (Figure 2) The authors need to clarify the number of cell bodies spotted (e.g., in the Figure legend).

      We have modified the figure legend accordingly.

      1. (Figure 3) The different number of cells in micrographs may give an (wrong) impression on the cell proliferation rate. Therefore, it is advisable to use the micrographs in which the similar number of cells are shown for conditions with the similar cell proliferation rates.

      We have included data to show the cell lengths under different conditions. We find that different conditions greatly affect proliferation rates. For instance, cells do not proliferate in the presence of torin1. We initially sought to investigate if caffeine induces a phenotype in ssp1 mutants by virtue of its interaction with the DNA damage response. The micrographs were included as representative examples and have been now complemented with cell length data.

      1. (Figure 4B) ssp2D, not spp2D.

      The figure legend has been edited.

      1. (Figure 4) The septation index of the none-treated cells should be shown as a baseline.

      We have included base line data for untreated wt cells in figure 1. We have no reason to suspect any of the mutants would provide different results over the time investigated.

      1. (Figure 6B, 6E) What do the black arrows indicate? Figure Legend does not seem to explain them.

      The legend has been modified to indicate what the arrows refer to.

      1. (Figure 6C) Indicate which part of the Maf1-PK blot corresponds to the phosphorylated species, because Maf1-PK is probed with an anti-V5 (not a phosphorylation-specific) antibody.

      These experiments have been carefully repeated under different conditions and the figure is now modified accordingly.

      1. (Figure 6D) gsk3Dssp1D, not gs3Dssp1D.

      We have deleted this figure and have now replaced it with data we believe is more appropriate.

      Reviewer #1 (Significance):

      As caffeine is implicated in protective effects against diseases including cancer and improved responses to clinical therapies, the topic of the present study is of interest and importance to the broad audience.

      In the present study, the most significant finding is that caffeine- and torin1-induced hypersensitivity to phleomycin is dependent on Ssp1 and Ssp2 (Figure 2). This result may be important in chemotherapy against cancers. On the other hand, caffeine is known to activate AMPK (e.g., Jensen Am J Physiol Endocrinol, 2007). Besides, as detailed in the Major comments, many of the major conclusions are not supported by the present results. Therefore, based on my field of expertise (cell cycle, cell proliferation, and TOR signaling), I conclude that the present study hardly extends the knowledge in the field of "the cell biology of caffeine."_

      We thank the reviewer for their helpful comments. We accept the constructive criticisms and have carried out extensive additional experiments to provide further roles for Ssp2 and TORC1, in mediating the cell cycle effects of caffeine. We stress that caffeine has previously been proposed its effects via inhibition of Rad3 activity. Our previous work showed that caffeine did not inhibit Rad3 mediated checkpoint signaling. As later studies suggested caffeine inhibited TORC1 activity, the major goal was to investigate if caffeine is an indirect inhibitor of TORC1 via Ssp2 which is activated by several stresses. It has never been demonstrated that caffeine signals via Ssp2. This study provides the first evidence that caffeine modulates cell cycle progression by at least partially signaling via Ssp2 and TORC1. After nearly 30 years, it is vital that its precise activity, in particular enhancing DNA damage sensitivity is properly characterized. Such work woold open the way for additional studies on how caffeine activates cell physiology. For instance, we show that caffeine at 10 mM is more effective at inhibiting Sck1 activity than Rapamycin at 100 ng/ ml. In contrast, rapamycin at this concentration is more effective at inhibiting Maf1 activity. Hence further studies on how exactly the combination of caffeine and rapamycin influences their effect on ageing and other TORC1 regulated processes.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary: In this paper, Alao and Rallis analyze the role of AMPK and TORC1 pathways, and the respective crosstalk, in regulating cell cycle progression in the presence of DNA damage in S. pombe. The authors show, almost exclusively through chemo-genetic epistasis assays, that caffeine inhibits TORC1 indirectly activating AMPK, in contrast to the specific ATP-competitive TORC1 inhibitor torin1. Specifically, it is shown that in the absence of a functional AMPK pathway caffeine is unable to revert the TORC1-inhibition-dependent override of cell-cycle arrest caused by the DNA-damaging agent phleomycin, henceforth partially suppressing the growth inhibition caused by the co-treatment.

      Major comments: The overall story of the paper is convincing. However, the choice of an almost exclusively chemo-genetic approach, lack of controls in some experiments and some discrepancy in data presentation suggest that the manuscript undergoes revision before the authors claim that their conclusions are fully supported by the results. In detail:

      In Figure 1, graphs of septation indexes are presented separately for each strain. This presentation prevents the reader from clearly comparing the differences of septation caused by genetic background rather than the treatment, i.e. the septation happening by treatment with torin1. I feel it would be better to group the results by drug rather than by strain/mutant. If the results are presented this way because the experiments on different strains were run separately, I further suggest that they are re-run so to always include at least the wt in every run._

      We have included data for untreated and phleomycin only treated wt cells as a reference. Additionally, all experiments were repeated at least 2 times. We have used this assay for over 10 years and have found it to be reproducible and reliable. We are not able to include wt cells in every run as this would be beyond the manpower capacity and time constraints involved. It is also likely that torin1 activity is influenced by the ssp1/ 2 backgrounds due to increased basal TORC1 activity as previously reported. The main goal was to illustrate that caffeine differs from a direct inhibitor such as torin1.

      Furthermore, torin1 inhibits both TORC1 and TORC2 and thus cannot be directly compared to caffeine. We do prove however, in this and other figures that in contrast to torin1 and rapamycin that caffeine signals via targets upstream of TORC1. We can therefore deduce that it functions in a manner similar to other environmental and nutrient stresses, which require with the Ssp1 and Sty1 regulated pathways to advance mitosis and other processes such as autophagy induction.

      In Figure 2C-D, an inconsistency is observable between the phleo+caffeine sensitivity of ssp1Δ and ssp2Δ, the latter retaining a higher sensitivity. Provided that this is not only due to this specific replicate, how would the authors explain such a difference and fit it into their conclusion of a "cascade" signaling with Ssp1 acting upstream of Ssp2?

      We agree that analyzing the different interacting pathways involved, is complex. For instance, Ssp1 is required for suppressing Srk1 following Sty1 activation independently of its effects on Ssp2 and TORC1. Furthermore, basal TORC1 activity is higher in Ssp2 mutants as previously reported. It is likely that Ssp1 exerts a more definitive role as it is required to directly reactivate Cdc25 activity following exposure to stress. In contrast Ssp2 activation eventually results in increased Cdc25 activity via inhibition of PP2A (Figure 8). These experiments are, thus, intended to compliment those in figure1 but the DNA damaging effects of caffeine must also be taken into account.

      In Figure 2I, a huge discrepancy is observable compared to panel 2A in terms of phleo+caffeine (no ATP) sensitivity of wt cells. Here, cells seem to cope well with the phleomycin treatment even if co-treated with caffeine. This renders the main finding of the panel (the effect of phelo+caffeine+ATP) rather uninterpretable.

      We have noted that relevant assays, at least in fission yeast, are influenced by the culture vessels (e.g., plastic type/ glass) as well as the vessel volume (probably due to different aeration, oxygen availability that affects growth and metabolism parameters). We have corrected figure 1a. In terms of ATP, these experiments are highly reproducible even if the exact mechanism remains unclear.

      In Figure 3A, the simple observation of elongation is sometimes hard to assess, for example in the ATP-caused suppression of the effect of torin 1, as also acknowledge by the authors in the text. I feel it would be really necessary to quantify such results on an adequate number of cells.

      We have reproducibly observed this uncharacterized effect of ATP. We have analysed the cell length in additional experiments to show that ATP influences average cell length under these conditions. It is important to note that the effects of phleomycin are pleotropic. For instance, it likely induces cell cycle arrest at various cell cycle phases as well as in early and late G2. Additionally, it may influence other cellular processes such as DNA or compete with drug targets such as TORC1 which is influenced by ATP.

      In Figure 3B,C wt is missing to compare the results in the presence of the same treatments. I understand the focus on Ssp1, but the authors should show the same treatments on wt cells. Similarly, it would be better to show the drug treatments in panel C also at 30{degree sign}C. For the same reasons as in the previous point, quantifications would greatly enhance the credibility of the claims here.

      Previous work by other investigators have shown that wt cells proliferate normally under these conditions. We also show in figure 1 that cell proliferation is not affected under nor cycling conditions in these assays. We have added cell length data that convincingly prove that Ssp1 is required to mediate the mitotic effects of caffeine. It appears that caffeine induces a cell cycle delay that requires Ssp1 to suppress Srk1- mediated Cdc25 inhibition. Furthermore, recent studies have demonstrated that rapamycin (which targets TORC1 downstream of Ssp1) allows cell proliferation at higher temperatures in S. pombe.

      A major point is the almost complete absence of molecular data. Except for Figure 6, the data do not include a detection of the relative activation of the relevant pathways. Figure 6 could hardly fill this gap, since the samples therein analyzed are not the ones utilized in most of the other figures, but simple, single time-point treatment with a single drug. The authors usually refer in the text to previous knowledge about how a treatment influences a pathway. However, they should show it here in their experimental conditions.

      We have performed extensive additional experiments including those suggested by the reviewer. These experiments conclusively show caffeine induces Ssp2 phosphorylation in an Ssp1- dependent manner. We also demonstrate that caffeine attenuates TORC1 signaling. Together with the cell cycle data, our findings strongly suggest caffeine indirectly inhibits TORC1 signaling a manner analogous to other environmental stresses. We also note that the inhibitory effect of caffeine on TORC1 has been demonstrated in several studies. What have provided further evidence for this but have for the first time demonstrated, that caffeine affects Ssp2.

      Minor comments:<br /> • A different grouping of the experiments/panels would help the reader. For example, Fig. 2I would fit better together with Fig. 3A, to match the composition of the various chapters of the results.

      We have performed additional experiments as suggested by the other reviewers. We believe the data is now easier to understand.

      Torin 1 is sometimes referred to with a capital T or with a lowercase t, especially in the Figures. I suggest to uniform the nomenclature.

      We have edited the text.

      In the results, the authors state that "ATP may increase TORC1 activity or act as a competitive inhibitor towards both compounds.". It's a little bit odd to refer to ATP as a competitive inhibitor of drugs. I would rather be ATP, the physiological agonist, outcompeting two compounds which are working as ATP-competitive inhibitors.

      We have modified the text accordingly.

      Reviewer #2 (Significance):

      The interplay between TORC1 and AMPK is of great interest in the cell signaling field, basically in every model organism.

      The paper provides a conceptual advance in the field showing a genetic interaction between the two pathways using a model organism which has probably been overlooked so far, which is a pity because S. pombe is the best organism to study G2/M cell cycle/size regulation. The story would be of interest especially for an audience working in cell signaling in microorganisms, but not so much (at least at this stage) for the community working on aging, disease and chemo-/radio-sensitization, contrary to what the authors claim. Furthermore, for the above-mentioned reasons, I feel like the authors are a little bit overshooting when claiming (for example in the abstract and in the discussion), that their work provides a clear understanding of the mechanism.<br /> As requested by Review Commons, I specify that my expertise is on TORC1/AMPK/PKA pathways, on their crosstalk and their regulation by metabolic intermediates.

      We believe that the additional requested experiments have adequately improved the manuscript and support our presented mechanistic model.

      Caffeine is interest in cancer biology and the biogerontology field proven by recent reports on metabolic phenotyping, liver function testing, induction of autophagy and interplay with HIF-1, just to mention a few.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary<br /> This manuscript examines the genetic requirements for checkpoint override by caffeine in the fission yeast model organism. The main outcome is to show that checkpoint override, which has previously been linked to the downregulation of TORC1, is dependent on on the AMPK pathway (Ssp1/Ssp2). Additional analysis of downstream factors and the cross-talking Sty1 pathway implicates Greatwall kinases and Igo1 (PP2A inhibitor - endosulfine analogue) although the pleiotropic nature of these pathways and the rather blunt endpoints of septation index and phleomycin sensitivity makes robust data interpretation difficult.

      Major comments<br /> For clarity the manuscript would benefit from some restructuring. In particular it would help the reader if the diagram presented in figure 7 was presented first as this would help orientate the reader with the pathways. The mammalian equivalents should be indicated.

      Figure 8 (previously figure 7) summarizes our findings schematically. We believe that it works well at the end as a conclusion to the work and the discussion. Wherever appropriate we have mentioned the mammalian equivalent (e.g., for Rad3).

      For scientific accuracy and clarity the manuscript requires significant attention. For example in the abstract where Rad3 is introduced it is not made clear that this is the fission yeast gene. It would be better to introduce ATR at this point? Anther example in the abstract: 'Deletion of ssp1 and ssp2 suppresses...' should read 'Deletion of ssp1 or ssp2 suppresses...' as the two genes are not deleted in the same strain. I would recommend that the authors carefully revise the manuscript paying close attention to each statement. Fore example on page 4: 'Downstream of TORC1, caffeine failed to accelerate ppk18D but not igo1D and partially overrode DNA damage checkpoint signalling'. It is unclear what the authors mean by accelerate. I assume they mean accelerate cell cycle progression, but there is no direct analysis of cell cycle kinetics in the results. Similarly on page 5: '... ppk18D mutant displayed slower cell cycle kinetics than wild type cells exposed to phleomycin and caffeine or torin1 (Figuer 1D)'. However, the figure shows no cell cycle kinetic analysis.

      We have modified the wording of the abstract according to the reviewer’s suggestions.

      We refer to accelerated progression into mitosis and have edited the text where appropriate. Depending on the type of DNA damage, S. pombe cells transiently or permanently arrest cell cycle progression. It is well known that caffeine overrides these cell cycle DNA damage checkpoints. We previously proved that this was not due to Rad3 inhibition. Additionally, TORC1 (which controls the timing of mitosis) inhibition overrides checkpoint signaling. Our aim was to investigate if caffeine mimics this effect at least partially, via activation of Ssp2. We have demonstrated this is the case, although the basal state of the various mutants can complicate the data analysis in terms of cell cycle progression. Following exposure to phleomycin, this septation index peaks at 60 minutes following exposure to caffeine. In ppk18 mutants this peak was delayed by 30 minutes. Thus, wt and ppk18 mutants proceed through mitosis and cytokinesis at different rates (as determined by measuring the septation index).

      The authors appear to make the assumption that 'Inhibition of DNA damage signalling by caffeine and torin1 enhanced phleomycin sensitivity...' (page 6) but then clearly go on to show that the mutants used are sensitive for other unknown reasons. To make this link it would be necessary to artificially impose a G2 delay and show how much and in which circumstances this reverses the effect on sensitivity of caffeine/torin1. The authors should thus be very clear that they cannot equate sensitivity to 'checkpoint over-ride' and adjust their wording and assumptions accordingly. Assumptions on epistasis need to use the same assay and not equate between assays. As an example F1C and F2D do not equate as phleo+caffeine would be expected to be sensitised above phleo+torin1. This is not commented on in the text. Also on page 7 '... ATP also suppressed the ability of torin1 to override DNA damage checkpoint signalling albeit to a lesser degree (Figure 2I).' However, this figure only shows sensitivity, not septation index.

      We accept that these results can be difficult to interpret. Firstly, caffeine appears to modulate cell cycle progression by various means. We previously demonstrated that it stabilizes Cdc25 independently of checkpoint signaling. However, it also activates Ssp2 which subsequently affects Cdc25 activity via PP2A. Its effect on mitosis can thus differ depending on the context. For instance, igo1 mutants already have high PP2A activity which would affect the subsequent effect of caffeine on Cdc25 activity. Ssp2 on the other hand appears to regulate cell fate according to the nutritional state. Its sensing of nutritional cues is not limited to ATP/ AMP levels as it also regulates the response to amino acid quality (e.g., glutamate versus torin1).

      We have carried out additional experiments on the effect of ATP. While it did affect progression into mitosis, the results were complicated and have not been shown. Instead, we have provided additional data to show that it affects cell length which is an indicator of G2 cell length. In other words, longer cells spend more time in G2 prior to septation.

      We also suspect that caffeine is itself a DNA damaging agent as previously reported in the early 1970s. More recent studies have also indicated a role for Rad3 and DNA repair proteins for tolerance to caffeine. In fact, TORC1 itself has been reported to be required for DNA damage repair. Thus, TORC1 inhibition could potentially enhance DNA damage sensitivity independently of mitotic progression as shown in some of our experiments.

      While we have clearly identified a role for Ssp2 in mediating the cell cycle effects of caffeine, we accept that these findings will require further studies (beyond the scope of this one); to give more insights on how these caffeine- mediated effects occur. What is clear is that caffeine overrides DNA damage checkpoint signaling by at least partially inhibiting TORC1 signaling.

      All the septation index graphs require an untreated (I.e no caffeine or torin1) control.

      We now show in figure 1a, that the septation index does not change over the time period studied, when cells were left untreated. These assays have been routinely used for many years now and are very reproducible. The graphs clearly show the differential effects caffeine and torin1 exert on cell cycle progression in wt and mutant strains exposed to phleomycin.

      Figure 3 is not quantitative and cannot support the conclusions drawn from it. If, for example, the authors wish to demonstrate ATP can suppress checkpoint override (Figure 3A) they should use the same septation assay used before. If this is not possible, then it should be explained why not and an alternative quantitative assay should be developed. It is unclear why the authors include Figure 3B,C at all.

      Ssp2, on the other hand, appears to regulate cell fate according to the nutritional state. Its sensing of nutritional cues is not limited to ATP/AMP levels as it also regulates the response to amino acid quality (e.g., glutamate versus torin1). Additionally, exposure to stress may induce a transient decline in ATP levels. We thus investigated how ATP might affect caffeine or torin1. We could not detect any major changes in the septation index (not shown). Cells exposed to ATP in the presence of caffeine and phleomycin were shorter. We cannot tell how exactly suppresses the effect of caffeine and torin1 on DNA damage sensitivity.

      It is unclear to this reviewer what the significance of the data with gsk3D cells is (Figure 5). The authors should introduce the protein, why there is an expectation that it would have a role in the pathway and explain its relevance. Similarly when discussing the resulting data.

      Gsk3 lies downstream of TORC2 which is inhibited by torin1 but not caffeine. Gsk3 regulates Pub1 stability which is the E3 ligase for Cdc25. We showed previously that caffeine stabilizes Cdc25, suggesting it might interfere with Pub1 activity. Additionally, we are investigating caffeine as an indirect inhibitor of TORC1 with torin1 that directly inhibits both complexes. Our data provide further evidence for a differential effect of caffeine and torin1 on TORC1 signaling. We have modified the text accordingly.

      Figure 5A shows a similar response of wild type cells to phleomycin regarding checkpoint override as was shown in Figure 1A. However Figure 5C is not recognisable as equivalent to Figure 2A, yet both report sensitivity to phleomycin od wild type cells under equivalent circumstances. This is a major concern as to reproducibility of these data. It is also not possible to conclude from either Figure 5C or 5D that caffeine or torin1 treatment is, or is not, sensitising cells to phleomycin treatment, yet this conclusion is made when discussing the data.

      We agree with this and other reviewers that demonstrating enhanced sensitivity to caffeine is problematic. Nonetheless, our cell cycle data clearly indicate a differential role for Gsk3 in mediating the cell cycle effects of caffeine and torin1. In terms of DNA damage sensitivity, we have reproducibly observed a lower degree of DNA damage sensitivity in gsk3 mutants relative to wt cells. Hence, while caffeine is less effective at enhancing DNA damage sensitivity relative to torin1 in wt cells; we observed that caffeine and torin1 increase DNA damage sensitivity to a similar degree in gsk3 mutants.

      Figure 6A shows that caffeine, but not torin1 results in Ssp2 phosphorylation. Is this experiment reproducible and does the total level of Ssp2 increase reproducibly? This should be doe ae and the results discussed. Ideally, the bands would be quantified against actin intensity and presented as a bar graph with standard deviation.

      We have repeated these experiments alone and in combination with phleomycin. This data convincingly show that caffeine but not torin1 induces Ssp2 phosphorylation. In fact, torin1 suppresses Ssp2 phosphorylation, likely due to inhibition of a feedback mechanism resulting from TORC1 inhibition. In contrast, caffeine likely activates Ssp1 via the stress response, which in turn phosphorylates Ssp2.

      Figure 6B, when introduced should explain the background as to why eIF2alpha phosphorylation is a readout of TORC1 activity. Importantly, the figure should be supported by an actin control and 3 repeats quantified. Figure 6C purports to establish that caffeine moderately attenuates Maf1 phosphorylation. To be able to state this, it would be essential to quantify the gel and report repeated results relative to actin and the total levels of Maf1. Similarly Figure6D and 6E require an actin control and would benefit from proper quantification.

      We have repeated the Maf1 experiments to clarify the data and show that caffeine suppresses Sck1 an additional TORC1 phosphorylation target.

      Minor comments<br /> p3 'cigarette smoke and other gases'?

      We have edited the statement.

      P4 torin1 was dissolved in DMSO (not were)

      We have edited the text.

      p5 phospho not phosphor Ssp2

      We have edited the text.

      p6 exlpain why ppk18 deletion results are surprising. Also this result could be discussed.

      It had been proposed previously, that Ppk18 is the Greatwall homologue in S. pombe and thus the major regulator of PP2A and mitosis downstream of TOCR1. Later studies suggested a redundant role for Cek1 in this pathway. While deletion of cek1 in a ppk18 background modulated the effect of torin1 on cell cycle progression, it did not interfere with the effects of caffeine. At present we cannot account for this observation. We cannot rule out that caffeine activates an additional kinase that regulates Igo1 activity.

      Together our data show that caffeine advances progression into mitosis in a manner that differs from direct inhibition of TORC1 by torin1.

      We have now added the relevant comments on this unexpected observation within the discussion.

      Explain why Cek1 is not tested

      We have now tested a ppk18 cek1 double mutant.

      p6 introduce what pap1 is when first mentioned

      We have introduced PP2APab1 as requested.

      Reviewer #3 (Significance):

      The data show that fission yeast Ssp1/2 has a role in inhibiting TORC1 in response to caffeine and this influences checkpoint override. This is an incremental, but potentially interesting, observation contributing to understanding mechanism(s) of caffeine action. The lack of quantification, the pleiotropic nature of the mutants used and the rather blunt endpoints assayed make it hard to establish to what extent the direct TORC1 inhibition by Ssp2 causes the checkpoint override, which limits is potential impact. The core observation may, however, be of interest to the wider caffeine field. The referee has the perspective of a yeast cell cycle geneticist.

      We thank the reviewer for identifying the significance of the study in understanding the mechanisms of caffeine effects on the cell cycle. We have added all the suggested experiments with additional mutants and protein markers as well quantitative approaches that have appropriately improved the manuscript. We believe that the mechanism provided is of more general interest and not limited to the caffeine field: manipulating the cell cycle and understanding the interplays between growth and stress are of general interest and importance.

      Reviewer #4 (Evidence, reproducibility and clarity):

      The authors provide a series of genetic studies identifying a role for Ssp1-Ssp2 signaling in TORC1-dependent responses to DNA damage. The main assays are cell division (i.e. septation index) and cell viability (i.e. serial dilution spot assays) following treatment with the DNA damaging agent phleomycin. The authors perform these assays in a number of genetic mutant backgrounds to determine which genes and pathways are required for the relevant cellular response. Supporting data also include microscopy images and western blots to test protein phosphorylation. In general, the results support a role for Ssp1-Ssp2 acting upstream of TORC1. However, in several cases the data do not support a straightforward relationship, and it is confusing to parse through a number of intermediate effects, which often vary between different assays. I have provided some specific comments below that might be addressed to strengthen the technical aspects of the manuscript.

      Major<br /> 1. The authors conclude "that caffeine and torin1 indirectly and directly inhibit TORC1 activity respectively" based on Figure 1. This conclusion seems quite strong given the indirect nature of assays in Figure 1, which test septation in the presence of DNA damage. The conclusion would require experiments that assay TORC1 activity itself.

      Both caffeine and torin1 have previously been reported to inhibit TORC1 which controls the timing of mitosis. We sought to investigate if caffeine mediates its effects via the stress response pathway. We have conducted additional experiments which clearly demonstrate that caffeine inhibits TORC1 at least partially via the activation of Ssp2. These observations make sense as we have previously shown that caffeine actives the stress response pathway to activate Srk1 which inhibits Cdc25. More recent studies my others indicate that Ssp1 is required to suppress Srk1 to allow progression into mitosis. This accounts for the failure of ssp1 mutants to advance mitosis under stress conditions. Additionally, Ssp1 activates Ssp2 which leads to the downstream inhibition of TORC1.

      1. Figure 2 needs some explanation to introduce the idea that cell growth reflects an intact DNA damage response that prevented division in the presence of phleomycin. I also felt that the conclusions were very strong given the data, and the authors should discuss each case more carefully. For example, deletion of ssp1 does not really suppress the ability of torin1 to enhance phleo sensitivity (Figure 2C).

      We would not expect the deletion of ssp1 to suppress the effect of torin1 under stress conditions. We have provided further evidence to show that Ssp1 is required to facilitate progression into mitosis at least in the presence of phleomycin or heat stress.

      1. Microscopy imaging in Figure 3 nicely complements some of the other assays. However, it seems important to know if the cells are actively growing in each of these cases. An example is torin and rapamycin shortening ssp1 mutants at 35 degrees: are these cells actively cycling?

      Our aim was to demonstrate that caffeine exacerbates the ssp1 phenotype. This would provide further evidence to show that caffeine exerts its effects at least in part by activating Ssp1. Cells do not cycle in the presence of torin1 as it inhibits both TORC complexes. We have provided additional evidence to show that caffeine does indeed interact with Ssp1. As the primary aim of the study was to determine is caffeine overrides DNA damage via Ssp1 we have not investigated if they are cycling. Their shortened size suggests that rapamycin and torin1 affect cell division in a different manner from caffeine.

      1. From Figure 6A, the authors conclude that caffeine induces phosphorylation of Ssp2. However, it appears that both Ssp2 protein levels and its phosphorylation levels are both increased, which seems an important distinction.

      We have repeated these experiments several times under different conditions. Some proteins become more stable when phosphorylated as has been previously demonstrated for Srk1 for instance.

      1. In Figure 6D, the authors should show separate gsk3 and ssp1 mutants. It seems likely that all phosphorylation of Ssp2 is due to Ssp1, but this should be shown.

      We have replaced the figure with a ssp1 single mutant.

      1. I am confused about Maf1 phosphorylation in Figure 6C. It is increased upon torin1 treatment, but it is discussed as an indicator or TORC1 activity. Does that mean that loss of its phosphorylation correlates with increased TORC1 activity? As written, I thought it was a TORC1 substrate, which led to confusion about its increased phosphorylation upon torin1 treatment.

      Maf1 is phosphorylated by TORC1. Inhibition of TORC1 would thus lead to a loss of phospho-Maf1 moieties and the accumulation of the unphosphorylated form. We have conducted additional experiments and under various conditions to show that caffeine weakly inhibits Maf1 phosphorylation. We note however, that different stresses result in differential outcomes following TORC1 inhibition. As such we have included new data to show that caffeine suppresses the TORC1 target Sck1. In S. pombe Sck1 and Sck2 regulate progression into mitosis.

      Minor<br /> 1. An untreated control should be shown for assays in Figure 1.

      We have included this data for figure 1a.

      1. An untreated control should be shown for assays in Figure 4.

      We have noted in the results for figure 1, that untreated cells and phleomycin only treated cells do not show any changes in septation index over the time course studied in these experiments.

      Reviewer #4 (Significance):

      The study has significance in connecting several conserved and central signaling pathways including TORC1, AMPK, and PP2A. Also, the study uses caffeine and torin1 that have effects in many different cell types. The connection between caffeine and torin1 effects on phleomycin-treated cells was previously established by these researchers. The significance of the current study is providing a genetic pathway for this connection. The significance is partly limited by some of the technical points raised in the previous section, such as some inconsistencies in the strength of results from different assays. Also, the role of these pathways in DNA damage response signaling is not new. While the main significance of this work might relate to a more specialized audience, it does add to a broader body of literature regarding these conserved pathways and processes.

      My expertise is yeast cell biology.

      While the roles of the pathways in DNA damage has been reported usinbg genetic and pharmacological combinations we dissect their relationships and provide mechanistic connections.

      We thank the reviewer for identifying the significance of this study. We believe we have now addressed the technical issues raised.

    1. Reviewer #3 (Public Review):

      This paper proposes a computational account for the phenomenon of pattern differentiation (i.e., items having distinct neural representations when they are similar). The computational model relies on a learning mechanism of the nonmonotonic plasticity hypothesis, fast learning rate and inhibitory oscillations. The relatively simple architecture of the model makes its dynamics accessible to the human mind. Furthermore, using similar model parameters, this model produces simulated data consistent with empirical data of pattern differentiation. The authors also provide insightful discussion on the factors contributing to differentiation as opposed to integration. The authors may consider the following to further strengthen this paper:

      The model compares different levels of overlap at the hidden layer and reveals that partial overlap seems necessary to lead to differentiation. While I understand this approach from the perspective of modeling, I have concerns about whether this is how the human brain achieves differentiation. Specifically, if we view the hidden layer activation as a conjunctive representation of a pair that is the outcome of encoding, differentiation should precede the formation of the hidden layer activation pattern of the second pair. Instead, the model assumes such pattern already exists before differentiation. Maybe the authors indeed argue that mechanistically differentiation follows initial encoding that does not consider similarity with other memory traces?

      Related to the point above, because the simulation setup is different from how differentiation actually occurs, I wonder how valid the prediction of asymmetric reconfiguration of hidden layer connectivity pattern is.

      Although as the authors mentioned, there haven't been formal empirical tests of the relationship between learning speed and differentiation/integration, I am also wondering to what degree the prediction of fast learning being necessary for differentiation is consistent with current data. According to Figure 6, the learning rates lead to differentiation in the 2/6 condition achieved differentiation after just one-shot most of the time. On the other hand, For example, Guo et al (2021) showed that humans may need a few blocks of training and test to start showing differentiation.

      Related to the point above, the high learning rate prediction also seems to be at odds with the finding that the cortex, which has slow learning (according to the theory of complementary learning systems), also shows differentiation in Wammes et al (2022).

      More details about the learning dynamics would be helpful. For example, equation(s) showing how activation, learning rate and the NMPH function work together to change the weight of connections may be added. Without the information, it is unclear how each connection changes its value after each time point.

      In the simulation, the NMPH function has two turning points. I wonder if that is necessary. On the right side of the function, strong activation leads to strengthening of the connectivity, which I assume will lead to stronger activation on the next time point. The model has an upper limit of connection strength to prevent connection from strengthening too much. The same idea can be applied to the left side of the function: instead of having two turning points, it can be a linear function such that low activation keeps weakening connection until the lower limit is reached. This way the NMPH function can take a simpler form (e.g., two line-segments if you think the weakening and strengthening take different rates) and may still simulate the data.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)):____

      Summary: In this manuscript by Berg et al the authors demonstrate that RNA polymerase activity is important for the formation of nuclear blebs. This is an interesting and significant finding because prior work has suggested nuclear bleb formation is a result of changes in nuclear rigidity (lamins) or chromatin (via histone modifications). Overall I thought the manuscript was quite interesting and the data well presented. I think the inclusion of multiple mechanisms of blebbing (VPA treatment, as well as lamin B KO) helps to further support the importance of RNA polymerase/transcription activity in the blebbing process. However, I do have some concerns regarding the conclusions of the data that I think should be addressed as a revision.__

      We appreciate that Reviewer states that “the manuscript was quite interesting and the data well presented”, it is a “significant advancement”, and “the first report of this phenomena, and thus will be impactful to the nuclear mechanics field.”

      In the points below, the Reviewer specifically suggests that we: 1) clarify possible contributions from RNA pol III, 2) address how global vs. local chromatin motion might contribute to our findings, and 3) discuss the force production capabilities of RNA pol II. We also appreciate the feedback regarding the conclusions and have made the specific changes requested in the revision.

      Major Comments:____ 1. One concern I have is that the alpha-amanitin inhibitor has been shown to also inhibit RNA polymerase III. In an old study (1974 Weinmann PNAS) it appears that the inhibitor starting at 1 to 10 ug/ml. In this study the authors are using 10 uM alpha-amanitin, which is ~ 9 ug/ml and within the range of inhibiting some RNA polymerase III. Additionally, the other drug (actinomycin D) is even less specific for RNA polymerase II. I would suggest that the authors consider one of the following approaches 1) acknowledge in the manuscript the potential for RNA polymerase III to be important in the blebbing process 2) try a 10-fold lower dose of alpha-amanitin and see if that also inhibits blebbing, 3) try to find a way to demonstrate that RNA polymerase III activity is not inhibited at the 10 uM alpha-amanitin dosage, or 4) consider an alternate method to perturb RNA polymerase II activity (see Zhang Science Advances 2021 for an auxin-based approach to downregulate RNA polymerase II).

      The Reviewer raises the point that alpha-amanitin inhibits both RNA pol II and III. In the revised manuscript, we provide new data to further support that the observed effects arise from RNA pol II. We now include new data from cells treated with the transcription inhibitors flavopiridol (which inhibits RNA pol II elongation) and triptolide (which inhibits RNA pol I and II initiation). These transcription inhibitors also suppress nuclear blebbing in VPA-treated nuclei (Figure 2C) as well as three other nuclear blebbing perturbations in chromatin and lamins (Supplemental Figure 1A). These new experiments directly show that nuclear bleb suppression by transcription inhibitors can be observed without possible inhibition of RNA pol III by alpha-amanitin.

      __ A second concern I have is that the inhibition of RNA polymerase is global. Thus it is difficult to know for sure the biophysical function of the polymerase occurs immediately at the bleb, or instead is somehow affecting the overall chromatin state throughout the entire nucleus. I agree that figure 3 does provide some evidence that major mechanical and biophysical properties of the nuclei are not changed in response to the inhibition of the polymerase. However, micromanipulation experiments are done with isolated nuclei, which may be somehow mechanically altered already by isolation from cells. I feel that there still must be given some consideration in the discussion of the possibility that RNA polymerase activity outside of the bleb may be having some role in the stabilization of the chromatin and blebbing propensity.__

      We appreciate the Reviewer’s insightful comments and we have revised the manuscript to clarify that we do not attribute blebbing purely to local effects. Instead, we argue that global changes in chromatin motion driven by transcription could contribute to nuclear blebs.

      We did not intend to communicate that alterations to chromatin or its dynamics were necessarily only local. Indeed, we found that relative levels in RNAP Ser2 and Ser5 phosphorylation were different inside the blebs (Figure 6). Nonetheless, transcription was perturbed globally in our experiments, so we realized that blebbing could be driven by global changes (Figure 1). We hypothesize that global regulation of transcription can stimulate nuclear blebbing since transcription and its inhibition can, respectively, drive and suppress correlated chromatin motion throughout the entire nucleus (as previously observed by Zidovska et al. (PNAS 2013) and Shaban et al. (NAR 2018, Genome Biol. 2020), among others). We have revised the manuscript to clarify this point (Discussion section, page 15). We have also added new simulation snapshots showing global chromatin motions and how these motions are coupled to nuclear morphology (Figure 7C).

      In response to the concern that isolated nuclei exhibit different mechanical properties than nuclei inside of cells, we refer to our previously published micromanipulation measurements (Stephens et al. MBoC 2017). There, we found that nuclei within the cell and outside of the cell have quantitatively similar spring constants and qualitatively similar force-extension curves. Therefore, we are confident that the lack of change in nuclear stiffness measured by micromanipulation accurately reflects the mechanics of nuclei inside of cells across different perturbations.

      __ While I lack expertise to evaluate the basis of the model, I appreciate the model can show that motor activity can influence bulge. But it is not clear in the manuscript that RNA polymerase can generate these kinds of forces. The Liu citation is a model, and does not provide direct evidence that the RNA polymerase can generate force, or forces large enough to be meaningful. To me the model in this paper (Figure 7) felt as if it was only a possible hypothesis of why the RNA polymerase has an effect on blebbing, but I imagine there could be other hypotheses that would cause the same effect. The authors state (in the abstract) that RNA pol II can generate active forces, but I am concerned this is not sufficiently established. Since this motor/force activity of RNA polymerase is not experimentally demonstrated in this paper the authors should either do a better job of including evidence of this from the literature or consider removing this part of the manuscript.__

      RNA polymerase is capable of exerting forces in excess of 10 pN (e.g., see Wang et al. Science 1998; Herbert et al., Annu Rev Biochem 2008). The collective activity of many motors (10’s of thousands, e.g., see Zhao et al. Proc. Natl. Acad. Sci. 2014) may generate even larger forces. As discussed in our earlier modeling paper, this force scale is consistent with the motor strengths studied in our simulations (Liu et al. Phys. Rev. Lett. 2021); in the present work, we present simulation results for motors that generate 0.14 pN forces. Thus, transcription, in principle, could generate forces even larger than the ones we considered in the model.

      Additional experiments indicate that at larger length scales, RNA polymerase activity appears to drive coherent motions of chromatin throughout the cell nucleus (Zidovska et al. PNAS 2013; Shaban et al. NAR 2018; Shaban et al. Genome Biol 2020). It is these motions, driven by motors, that appear to drive the formation of nuclear bulges in our model (please see new panel Figure 7C).

      Therefore, the aim of the model is to build on established and new results to better understand how transcription could alter nuclear morphology. Our model is adapted from earlier models, which could reproduce observations of chromatin-based nuclear rigidity, (Stephens et al. MBoC 2017, Banigan et al. Biophys J 2017, Strom et al. eLife 2021), some aspects of nuclear morphology (Banigan et al. Biophys J 2017, Lionetti et al. Biophys J 2020), and possibly explain how nonequilibrium motor activity (such as RNA pol II) can drive coherent chromatin dynamics (Liu et al. PRL 2021), which have been observed in live-cell imaging experiments (e.g., Zidovska et al. PNAS 2013; Shaban et al. NAR 2018; Shaban et al. Genome Biol. 2020, among others). The precise form of the motor activity is not the focus of our model (or the previous motor model in Liu et al. PRL 2021). Instead, our simulation result indicates that the relatively small motor forces that generate coherent chromatin dynamics could explain the surprising observation that transcription is a critical component of nuclear blebbing.

      To address the Reviewer’s comment, we have added additional text to the Introduction and the Results sections to support the inclusion of motors to model the possible effects of transcription on chromatin dynamics and nuclear shape.

      In the Introduction (page 4), we now write:

      Simulations suggest that chromatin connectivity combined with the forces generated by polymerase motor activity (~10 pN per polymerase (Herbert et al. 2008)) could generate these dynamics (Liu et al., 2021).

      In the Results section (page 10), we write:

      We consider motors that generate sub-pN forces, well below the 10 pN forces that may be generated by individual RNA polymerases (Herbert et al. 2008).

      Additionally, we have updated Table 1 to include the simulated motor strength.__ __

      __ Minor Comments: 1. Did the authors do any analysis to see if the increased RNA transcription with VPA treatment (Figure 1B) has any spatial relationship to where the bleb occurs? Could an analysis of this be done similar to Figure 6 (with a bleb/body ratio)?__

      The Reviewer raises an interesting point about measuring RNA localization relative to the bleb. We measured RNA intensity in the bleb and the nuclear body for wild type cells only. We find that RNA levels are significantly decreased in the bleb (80% of body signal, p

      __ Is there anything known about lamin B1 KO cells as to whether or not they have increased transcription? Or could the authors do an analysis like they did with VPA treatment to check this?____ If they were to have increased transcription this would further support the authors' proposed mechanism of transcription itself (or RNA polymerase activity) driving blebbing).__

      In the revised manuscript, we show that several nuclear perturbations that are known to decrease nuclear stiffness and cause increased nuclear blebbing also rely on active transcription. Lamin B1 knockout or knockdown cells have been shown to result in changes in transcription. However, it was difficult to find data that shows whether the overall level of transcription changes. Collaborators of ours have unpublished data that indicates that twice as many genes are upregulated as downregulated upon lamin B1 knockdown, but this still does not assess the total level of transcription within the nucleus. Alternatively, increasing transcription via other means is fraught with off-target effects, which would require many additional complementary experiments. We thank the Reviewer for this interesting suggestion, but we believe this is beyond the scope of this manuscript, in which we have focused on showing that transcription inhibition suppresses bleb formation.

      __ Figure 1D, the VPA ser2 image appears much brighter than the untreated image. Yet the graph shows they are similar. Perhaps a more representative image should be used?__

      The image used reflects the data that Ser2 signal is brighter (by ~10%) in VPA-treated cells but is not significantly altered compared to wild type (unt), and thus it is an accurate reflection of the data.

      __ Can the authors comment if there is less DNA at the bleb site? In Figure 6 A this appears to be the case (based on the VPA image). If true, is the alpha-amanitin treatment rescuing this such that there is more DNA at the bleb (maybe causing the bleb to be smaller?).__

      We find that there is less DNA signal intensity per unit area in the nuclear bleb as compared to the nuclear body (bleb has ~60% the signal of the body; see teal dots/data in Figure 6B). This agrees with previously published work from our lab (Stephens et al. 2018 MBoC).

      Alpha-amanitin treatment does not rescue this effect. Decreased DNA enrichment in the bleb remains with alpha-amanitin treatment (p > 0.05, comparing across all 4 conditions in Figure 6B).

      __ What is the significance of bleb vs non-bleb nuclear rupture? Is there anything known in the literature as to how these ruptures may be different in terms of biophysics, impact to DNA, repair? It would be helpful to have some context, as well as to understand if non-bleb rupture is something that may have been previously missed in other contexts.__

      The Reviewer asks a valid and interesting question that this manuscript only begins to address. In general, we believe that ruptures occurring with blebs vs. without blebs may reflect aspects of the underlying mechanism(s) of blebbing and rupture, in the presence or absence of transcription. We offer a few further thoughts below.

      1) Non-bleb nuclear ruptures have been reported in a few papers by our group (Stephens et al., 2019 MBoC) and others (Chen et al., 2018 PNAS), but much is still unknown.

      2) Non-bleb nuclear rupture is part of normal nuclear behavior, as it accounts for ~20% of nuclear ruptures in wild type and perturbed cells (VPA and LMNB1-/-).

      3) Overall, we think that bleb-based and non-bleb-based ruptures may occur through different mechanisms. The simplest difference is that bleb-based nuclear ruptures follow the nucleus’ ability to form blebs, whereas non-bleb-based nuclear rupture occurs in cases where there is less bleb formation, suggesting that factors other than the ability to form blebs may also be important for rupture. In the current study, we observed that bleb-based nuclear ruptures (and bleb formation) require transcription. In another manuscript from our lab under review, bleb-based nuclear ruptures (and nuclear blebbing) can be suppressed by actin contraction inhibition and increased by increased actin contraction (Pho et al., biorxiv 2022).

      Additionally, we note it was reported that non-bleb-based nuclear ruptures, at least some of which are driven by microtubule prodding, result in increased levels of DNA damage (Earle et al. Nat Mater 2020), as has been observed for bleb-based ruptures (Stephens et al., 2019 MBoC; Xia et al. J Cell Bio 2018). Thus, nuclear rupture in general is thought to lead to DNA damage. However, total levels of DNA damage due to rupture may be controlled by different cellular processes.

      In the revision, we have clarified our motivation for quantifying ruptures with and without blebs. We have also added a few remarks, drawn from the above comments, to the Discussion section (pages 11-14).

      Reviewer #1 (Significance (Required)):____ General assessment: This study is a careful analysis of how RNA polymerase inhibition reduces nuclear blebbing. The study demonstrates this very well, using a variety of approaches. However, some limitations are the overstatement of some conclusions (specifically that it is RNA polymerase II when the inhibitor may also affect RNA polymerase III; that the RNA polymerase activity is important at the bleb and involves motor activity). Advance: This paper is a significant advancement because it shows the role of transcription in the biophysics of the nuclear shape. To my knowledge this is the first report of this phenomena, and thus will be impactful to the nuclear mechanics field. Audience: I think the findings are of broad interest, including beyond the nuclear mechanics field. I think the audience would be the entire cell biology community. Expertise: My expertise is in cell mechanics, including forces at the the nuclear LINC complex. While I do not work in the field of nuclear blebbing and rupture, I follow this field quite closely.

      We greatly appreciate the Reviewer’s statement that “To my knowledge this is the first report of this phenomena, and thus will be impactful to the nuclear mechanics field.__” __We thank the Reviewer for their thoughtful comments and suggestions, which have helped to improve the manuscript. __

      __

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors present data supporting the potential involvement of active transcription in the formation of nuclear blebs when the global deacetylase inhibitor valproic acid (VPA) has been applied to cells

      Reviewer #2’s greatest concern throughout the review was that we focused on the use of VPA as a model for generating increased nuclear blebbing and 24-hour treatment with alpha-amanitin as a transcription inhibitor. In the revised manuscript, we provide new data to show that nuclear blebbing generated by a variety of different nuclear perturbations (VPA, DZNep, LMNB1-/-, and LA KD Figure 2D __and __Supplemental Figure 1A) is reliant on active transcription in two different cell lines (MEF and HT1080, Figure 2 A and B). This is supported by use of four different transcription inhibition drugs, which work over varying time periods (24 hrs in alpha-amanitin, triptolide, or flavopiridol; actinomycin D for 1.5 hrs Figure 2C). We also timelapse imaged during drug treatment to show that transcription inhibitors for which we used 24-hour incubation times, can suppress nuclear blebs within 8 hours (Supplemental Figure 1B). __We also show that nuclear bleb formation and stability in wild type is transcription dependent (__Figure 5). We believe the new data added in our revised manuscript addresses the concerns of the Reviewer that the findings were specific to VPA and alpha-amanitin together only.

      __Reviewer #2 (Significance (Required)):____

      The authors present data supporting the potential involvement of active transcription in the formation of nuclear blebs when the global deacetylase inhibitor valproic acid (VPA) has been applied to cells. __

      While somewhat interesting, this is a rather specific condition that is further restricted by the limited use of experimental approaches. For example, the only deacetylase inhibitor used is VPA. Is this because VPA is the only one to trigger the effect? The authors should expand their approach to include additional inhibitors or, preferably, a directed knockdown tactic that targets the specific HDACs driving their phenomena.

      The Reviewer is concerned that we have used limited experimental approaches by focusing on VPA treatment to induce nuclear blebs and alpha-amanitin overnight treatment to suppress nuclear blebbing. VPA treatment is a well-established perturbation to induce nuclear blebbing via HDAC inhibition, and it is similar to a variety of other nuclear perturbations that also induce blebs (Stephens et al. MBoC 2018, 2019; Kalinin et al. MBoC 2021; Pho et al. biorxiv 2022).

      Nonetheless, to clearly address the Reviewer’s concerns we have provided new data which shows that four different nuclear perturbations are suppressed by transcription inhibition and that four different transcription inhibitors suppress nuclear blebbing. In addition to these perturbations, we also note that transcription inhibition affects bleb formation and stability in wild type cells. Below we outline the diverse experimental approaches that support the major conclusion of our manuscript.

      Our data shows that transcription inhibition suppresses nuclear blebbing through data for:

      1. Multiple cell lines (MEF and HT1080, Figure 2, A and B) – original data
      2. Multiple transcription inhibitors (Figure 2C __and Supplemental Figure 1__):
      3. Alpha-amanitin (RNA pol II and III degradation) – original data
      4. Triptolide (RNA pol I and II initiation inhibition) – new data
      5. Flavopiridol (RNA pol II elongation inhibition) – new data
      6. Actinomycin D (DNA intercalation) – original data

      7. Multiple perturbations that cause nuclear blebbing (Figure 2D ____and Supplemental Figure 1):

      8. VPA histone deacetylase inhibitor, which increases euchromatin and chromatin decompaction; used because it is the most highly studied treatment by our lab (Stephens et al., 2017, 2018, 2019 MBoC; Pho et al., 2022 biorxiv) – original data
      9. DZNep histone methyltransferase inhibitor, which decreases heterochromatin and chromatin decompaction (Stephens et al., 2018, 2019 MBoC) – new data
      10. Lamin B1 null cells (LMNB1-/- or LB1-/-) (many previous works, including Stephens et al. MBoC 2018) – original data
      11. Lamin A constitutive knockdown cells (LA KD) (Vahabikashi et al., 2022 PNAS) – new data

      12. Nuclear bleb formation and stabilization in wild type cells is dependent on transcription in addition to VPA (Figure 5). – original data

      13. Time dependence of suppression of nuclear blebbing requested by Reviewers 2 & 3:
      14. Actinomycin D treatment of 1.5 hrs is sufficient to suppress nuclear blebs (Figure 2C) – original data
      15. Transcription inhibition with alpha-amanitin, triptolide, and flavopiridol all show an increased rate of nuclear bleb reabsorption in the first 8 hrs of treatment for both VPA and LMNB1-/- perturbations (Supplemental Figure 1B) – new data.
      16. This new data indicates that even formed blebs require active transcription to remain blebbed for long times
      17. This new data also shows that the effect of transcription inhibition on nuclear blebbing does not require 24 hours of treatment.

      __Moreover, the authors imply that VPA works through histone deacetylation yet do not provide direct evidence. It is equally likely that the application of VPA alters the acetylation pattern of a non-histone protein that eventually alters nuclear blebbing. __

      The Reviewer questions whether histone deacetylation due to VPA treatment is responsible for nuclear blebbing. As the Reviewer notes in their next point below, histone deacetylation (e.g., by VPA or TSA treatment) as a mechanism for nuclear blebbing was previously established by work from our lab (Stephens et al., 2018 and 2019 MBoC) and others (Kalinin et al. MBoC 2021). This was described and referenced in the original manuscript’s introduction.

      To summarize previous work, inhibition of histone deacetylation by VPA induces chromatin decompaction (Stypula-Cyrus et al. PLoS One 2013, Lleres et al. J Cell Bio 2009), increasing histone acetylation/euchromatin (Göttlicher et al. EMBO J 2001; Krämer et al. EMBO J 2003). In turn, this softens the nucleus (Stephens et al. MBoC 2017; Shimamoto et al. MBoC 2017), which succumbs to nuclear blebbing (Stephens et al., MBoC 2018). Softening and blebbing effect can also be induced by histone hyperacetylation via TSA or histone demethylation via DZNep (Stephens et al., MBoC 2018). This effect can be reversed by chromatin compaction via increased histone methylation/heterochromatin formation (Stephens et al. MBoC 2019).

      In the present work, we measured histone acetylation (H3K9ac) in both VPA and VPA+alpha-amanitin perturbations to ensure that alpha-amanitin does not simply reverse the increase in VPA-based histone acetylation and thereby decrease nuclear blebbing, which it does not (Figure 3, A and B).

      Altogether, inhibition of histone deacetylation by VPA as a mechanism for nuclear blebbing is established by the previous literature. The present work builds on those results to uncover a surprising new driver of nuclear blebbing which is transcirption. Therefore, we consider it to be unnecessary to provide further confirmatory measurements of VPA-treated cells beyond what is already provided in the manuscript. Finally, we point to the inclusion of new data from three other nuclear perturbations that cause nuclear blebbing that can be suppressed by transcription inhibition (Figure 2).

      __Regardless, the reported findings with VPA were previously reported (Stephens et al. 2018) and the influence of alpha amanitin only represents an incremental advancement in our understanding of nuclear blebs. __

      The finding that alpha-amanitin inhibits nuclear blebbing implies that a previously unknown mechanism/pathway, involving an essential genomic process, is critical to nuclear shape regulation. We therefore strongly disagree with the Reviewer that bleb inhibition upon alpha-amanitin treatment represents an incremental advance.

      Moreover, the existing literature generally argues that nuclear blebbing is caused by actin-based compression and confinement. It is widely believed that the cytoskeleton deforms the nucleus, which can herniate a nuclear bleb in softer nuclei. Here, we show that with transcription inhibition there are no overt changes to actin contraction (Supplemental Figure 2), actin confinement (Figure 3E), and nuclear mechanics (Figure 3G). However, levels of blebbing change anyway! This will be a new and surprising result to those who believe the current prevailing narrative from the literature. We have now shown for the first time that transcription is also needed to form and stabilize nuclear blebs; to our knowledge, this was almost entirely unknown until now.

      Further supporting our belief in the significance of our findings, Reviewer #1 and Reviewer #3 clearly state that our work is novel and important:

      Reviewer #1 “To my knowledge this is the first report of this phenomena, and thus will be impactful to the nuclear mechanics field.”

      Reviewer #3 “This is an interesting study that shows, for the first time, that inhibition of transcription reduces the occurrence of nuclear blebs in cells that have been pre-treated with valproic acid.”

      To address the Reviewer’s concern, we have revised the manuscript to clarify that active transcription is required to form nuclear blebs across all of the perturbations now presented in this manuscript. Furthermore, we have clarified that transcription inhibition appears to suppress blebbing without altering other cellular components and properties (actin, nuclear stiffness) that are widely believed to control blebbing (see Results page 7, Results page 10, Discussion page 14).

      Adding to the concern is that actinomycin D does not have the same level of influence as alpha amanitin (Figure 2), which suggests the alpha amanitin is having a pleotropic impact on blebbing. To validate that the changes in blebbing in the presence of VPA are dependent upon active transcription, the authors should use the anchor-away technique to remove RNAP from the nucleus thereby avoiding any indirect effects of the drugs (i.e., alpha amanitin) in use. Further adding concern that it is an indirect outcome is the prolonged incubation period (16-24 hours) that is apparently needed to observe the changes (page 5 paragraph 4). If it is active transcription that is causing the change in blebbing, then this should be apparent in a much shorter time frame (The Reviewer is worried about possible differences between transcription inhibitors actinomycin D and alpha amanitin. To further address these concerns in the revised manuscript, we now present new data for VPA without transcription inhibitor and VPA with transcription inhibition vy four different transcription inhibitors (__Figure 2C). Inhibitors include alpha-amanitin (RNA pol II degradation), triptolide (transcription initiation inhibition), flavopiridol (transcription elongation inhibition), and actinomycin D (DNA intercalation). All VPA plus transcription inhibitor treatments result in a significant decrease in nuclear blebbing relative to VPA treatment alone (p (p > 0.05, Figure 2C). Thus, there is no significant difference in the degree of nuclear blebbing suppression between the four different transcription inhibitors used.

      Furthermore, the Reviewer raises concerns about the time interval from the start of transcription inhibitor treatment to suppression of nuclear blebbing. We agree that considering this time interval is valuable. However, we need to consider that the time interval for each of the different transcription inhibitors to take effect is different (Bensaude 2011 Transcription). Alpha-amanitin inhibits transcription in 4-8 hours (10 µM, Nguyen et al., 1996 NAR), triptolide (1 µM, Chen et al. 2014 Genes Dev) and flavopiridol (0.5 µM, Chen et al., 2005 Blood) work in 2-4 hours, and actinomycin D works in about 1 hour (10 mg/mL, Lai et al. 2019 Methods). These times are now mentioned in the manuscript (Figure 2 legend and Methods section).

      It was not, however, known in advance how long it would take for transcription inhibition to have an effect on nuclear morphology. Therefore, the time to observe bleb suppression could have been longer than these treatment durations. As mentioned above, treatment with actinomycin D for 1.5 hours results in a similar decrease in nuclear blebbing as compared to the other inhibitors with 24-hour treatment (Figure 2C). To further address these concerns, we provide new data in the revised manuscript showing tracking of nuclear bleb reabsorption during the first 8 hours of treatment with alpha amanitin, triptolide, and flavopiridol via live cell imaging. Nuclear bleb reabsorption for both VPA and LMNB1-/- perturbations goes from ~5 % to 30% or greater during the first 8 hours of treatment with each of the transcription inhibitors (Supplemental Figure 1B), consistent with the time required to fully inhibit transcription. This supports our conclusion that transcription is essential to stabilizing nuclear blebs.

      __In addition to these issues, the authors rely on immunofluorescence signals to measure the levels of various factors including the Ser5 and Ser2 phosphorylation, which is capturing the total levels of these factors and not the DNA bound forms. If the changes in blebbing actually involve transcription initiation, then the authors should include measurements on the DNA-bound factors. __

      We are measuring Ser5 and Ser2 phosphorylation of RNA polymerase to track the actively DNA transcribing population. These markers appear on DNA-bound RNAP. Ser5 and Ser7 of RNAP are phosphorylated during initiation, and subsequently dephosphorylated during transcription elongation, while Ser2 is added at that time (Hsin and Manley 2012 Genes Dev). Ser2 is removed at transcription termination. Therefore, we expect immunofluorescence to measure DNA-bound RNAP.

      __As reported the authors conclude that there is no changes in Ser2 and Ser5 phosphorylation yet they report that total RNA levels rise (Figure 1). How is the disconnect between RNA levels and Ser2 and Ser5 phosphorylation occurring? __

      The Reviewer raises a question about how VPA treatment increases RNA levels but not levels of active RNA pol Ser2 and Ser5. While this is an interesting question, without a dedicated investigation, we can only speculate, at best; this question is beyond the scope of the paper focused on how transcription inhibition suppresses nuclear blebbing. The point of this data is to show that treatment with alpha-amanitin alone and along with VPA causes decreases in both RNA and RNA pol II Ser2 and 5 confirming transcription inhibition.

      __Comparably, they use H3K9ac immunofluorescence as a measure of euchromatin. While the authors might be gaining a view on the total levels of H3K9ac under these experimental conditions, it is not clear whether this is DNA associated or not. Minimally, the authors should perform ATAC-Seq to judge the changes in euchromatin. __

      The Reviewer questions the use of H3K9ac immunofluorescence as measurement of euchromatin levels, particularly in VPA-treated cells. The relationship between VPA and chromatin decompaction / euchromatin levels has been previously established (e.g., Stypula-Cyrus et al. PLoS One 2013, Felisbino et al. J Cell Biochem 2014, Lleres et al. J Cell Bio 2009). New data in Figure 3B shows that heterochromatin marker H3K9me2,3 also is not altered by alpha-amanitin treatment. In the case VPA + alpha-amanitin treatment, micromanipulation and nuclear height measurements provide further evidence that chromatin decompaction remains, since chromatin-based force response is unchanged from VPA treatment alone (Figure 3, E and G).

      Again, we note that our manuscript focuses on the effects of transcription on nuclear blebbing and rupture, which were not previously reported and differ from the current understanding in the literature. Furthermore, ATAC-seq is a major undertaking that is simply not appropriate for further proving an auxiliary point about a previously established effect.

      In summary, the original manuscript addresses this point. The specific experiment requested by the Reviewer is not necessary and is far beyond the scope of this study.

      A final major concern is the lack of a correlation between the blebbing and nuclear ruptures (page 7 paragraph 3; Figure 4). If ruptures are not correlating with the blebbing, what is the relevance of the blebbing?

      The Reviewer is asking for a clarification of the importance of nuclear blebbing in relation to nuclear ruptures. We have revised the manuscript to add new text to the Figure 4 legend clarifying the measurements and to the Discussion section describing the importance of this data (Discussion pages 12-13 and page 14). We discuss this in more detail below.

      We would like to clarify that blebbing and nuclear rupture are not uncorrelated, as suggested by the Reviewer. We and others have shown that nuclear blebs are sites of high curvature that result in nuclear ruptures. In the present manuscript, timelapse imaging of nuclear bleb formation has been observed to result in nuclear rupture within minutes in all imaged cases (Figure 5). This data in the manuscript agrees with previous published data from our lab of bleb formation to rupture in >95% of the time (Stephens et al., 2019 MBoC). Furthermore, stabilized nuclear blebs persist for hours (Supplemental Figure 1B) and undergo more rupture, as shown in Figure 4D. Therefore, ruptures remain correlated with nuclear blebs in our study.

      What we have shown, however, is that the percentage of cells that undergo at least one nuclear rupture during the time lapse is not statistically significantly decreased from VPA-treated levels by the addition of alpha-amanitin (Figure 4B). This appears to be due to two factors: 1) a basal level of nuclear rupture (see wild type data in Figure 4) and 2) an increase in the level of non-bleb-based nuclear rupture. However, importantly, non-bleb-based ruptures appear to occur less frequently for cells that undergo nuclear ruptures. Of the cells that exhibit nuclear rupture, those with non-bleb-based ruptures on average undergo only a single rupture over a 3-hour timelapse whereas those undergoing bleb-based rupture undergo an average of > 2 ruptures over the same time (Figure 4D).

      Altogether, these data point to a correlation between blebbing and rupture, where blebbing can promote nuclear rupture, but is not essential for rupture. Therefore, observations of blebs are important in that they correspond to increases in nuclear rupture and corresponding nuclear dysfunction, such as DNA damage. The observation of non-bleb-based rupture, while not entirely a new (Chen et al. PNAS 2018, Stephens et al. MBoC 2019, Pho et al. bioRxiv 2022), is interesting because it may be driven by a different mechanism; transcription is not essential for nuclear ruptures in the absence of nuclear blebs but promotes rupture in the presence of blebs. These results add to our knowledge of the factors regulating nuclear integrity and shape, and we anticipate that they will be further investigated in future studies.

      Finally, beyond these findings, we speculate that blebbing itself may be harmful to cell nuclear function. Previous studies have observed that nuclear deformations can cause DNA damage (Shah et al. Curr Biol 2021), chromatin reorganization (Jacobson et al. BMC Biol 2018, Golloshi et al. EMBO J 2022), and alterations to mechanotransduction (reviewed in Kalukula et al. Nat Rev Mol Cell Biol 2022). The extent to which the changes associated with these “nuclear deformations” require blebbing, rupture, or both is under investigation by various labs. Furthermore, previous studies (Shimi et al. Genes Dev 2008; Pfleghaar et al. Nucleus 2015) along with the present study (RNA Pol Ser2 and Ser5; Figure 6) have shown that chromatin content and, possibly, functionality is different within the nuclear bleb. Data in another manuscript in preparation from our lab, further suggests that there is limited exchange of biomolecular content between the nuclear body and bleb. Therefore, while we cannot conclusively claim that blebs are themselves deleterious to function, there is a growing body of suggestive evidence that this is the case.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This is an interesting study that shows, for the first time, that inhibition of transcription reduces the occurrence of nuclear blebs in cells that have been pre-treated with valproic acid. The data that supports this is in Figure 2, collected in two different cell types (MEFs and HT1080 cells). The effect appears robust. New data is also provided that a marker of initiation of transcription but not transcriptional elongation is enriched in valproic acid-induced blebs.

      We thank Reviewer #3 for positive comments that our study is “interesting”, “reproducible”, and data that shows the effect of transcription on nuclear blebbing “for the first time”.

      This Reviewer asks for clarifications on 1) how transcription is a new mechanism for nuclear bleb formation and not part of the traditional view, 2) the generality of our conclusions (similar to Reviewer #2) since we report “on the inhibition of transient, small, valproic acid-induced blebs by alpha-amanitin”, and 3) the insight the modeling provides. We have provided new data and made changes to the manuscript to address all the Reviewer’s comments.

      __ Major comments

      1. The paper makes general claims about transcription and nuclear shape, when in reality, it is only reporting on the inhibition of transient, small, valproic acid-induced blebs by alpha-amanitin. This scenario under which the experiments were performed, for which there is no obvious physiological counterpart, ought not to be construed to challenge or contrast with the current understanding that the nucleus maintains its shape by resisting cytoskeletal forces. Cytoskeletal forces are well-known to establish nuclear shape; nuclear shape in this context, is generally taken to refer to the gross shape of the nucleus (e.g. elliptical, circular, etc.), and not small local blebs that may form due to F-actin based confinement or other mechanisms. Thus, this interpretation is overstated:

      "Surprisingly, we find that while nuclear stiffness largely controls nuclear rupture, it is not the sole determinant of nuclear shape. This contrasts with previous studies, which suggested that the nucleus maintains its shape by resisting cytoskeletal and/or other external antagonistic forces (Khatau et al., 2009; Le Berre et al., 2012; Hatch and Hetzer, 2016; Stephens et al., 2018; Earle 12 et al., 2020)."

      __

      The Reviewer appears to be concerned with two issues in this comment. First, the Reviewer is concerned about our use of the word shape, which could be interpreted too generally, rather than as categorizing the blebbing and rupture phenomena that we observe in this study. We appreciate the Reviewer’s feedback and have made changes to this sentence as well as the paper in general to clarify that we are focused on nuclear blebs. Second, there is the issue of to what degree our results modify our understanding of the role of nuclear stiffness in nuclear blebbing and rupture. We discuss this below.

      To address the Reviewer’s comment that the results are limited to “the inhibition of transient, small, valproic acid-induced blebs by alpha-amanitin” we provide new data and context for our results. The revised manuscript includes 1) new data using four transcription inhibitors and four nuclear blebbing perturbations and 2) original data showing that nuclear blebs are persistent rather than small and transient, and they alter gross nuclear shape. Our results are relevant to a wider range of blebbing/rupture and bleb/rupture suppression scenarios, as exemplified by the different nuclear perturbations, transcription inhibitors, cell types tested in our experiments, and long lifetimes for nuclear blebs. More specifically:

      1) The Reviewer notes that our original studies were done with VPA and alpha-amanitin, similar to Reviewer #2 concerns. We provide new data to now show that 4 different transcription inhibitors can suppress nuclear blebbing across 2 chromatin and 2 lamin perturbations (Figure 2 and Supplemental Figure 1). Thus, our new data supports the idea that transcription is broadly required for nuclear blebbing.

      2) The Reviewer states that blebs are small and transient, and that “shape” is meant to reflect the gross shape (e.g., circular). In fact blebs are long-lived as we show with new data that most (>95%) of VPA and LMNB1-/- blebs, remain at the end of an 8-hour timelapse (Supplemental Figure 1B). Furthermore, on average, nuclear blebs account for 15% of the nuclear size in VPA-treated cells (Figure 6E). While not measured in this paper, many studies have shown that nuclear blebs cause gross circularity to decrease significantly and that changes in circularity are associated with nuclear rupture (e.g., Stephens et al. MBoC 2018, Xia et al. JCB 2018). Most recently, we show nuclear blebs decreased nuclear circularity significantly in another manuscript under review (Pho et al., 2022 biorxiv).

      The Reviewer also argues that our data showing the importance of transcription in nuclear blebbing “ought not to be construed to challenge or contrast with the current understanding that the nucleus maintains its shape by resisting cytoskeletal forces.” We acknowledge that our results are not sufficient to rule out the broad assertion made by the Reviewer. However, our data shows for the first time that nuclear blebbing relies on transcriptional activity, while we measure no change in actin contraction or confinement or nuclear stiffness (respectively, Supplemental Figure 2 and Figure 3, C-E). Consequently, these results are a challenge to the current understanding, which must be updated by our results and future experiments. At the same time, we note that this manuscript’s Discussion section acknowledges that we have data in another preprint in which inhibition of actin contraction decreases nuclear blebbing to near 0% in wild type and perturbations (Pho et al., 2022 biorxiv). Together, these observations suggest a complicated picture in which multiple factors are jointly responsible for regulating nuclear blebbing and rupture.

      __ As an aside, the data in the paper does not appear to support the interpretation that "nuclear stiffness largely controls nuclear rupture". It is unclear what the authors mean by this statement.__

      We originally intended that comment to state the previous understanding in the literature, but we realize it was unclear. We appreciate the Reviewer’s feedback and have revised the text.__ __

      __ 2. Further to point 2, treatment with alpha-amanitin does nothing to the occurrence of blebbing in normal cells. Thus, the data are specifically applicable to valproic acid-treated cells. As such, the broad interpretations related to nuclear shape and mechanics should be tempered.__

      The Reviewer is concerned that we cannot support the claim that this effect is broad and general; these concerns are also raised by Reviewer #2. We have provided new data and highlight original data to support that this effect is in fact broad and general, and moreover, that the data supports a role for transcription in nuclear blebbing.

      We specifically address the Reviewer’s statement: “treatment with alpha-amanitin does nothing to the occurrence of blebbing in normal cells”. In the original manuscript, we provided data that showed that wild type nuclear bleb formation and stability are suppressed upon transcription inhibition (Figure 5) even though the percentage of wild type nuclei exhibiting a bleb is not changed by alpha-amanitin treatment (Figure 2). We also provided data showing that the predominant type of nuclear rupture changes with alpha-amanitin treatment, including in wild type cells (blebbed vs. not, Figure 4C). Thus, while the effects of transcription inhibition are most easily visible in VPA-treated cells, they are also present in wild type cells in how blebs are formed and stabilized (Figure 5). We have revised the manuscript to better highlight this important point.

      In addition, we again emphasize that our results extend beyond VPA-induced blebs. Our revised manuscript now includes new data of 4 different perturbations (to chromatin histone modifications and lamins A and B) that induce nuclear blebs, which can be suppressed by 4 different transcription inhibitors (Figure 2 and Supplemental Figure 1). As previously noted by both Reviewers 1 and 3, this effect is reproducible in different cell lines. This new data directly addresses the concern that the effect is only applicable to VPA and alpha amanitin.

      Nonetheless, we agree with the Reviewer that we cannot support broader claims that nuclear mechanical properties are unaltered by transcription inhibitors across all scenarios, as we only measured this change in VPA-treated cells. Micromanipulation force experiments are detailed and time consuming, making it difficult to include data for multiple perturbations. We chose VPA because we have the most measurements of this perturbation which have remained consistent over the life of micromanipulation force measurements. Therefore, we have revised our statements on nuclear mechanics in the revised manuscript (page 14).

      __ T____he motor model for RNA pol II activity assumes that the motor 'repels' nearby chromatin units. It is not clear how this is related to the mechanism of motor action of RNA pol II on chromatin during transcription.__

      The point of the model is not to precisely reproduce the manner in which transcribing RNA pol II exerts forces on the chromatin fiber. Instead, we have developed a coarse-grained model to study how the collective activity of molecular motors might drive chromatin dynamics and consequently, changes in nuclear shape, either global or local.

      The model itself is based on our earlier models, which were used to recapitulate and understand how changes to chromatin mechanical properties governed nuclear rigidity (Stephens et al. MBoC 2017, Banigan et al. Biophys J 2017, Strom et al. eLife 2021; also see a similar model by Lionetti et al. Biophys J 2020) and how nonequilibrium activity due to molecular motors, such as RNA pol II, can drive coherent chromatin dynamics (Liu et al. PRL 2021), which have been observed in live-cell imaging experiments (e.g., Zidovska et al. PNAS 2013; Shaban et al. NAR 2018; Shaban et al. Genome Biol. 2020, among others). The current model therefore explores how the newly observed connection between transcription and nuclear blebbing could be explained by known phenomena.

      We note that the "repelling” motors used to model RNA pol II activity in the present work are in many ways qualitatively similar to the dipolar “extensile” motors used by other researchers to model motor-driven chromatin dynamics (e.g., see Saintillan et al. PNAS 2018). More generally, study of “active matter” over the last 20-30 years (and statistical physics over the last century) has shown that precise details of active molecular agents are often unimportant to the larger-scale behavior of the system (e.g., see Marchetti et al. Rev Mod Phys 2013). Thus, we view the repulsive motors as modeling the effective behavior of many RNA pol II within a sub-micron region of chromatin. Better establishing the differences between different choices of motor activities is the subject of a modeling paper in preparation.

      To address the Reviewer’s concern, we have more clearly stated the scientific foundations of the model, and we have revised our description of the model to clarify that we do not intend to model the behavior of individual RNA pol II by individual repulsive motors (see Results section, page 10).

      __The motor model also does not seem to add conclusive insight to the manuscript, as the nuclear shapes predicted are not directly comparable to the experimental shapes which are flat and smooth with only an occasional, single, local bleb. __

      The Reviewer raises two related points with this comment: that bulges and blebs are not directly comparable, and therefore, that the model “does not seem to add conclusive insight to the manuscript.”

      We agree with the Reviewer that bulges in the simulations are not blebs as they are observed in the experiments. However, it seems likely to us that bulges are necessary precursors to bleb formation; it is difficult to envision how a large local nuclear protrusion could form without first bulging outward from the nuclear body. Furthermore, we disagree with the assertion that nuclei are generally flat and smooth, as qualitative and quantitative analysis of imaging data reveals that nuclei exhibit shape fluctuations and irregularities across multiple scales (see, for example, Chu et al. PNAS 2017, Patteson et al. JCB 2019, Stephens et al. MBoC 2019, Liu et al. PRL 2021).

      Nonetheless, the observation of bulges but not blebs is a shortcoming of the simulation model. We believe this shortcoming reflects a tradeoff made in developing this model; we chose to develop and study a model with relative simplicity compared to a real cell nucleus. A more complicated model might better capture some aspects of nuclear blebbing at the expense of additional complexity. For example, the current model does not allow lamin-lamin or chromatin-lamin bonds to rupture, either stochastically or due to high forces. This effect, which is likely present in vivo, might be necessary for generating more bleb-like structures in simulations. Developing and refining such a model is an active pursuit within our collaboration, but for the moment, it is beyond the present purpose of the model.

      Instead, the purpose of the model is to determine whether the observed effect of transcription inhibition on nuclear blebbing / localized shape deformations can be understood through known biophysical phenomena. Established models – to the extent that they exist – were insufficient because they typically relied on nuclear mechanics, which our experiments provide data that transcription is not changing nuclear mechanical rigidity. The current model demonstrates how motor activity within chromatin can alter the structure and dynamics of the lamina. The simulations are certainly not proof that transcription affects nuclear blebbing through the proposed mechanism. However, they are a first-of-their kind demonstration of how nonequilibrium biophysical activity (such as that generated by transcription) within a biopolymer system (chromatin) can emergently alter the geometry of the confining boundary (the lamina). This new result provides a plausible interpretation for the experiments in the manuscript.

      In the revised manuscript, we have clarified our modeling approach and objectives in the Results and Discussion sections, and we have more clearly identified and discussed the limitations of the model (Results pages 10-11, Discussion page 15).

      The model offers 'proof of principle', but is not capable of ruling out alternative mechanisms (such as nuclear pressurization by confinement, chromatin decompaction, or changes to osmotic pressure). It may be more appropriate to include the model in the discussion as opposed to presenting it as a new result that can be reliably interpreted through comparisons with experiment.

      We respectfully disagree with the suggestion to include the model in the Discussion section instead of the Results. As discussed above, the model is new biophysics research and the simulations produced new scientific results, even if the overall interpretation remains open.

      However, we have some thoughts about the alternatives suggested by the Reviewer. This is discussed in detail below, but briefly: experimental data, rather than the model itself, suggests that the alternative mechanisms mentioned by the Reviewer do not explain the effects of transcription. After treatment with alpha-amanitin, we do not observe changes to actin-based confinement or contraction (Figure 3E, Supplemental Figure 2), and there are no changes to chromatin histone modifications or nuclear rigidity (Figure 3). We also are skeptical of osmotic pressure arguments since 1) fluid, ions, and small biomolecules should freely flow through nuclear pores to maintain osmotic pressure balance between the nucleus and the cytoplasm, especially on hours-long time scales, and 2) increasing the osmotic pressure by fragmenting chromatin has previously been observed to have either no effect or a suppressive effect on nuclear stiffness (Stephens et al. MBoC 2017, Belaghzal et al. Nat Genet 2021), which would potentially increase blebbing (the opposite of the effect suggested by the Reviewer). We have addressed this further in the revised Results section (page 10) and below.__ __

      __ 4. The data in the paper is not strong enough to rule out the more conventional mechanism of nuclear pressurization, which could be caused by F-actin based confinement or chromatin decompaction, or changes to osmotic pressure. Immunostaining of myosin is not a reliable way to compare myosin activity across conditions. It is possible that the long treatment with alpha-amanitin (unto 24 h, Fig. 2) relieves the pressure in the nucleus without measurable changes in the already established cell shape and hence the nuclear shape (height changes in spread cells are small at best -- valproic acid appears to reduce height by ~0.5 microns in Figure 3E which is smaller than the optical resolution along the z-axis of a typical confocal microscope).__

      The Reviewer has proposed several alternative mechanisms and questioned the use of immunostaining and nuclear height measurements in the manuscript. We address each of these below.

      Specifically, the Reviewer is concerned that we cannot rule out the more conventionally believed mechanisms of 1) actin confinement, 2) actin contraction 3) chromatin decompaction and/or 4) osmotic pressure. We have revised the text to clarify that our data and data from others strongly supports that these four “conventional” mechanisms are not responsible for transcription inhibition-based nuclear blebbing suppression (revisions on pages 7, 10, 14).

      1) Actin confinement, as measured by nuclear height does not change upon transcription inhibition (Figure 3, C-E). Thus, our data supports the idea that transcription inhibition suppresses nuclear blebbing through a different mechanism. The Reviewer objects to this measurement on the basis that even the 0.5 µm change observed for VPA-treated cells is below optical resolution. However, optical resolution is not relevant to this measurement because we are not resolving two objects; rather, we are measuring the size of one object, the nucleus.

      When two dots/objects are separated in the same frame or in different z slices, one needs to clearly distinguish two gaussians point spreads from the two objects a distance X apart. That is resolution and that is not the relevant limitation here. We measure the size of one object (the nucleus) using full-width half-maximum, which can quantify changes in nuclear height at scales finer than the optical resolution. For example, the FWHM of a fluorescence bead can be observed to change by just 10’s of nm depending on the light emitted; with small wavelengths, one has smaller FWHM (from the Rayleigh criterion, θ = 1.22λ/D, where λ is the wavelength of the light). Our measurements are through a z-stack at 200 nm steps, thus the change in distance from wild type to VPA-treated of 0.5 µm is 2.5 z steps (not smaller than one z step). Finally, we have additional data showing our ability to measure these differences many times over (Pho et al. 2022 biorxiv).

      Image left is from: https://en.wikipedia.org/wiki/Full_width_at_half_maximum

      Image right is a crop of Figure 3D from the manuscript.

      2) Actin contraction, as measured by γMLC2, does not change either (Supplemental Figure 2). However, we know that actin contraction is a major determinant of nuclear blebbing (Mistriotis et al., 2019 JCB and Pho et al., 2022 biorxiv). Therefore, our data support that transcription affects blebbing in some other way than actin contraction.

      The Reviewer disputes this finding by stating that “immunostaining of myosin is not a reliable way to compare myosin activity across conditions.” Published reports show that γMLC2 immunostaining is a reliable way to measure actin contractility changes (Wan et al. MBoC 2012; Ramachandran et al. Mol Vision 2011; Duan et al. Cell Cycle 2016; Nishimura et al. PLOS One 2020). We have another preprint showing that alterations to actin contraction as measured by immunostaining of phosphorylated myosin light chain 2 (γMLC2) determine nuclear blebbing, independent of changes in actin confinement (Pho et al., 2022 biorxiv). There, we clearly show that changes in γMLC2 immunostaining can measure changes in actin contraction due to well-established modulators. Similarly, the ROCK inhibitor Y27632 in Supplemental Figure 2 can be viewed as a positive control in that γMLC2 immunostaining is clearly decreased after treatment with the inhibitor.

      3) Chromatin decompaction via H3K9ac and chromatin-based nuclear rigidity are not rescued by transcription inhibition. New data also shows that levels of heterochromatin H3K9me2,3 does not change upon transcription inhibition (Figure 3B). The new data presented in this manuscript shows that transcription inhibition also suppresses blebbing in DZNep-treated cells (Figure 2D), where chromatin compaction by heterochromatin formation is inhibited (Stephens et al. MBoC 2019). Together, these experiments suggest that transcription inhibition is not suppressing nuclear blebs through increases in heterochromatin-based chromatin compaction.

      Furthermore, the lack of change in the measurement of nuclear stiffness via micromanipulation (Figure 3G) provides a complementary metric suggesting that chromatin compaction is unchanged, at least in the case of VPA + alpha-amanitin.

      Altogether, these results are inconsistent with transcription inhibition suppressing blebs through alterations to chromatin compaction.

      4) Osmotic pressure is the least or not at all established of the four “traditional” mechanisms. The Reviewer proposes that transcription inhibitors, such as alpha-amanitin, could relieve osmotic pressure within the nucleus. We disagree with this explanation in that it is implausible for the nucleus to maintain an osmotic pressure imbalance in VPA-treated cells over long periods of time. Fluid, ions, and small biomolecules likely can flow through nuclear pores to maintain osmotic balance between the nucleoplasm and cytoplasm, especially over the hours long duration of VPA treatment. Furthermore, we are skeptical that VPA treatment, even with its chromatin-decompacting effects, significantly increases osmotic pressure because nuclear stiffness actually decreases after VPA treatment (Stephens et al. MBoC 2017, 2018, 2019; Krause et al. Phys Bio 2013; Shimamoto et al. MBoC 2017; Hobson et al. MBoC 2020) . Increased osmotic pressure should cause the nucleus to be stiffer. Moreover, nuclei in VPA-treated cells consistently undergo blebbing and rupture, which would naturally relieve any pressure imbalance. Thus, the notion that the measurements after hours VPA or VPA+aam treatment (Figures 2-5) are the result of a steady-state change in osmotic pressure is simply inconsistent with the experimental data.

      We note that in cases of acute osmotic shock, where the osmotic pressure balance of the nucleus may be altered, the nucleus changes in size (e.g., see Finan et al., 2009 Ann Biomed Eng), which we do not observe in our experiments. Our measurements of nuclear area (Figure 6C) and height (Figure 3E) show no change nuclear size upon transcription inhibition (for more on the issue of height measurement, see the previous point).

      To further address concerns about overnight treatment causing off-target effects, we have provided new data from a shorter treatment duration in the manuscript. The new data shows that within 8 hours, blebs exhibit more reabsorption after alpha-amanitin, triptolide, and flavopiridol treatment in both VPA-treated and LMNB1-/- cells (Supplemental Figure 1B). Additionally, we note that actinomycin D decreased nuclear blebbing in 1.5 hours, and thus did not require overnight treatment.

      In summary, our original and new data clearly show that transcription contributes to nuclear blebbing. Transcription inhibition does not change other factors (such as actin-based confinement or contraction, changes in chromatin compaction, or osmotic pressure), that have been shown or may be thought to contribute to nuclear blebbing. The revised manuscript addresses this issue through the inclusion of new data, as discussed above.

      __

      Further to point 4, the data in Figure 4B and 4D both show a decrease in the mean of the % of ruptured nuclei and rupture frequency (please provide units for this frequency on the Y-axis). With more experiments, perhaps the data would have reached statistical significance?__

      The Reviewer is asking for clarification on the data included in Figure 4 B and D reporting the percentage of cells that display a nuclear rupture.

      We have revised the manuscript to clarify that Figure 4B is the percentage of all nuclei that show at least one nuclear rupture. The measurement unit, percent (listed as “[%]”), is shown on the y-axis. The revised manuscript also clarifies that Figure 4D reports, for the nuclei that rupture, the average number of times a nucleus ruptures during the 3-hour time-lapse.

      The Reviewer stats that “with more experiments, perhaps the data would reach statistical significance?” To address this comment, we have altered the text to explain that % of all nuclei that rupture at least once does not significantly decrease by t-test but does show a non-statistically significant decrease. The data in Figure 4B shows that VPA causes 18.5 +/- 2.7 % rupture and VPA+alpha-amanitin causes 12.4 +/- 1.5 % rupture. Student’s t-test is p = 0.08 which is not statistically significant (p > 0.05) for six biological replicates each consists of n = 100-300 cells. We feel the data speaks for itself without us doing more experiments with the sole purpose of getting a lower p value. The stronger data is in Figure 4D, which clearly shows less nuclear ruptures per nucleus. We appreciate the Reviewer’s perspective and have modified the text in the Results and Discussion sections to reflect these important points (pages 8 and 14). __ __

      __ Minor comments.

      1. Confirmatory data, which has already been published in the same cell line in the past, could be moved if possible to supplemental information. Figure 1 seems to be a characterization of the efficacy of alpha-amanitin which is well-known, and therefore does not represent an original finding. It should perhaps be in supplemental information.__

      We understand the Reviewer’s point but would like to leave Figure 1 as a main text figure to provide a clearer story for all readers of our manuscript.__ __

      __ 2. Did the counting method used to collect data in Figure 4B exclude nuclei that rupture multiple times? This should be specified in the manuscript.__

      No, Figure 4B is the percentage of nuclei that rupture, which includes nuclei that rupture any number of times as a single nucleus that ruptures. We have revised the Figure 4 legend to clarify this point. __ __

      __ 3. This statement should be rephrased: "Since transcription is needed to form and stabilize nuclear blebs, at least some aspect of nuclear shape deformations appears to be non-mechanical" - deformation in the model in Figure 7 is clearly 'mechanical' - driven by motor force.__

      We appreciate the Reviewer’s feedback and have rewritten the text changes this to “independent of the bulk mechanical strength of the nucleus”. __ __

      __ 4. It is important to specify the times for which cells were treated with the various drugs in each figure (and not just in figure 2).__

      We appreciate the Reviewer’s feedback and have added this information to each figure legend.__

      __

      __

      Reviewer #3 (Significance (Required)):

      This paper reports new data that nuclear blebbing induced by treatment with valproic acid can be inhibited by co-treatment with alpha-amanitin. The data provided are reproducible across different cell lines. The data suggest that inhibition of transcription inhibits blebs which are induced by valproic acid treatment, but it does not inhibit blebs in cells untreated with valproic acid. Immunostaining reveals some enrichment of RNA pol II phosphorylated at Ser5 in valproic acid-induced blebs, suggesting an enhancement of transcription-initiation (but not transcriptional elongation) in the bleb. Alpha-amanitin treatment reduces bleb formation and bleb lifetime.

      While the data are clearly presented, and interesting in terms of relating transcription to blebbing, the proposed interpretation in terms of a new mechanism of blebbing is not strongly supported by the data or by the computational model. More definitive evidence is required to rule out that blebbing in valproic acid treated cells is not caused by a pressurization of the nucleus due to valproic acid treatment, which could be released by treatment with alpha-amanitin treatment for upto 24 h. The manuscript generalizes the findings to 'nuclear shape', and interprets them as suggestive of an alternative mechanism of establishment of nuclear shape; this generalization seems unsupported by the data.__

      Overall, the data provided is novel and interesting to cell biologists, provided more definitive evidence can be provided to rule out other models and to establish the new proposed model for nuclear blebbing. Else, the claims of an alternative mechanism for blebbing could be toned down, and the data on the relation between transcription and blebbing, which is the novel and interesting finding in this paper, could be presented in a more focused way.

      We appreciate that the Reviewer points out that “the data are clearly presented and interesting” and “reproducible across different cell lines.” The Reviewer’s main concerns appear to be with: 1) the effect of transcription inhibition on blebbing that is not induced by VPA, 2) alternatives or limitations to our proposed interpretation of the results, and 3) describing our results as applicable to “nuclear shape” in general.

      We have addressed each of these concerns in detail in the above response and the revised manuscript. To summarize:

      • We have included new data to show that four different transcription inhibitors combined with four different nuclear perturbations exhibit the same effects (Figure 2 and Supplemental Figure 1). Furthermore, we have clarified in the revised manuscript that even wild type (“untreated”) nuclei exhibit changes to blebbing dynamics (decreased stability, increased reabsorption) after transcription inhibition (Figure 5). Furthermore, concerns about time intervals was addressed by time lapse imaging showing that bleb reabsorption (return to normal shape) increases six-fold in the first 8 hours of transcription inhibitor treatment (Supplemental Figure 1B).
      • The original manuscript, new data, and previous data from the literature provides evidence that alternative mechanisms involving “pressurization” (discussed above), the actin cytoskeleton (Figure 3E and Supplemental Figure 2), and chromatin and nuclear rigidity (Figure 3) do not explain the observed effects of transcription inhibition. We discuss this in detail in the revised manuscript and the above response. Furthermore, we have revised our presentation and discussion of the simulation model to describe its relevance more clearly to the results, support its inclusion in the manuscript, and provide appropriate caveats on our computational findings.
      • We have revised the manuscript to clarify that our results primarily concern nuclear blebbing and rupture. The Reviewer is correct that the current investigation does not particularly focus on larger-scale shape such as circularity/ellipticity. In summary, our data clearly indicate that transcription contributes to nuclear blebbing and rupture. Previously suggested mechanisms of blebbing are generally inconsistent with the observed effect in combination with our other measurements. The model investigates a plausible new, complementary mechanism, which in itself represents an advance in biophysical modeling and ties the manuscript together.

      We thank the Reviewer for their thorough critique, which we have now addressed. We believe that the new experimental data and analysis and computational modeling in our manuscript significantly advances our overall understanding of nuclear blebbing, even as it raises new questions to be addressed by future work.

    1. Author Response

      Reviewer #1 (Public Review):

      The Introduction starts by setting up a straw-man argument, claiming that the assumption is that gene expression is set up as stable expression domains that undergo little or no subsequent change. I don't think that any current developmental biologist thinks this is true. The references used to support this claim are from the 1990s up to the early 2000s. There are numerous examples since then that show that developmental gene expression is dynamic as a rule.

      Our argument might seem like a strawman for certain sector of developmental biologists who work in the field of pattern formation, or aware of the latest advances in the field. However, a look at current publications on developmental enhancers reveals that the dominant model with which enhancer biologists interpret their data is still the French Flag model (specifically, the eve-stripe-2 model of enhancer function). We meant to address this audience, and attempted to clarify this from the very beginning by stating that “Much of our models of how enhancers work during development relies on the assumption that …”. Please, note here that we are talking about “models of how enhancers work”, not models of pattern formation in general.

      The Introduction then continues as a rather detailed review of enhancers, Tribolium methodology, tools for identifying enhancers, and more. The Introduction cites 99 references, which seems excessive for what is essentially an experimental paper. Significant parts of the Introduction can be trimmed or removed. There is no need to mention all the tools available for Tribolium if they are not used in the described experiments. A thorough analysis of the advantages and disadvantages of different modes of ATAC-seq is also beyond the scope of the Introduction. The authors should explain why they chose the tools they chose without excessive background.

      In the revised manuscript, we shortened the discussion of Tribolium methodologies and imaging techniques. However, we think that the paragraph discussing ATAC-seq strategies are important to justify our choices as why we took the effort to cut the embryos to perform tissue-specific ATAC-seq analysis, instead of performing whole-embryo ATAC-seq.

      Having said that, the Introduction actually overlooks a lot of significant work that is relevant to the subject of the paper. Specifically, the authors completely ignore all of the work on development in hemimetabolous insects such as Oncopeltus and Gryllus - the omission is glaring. There has been a lot of relevant work on dynamic gene expression patterns coming out of these species.

      You are right indeed. We apologize for that. We added now citations to relevant works from those to insect to the manuscript.

      The experimental setup involves cutting embryos into three sections at two time points. The results then discuss differences in "space" and "time" but there is no discussion of the embryological meaning of these terms. What is happening at the two time points from a developmental perspective? What is the difference between the three sections? There is a lot of relevant development going on at these stages and important regional differences, which have been well-studied in Tribolium and in other insects but are not even mentioned.

      A good point. Correlating chromatin landscape changes with embryological events is an interesting point that needs further analysis and the application of ATAC-seq to further timepoints. We chose leaving this to future work (possibly using single cell ATAC-seq). In this work, we restricted our analysis to the benefits of applying time- and tissue-specific ATAC-seq in predicting active enhancers. We added a note on this point in the discussion.

      In the preliminary results of the ATAC-seq analysis, it is clear that there are significant differences between the sections, which should come as no surprise, but fairly minor differences between the same section at the two time points. This could be because the two time points are pretty close together at a stage when there is a lot of repetitive patterning going on. A possible interpretation, which the authors don't mention because it goes against their main thesis, is that maybe most of the processes that are taking place at this stage are not dynamic enough to show up at the temporal resolution they have applied. This is worth at least a mention.

      We agree with this observation. We would like to draw the reviewer’s attention to our statement “Together, our findings indicate that changes in chromatin accessibility in Tribolium at this developmental stage are primarily associated with space rather than time…””. Detailed analysis of the chromatin dynamics across time would need taking more datapoints, which is something we plan to do in future work.

      The authors link each accessible site to the nearest gene when looking at putative enhancer function. This is a risky assumption since there are many examples of enhancer sites that are far upstream or downstream of the target gene and often closer to an unrelated gene than to the target gene. The authors should at least acknowledge this problem with their functional annotation.

      The reviewer is correct in that, in particular for large eukaryotic genomes, enhancers are often located far away from their target genes. We have no comprehensive enhancer-target data that would enable us to perform a more accurate analysis. Furthermore, the assumption that at least for some of the enhancers the nearest genes will also be their targets, and hence, provide insight into the function of the enhancers themselves seems reasonable given the relatively compact organization of the Tribolium genome. In any case, the analysis was just presented as one of several sanity checks for our ATAC-seq data; for the sake of streamlining the manuscript we no longer include this analysis in the current version of the manuscript.

      In the Discussion, the authors claim that contrary to how it may seem, the question they are addressing is not a "fringe problem". Once again, I think this is a straw man. No active researcher thinks that the question of dynamic regulation of gene expression during development is a fringe problem. On the contrary, most researchers will accept that this is one of the most interesting and important questions in current developmental biology.

      This whole argument was removed from the Discussion in the revised manuscript.

      Perhaps the most significant problem with the manuscript is that it is all built around the premise of enhancer switching between dynamic enhancers and static enhancers. The authors find one site that is consistent with their prediction for a dynamic enhancer and one site - regulating a different gene - that is consistent with their prediction for a static enhancer and claim that they have provided support for their model. I think this claim is grossly exaggerated. They present data that can be seen as consistent with their model but are a long way from providing evidence for it.

      We actually thought we were cautious enough about this. Nowhere in our text did we mention that our data “support” the enhancer switching model. We stated quite early (in the abstract, actually) that:

      “We found our data consistent with a model in which the timing of gene expression during embryonic pattern formation is mediated by a balancing act between enhancers that induce rapid changes in gene expressions (that we call ‘dynamic enhancers’) and enhancers that stabilizes gene expressions (that we call ‘static enhancers’).”

      To make this message clearer, we added the following sentence to the abstract of the revised manuscript: “However, more data is needed for a strong support for this or any other alternative models.” And again at the end of the Introductions: “While these data are in line with our Enhancer Switching model, more data is needed as a strong support for the model.” Also, at the end of the Results section examining runB enhancer dynamics, we stated: “However, this merely shows that runB activity dynamics are consistent with our model, but is still far from strongly supporting the model (more on that in the Discussion).” Also for the Results section on enhancer hbA dynamics: “Again, this merely shows that hbA activity dynamics are consistent with our model, but is still far from strongly supporting it.”.

      Moreover, in the opening paragraph of the Discussion, we explicitly and quite openly addressed this point, and suggested what kind of observations and experiments needed in the future to qualify as a “strong support” for the model. We even ran simulations for what kind of observation should one expect in enhancer deletion experiments if the model is correct (Figure 7).

      But it seems like discussing the enhancer switching model in detail gives the impression of its central importance to the paper. In our view, our experimental system is quite general and does not depend on that model, but the point of mentioning it is that it is an example of how could an alternative model of enhancer regulation be of relevance to the problem of dynamic gene expression. This wouldn’t be obvious without this or a similar model that is showing this, even if it is hypothetical. But since our presentation is obviously giving the impression that our claims are stronger that they really are, we altered our phrasing in the introduction of the revised manuscript to make our point clearer:

      “Despite its potential inaccuracies, the Enhancer Switching model exemplifies the type of alternative frameworks we need to explore in order to elucidate the mechanisms driving the generation of gene expression waves during development. Consequently, an appropriate model system is required, allowing us to test not only the Enhancer Switching model but also any other prospective model that provides a satisfactory explanation for the initiation of gene expression waves at the enhancer level.”

      We hope that this addresses the reviewer’s quite legitimate concerns.

      Like the Introduction, the Discussion includes long paragraphs (lines 450-480) that are more suitable for a review/hypothesis paper. The data presented in this manuscript has little relevance to the question of kinematic vs. trigger waves, and therefore there is no real reason for the question to be discussed here.

      We have now significantly shortened the discussion.

      Reviewer #2 (Public Review):

      Open questions:

      What happens with the runB enhancer at later stages of embryogenesis? With what kind of dynamics do the anterior-most stripes fade and does that agree with the model? Do they show the same dynamics throughout segmentation? I think later stages need to be shown because the prediction from the model would be that the dynamics are repeated with each wave. I am not so sure about the prediction for ageing stripes – yet it would have been interesting to see the model prediction and the activity of the static enhancer.

      Yes, the dynamics repeats in the germband. This is shown in Supplementary Figure 8. The dynamics in germband were shown by visualizing yellow mRNA and intronic probes. MS2 imaging was not possible to be used because the embryo dive into the yolk for a while, and then it becomes difficult to capture the germband in the right orientation for imaging. We are currently working to use light sheet microscopy for imaging germband stages.

      I understand that the mRNA of the reporter gene yellow is more stable than the runt mRNA. This might interfere with the possibility to test your prediction for static enhancers: The criterion is that the stripes should increase in strength as the wave migrates towards the anterior. You show this for runB – but given that yellow has a more stable transcript – could this lead to a “false positive” increase in intensity with the slower migration and accumulation of transcripts? I would feel more comfortable with the statement that this is a static enhancer if you could exclude that the signal is blurred by an artifact based on different mRNA stability. What about re-running the simulation (with the p–rameters that have shown to well reflect endogenous –unt mRNA levels) but i“creasing the parameter for the stability of the mRNA? Are static and dynamic enhancers still distinguishable? The claim of having found a static enhancer rests on this increase in signal, hence, other explanations need to be excluded carefully.

      Good questions. Note that runB reporter dynamics were examined not only by visualizing yellow mRNAs (which indeed seem to be more stable than endogenous run mRNA; see Supplementary Figure 10), but also using MS2 (with virtually zero mRNA stability; although stability was simulated in the shown movies to show virtual mRNA dynamics), and intronic yellow mRNA (showing de novo transcription; Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts). The expected dynamics of a static enhancer reporter is quite unique: it progressively increases initially as it propagates from posterior to anterior, then it progressively decreases as it slows down and stabilizes at the anterior. Then they eventually fade. These full range of dynamics is obvious in germband embryos stained for intronic yellow to show de novo transcription of runB enhancer reporter (Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts).

      Running the simulation for the model using different degradation rates for the enhancer reporter made the static enhancer’s expression either less or more persistent, but gave the same overall result: the static enhancer expression has diminished expression at the very posterior, but high expression as its expression wave exiting the growth-zone/SAZ. This is consistent with not only yellow mRNA expressions of runB, but with its intronic expression as well (Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts).

      What about the head domain of the runB enhancer (e.g. Fig. 6A lowest row): This seems to be different from endogenous expression in your work and in Choe et al. Is that aspect different from endogenous expression and can this be reconciled with your model?

      Yes, indeed this aspect cannot be explained by our model. We believe that head patterning in insects is regulated by a different regulatory network. This network might be (de)-activated by missing repressors in the selected DNA segment for runB enhancer. We mentioned this issue in the revised manuscript.

      The claim of similar dynamics of expression visualized by in situ and MS2 in vivo relies on comparing Fig. 6C with 6A. To compare these two panels, I would need to know to what stage in A the embryo in C should be compared. Actually, the stripe in 6C appears more crisp than the stripes in 6A.

      Were the enhancer dynamics tested in vivo at later stages as well? I would appreciate a clear statement on what stages can be visualized and where the technical boundaries are because this will influence any considerations by others using this system.

      One really cannot be that super-precise about the timing of a very dynamic process in space and time like this one we are studying. We believe that Figure 6D shows clearly that runB activity dynamics are similar to endogenous run expression.

      How do the reported accessibility dynamics of runA enhancer correlate with the activity of the reporter: E.g. is the enhancer open in the middle body region but closed at the posterior part of the embryo? Or is it closed at the anterior – and if so: why is there a signal of the reporter in the head?

      You show that chromatin accessibility dynamics help in identifying active enhancers. Is this idea new or is it based on previous experience with Drosophila (e.g. PMID: 29539636 or works cited in https://doi.org/10.1002/bies.201900188)? Or in what respect is this novel?

      Our manuscript contributes to the growing body of evidence confirming that accessibility per se does not imply activity. Of course, this is not a new idea, but given the widely use of accessibility as a proxy for enhancer activity in the genomics community, we do feel it is important to reiterate the message. As the reviewer correctly indicates, several published findings point to a correlation between accessibility dynamics and enhancer activity. However, to our knowledge, this is the first example in Tribolium. It is important to point out that what “dynamic” means strongly depends on the experimental design. Even in Drosophila, not enough studies have been conducted to fully understand the relationship (e.g., ideally, this should be done on a continuous time scale and at single cell level). We acknowledge in the manuscript that this relationship has been observed before in other species (and have added the references suggested by the reviewer, for which we are very grateful), but still believe that our observations are highly significant to the Tribolium community.

      Reviewer #3 (Public Review):

      I have two major concerns: First, the claim about differential accessibility being related to enhancer activity is not really established from the presented data, in my view. This needs to be clarified. (I do believe in the claim to some extent, but not based on presented evidence.)

      We agree with the reviewer that more data – and, more importantly, independent replication – are necessary to confirm this finding. Please, refer to our response to your comment regarding the statistical significance of the findings.

      Second, the evidence in support of the Enhancer Switching model for runt should be accompanied by identification of and spatiotemporal profiling of the “speed regulator”, if this is not established yet.

      Experiments supporting the role of Cad as a speed regulator for both pair-rule and gap genes have been published in El-Sherif et al PLOS Genetics 2014 and Zhu et al PNAS 2017. We added a comment stressing this fact.

      In addition to these two concerns, the simulations of the Enhancer Switching model need to be described, at least in the outline, in the Methods section.

      Done

    1. Author Response

      Reviewer #1 (Public Review):

      Specifically, the authors define "efficacy" (eta) of a ligand as the fractional change in binding free energy between the open and the closed states of the channel.

      We assume that the word in quotes is a typo; ղ is efficiency, not efficacy (now given the symbol λ). We now emphasize the distinction immediately after Eq. 2.

      1) One concern regards the clustering of the data sets in Fig. 5 into exactly 5 eta-classes. First, two clusters contain only two data points each. Second, the proposed "catch&hold LFER model" (Fig. 2) does not predict the existence of a discrete number of such eta-classes. How strong is the evidence that there are exactly 5 classes as opposed to a continuum of possible eta values.

      Statistical (x means cluster) analysis indicates that the 23 agonists segregate into 5 ղ classes. Groups with only 2 members (plus the intercept) are less well defined (Fig 4) but are supported by the 5 mutational ղ classes (Fig. 7). (see above)

      2) The authors do not discuss the uniqueness of the proposed model.

      see above. Ln 405 Induced fits are common.

      In fact, it seems to me that the existence of eta-classes might be explained just as well by an alternative model which assumes a single gating mechanism for the receptor,

      We are not sure what a “single gating mechanism” means. Does non-single refer to i) the2 stage induced fits (catch-hold LFER)? … ղ classes makes this conclusion unavoidable. ii) our conjecture that are there are 5 different C versus O binding site structural pairs…? Energy derives from structure, so we the 5 energy ratios indicate 5 structural pairs. iii) multiple steps inside gating (ϕ)? …So far there have not been any alternative explanations for the organized map of ϕ. iv) catch itself?... Evidence for this induced fit is given in Fig 2 and 7 SI, and on Ln 528-547 we discuss the implications of kon to C versus O. Ln 405 Local ‘Induced fit’ rearrangements in enzymes are common. We think the evidence is strong for the bottom scheme in Fig 2A.

      but distinct patterns of ligand-protein interactions for the different agonists.

      ղ classes derive from distinct interactions for different agonists, but what these are and whether the ‘contact number’ idea is useful are uncertain (see above).

      The pore opening-associated increase in agonist affinity is typically caused by a tightening of the substrate binding site (often called clamshell closure) …

      Ln 379-386 In the Discussion we now relate catch-hold to induced fit

      Ln 455, 461-463, 471-474 Fig 2SI and the induced fit to clamshell closure

      Reviewer #2 (Public Review):

      This is an interesting manuscript with a worthwhile approach to receptor mechanisms. The paper contains an impressive amount of new data. These single molecule concentration response curves have been compiled with care and the authors deserve great credit for obtaining these data.

      Ln 233 ղ can be estimated from a CRC built from whole-cell currents…

      Ln 150 …or indeed any method that estimates KdC and KdO (for example binding assays, or perhaps in silico simulations of AC and AO structures)

      I judge the main result to be that there are different values of the recently-proposed agonist-related quantity "efficiency".

      Ln 21, 26-27, 535-547 OK, but to us the most interesting insight is that in AChRs binding IS gating.

      These values are clustered into 5 quite closely spaced groups. The authors propose that these groups are the same whether considering mutations in the binding site or different agonists.

      see above

      It was unclear to me in several places, what new data and what old data are included in each figure. Therefore readers may have difficulty judging the claimed advance. This difficulty is not helped by the discussion, which includes some previous findings as "results".

      see above.

      A further weakness is that it is unclear how general or how specific these concepts are. The authors assert that they are, by definition, completely universal. However, we do not have reference to previous work or current data on any other receptor than the muscle nicotinic. I could not square the concept that "every receptor works like this" with the evident lack of desire to demonstrate this for any other receptor.

      Ln 132-136 There are reasons to think that receptors in general work according to Figure 1A. A thermalized ligand (for instance TriMA, MW 60) has the momentum of only ~3 water molecules. A momentum sensor would have terrible signal/noise.

      Reviewer #3 (Public Review):

      This work attempts to introduce a new attribute of the receptor- efficiency, a fraction of an agonist binding energy consumed by conformational transition of the receptor from resting to active (open) states. Furthermore, the authors use an impressive set of experimental data (single channel recordings with 23 agonists and 53 mutations) to measure the efficiency for each agonist and mutant receptor. All the estimated efficiencies fall into a few groups and inside each of the efficiency groups there is a strong correlation between agonist affinity and receptor opening efficacy.

      The main finding in this study is that estimated efficiencies fall into 5 groups.

      see above.

      There is no clear description of the method how the efficiencies were allocated into different groups. Most importantly, it is not clear if the method used takes into account the uncertainty of the efficiency estimate. The study does not show any statistical metrics of the efficiency estimates as well as any other calculated variable such as dissociation equilibrium constants to resting or open states. Surely, the uncertainty of the efficiency should matter especially considering how near the efficiency group values are (eg. difference about 10% between 0.51 and 0.56 or 0.41 and 0.45).

      see above

      All the tested agonists fell into groups according to the efficiency value attributed to them. It is difficult to see why some of the agonists belong to the same group. For example, it is not obvious at all why such agonists as epibatidine, decamethonium and TMP are in the same group. The question, I guess, arises if this grouping based on efficiency has any predictability value. Furthermore, if a series of mutations with the same agonist fall into different groups, the prediction power of this approach is very limited if one attempts to design a new agonist or look for a new mutation.

      see above and Ln 548-561 (last para of text). Efficiency is a relatively new idea. This report is one of only a few on the subject. More experiments with different receptors by more labs using other approaches are needed to ascertain whether ղ is general.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript will interest cognitive scientists, neuroimaging researchers, and neuroscientists interested in the systems-level organization of brain activity. The authors describe four brain states that are present across a wide range of cognitive tasks and determine that the relative distribution of the brain states shows both commonalities and differences across task conditions.

      The authors characterized the low-dimensional latent space that has been shown to capture the major features of intrinsic brain activity using four states obtained with a Hidden Markov Model. They related the four states to previously-described functional gradients in the brain and examined the relative contribution of each state under different cognitive conditions. They showed that states related to the measured behavior for each condition differed, but that a common state appears to reflect disengagement across conditions. The authors bring together a state-of-the-art analysis of systemslevel brain dynamics and cognitive neuroscience, bridging a gap that has long needed to be bridged.

      The strongest aspect of the study is its rigor. The authors use appropriate null models and examine multiple datasets (not used in the original analysis) to demonstrate that their findings replicate. Their thorough analysis convincingly supports their assertion that common states are present across a variety of conditions, but that different states may predict behavioural measures for different conditions. However, the authors could have better situated their work within the existing literature. It is not that a more exhaustive literature review is needed-it is that some of their results are unsurprising given the work reported in other manuscripts; some of their work reinforces or is reinforced by prior studies; and some of their work is not compared to similar findings obtained with other analysis approaches. While space is not unlimited, some of these gaps are important enough that they are worth addressing:

      We appreciate the reviewer’s thorough read of our manuscript and positive comments on its rigor and implications. We agree that the original version of the manuscript insufficiently situated this work in the existing literature. We have made extensive revisions to better place our findings in the context of prior work. These changes are described in detail below.

      1) The authors' own prior work on functional connectivity signatures of attention is not discussed in comparison to the latest work. Neither is work from other groups showing signatures of arousal that change over time, particularly in resting state scans. Attention and arousal are not the same things, but they are intertwined, and both have been linked to large-scale changes in brain activity that should be captured in the HMM latent states. The authors should discuss how the current work fits with existing studies.

      Thank you for raising this point. We agree that the relationship between low-dimensional latent states and predefined activity and functional connectivity signatures is an important and interesting question in both attention research and more general contexts. Here, we did not empirically relate the brain states examined in this study and functional connectivity signatures previously investigated in our lab (e.g., Rosenberg et al., 2016; Song et al., 2021a) because the research question and methodological complexities deserved separate attention that go beyond the scope of this paper. Therefore, we conceptually addressed the reviewer’s question on how functional connectivity signatures of attention are related to the brain states that were observed here. Next, we asked how arousal relates to the brain states by indirectly predicting arousal levels of each brain state based on its activity patterns’ spatial resemblance to the predefined arousal network template (Goodale et al., 2021).

      Latent states and dynamic functional connectivity

      Previous work suggested that, on medium time scales (~20-60 seconds), changes in functional connectivity signatures of sustained attention (Rosenberg et al., 2020) and narrative engagement (Song et al., 2021a) predicted changes in attentional states. How do these attention-related functional connectivity dynamics relate to latent state dynamics, measured on a shorter time scale (1 second)?

      Theoretically, there are reasons to think that these measures are related but not redundant. Both HMM and dynamic functional connectivity provide summary measures of the whole-brain functional interactions that evolve over time. Whereas HMM identifies recurring low-dimensional brain states, dynamic functional connectivity used in our and others’ prior studies captures high-dimensional dynamical patterns. Furthermore, while the mixture Gaussian function utilized to infer emission probability in our HMM infers the states from both the BOLD activity patterns and their interactions, functional connectivity considers only pairwise interactions between regions of interests. Thus, with a theoretical ground that the brain states can be characterized at multiple scales and different methods (Greene et al., 2023), we can hypothesize that the both measures could (and perhaps, should be able to) capture brain-wide latent state changes. For example, if we were to apply kmeans clustering methods on the sliding window-based dynamic functional connectivity as in Allen et al. (2014), the resulting clusters could arguably be similar to the latent states derived from the HMM.

      However, there are practical reasons why the correspondence between our prior dynamic functional connectivity models and current HMM states is difficult to test directly. A time point-bytime point matching of the HMM state sequence and dynamic functional connectivity is not feasible because, in our prior work, dynamic functional connectivity was measured in a sliding time window (~20-60 seconds), whereas the HMM state identification is conducted at every TR (1 second). An alternative would be to concatenate all time points that were categorized as each HMM state to compute representative functional connectivity of that state. This “splicing and concatenating” method, however, disrupts continuous BOLD-signal time series and has not previously been validated for use with our dynamic connectome-based predictive models. In addition, the difference in time series lengths across states would make comparisons of the four states’ functional connectomes unfair.

      One main focus of our manuscript was to relate brain dynamics (HMM state dynamics) to static manifold (functional connectivity gradients). We agree that a direct link between two measures of brain dynamics, HMM and dynamic functional connectivity, is an important research question. However, due to some intricacies that needed to be addressed to answer this question, we felt that it was beyond the scope of our paper. We are eager, however, to explore these comparisons in future work which can more thoroughly address the caveats associated with comparing models of sustained attention, narrative engagement, and arousal defined using different input features and methods.

      Arousal, attention, and latent neural state dynamics

      Next, the reviewer posed an important question about the relationship between arousal, attention, and latent states. The current study was designed to assess the relationship between attention and latent state dynamics. However, previous neuroimaging work showed that low-dimensional brain dynamics reflect fluctuations in arousal (Raut et al., 2021; Shine et al., 2016; Zhang et al., 2023). Behavioral studies showed that attention and arousal hold a non-linear relationship, for example, mind-wandering states are associated with lower arousal and externally distracted states are associated with higher arousal, when both these states indicate low attention (Esterman and Rothlein, 2019; Unsworth and Robison, 2018, 2016).

      To address the reviewer’s suggestion, we wanted to test if our brain states reflected changes in arousal, but we did not collect relevant behavioral or physiological measures. Therefore, to indirectly test for relationships, we predicted levels of arousal in brain states by applying the “arousal network template” defined by Dr. Catie Chang’s group (Chang et al., 2016; Falahpour et al., 2018; Goodale et al., 2021). The arousal network template was created from resting-state fMRI data to predict arousal levels indicated by eye monitoring and electrophysiological signals. In the original study, the arousal level at each time point was predicted by the correlation between the BOLD activity patterns of each TR to the arousal template. The more similar the whole-brain activation pattern was to the arousal network template, the higher the participant was predicted to be aroused at that moment. This activity pattern-based model was generalized to fMRI data during tasks (Goodale et al., 2021).

      We correlated the arousal template to the activity patterns of the four brain states that were inferred by the HMM. The DMN state was positively correlated with the arousal template (r=0.264) and the SM state was negatively correlated with the arousal template (r=-0.303) (Author response image 1). These values were not tested for significance because they were single observations. While speculative, this may suggest that participants are in a high arousal state during the DMN state and a low arousal state during the SM state. Together with our results relating brain states to attention, it is possible that the SM state is a common state indicating low arousal and low attention. On the other hand, the DMN state, a signature of a highly aroused state, may benefit gradCPT task performance but not necessarily in engaging with a sitcom episode. However, because this was a single observation and we did not collect a physiological measure of arousal to validate this indirect prediction result, we did not include the result in the manuscript. We hope to more directly test this question in future work with behavioral and physiological measures of arousal.

      Author response image 1.

      Changes made to the manuscript

      Importantly, we agree with the reviewer that a theoretical discussion about the relationships between functional connectivity, latent states, gradients, as well as attention and arousal was a critical omission from the original Discussion. We edited the Discussion to highlight past literature on these topics and encourage future work to investigate these relationships.

      [Manuscript, page 11] “Previous studies showed that large-scale neural dynamics that evolve over tens of seconds capture meaningful variance in arousal (Raut et al., 2021; Zhang et al., 2023) and attentional states (Rosenberg et al., 2020; Yamashita et al., 2021). We asked whether latent neural state dynamics reflect ongoing changes in attention in both task and naturalistic contexts.”

      [Manuscript, page 17] “Previous work showed that time-resolved whole-brain functional connectivity (i.e., paired interactions of more than a hundred parcels) predicts changes in attention during task performance (Rosenberg et al., 2020) as well as movie-watching and story-listening (Song et al., 2021a). Future work could investigate whether functional connectivity and the HMM capture the same underlying “brain states” to bridge the results from the two literatures. Furthermore, though the current study provided evidence of neural state dynamics reflecting attention, the same neural states may, in part, reflect fluctuations in arousal (Chang et al., 2016; Zhang et al., 2023). Complementing behavioral studies that demonstrated a nonlinear relationship between attention and arousal (Esterman and Rothlein, 2019; Unsworth and Robison, 2018, 2016), future studies collecting behavioral and physiological measures of arousal can assess the extent to which attention explains neural state dynamics beyond what can be explained by arousal fluctuations.”

      2) The 'base state' has been described in a number of prior papers (for one early example, see https://pubmed.ncbi.nlm.nih.gov/27008543). The idea that it might serve as a hub or intermediary for other states has been raised in other studies, and discussion of the similarity or differences between those studies and this one would provide better context for the interpretation of the current work. One of the intriguing findings of the current study is that the incidence of this base state increases during sitcom watching, the strongest evidence to date is that it has a cognitive role and is not merely a configuration of activity that the brain must pass through when making a transition.

      We greatly appreciate the reviewer’s suggestion of prior papers. We were not aware of previous findings of the base state at the time of writing the manuscript, so it was reassuring to see consistent findings. In the Discussion, we highlighted the findings of Chen et al. (2016) and Saggar et al. (2022). Both studies highlighted the role of the base state as a “hub”-like transition state. However, as the reviewer noted, these studies did not address the functional relevance of this state to cognitive states because both were based on resting-state fMRI.

      In our revised Discussion, we write that our work replicates previous findings of the base state that consistently acted as a transitional hub state in macroscopic brain dynamics. We also note that our study expands this line of work by characterizing what functional roles the base state plays in multiple contexts: The base state indicated high attentional engagement and exhibited the highest occurrence proportion as well as longest dwell times during naturalistic movie watching. The base state’s functional involvement was comparatively minor during controlled tasks.

      [Manuscript, page 17-18] “Past resting-state fMRI studies have reported the existence of the base state. Chen et al. (2016) used the HMM to detect a state that had “less apparent activation or deactivation patterns in known networks compared with other states”. This state had the highest occurrence probability among the inferred latent states, was consistently detected by the model, and was most likely to transition to and from other states, all of which mirror our findings here. The authors interpret this state as an “intermediate transient state that appears when the brain is switching between other more reproducible brain states”. The observation of the base state was not confined to studies using HMMs. Saggar et al. (2022) used topological data analysis to represent a low-dimensional manifold of resting-state whole-brain dynamics as a graph, where each node corresponds to brain activity patterns of a cluster of time points. Topologically focal “hub” nodes were represented uniformly by all functional networks, meaning that no characteristic activation above or below the mean was detected, similar to what we observe with the base state. The transition probability from other states to the hub state was the highest, demonstrating its role as a putative transition state.

      However, the functional relevance of the base state to human cognition had not been explored previously. We propose that the base state, a transitional hub (Figure 2B) positioned at the center of the gradient subspace (Figure 1D), functions as a state of natural equilibrium. Transitioning to the DMN, DAN, or SM states reflects incursion away from natural equilibrium (Deco et al., 2017; Gu et al., 2015), as the brain enters a functionally modular state. Notably, the base state indicated high attentional engagement (Figure 5E and F) and exhibited the highest occurrence proportion (Figure 3B) as well as the longest dwell times (Figure 3—figure supplement 1) during naturalistic movie watching, whereas its functional involvement was comparatively minor during controlled tasks. This significant relevance to behavior verifies that the base state cannot simply be a byproduct of the model. We speculate that susceptibility to both external and internal information is maximized in the base state—allowing for roughly equal weighting of both sides so that they can be integrated to form a coherent representation of the world—at the expense of the stability of a certain functional network (Cocchi et al., 2017; Fagerholm et al., 2015). When processing rich narratives, particularly when a person is fully immersed without having to exert cognitive effort, a less modular state with high degrees of freedom to reach other states may be more likely to be involved. The role of the base state should be further investigated in future studies.”

      3) The link between latent states and functional connectivity gradients should be considered in the context of prior work showing that the spatiotemporal patterns of intrinsic activity that account for most of the structure in resting state fMRI also sweep across functional connectivity gradients (https://pubmed.ncbi.nlm.nih.gov/33549755/). In fact, the spatiotemporal dynamics may give rise to the functional connectivity gradients (https://pubmed.ncbi.nlm.nih.gov/35902649/). HMM states bear a marked resemblance to the high-activity phases of these patterns and are likely to be closely linked to them. The spatiotemporal patterns are typically obtained during rest, but they have been reported during task performance (https://pubmed.ncbi.nlm.nih.gov/30753928/) which further suggests a link to the current work. Similar patterns have been observed in anesthetized animals, which also reinforces the conclusion of the current work that the states are fundamental aspects of the brain's functional organization.

      We appreciate the comments that relate spatiotemporal patterns, functional connectivity gradients, and the latent states derived from the HMM. Our work was also inspired by the papers that the reviewer suggested, especially Bolt et al.’s (2022), which compared the results of numerous dimensionality and clustering algorithms and suggested three spatiotemporal patterns that seemed to be commonly supported across algorithms. We originally cited these studies throughout the manuscript, but did not discuss them comprehensively. We have revised the Discussion to situate our findings on past work that used resting-state fMRI to study low-dimensional latent brain states.

      [Manuscript, page 15-16] “This perspective is supported by previous work that has used different methods to capture recurring low-dimensional states from spontaneous fMRI activity during rest. For example, to extract time-averaged latent states, early resting-state analyses identified task-positive and tasknegative networks using seed-based correlation (Fox et al., 2005). Dimensionality reduction algorithms such as independent component analysis (Smith et al., 2009) extracted latent components that explain the largest variance in fMRI time series. Other lines of work used timeresolved analyses to capture latent state dynamics. For example, variants of clustering algorithms, such as co-activation patterns (Liu et al., 2018; Liu and Duyn, 2013), k-means clustering (Allen et al., 2014), and HMM (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017), characterized fMRI time series as recurrences of and transitions between a small number of states. Time-lag analysis was used to identify quasiperiodic spatiotemporal patterns of propagating brain activity (Abbas et al., 2019; Yousefi and Keilholz, 2021). A recent study extensively compared these different algorithms and showed that they all report qualitatively similar latent states or components when applied to fMRI data (Bolt et al., 2022). While these studies used different algorithms to probe data-specific brain states, this work and ours report common latent axes that follow a long-standing theory of large-scale human functional systems (Mesulam, 1998). Neural dynamics span principal axes that dissociate unimodal to transmodal and sensory to motor information processing systems.”

      Reviewer #2 (Public Review):

      In this study, Song and colleagues applied a Hidden Markov Model to whole-brain fMRI data from the unique SONG dataset and a grad-CPT task, and in doing so observed robust transitions between lowdimensional states that they then attributed to specific psychological features extracted from the different tasks.

      The methods used appeared to be sound and robust to parameter choices. Whenever choices were made regarding specific parameters, the authors demonstrated that their approach was robust to different values, and also replicated their main findings on a separate dataset.

      I was mildly concerned that similarities in some of the algorithms used may have rendered some of the inter-measure results as somewhat inevitable (a hypothesis that could be tested using appropriate null models).

      This work is quite integrative, linking together a number of previous studies into a framework that allows for interesting follow-up questions.

      Overall, I found the work to be robust, interesting, and integrative, with a wide-ranging citation list and exciting implications for future work.

      We appreciate the reviewer’s comments on the study’s robustness and future implications. Our work was highly motivated by the reviewer’s prior work.

      Reviewer #3 (Public Review):

      My general assessment of the paper is that the analyses done after they find the model are exemplary and show some interesting results. However, the method they use to find the number of states (Calinski-Harabasz score instead of log-likelihood), the model they use generally (HMM), and the fact that they don't show how they find the number of states on HCP, with the Schaeffer atlas, and do not report their R^2 on a test set is a little concerning. I don't think this perse impedes their results, but it is something that they can improve. They argue that the states they find align with long-standing ideas about the functional organization of the brain and align with other research, but they can improve their selection for their model.

      We appreciate the reviewer’s thorough read of the paper, evaluation of our analyses linking brain states to behavior as “exemplary”, and important questions about the modeling approach. We have included detailed responses below and updated the manuscript accordingly.

      Strengths:

      • Use multiple datasets, multiple ROIs, and multiple analyses to validate their results

      • Figures are convincing in the sense that patterns clearly synchronize between participants

      • Authors select the number of states using the optimal model fit (although this turns out to be a little more questionable due to what they quantify as 'optimal model fit')

      We address this concern on page 30-31 of this response letter.

      • Replication with Schaeffer atlas makes results more convincing

      • The analyses around the fact that the base state acts as a flexible hub are well done and well explained

      • Their comparison of synchrony is well-done and comparing it to resting-state, which does not have any significant synchrony among participants is obvious, but still good to compare against.

      • Their results with respect to similar narrative engagement being correlated with similar neural state dynamics are well done and interesting.

      • Their results on event boundaries are compelling and well done. However, I do not find their Chang et al. results convincing (Figure 4B), it could just be because it is a different medium that explains differences in DMN response, but to me, it seems like these are just altogether different patterns that can not 100% be explained by their method/results.

      We entirely agree with the reviewer that the Chang et al. (2021) data are different in many ways from our own SONG dataset. Whereas data from Chang et al. (2021) were collected while participants listened to an audio-only narrative, participants in the SONG sample watched and listened to audiovisual stimuli. They were scanned at different universities in different countries with different protocols by different research groups for different purposes. That is, there are numerous reasons why we would expect the model should not generalize. Thus, we found it compelling and surprising that, despite all of these differences between the datasets, the model trained on the SONG dataset generalized to the data from Chang et al. (2021). The results highlighted a robust increase in the DMN state occurrence and a decrease in the base state occurrence after the narrative event boundaries, irrespective of whether the stimulus was an audiovisual sitcom episode or a narrated story. This external model validation was a way that we tested the robustness of our own model and the relationship between neural state dynamics and cognitive dynamics.

      • Their results that when there is no event, transition into the DMN state comes from the base state is 50% is interesting and a strong result. However, it is unclear if this is just for the sitcom or also for Chang et al.'s data.

      We apologize for the lack of clarity. We show the statistical results of the two sitcom episodes as well as Chang et al.’s (2021) data in Figure 4—figure supplement 2 in our original manuscript. Here, we provide the exact values of the base-to-DMN state transition probability, and how they differ across moments after event boundaries compared to non-event boundaries.

      For sitcom episode 1, the probability of base-to-DMN state transition was 44.6 ± 18.8 % at event boundaries whereas 62.0 ± 10.4 % at non-event boundaries (FDR-p = 0.0013). For sitcom episode 2, the probability of base-to-DMN state transition was 44.1 ± 18.0 % at event boundaries whereas 62.2 ± 7.6 % at non-event boundaries (FDR-p = 0.0006). For the Chang et al. (2021) dataset, the probability of base-to-DMN state transition was 33.3 ± 15.9 % at event boundaries whereas 58.1 ± 6.4 % at non-event boundaries (FDR-p < 0.0001). Thus, our result, “At non-event boundaries, the DMN state was most likely to transition from the base state, accounting for more than 50% of the transitions to the DMN state” (pg 11, line 24-25), holds true for both the internal and external datasets.

      • The involvement of the base state as being highly engaged during the comedy sitcom and the movie are interesting results that warrant further study into the base state theory they pose in this work.

      • It is good that they make sure SM states are not just because of head motion (P 12).

      • Their comparison between functional gradient and neural states is good, and their results are generally well-supported, intuitive, and interesting enough to warrant further research into them. Their findings on the context-specificity of their DMN and DAN state are interesting and relate well to the antagonistic relationship in resting-state data.

      Weaknesses:

      • Authors should train the model on part of the data and validate on another

      Thank you for raising this issue. To the best of our knowledge, past work that applied the HMM to the fMRI data has conducted training and inference on the same data, including initial work that implemented HMM on the resting-state fMRI (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017) as well as more recent work that applied HMMs to the task or movie-watching fMRI (Cornblath et al., 2020; Taghia et al., 2018; van der Meer et al., 2020; Yamashita et al., 2021). That is, the parameters—emission probability, transition probability, and initial probability—were estimated from the entire dataset and the latent state sequence was inferred using the Viterbi algorithm on the same dataset.

      However, we were also aware of the potential problem this may have. Therefore, in our recent work asking a different research question in another fMRI dataset (Song et al., 2021b), we trained an HMM on a subset of the dataset (moments when participants were watching movie clips in the original temporal order) and inferred latent state sequence of the fMRI time series in another subset of the dataset (moments when participants were watching movie clips in a scrambled temporal order). To the best of our knowledge, this was the first paper that used different segments of the data to fit and infer states from the HMM.

      In the current study, we wanted to capture brain states that underlie brain activity across contexts. Thus, we presented the same-dataset training and inference procedure as our primary result. However, for every main result, we also showed results where we separated the data used for model fitting and state inference. That is, we fit the HMM on the SONG dataset, primarily report the inference results on the SONG dataset, but also report inference on the external datasets that were not included in model fitting. The datasets used were the Human Connectome Project dataset (Van Essen et al., 2013), Chang et al. (2021) audio-listening dataset, Rosenberg et al. (2016) gradCPT dataset, and Chen et al. (2017) Sherlock dataset.

      However, to further address the concern of the reviewer whether the HMM fit is reliable when applied to held-out data, we computed the reliability of the HMM inference by conducting crossvalidations and split-half reliability analysis.

      (1) Cross-validation

      To separate the dataset used for HMM training and inference, we conducted cross-validation on the SONG dataset (N=27) by training the model with the data from 26 participants and inferring the latent state sequence of the held-out participant.

      First, we compared the robustness of the model training by comparing the mean activity patterns of the four latent states fitted at the group level (N=27) with the mean activity patterns of the four states fitted across cross-validation folds. Pearson’s correlations between the group-level vs. cross-validated latent states’ mean activity patterns were r = 0.991 ± 0.010, with a range from 0.963 to 0.999.

      Second, we compared the robustness of model inference by comparing the latent state sequences that were inferred at the group level vs. from held-out participants in a cross-validation scheme. All fMRI conditions had mean similarity higher than 90%; Rest 1: 92.74 ± 5.02 %, Rest2: 92.74 ± 4.83 %, GradCPT face: 92.97 ± 6.41 %, GradCPT scene: 93.27 ± 5.76 %, Sitcom ep1: 93.31 ± 3.92 %, Sitcom ep2: 93.13 ± 4.36 %, Documentary: 92.42 ± 4.72 %.

      Third, with the latent state sequences inferred from cross-validation, we replicated the analysis of Figure 3 to test for synchrony of the latent state sequences across participants. The crossvalidated results were highly similar to manuscript Figure 3, which was generated from the grouplevel analysis. Mean synchrony of latent state sequences are as follows: Rest 1: 25.90 ± 3.81%, Rest 2: 25.75 ± 4.19 %, GradCPT face: 27.17 ± 3.86 %, GradCPT scene: 28.11 ± 3.89 %, Sitcom ep1: 40.69 ± 3.86%, Sitcom ep2: 40.53 ± 3.13%, Documentary: 30.13 ± 3.41%.

      Author response image 2.

      (2) Split-half reliability

      To test for the internal robustness of the model, we randomly assigned SONG dataset participants into two groups and conducted HMM separately in each. Similarity (Pearson’s correlation) between the two groups’ activation patterns were DMN: 0.791, DAN: 0.838, SM: 0.944, base: 0.837. The similarity of the covariance patterns were DMN: 0.995, DAN: 0.996, SM: 0.994, base: 0.996.

      Author response image 3.

      We further validated the split-half reliability of the model using the HCP dataset, which contains data of a larger sample (N=119). Similarity (Pearson’s correlation) between the two groups’ activation patterns were DMN: 0.998, DAN: 0.997, SM: 0.993, base: 0.923. The similarity of the covariance patterns were DMN: 0.995, DAN: 0.996, SM: 0.994, base: 0.996.

      Together the cross-validation and split-half reliability results demonstrate that the HMM results reported in the manuscript are reliable and robust to the way we conducted the analysis. The result of the split-half reliability analysis is added in the Results.

      [Manuscript, page 3-4] “Neural state inference was robust to the choice of 𝐾 (Figure 1—figure supplement 1) and the fMRI preprocessing pipeline (Figure 1—figure supplement 5) and consistent when conducted on two groups of randomly split-half participants (Pearson’s correlations between the two groups’ latent state activation patterns: DMN: 0.791, DAN: 0.838, SM: 0.944, base: 0.837).”

      • Comparison with just PCA/functional gradients is weak in establishing whether HMMs are good models of the timeseries. Especially given that the HMM does not explain a lot of variance in the signal (~0.5 R^2 for only 27 brain regions) for PCA. I think they don't report their own R^2 of the timeseries

      We agree with the reviewer that the PCA that we conducted to compare with the explained variance of the functional gradients was not directly comparable because PCA and gradients utilize different algorithms to reduce dimensionality. To make more meaningful comparisons, we removed the data-specific PCA results and replaced them with data-specific functional gradients (derived from the SONG dataset). This allows us to directly compare SONG-specific functional gradients with predefined gradients (derived from the resting-state HCP dataset from Margulies et al. [2016]). We found that the degrees to which the first two predefined gradients explained whole-brain fMRI time series (SONG: 𝑟! = 0.097, HCP: 0.084) were comparable to the amount of variance explained by the first two data-specific gradients (SONG: 𝑟! = 0.100, HCP: 0.086). Thus, the predefined gradients explain as much variance in the SONG data time series as SONG-specific gradients do. This supports our argument that the low-dimensional manifold is largely shared across contexts, and that the common HMM latent states may tile the predefined gradients.

      These analyses and results were added to the Results, Methods, and Figure 1—figure supplement 8. Here, we only attach changes to the Results section for simplicity, but please see the revised manuscript for further changes.

      [Manuscript, page 5-6] “We hypothesized that the spatial gradients reported by Margulies et al. (2016) act as a lowdimensional manifold over which large-scale dynamics operate (Bolt et al., 2022; Brown et al., 2021; Karapanagiotidis et al., 2020; Turnbull et al., 2020), such that traversals within this manifold explain large variance in neural dynamics and, consequently, cognition and behavior (Figure 1C). To test this idea, we situated the mean activity values of the four latent states along the gradients defined by Margulies et al. (2016) (see Methods). The brain states tiled the two-dimensional gradient space with the base state at the center (Figure 1D; Figure1—figure supplement 7). The Euclidean distances between these four states were maximized in the two-dimensional gradient space, compared to a chance where the four states were inferred from circular-shifted time series (p < 0.001). For the SONG dataset, the DMN and SM states fell at more extreme positions of the primary gradient than expected by chance (both FDR-p values = 0.004; DAN and SM states, FDRp values = 0.171). For the HCP dataset, the DMN and DAN states fell at more extreme positions on the primary gradient (both FDR-p values = 0.004; SM and base states, FDR-p values = 0.076). No state was consistently found at the extremes of the secondary gradient (all FDR-p values > 0.021).

      We asked whether the predefined gradients explain as much variance in neural dynamics as latent subspace optimized for the SONG dataset. To do so, we applied the same nonlinear dimensionality reduction algorithm to the SONG dataset’s ROI time series. Of note, the SONG dataset includes 18.95% rest, 15.07% task, and 65.98% movie-watching data whereas the data used by Margulies et al. (2016) was 100% rest. Despite these differences, the SONG-specific gradients closely resembled the predefined gradients, with significant Pearson’s correlations observed for the first (r = 0.876) and second (r = 0.877) gradient embeddings (Figure 1—figure supplement 8). Gradients identified with the HCP data also recapitulated Margulies et al.’s (2016) first (r = 0.880) and second (r = 0.871) gradients. We restricted our analysis to the first two gradients because the two gradients together explained roughly 50% of the entire variance of functional brain connectome (SONG: 46.94%, HCP: 52.08%), and the explained variance dropped drastically from the third gradients (more than 1/3 drop compared to second gradients). The degrees to which the first two predefined gradients explained whole-brain fMRI time series (SONG: 𝑟! = 0.097, HCP: 0.084) were comparable to the amount of variance explained by the first two data-specific gradients (SONG: 𝑟! = 0.100, HCP: 0.086; Figure 1—figure supplement 8). Thus, the low-dimensional manifold captured by Margulies et al. (2016) gradients is highly replicable, explaining brain activity dynamics as well as data-specific gradients, and is largely shared across contexts and datasets. This suggests that the state space of whole-brain dynamics closely recapitulates low-dimensional gradients of the static functional brain connectome.”

      The reviewer also pointed out that the PCA-gradient comparison was weak in establishing whether HMMs are good models of the time series. However, we would like to point out that the purpose of the comparison was not to validate the performance of the HMM. Instead, we wanted to test whether the gradients introduced by Margulies et al. (2016) could act as a generalizable lowdimensional manifold of brain state dynamics. To argue that the predefined gradients are a shared manifold, these gradients should explain SONG data fMRI time series as much as the principal components derived directly from the SONG data. Our results showed comparable 𝑟!, both in predefined gradient vs. data-specific PC comparisons and predefined gradient vs. data-specific gradient comparisons, which supported our argument that the predefined gradients could be the shared embedding space across contexts and datasets.

      The reviewer pointed out that the 𝑟2 of ~0.5 is not explaining enough variance in the fMRI signal. However, we respectfully disagree with this point because there is no established criterion for what constitutes a high or low 𝑟2 for this type of analysis. Of note, previous literature that also applied PCA to fMRI time series (Author response image 4A and 4B) (Lynn et al., 2021; Shine et al., 2019) also found that the cumulative explained variance of top 5 principal components is around 50%. Author response image 4C shows cumulative variances to which gradients explain the functional connectome of the resting-state fMRI data (Margulies et al., 2016).

      Author response image 4.

      Finally, the reviewer pointed out that the 𝑟! of the HMM-derived latent sequence to the fMRI time series should be reported. However, there is no standardized way of measuring the explained variance of the HMM inference. There is no report of explained variance in the traditional HMMfMRI papers (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017). Rather than 𝑟!, the HMM computes the log likelihood of the model fit. However, because log likelihood values are dependent on the number of data points, studies do not report log likelihood values nor do they use these metrics to interpret the goodness of model fit.

      To ask whether the goodness of the HMM fit was significant above chance, we compared the log likelihood of the HMM to the log likelihood distribution of the null HMM fits. First, we extracted the log likelihood of the HMM fit with the real fMRI time series. We iterated this 1,000 times when calculating null HMMs using the circular-shifted fMRI time series. The log likelihood of the real model was significantly higher than the chance distribution, with a z-value of 2182.5 (p < 0.001). This indicates that the HMM explained a large variance in our fMRI time series data, significantly above chance.

      • Authors do not specify whether they also did cross-validation for the HCP dataset to find 4 clusters

      We apologize for the lack of clarity. When we computed the Calinski-Harabasz score with the HCP dataset, three was chosen as the most optimal number of states (Author response image 5A). When we set K as 3, the HMM inferred the DMN, DAN, and SM states (Author response image 5C). The base state was included when K was set to 4 (Author response image 5B). The activation pattern similarities of the DMN, DAN, and SM states were r = 0.981, 0.984, 0.911 respectively.

      Author response image 5.

      We did not use K = 3 for the HCP data replication because we were not trying to test whether these four states would be the optimal set of states in every dataset. Although the CalinskiHarabasz score chose K = 3 because it showed the best clustering performance, this does not mean that the base state is not meaningful to this dataset. Likewise, the latent states that are inferred when we increase/decrease the number of states are also meaningful states. For example, in Figure 1—figure supplement 1, we show an example of the SONG dataset’s latent states when we set K to 7. The seven latent states included the DAN, SM, and base states, the DMN state was subdivided into DMN-A and DMN-B states, and the FPN state and DMN+VIS state were included. Setting a higher number of states like K = 7 would mean that we are capturing brain state dynamics in a higher dimension than when using K = 4. Because we are utilizing a higher number of states, a model set to K = 7 would inevitably capture a larger variance of fMRI time series than a model set to K = 4.

      The purpose of latent state replication with the HCP dataset was to validate the generalizability of the DMN, DAN, SM, and base states. Before characterizing these latent states’ relevance to cognition, we needed to verify that these latent states were not simply overfit to the SONG dataset. The fact that the HMM revealed a similar set of latent states when applied to the HCP dataset suggested that the states were not merely specific to SONG data.

      To make our points clearer in the manuscript, we emphasized that we are not arguing for the four states to be the exclusive states. We made edits to Discussion as follows.

      [Manuscript, page 16] “Our study adopted the assumption of low dimensionality of large-scale neural systems, which led us to intentionally identify only a small number of states underlying whole-brain dynamics. Importantly, however, we do not claim that the four states will be the optimal set of states in every dataset and participant population. Instead, latent states and patterns of state occurrence may vary as a function of individuals and tasks (Figure 1—figure supplement 2). Likewise, while the lowest dimensions of the manifold (i.e., the first two gradients) were largely shared across datasets tested here, we do not argue that it will always be identical. If individuals and tasks deviate significantly from what was tested here, the manifold may also differ along with changes in latent states (Samara et al., 2023). Brain systems operate at different dimensionalities and spatiotemporal scales (Greene et al., 2023), which may have different consequences for cognition. Asking how brain states and manifolds—probed at different dimensionalities and scales—flexibly reconfigure (or not) with changes in contexts and mental states is an important research question for understanding complex human cognition.”

      • One of their main contributions is the base state but the correlation between the base state in their Song dataset and the HCP dataset is only 0.399

      This is a good point. However, there is precedent for lower spatial pattern correlation of the base state compared to other states in the literature.

      Compared to the DMN, DAN, and SM states, the base state did not show characteristic activation or deactivation of functional networks. Most of the functional networks showed activity levels close to the mean (z = 0). With this flattened activation pattern, relatively low activation pattern similarity was observed between the SONG base state and the HCP base state.

      In Figure 1—figure supplement 6, we write, “The DMN, DAN, and SM states showed similar mean activity patterns. We refrained from making interpretations about the base state’s activity patterns because the mean activity of most of the parcels was close to z = 0”.

      A similar finding has been reported in a previous work by Chen et al. (2016) that discovered the base state with HMM. State 9 (S9) of their results is comparable to our base state. They report that even though the spatial correlation coefficient of the brain state from the split-half reliability analysis was the lowest for S9 due to its low degrees of activation or deactivation, S9 was stably inferred by the HMM. The following is a direct quote from their paper:

      “To the best of our knowledge, a state similar to S9 has not been presented in previous literature. We hypothesize that S9 is the “ground” state of the brain, in which brain activity (or deactivity) is similar for the entire cortex (no apparent activation or deactivation as shown in Fig. 4). Note that different groups of subjects have different spatial patterns for state S9 (Fig. 3A). Therefore, S9 has the lowest reproducible spatial pattern (Fig. 3B). However, its temporal characteristics allowed us to distinguish it consistently from other states.” (Chen et al., 2016)

      Thus, we believe our data and prior results support the existence of the “base state”.

      • Figure 1B: Parcellation is quite big but there seems to be a gradient within regions

      This is a function of the visualization software. Mean activity (z) is the same for all voxels within a parcel. To visualize the 3D contours of the brain, we chose an option in the nilearn python function that smooths the mean activity values based on the surface reconstructed anatomy.

      In the original manuscript, our Methods write, “The brain surfaces were visualized with nilearn.plotting.plot_surf_stat_map. The parcel boundaries in Figure 1B are smoothed from the volume-to-surface reconstruction.”

      • Figure 1D: Why are the DMNs further apart between SONG and HCP than the other states

      To address this question, we first tested whether the position of the DMN states in the gradient space is significantly different for the SONG and HCP datasets. We generated surrogate HMM states from the circular-shifted fMRI time series and positioned the four latent states and the null DMN states in the 2-dimensional gradient space (Author response image 6).

      Author response image 6.

      We next tested whether the Euclidean distance between the SONG dataset’s DMN state and the HCP dataset’s DMN state is larger than would be expected by chance (Author response image 7). To do so, we took the difference between the DMN state positions and compared it to the 1,000 differences generated from the surrogate latent states. The DMN states of the SONG and HCP datasets did not significantly differ in the Gradient 1 dimension (two-tailed test, p = 0.794). However, as the reviewer noted, the positions differed significantly in the Gradient 2 dimension (p = 0.047). The DMN state leaned more towards the Visual gradient in the SONG dataset, whereas it leaned more towards the Somatosensory-Motor gradient in the HCP dataset.

      Author response image 7.

      Though we cannot claim an exact reason for this across-dataset difference, we note a distinctive difference between the SONG and HCP datasets. Both datasets largely included resting-state, controlled tasks, and movie watching. The SONG dataset included 18.95% of rest, 15.07% of task, and 65.98% of movie watching. The task only contained the gradCPT, i.e., sustained attention task. On the other hand, the HCP dataset included 52.71% of rest, 24.35% of task, and 22.94% of movie watching. There were 7 different tasks included in the HCP dataset. It is possible that different proportions of rest, task, and movie watching, and different cognitive demands involved with each dataset may have created data-specific latent states.

      • Page 5 paragraph starting at L25: Their hypothesis that functional gradients explain large variance in neural dynamics needs to be explained more, is non-trivial especially because their R^2 scores are so low (Fig 1. Supplement 8) for PCA

      We address this concern on page 21-23 of this response letter.

      • Generally, I do not find the PCA analysis convincing and believe they should also compare to something like ICA or a different model of dynamics. They do not explain their reasoning behind assuming an HMM, which is an extremely simplified idea of brain dynamics meaning they only change based on the previous state.

      We appreciate this perspective. We replaced the Margulies et al.’s (2016) gradient vs. SONGspecific PCA comparison with a more direct Margulies et al.’s (2016) gradient vs. SONG-specific gradient comparison as described on page 21-23 of this response letter.

      More broadly, we elected to use HMM because of recent work showing correspondence between low-dimensional HMM states and behavior (Cornblath et al., 2020; Taghia et al., 2018; van der Meer et al., 2020; Yamashita et al., 2021). We also found the model’s assumption—a mixture Gaussian emission probability and first-order Markovian transition probability—to be the most suited to analyzing the fMRI time series data. We do not intend to claim that other data-reduction techniques would not also capture low-dimensional, behaviorally relevant changes in brain activity. Instead, our primary focus was identifying a set of latent states that generalize (i.e., recur) across multiple contexts and understanding how those states reflect cognitive and attentional states.

      Although a comparison of possible data-reduction algorithms is out of the scope of the current work, an exhaustive comparison of different models can be found in Bolt et al. (2022). The authors compared dozens of latent brain state algorithms spanning zero-lag analysis (e.g., principal component analysis, principal component analysis with Varimax rotation, Laplacian eigenmaps, spatial independent component analysis, temporal independent component analysis, hidden Markov model, seed-based correlation analysis, and co-activation patterns) to time-lag analysis (e.g., quasi-periodic pattern and lag projections). Bolt et al. (2022) writes “a range of empirical phenomena, including functional connectivity gradients, the task-positive/task-negative anticorrelation pattern, the global signal, time-lag propagation patterns, the quasiperiodic pattern and the functional connectome network structure, are manifestations of the three spatiotemporal patterns.” That is, many previous findings that used different methods essentially describe the same recurring latent states. A similar argument was made in previous papers (Brown et al., 2021; Karapanagiotidis et al., 2020; Turnbull et al., 2020).

      We agree that the HMM is a simplified idea of brain dynamics. We do not argue that the four number of states can fully explain the complexity and flexibility of cognition. Instead, we hoped to show that there are different dimensionalities to which the brain systems can operate, and they may have different consequences to cognition. We “simplified” neural dynamics to a discrete sequence of a small number of states. However, what is fascinating is that these overly “simplified” brain state dynamics can explain certain cognitive and attentional dynamics, such as event segmentation and sustained attention fluctuations. We highlight this point in the Discussion.

      [Manuscript, page 16] “Our study adopted the assumption of low dimensionality of large-scale neural systems, which led us to intentionally identify only a small number of states underlying whole-brain dynamics. Importantly, however, we do not claim that the four states will be the optimal set of states in every dataset and participant population. Instead, latent states and patterns of state occurrence may vary as a function of individuals and tasks (Figure 1—figure supplement 2). Likewise, while the lowest dimensions of the manifold (i.e., the first two gradients) were largely shared across datasets tested here, we do not argue that it will always be identical. If individuals and tasks deviate significantly from what was tested here, the manifold may also differ along with changes in latent states (Samara et al., 2023). Brain systems operate at different dimensionalities and spatiotemporal scales (Greene et al., 2023), which may have different consequences for cognition. Asking how brain states and manifolds—probed at different dimensionalities and scales—flexibly reconfigure (or not) with changes in contexts and mental states is an important research question for understanding complex human cognition.”

      • For the 25- ROI replication it seems like they again do not try multiple K values for the number of states to validate that 4 states are in fact the correct number.

      In the manuscript, we do not argue that the four will be the optimal number of states in any dataset. (We actually predict that this may differ depending on the amount of data, participant population, tasks, etc.) Instead, we claim that the four identified in the SONG dataset are not specific (i.e., overfit) to that sample, but rather recur in independent datasets as well. More broadly we argue that the complexity and flexibility of human cognition stem from the fact that computation occurs at multiple dimensions and that the low-dimensional states observed here are robustly related to cognitive and attentional states. To prevent misunderstanding of our results, we emphasized in the Discussion that we are not arguing for a fixed number of states. A paragraph included in our response to the previous comment (page 16 in the manuscript) illustrates this point.

      • Fig 2B: Colorbar goes from -0.05 to 0.05 but values are up to 0.87

      We apologize for the confusion. The current version of the figure is correct. The figure legend states, “The values indicate transition probabilities, such that values in each row sums to 1. The colors indicate differences from the mean of the null distribution where the HMMs were conducted on the circular-shifted time series.”

      We recognize that this complicates the interpretation of the figure. However, after much consideration, we decided that it was valuable to show both the actual transition probabilities (values) and their difference from the mean of null HMMs (colors). The values demonstrate the Markovian property of latent state dynamics, with a high probability of remaining in the same state at consecutive moments and a low probability of transitioning to a different state. The colors indicate that the base state is a transitional hub state by illustrating that the DMN, DAN, and SM states are more likely to transition to the base state than would be expected by chance.

      • P 16 L4 near-critical, authors need to be more specific in their terminology here especially since they talk about dynamic systems, where near-criticality has a specific definition. It is unclear which definition they are looking for here.

      We agree that our explanation was vague. Because we do not have evidence for this speculative proposal, we removed the mention of near-criticality. Instead, we focus on our observation as the base state being the transitional hub state within a metastable system.

      [Manuscript, page 17-18] “However, the functional relevance of the base state to human cognition had not been explored previously. We propose that the base state, a transitional hub (Figure 2B) positioned at the center of the gradient subspace (Figure 1D), functions as a state of natural equilibrium. Transitioning to the DMN, DAN, or SM states reflects incursion away from natural equilibrium (Deco et al., 2017; Gu et al., 2015), as the brain enters a functionally modular state. Notably, the base state indicated high attentional engagement (Figure 5E and F) and exhibited the highest occurrence proportion (Figure 3B) as well as the longest dwell times (Figure 3—figure supplement 1) during naturalistic movie watching, whereas its functional involvement was comparatively minor during controlled tasks. This significant relevance to behavior verifies that the base state cannot simply be a byproduct of the model. We speculate that susceptibility to both external and internal information is maximized in the base state—allowing for roughly equal weighting of both sides so that they can be integrated to form a coherent representation of the world—at the expense of the stability of a certain functional network (Cocchi et al., 2017; Fagerholm et al., 2015). When processing rich narratives, particularly when a person is fully immersed without having to exert cognitive effort, a less modular state with high degrees of freedom to reach other states may be more likely to be involved. The role of the base state should be further investigated in future studies.”

      • P16 L13-L17 unnecessary

      We prefer to have the last paragraph as a summary of the implications of this paper. However, if the length of this paper becomes a problem as we work towards publication with the editors, we are happy to remove these lines.

      • I think this paper is solid, but my main issue is with using an HMM, never explaining why, not showing inference results on test data, not reporting an R^2 score for it, and not comparing it to other models. Secondly, they use the Calinski-Harabasz score to determine the number of states, but not the log-likelihood of the fit. This clearly creates a bias in what types of states you will find, namely states that are far away from each other, which likely also leads to the functional gradient and PCA results they have. Where they specifically talk about how their states are far away from each other in the functional gradient space and correlated to (orthogonal) components. It is completely unclear to me why they used this measure because it also seems to be one of many scores you could use with respect to clustering (with potentially different results), and even odd in the presence of a loglikelihood fit to the data and with the model they use (which does not perform clustering).

      (1) Showing inference results on test data

      We address this concern on page 19-21 of this response letter.

      (2) Not reporting 𝑹𝟐 score

      We address this concern on page 21-23 of this response letter.

      (3) Not comparing the HMM model to other models

      We address this concern on page 27-28 of this response letter.

      (4) The use of the Calinski-Harabasz score to determine the number of states rather than the log-likelihood of the model fit

      To our knowledge, the log-likelihood of the model fit is not used in the HMM literature. It is because the log-likelihood tends to increase monotonically as the number of states increases. Baker et al. (2014) illustrates this problem, writing:

      “In theory, it should be possible to pick the optimal number of states by selecting the model with the greatest (negative) free energy. In practice however, we observe that the free energy increases monotonically up to K = 15 states, suggesting that the Bayes-optimal model may require an even higher number of states.”

      Similarly, the following figure is the log-likelihood estimated from the SONG dataset. Similar to the findings of Baker et al. (2014), the log-likelihood monotonically increased as the number of states increased (Author response image 8, right). The measures like AIC or BIC, which account for the number of parameters, also have the same issue of monotonic increase.

      Author response image 8.

      Because there is “no straightforward data-driven approach to model order selection” (Baker et al., 2014), past work has used different approaches to decide on the number of states. For example, Vidaurre et al. (2018) iterated over a range of the number of states to repeat the same HMM training and inference procedures 5 times using the same hyperparameters. They selected the number of states that showed the highest consistency across iterations. Gao et al. (2021) tested the clustering performance of the model output using the Calinski-Harabasz score. The number of states that showed the highest within-cluster cohesion compared to the across-cluster separation was selected as the number of states. Chang et al. (2021) applied HMM to voxels of the ventromedial prefrontal cortex using a similar clustering algorithm, writing: “To determine the number of states for the HMM estimation procedure, we identified the number of states that maximized the average within-state spatial similarity relative to the average between-state similarity”. In our previous paper (Song et al., 2021b), we reported both the reliability and clustering performance measures to decide on the number of states.

      In the current manuscript, the model consistency criterion from Vidaurre et al. (2018) was ineffective because the HMM inference was extremely robust (i.e., always inferring the exact same sequence) due to a large number of data points. Thus, we used the Calinski-Harabasz score as our criterion for the number of states selected.

      We agree with the reviewer that the selection of the number of states is critical to any study that implements HMM. However, the field lacks a consensus on how to decide on the number of states in the HMM, and the Calinski-Harabasz score has been validated in previous studies. Most importantly, the latent states’ relationships with behavioral and cognitive measures give strong evidence that the latent states are indeed meaningful states. Again, we are not arguing that the optimal set of states in any dataset will be four nor are we arguing that these four states will always be the optimal states. Instead, the manuscript proposes that a small number of latent states explains meaningful variance in cognitive dynamics.

      • Grammatical error: P24 L29 rendering seems to have gone wrong

      Our intention was correct here. To avoid confusion, we changed “(number of participantsC2 iterations)” to “(#𝐶!iterations, where N=number of participants)” (page 26 in the manuscript).

      Questions:

      • Comment on subject differences, it seems like they potentially found group dynamics based on stimuli, but interesting to see individual differences in large-scale dynamics, and do they believe the states they find mostly explain global linear dynamics?

      We agree with the reviewer that whether low-dimensional latent state dynamics explain individual differences—above and beyond what could be explained by the high-dimensional, temporally static neural signatures of individuals (e.g., Finn et al., 2015)—is an important research question. However, because the SONG dataset was collected in a single lab, with a focus on covering diverse contexts (rest, task, and movie watching) over 2 sessions, we were only able to collect 27 participants. Due to this small sample size, we focused on investigating group-level, shared temporal dynamics and across-condition differences, rather than on investigating individual differences.

      Past work has studied individual differences (e.g., behavioral traits like well-being, intelligence, and personality) using the HMM (Vidaurre et al., 2017). In the lab, we are working on a project that investigates latent state dynamics in relation to individual differences in clinical symptoms using the Healthy Brain Network dataset (Ji et al., 2022, presented at SfN; Alexander et al., 2017).

      Finally, the reviewer raises an interesting question about whether the latent state sequence that was derived here mostly explains global linear dynamics as opposed to nonlinear dynamics. We have two responses: one methodological and one theoretical. First, methodologically, we defined the emission probabilities as a linear mixture of Gaussian distributions for each input dimension with the state-specific mean (mean fMRI activity patterns of the networks) and variance (functional covariance across networks). Therefore, states are modeled with an assumption of linearity of feature combinations. Theoretically, recent work supports in favor of nonlinearity of large-scale neural dynamics, especially as tasks get richer and more complex (Cunningham and Yu, 2014; Gao et al., 2021). However, whether low-dimensional latent states should be modeled nonlinearly—that is, whether linear algorithms are insufficient at capturing latent states compared to nonlinear algorithms—is still unknown. We agree with the reviewer that the assumption of linearity is an interesting topic in systems neuroscience. However, together with prior work which showed how numerous algorithms—either linear or nonlinear—recapitulated a common set of latent states, we argue that the HMM provides a strong low-dimensional model of large-scale neural activity and interaction.

      • P19 L40 why did the authors interpolate incorrect or no-responses for the gradCPT runs? It seems more logical to correct their results for these responses or to throw them out since interpolation can induce huge biases in these cases because the data is likely not missing at completely random.

      Interpolating the RTs of the trials without responses (omission errors and incorrect trials) is a standardized protocol for analyzing gradCPT data (Esterman et al., 2013; Fortenbaugh et al., 2018, 2015; Jayakumar et al., 2023; Rosenberg et al., 2013; Terashima et al., 2021; Yamashita et al., 2021). The choice of this analysis is due to an assumption that sustained attention is a continuous attentional state; the RT, a proxy for the attentional state in the gradCPT literature, is a noisy measure of a smoothed, continuous attentional state. Thus, the RTs of the trials without responses are interpolated and the RT time courses are smoothed by convolving with a gaussian kernel.

      References

      Abbas A, Belloy M, Kashyap A, Billings J, Nezafati M, Schumacher EH, Keilholz S. 2019. Quasiperiodic patterns contribute to functional connectivity in the brain. Neuroimage 191:193–204.

      Alexander LM, Escalera J, Ai L, Andreotti C, Febre K, Mangone A, Vega-Potler N, Langer N, Alexander A, Kovacs M, Litke S, O’Hagan B, Andersen J, Bronstein B, Bui A, Bushey M, Butler H, Castagna V, Camacho N, Chan E, Citera D, Clucas J, Cohen S, Dufek S, Eaves M, Fradera B, Gardner J, Grant-Villegas N, Green G, Gregory C, Hart E, Harris S, Horton M, Kahn D, Kabotyanski K, Karmel B, Kelly SP, Kleinman K, Koo B, Kramer E, Lennon E, Lord C, Mantello G, Margolis A, Merikangas KR, Milham J, Minniti G, Neuhaus R, Levine A, Osman Y, Parra LC, Pugh KR, Racanello A, Restrepo A, Saltzman T, Septimus B, Tobe R, Waltz R, Williams A, Yeo A, Castellanos FX, Klein A, Paus T, Leventhal BL, Craddock RC, Koplewicz HS, Milham MP. 2017. Data Descriptor: An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci Data 4:1–26.

      Allen EA, Damaraju E, Plis SM, Erhardt EB, Eichele T, Calhoun VD. 2014. Tracking whole-brain connectivity dynamics in the resting state. Cereb Cortex 24:663–676.

      Baker AP, Brookes MJ, Rezek IA, Smith SM, Behrens T, Probert Smith PJ, Woolrich M. 2014. Fast transient networks in spontaneous human brain activity. Elife 3:e01867.

      Bolt T, Nomi JS, Bzdok D, Salas JA, Chang C, Yeo BTT, Uddin LQ, Keilholz SD. 2022. A Parsimonious Description of Global Functional Brain Organization in Three Spatiotemporal Patterns. Nat Neurosci 25:1093–1103.

      Brown JA, Lee AJ, Pasquini L, Seeley WW. 2021. A dynamic gradient architecture generates brain activity states. Neuroimage 261:119526.

      Chang C, Leopold DA, Schölvinck ML, Mandelkow H, Picchioni D, Liu X, Ye FQ, Turchi JN, Duyn JH. 2016. Tracking brain arousal fluctuations with fMRI. Proc Natl Acad Sci U S A 113:4518–4523.

      Chang CHC, Lazaridi C, Yeshurun Y, Norman KA, Hasson U. 2021. Relating the past with the present: Information integration and segregation during ongoing narrative processing. J Cogn Neurosci 33:1–23.

      Chang LJ, Jolly E, Cheong JH, Rapuano K, Greenstein N, Chen P-HA, Manning JR. 2021. Endogenous variation in ventromedial prefrontal cortex state dynamics during naturalistic viewing reflects affective experience. Sci Adv 7:eabf7129.

      Chen J, Leong YC, Honey CJ, Yong CH, Norman KA, Hasson U. 2017. Shared memories reveal shared structure in neural activity across individuals. Nat Neurosci 20:115–125.

      Chen S, Langley J, Chen X, Hu X. 2016. Spatiotemporal Modeling of Brain Dynamics Using RestingState Functional Magnetic Resonance Imaging with Gaussian Hidden Markov Model. Brain Connect 6:326–334.

      Cocchi L, Gollo LL, Zalesky A, Breakspear M. 2017. Criticality in the brain: A synthesis of neurobiology, models and cognition. Prog Neurobiol 158:132–152.

      Cornblath EJ, Ashourvan A, Kim JZ, Betzel RF, Ciric R, Adebimpe A, Baum GL, He X, Ruparel K, Moore TM, Gur RC, Gur RE, Shinohara RT, Roalf DR, Satterthwaite TD, Bassett DS. 2020. Temporal sequences of brain activity at rest are constrained by white matter structure and modulated by cognitive demands. Commun Biol 3:261.

      Cunningham JP, Yu BM. 2014. Dimensionality reduction for large-scale neural recordings. Nat Neurosci 17:1500–1509.

      Deco G, Kringelbach ML, Jirsa VK, Ritter P. 2017. The dynamics of resting fluctuations in the brain: Metastability and its dynamical cortical core. Sci Rep 7:3095.

      Esterman M, Noonan SK, Rosenberg M, Degutis J. 2013. In the zone or zoning out? Tracking behavioral and neural fluctuations during sustained attention. Cereb Cortex 23:2712–2723.

      Esterman M, Rothlein D. 2019. Models of sustained attention. Curr Opin Psychol 29:174–180.

      Fagerholm ED, Lorenz R, Scott G, Dinov M, Hellyer PJ, Mirzaei N, Leeson C, Carmichael DW, Sharp DJ, Shew WL, Leech R. 2015. Cascades and cognitive state: Focused attention incurs subcritical dynamics. J Neurosci 35:4626–4634.

      Falahpour M, Chang C, Wong CW, Liu TT. 2018. Template-based prediction of vigilance fluctuations in resting-state fMRI. Neuroimage 174:317–327.

      Finn ES, Shen X, Scheinost D, Rosenberg MD, Huang J, Chun MM, Papademetris X, Constable RT. 2015. Functional connectome fingerprinting: Identifying individuals using patterns of brain connectivity. Nat Neurosci 18:1664–1671.

      Fortenbaugh FC, Degutis J, Germine L, Wilmer JB, Grosso M, Russo K, Esterman M. 2015. Sustained attention across the life span in a sample of 10,000: Dissociating ability and strategy. Psychol Sci 26:1497–1510.

      Fortenbaugh FC, Rothlein D, McGlinchey R, DeGutis J, Esterman M. 2018. Tracking behavioral and neural fluctuations during sustained attention: A robust replication and extension. Neuroimage 171:148–164.

      Fox MD, Snyder AZ, Vincent JL, Corbetta M, Van Essen DC, Raichle ME. 2005. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc Natl Acad Sci U S A 102:9673–9678.

      Gao S, Mishne G, Scheinost D. 2021. Nonlinear manifold learning in functional magnetic resonance imaging uncovers a low-dimensional space of brain dynamics. Hum Brain Mapp 42:4510–4524.

      Goodale SE, Ahmed N, Zhao C, de Zwart JA, Özbay PS, Picchioni D, Duyn J, Englot DJ, Morgan VL, Chang C. 2021. Fmri-based detection of alertness predicts behavioral response variability. Elife 10:1–20.

      Greene AS, Horien C, Barson D, Scheinost D, Constable RT. 2023. Why is everyone talking about brain state? Trends Neurosci.

      Greene DJ, Marek S, Gordon EM, Siegel JS, Gratton C, Laumann TO, Gilmore AW, Berg JJ, Nguyen AL, Dierker D, Van AN, Ortega M, Newbold DJ, Hampton JM, Nielsen AN, McDermott KB, Roland JL, Norris SA, Nelson SM, Snyder AZ, Schlaggar BL, Petersen SE, Dosenbach NUF. 2020. Integrative and Network-Specific Connectivity of the Basal Ganglia and Thalamus Defined in Individuals. Neuron 105:742-758.e6.

      Gu S, Pasqualetti F, Cieslak M, Telesford QK, Yu AB, Kahn AE, Medaglia JD, Vettel JM, Miller MB, Grafton ST, Bassett DS. 2015. Controllability of structural brain networks. Nat Commun 6:8414.

      Jayakumar M, Balusu C, Aly M. 2023. Attentional fluctuations and the temporal organization of memory. Cognition 235:105408.

      Ji E, Lee JE, Hong SJ, Shim W (2022). Idiosyncrasy of latent neural state dynamic in ASD during movie watching. Poster presented at the Society for Neuroscience 2022 Annual Meeting.

      Karapanagiotidis T, Vidaurre D, Quinn AJ, Vatansever D, Poerio GL, Turnbull A, Ho NSP, Leech R, Bernhardt BC, Jefferies E, Margulies DS, Nichols TE, Woolrich MW, Smallwood J. 2020. The psychological correlates of distinct neural states occurring during wakeful rest. Sci Rep 10:1–11.

      Liu X, Duyn JH. 2013. Time-varying functional network information extracted from brief instances of spontaneous brain activity. Proc Natl Acad Sci U S A 110:4392–4397.

      Liu X, Zhang N, Chang C, Duyn JH. 2018. Co-activation patterns in resting-state fMRI signals. Neuroimage 180:485–494.

      Lynn CW, Cornblath EJ, Papadopoulos L, Bertolero MA, Bassett DS. 2021. Broken detailed balance and entropy production in the human brain. Proc Natl Acad Sci 118:e2109889118.

      Margulies DS, Ghosh SS, Goulas A, Falkiewicz M, Huntenburg JM, Langs G, Bezgin G, Eickhoff SB, Castellanos FX, Petrides M, Jefferies E, Smallwood J. 2016. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc Natl Acad Sci U S A 113:12574–12579.

      Mesulam MM. 1998. From sensation to cognition. Brain 121:1013–1052.

      Munn BR, Müller EJ, Wainstein G, Shine JM. 2021. The ascending arousal system shapes neural dynamics to mediate awareness of cognitive states. Nat Commun 12:1–9.

      Raut R V., Snyder AZ, Mitra A, Yellin D, Fujii N, Malach R, Raichle ME. 2021. Global waves synchronize the brain’s functional systems with fluctuating arousal. Sci Adv 7.

      Rosenberg M, Noonan S, DeGutis J, Esterman M. 2013. Sustaining visual attention in the face of distraction: A novel gradual-onset continuous performance task. Attention, Perception, Psychophys 75:426–439.

      Rosenberg MD, Finn ES, Scheinost D, Papademetris X, Shen X, Constable RT, Chun MM. 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nat Neurosci 19:165–171.

      Rosenberg MD, Scheinost D, Greene AS, Avery EW, Kwon YH, Finn ES, Ramani R, Qiu M, Todd Constable R, Chun MM. 2020. Functional connectivity predicts changes in attention observed across minutes, days, and months. Proc Natl Acad Sci U S A 117:3797–3807.

      Saggar M, Shine JM, Liégeois R, Dosenbach NUF, Fair D. 2022. Precision dynamical mapping using topological data analysis reveals a hub-like transition state at rest. Nat Commun 13.

      Schaefer A, Kong R, Gordon EM, Laumann TO, Zuo X-N, Holmes AJ, Eickhoff SB, Yeo BTT. 2018. Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cereb Cortex 28:3095–3114.

      Shine JM. 2019. Neuromodulatory Influences on Integration and Segregation in the Brain. Trends Cogn Sci 23:572–583.

      Shine JM, Bissett PG, Bell PT, Koyejo O, Balsters JH, Gorgolewski KJ, Moodie CA, Poldrack RA. 2016. The Dynamics of Functional Brain Networks: Integrated Network States during Cognitive Task Performance. Neuron 92:544–554.

      Shine JM, Breakspear M, Bell PT, Ehgoetz Martens K, Shine R, Koyejo O, Sporns O, Poldrack RA. 2019. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems. Nat Neurosci 22:289–296.

      Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, Beckmann CF. 2009. Correspondence of the brain’s functional architecture during activation and rest. Proc Natl Acad Sci 106:13040–13045.

      Song H, Emily FS, Rosenberg MD. 2021a. Neural signatures of attentional engagement during narratives and its consequences for event memory. Proc Natl Acad Sci 118:e2021905118.

      Song H, Park B-Y, Park H, Shim WM. 2021b. Cognitive and Neural State Dynamics of Narrative Comprehension. J Neurosci 41:8972–8990.

      Taghia J, Cai W, Ryali S, Kochalka J, Nicholas J, Chen T, Menon V. 2018. Uncovering hidden brain state dynamics that regulate performance and decision-making during cognition. Nat Commun 9:2505.

      Terashima H, Kihara K, Kawahara JI, Kondo HM. 2021. Common principles underlie the fluctuation of auditory and visual sustained attention. Q J Exp Psychol 74:705–715.

      Tian Y, Margulies DS, Breakspear M, Zalesky A. 2020. Topographic organization of the human subcortex unveiled with functional connectivity gradients. Nat Neurosci 23:1421–1432.

      Turnbull A, Karapanagiotidis T, Wang HT, Bernhardt BC, Leech R, Margulies D, Schooler J, Jefferies E, Smallwood J. 2020. Reductions in task positive neural systems occur with the passage of time and are associated with changes in ongoing thought. Sci Rep 10:1–10.

      Unsworth N, Robison MK. 2018. Tracking arousal state and mind wandering with pupillometry. Cogn Affect Behav Neurosci 18:638–664.

      Unsworth N, Robison MK. 2016. Pupillary correlates of lapses of sustained attention. Cogn Affect Behav Neurosci 16:601–615.

      van der Meer JN, Breakspear M, Chang LJ, Sonkusare S, Cocchi L. 2020. Movie viewing elicits rich and reliable brain state dynamics. Nat Commun 11:1–14.

      Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K. 2013. The WU-Minn Human Connectome Project: An overview. Neuroimage 80:62–79.

      Vidaurre D, Abeysuriya R, Becker R, Quinn AJ, Alfaro-Almagro F, Smith SM, Woolrich MW. 2018. Discovering dynamic brain networks from big data in rest and task. Neuroimage, Brain Connectivity Dynamics 180:646–656.

      Vidaurre D, Smith SM, Woolrich MW. 2017. Brain network dynamics are hierarchically organized in time. Proc Natl Acad Sci U S A 114:12827–12832.

      Yamashita A, Rothlein D, Kucyi A, Valera EM, Esterman M. 2021. Brain state-based detection of attentional fluctuations and their modulation. Neuroimage 236:118072.

      Yeo BTT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, Roffman JL, Smoller JW, Zöllei L, Polimeni JR, Fisch B, Liu H, Buckner RL. 2011. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol 106:1125–1165.

      Yousefi B, Keilholz S. 2021. Propagating patterns of intrinsic activity along macroscale gradients coordinate functional connections across the whole brain. Neuroimage 231:117827.

      Zhang S, Goodale SE, Gold BP, Morgan VL, Englot DJ, Chang C. 2023. Vigilance associates with the low-dimensional structure of fMRI data. Neuroimage 267.

    1. Author Response:

      The following is the authors' response to the current reviews.

      We appreciate the thoughtful critiques of the reviewers. While we agree that performing additional experiments and analyses probing the sensitivity of the technique would be useful for future studies, we are unable to perform additional experiments as our lab has closed. We share this technique as a starting point for further investigation, but it may need to be modified for success in other contexts. We have provided details of the scenarios (life stage, feeding, day, number of ticks) where we successfully sequenced B. burgdorferi from ticks, as well as one where we did not (unfed nymphs) as a starting point. We will clarify in proofing that our qPCR experiments show that we capture the vast majority of B. burgdorferi flaB mRNA from our input samples, suggesting that we are likely capturing the majority of the B. burgdorferi.

      In this work, we were most interested in using RNA-seq to perform differential expression analysis between annotated mRNAs across our timepoints. We have provided the number of genes detected in each sample (92% of annotated transcripts on average) as well as the median number of reads covering each gene (604 on average) in the supplemental file containing sequencing statistics. This coverage is highly reproducible across replicates, with an average Pearson correlation of 0.99 between gene expression levels (as Transcripts Per Million) between any two replicates. These data and the fact that many of the gene expression changes we observed align with previous observations of others give us confidence in our differential expression analysis. For those interested in tRNAs or sRNAs, we think that it would be best to modify the protocol to focus specifically on capturing those sequences in the library preparation. We encourage others interested in other aspects of our data to download it and explore it.

      We will correct remaining wording issues in proofing.

      —————

      The following is the authors' response to the original reviews.

      Dear Reviewing Editor,

      We thank you and the reviewers for the thoughtful comments on our manuscript, and we are excited to submit a revised version of our manuscript “Longitudinal map of transcriptome changes in the Lyme pathogen Borrelia burgdorferi during tick-borne transmission.” In response to the reviews, we have made the following changes to our manuscript:

      1. We updated the text for increased clarity around experimental details, including statistical analyses.

      2. We added additional details about the mapping of non-Bb reads as well as more information about Bb read coverage.

      3. We compared our differentially expressed genes to 4 previous studies of global transcriptional changes in different tick feeding contexts.

      4. We updated the discussion to address these comparisons as well as caveats of our study more directly.

      Please see our responses to individual comments below.

      Reviewer #1 (Public Review):

      In this study, Sapiro et al sought to develop technology for a transcriptomic analysis of B. burgdorferi directly from infected ticks. The methodology has exciting implications to better understand pathogen RNA profiles during specific infection timepoints, even beyond the Lyme spirochete. The authors demonstrate successful sequencing of the B. burgdorferi transcriptome from ticks and perform mass spectrometry to identify possible tick proteins that interact with B. burgdorferi. This technology and first dataset will be useful for the field. The study is limited in that no transcripts/proteins are followed-up by additional experiments and no biological interactions/infectious-processes are investigated.

      Critiques and Questions:

      We thank the reviewer for these thoughtful critiques and helping us improve our manuscript.

      This study largely develops a method and is a resource article. This should be more directly stated in the abstract/introduction.

      We edited the abstract and introduction to more directly state that we are sharing a new method and a resource for future investigations. (Lines 29-32; 101-103)

      Details of the infection experiment are currently unclear and more information in the results section is warranted. State the species of tick and life-stage (larval vs nymphal ticks) used for experiments. For RNA-seq, are mice are infected and ticks are naïve or are ticks infected and transmitting Borrelia to uninfected mice?

      We updated the results section to more clearly state the tick species and life stage and to make it more clear that infected ticks are transmitting Bb to naïve mice. (Lines 113-115)

      What is the limit of detection for this protocol? Experimental data should be provided about the number of B. burgdorferi required to perform this approach.

      We performed this protocol on pools of 6 (for later feeding stages) to 14 (for early stages) infected nymphs. Published studies (PMID: 7485694, PMID: 11682544) suggest that one day after attachment, there may be a few thousand Bb per tick, suggesting what we’ve measured here may come from on the order of 104 Bb. We were not able to capture consistent data from Bb from unfed ticks, which may be due to lower numbers or to an altered transcriptional state caused by lack of nutrients in the unfed tick. We updated the discussion to reflect some of these limitations and uncertainties. (Lines 461-465)

      More information regarding RNA-seq coverage is required. Line 147-148 "read coverage was sufficient"; what defines sufficient? Browser images of RNA-seq data across different genes would be useful to visualize the read coverage per gene. What is the distribution of reads among tRNAs, mRNAs, UTRs, and sRNAs?

      As we were interested in differential expression analysis, we defined sufficient as the number of reads needed per gene to determine statistically significant expression changes across days, which with DESeq2 is typically 10 reads. We reworded this section for clarity and added additional information about the median number of reads per gene which is also useful in thinking about differential expression analysis. (Lines 163-170) As we chose to focus on differential expression analysis here, we believe these are most relevant metrics to cover.

      My lab group was excited about the data generated from this paper. Therefore, we downloaded the raw RNA-seq data from GEO and ran it through our RNA-seq computational pipeline. Our QC analysis revealed that day 4 samples have a different GC% pattern and that a high percentage of E. coli sequences were detected. This should be further investigated and addressed in the paper: Are other bacteria being enriched by this method? Why would this be unique to day 4 samples? Does this affect data interpretation?

      We appreciate the interest in our data and pointing out this anomaly. We found that the day 4 samples do have a high percentage of reads that mapped to a bacterial species, Pseudomonas fulva, rather than ticks as we expected. (The reads that map to E. coli also map to P. fulva.) We have updated the results to include this information (Lines 156-165). We believe this is likely due to contamination from collecting ticks after they have fallen off mice in cages on day 4, rather than pulling ticks off the mice as in days 1-3. Unfortunately, as our lab has shut down, we cannot investigate the source further. We do think the high percentage of P. fulva reads suggests that other bacteria can be enriched with the anti-Bb antibody we used. We’ve updated the discussion to highlight this caveat. (Lines 459-460)

      While the presence of these bacterial reads did lower our overall Bb mapping rate and necessitate deeper sequencing for the day 4 samples, the Bb sequencing coverage of these samples is on par with samples from the other days in terms of percentage of genes with at least 10 reads and median number of reads per gene. Fewer than 0.0002% of the reads that map to Bb genes in any day 4 sample also map to P. fulva. We found that this small fraction of reads is dispersed across 334 genes in which an average of 0.05% (maximally 2.3%) of day 4 reads also map to P. fulva. Therefore, these bacterial reads do not change our interpretation of the results comparing gene expression across days, including day 4.

      Comprehensive data comparisons of this study and others are warranted. While the authors note examples of known differentially expressed genes (like lines 235-241), how does this global study compare to other global approaches? Are new expression patterns emerging with this RNA-seq approach compared to other methods? What differences emerged from day 1 to day 4 ticks compared to differences observed in unfed to fed ticks or fed ticks to DMC experiments? Directly compare to the following studies (PMID: 11830671; PMID: 25425211; PMID: 36649080.

      We added comparisons of our list of DE genes to those noted to change between “unfed tick” and “fed tick” culture conditions (PMID: 11830671 and 12654782), as well as fed nymph to DMC (PMID: 25425211 and 36649080) (Lines 231-252, Figure S4). These comparisons pointed us to two main findings: that global changes to Bb in different culture conditions generally agreed with the most dramatic changes we saw in our data, and that the timing of expression increases during feeding may relate to whether genes are more highly expressed in fed ticks or in mammalian conditions. Overall, the majority of our DE genes have been identified in at least one of these studies or in the other studies we compared to outlining RpoS, Rrp1, and RelBbu regulons. As many of these studies were asking slightly different questions and using different conditions and vastly different technology, we would expect some differences to arise from different contexts and some to be purely technical. The genes that were not seen in these previous studies tended to follow the same functional patterns we saw overall, heavily skewing towards genes of unknown function, outer surface proteins, and a handful of genes related to other functions. With the current state of the functional annotation of the genome, it is difficult to assess whether these amount to new expression patterns in and of themselves, so we focused on the overall trends in our data rather than those that were different from other studies.

      Details about the categorization of gene functions should be further described. The authors use functional analysis from Drechtrah et al., 2015, but that study also lacks details of how that annotation file was generated. Here, the authors have seemed to supplement the Drechtrah et al., 2015 list with bacteriophage and lipoprotein predictions - which are the same categories they focus their findings. Have they introduced a bias to these functional groups? While it can be noted that many lipoproteins are upregulated (or comment on specific genes classes), there are even more "unknown" proteins upregulated. I argue that not much can be inferred from functional analysis given the current annotation of the B. burgdorferi genome.

      We strongly agree that the current annotation of the Bb genome makes it difficult to perform meaningful global functional analysis, but we feel it is useful to get a general overview of gene functions. We described our methods for classifying genes into functional categories in the methods, in which we relied on previously published papers to make our best estimate of gene category (noted for each gene in the Table S4). Due to the lack of annotations for many genes, we focused on the relatively well-defined category of lipoproteins, as these are overrepresented as a group in our upregulated genes, as well as phage genes, which are not necessarily overrepresented, but are still interesting to us. We hope that others will look at the data (particular in Table S4, but also Table S3, or download the raw data and do their own analysis) with their own interests and biases and dig more into genes that we did not highlight specifically. We provide this data as a resource with the hope that some of the genes of unknown function that we see change here will be the subject of future functional studies so that this is less of problem in the future.

      Reviewer #1 (Recommendations For The Authors):

      In general, the paper is well written and digestible for a broad audience. However, some of the figure graphics are unnecessary and take away from the data. Please label tick species and tick life-stage in Figure 1 drawings. The legend of Figure 1 requires citations. The Figure 4B graphic is unnecessary and the colors are confusing as they are too similar to the color palette of Figure 4A, where the colors have meaning. The Figure 5A graphic is unnecessary and takes away from the data embedded within it.

      We more clearly labeled the species in Figure 1 and added citations to the legend. We have simplified Figures 4A and 5A for clarity.

      Clarify lines 220-259 and Figure 3. What days are being compared? Downregulated genes should also be commented on.

      We considered our set of differentially expressed genes as those that changed two-fold (multiple hypothesis adjusted p-value < 0.05) in any of the three comparisons shown in Figure 2 (day1 to day2, day1 to day3, day1 to day4). We clarified this at multiple points in the results (i.e Line 273). We commented on downregulated genes throughout, although as there were fewer genes and the magnitude of change was smaller, we focused more on upregulated genes.

      Line 327-329, state numbers not percentages. How many Bb proteins were actually detected?

      We updated this section to include numbers (Lines 371-374). In concordance with our sequencing data, we found (and were looking for) mainly tick proteins in this experiment.

      Data availability: B. burgdorferi and tick oligo sequences used for DASH should be provided in a supplemental table.

      We added a supplemental table of these sequences (Table S9). Please note they have been previously published in Dynerman et al. 2020 and Ring et al. 2022.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is overall well written and easy to follow. The data are compelling and support the conclusions. The discussion of this work is however highly insufficient and needs to be thoroughly edited:

      - Statistical analysis: The authors mention that DESeq2 was used. Please provide information on the type and the stringency of the tests used for differential gene expression analysis, including any additional potential correction for p-values (Bonferroni). The authors mention that genes with fold changes >2 were used for analysis, yet there is no information on the p-value cut off or if the genes with fold changes >2 were statistically significant. Please provide detail and rationale for the analysis.

      We clarified in the results and methods (Lines 200, 642-644) that we required a adjusted p-value < 0.05 from DESeq2’s Wald test with Benjamini-Hochberg correction along with a two-fold change when determining our genes of interest. As small fold changes showed statistically significant differences, we chose to set a fold change cutoff in most of our analysis to help us focus on the most highly expressed genes, like other studies we compared our data to. We included all of the DESeq2 results in Table S3 so that others may explore the data with different cutoffs if desired.

      - The field has been generating data on gene expression in ticks for decades. Yet, many of these studies are not referenced here. There is no discussion of how the data described here compares to what is known in the literature. For example, Venn diagrams or tables could be included for comparison with the data described lines 208-216. Extensive description and comparison of the data to the literature should be added in the discussion, and similarities/discrepancies should be discussed appropriately.

      We added additional comparisons to four different papers looking at global gene expression in Bb in the fed tick or tick-like culture conditions (Lines 231-252, Figure S4). This information as well as comparisons to transcriptional regulons (Figure S3) is available in Table S4. In addition to discussing some examples in the results, we added more information in the discussion regarding these comparisons (Lines 420-425). The majority of the genes that we see change over feeding have been previously noted to change expression during the enzootic cycle or be regulated by transcriptional programs active during this timeframe, and we have more clearly stated that. We focused on similarities here as these papers all ask slightly different questions in different contexts and use different technology which could all account for the many differences in individual genes between all of them and our work.

      - There is no discussion of the caveats of the study: for example, the authors are using an anti-OspA antibody, which could induce bias. The authors provide in-vitro pull down data supporting that this should not be an issue, but the pull down is performed from BSK-grown bacteria. This caveat should be discussed.

      We’ve added a paragraph to the discussion including this caveat and others (Lines 453-463).

      - Timing of RNA extraction: There is over 1h of delay between initial tick collection and RNA fixation. The effects of time on gene expression should be discussed.

      Although we were able to show that this timeframe did not affect cultured Bb gene expression, we added this to the discussion.

      - Gene expression is compared to Day 1. This introduces analyses bias as it does not allow identification of transcripts that first change upon initial feeding. This caveat should also be discussed

      We added this caveat – that we may miss gene expression changes in the first 24 hours of feeding – to the discussion.

      - This study is performed with 1 strain of B. burgdorferi on one tick species. Please provide perspective on the impact of these findings on Lyme disease causing spirochetes and their vectors broadly.

      We believe this method could be easily adaptable to study gene expression in other spirochete/vector pairs to determine similarities and differences and we added a comment to the discussion.

      - The discussion should also include insights on how to build on this work and include additional areas of method development to increase the recovery of B. burgforferi from ticks or other organisms and facilitate future transcriptomic studies.

      We added a few ideas to the discussion noting that this protocol could be modified for use in other timeframes, with other antibodies, or in other organisms. We also highlight the recent advent of TBDCapSeq by Grassmann et al. that may be used in conjunction with this type of protocol.

      Minor comments:

      - Consider re-wording the description of the methods and findings to the third person for coherence.

      The majority of the methods are now written in third person.

      - Over 90% of the reads did not map to B. burgdorferi: please provide additional information on what these reads mapped to (tick or mouse), and if the data reflects what is known in the literature

      We have updated the results and discussion with information about the reads that do not map to Bb (Lines 156-166). The majority of reads mapped the tick genome, which is what we expected. While a large number of reads in our day 4 samples unexpectedly mapped to Pseudomonas fulva, we do not believe this affects the interpretation of our data as we were still able to get broad genome coverage of Bb in these samples.

      - Please be more clear in the result section on the life stage of the ticks used for these studies.

      We have updated the results to clarify throughout.

      - Indicate how many total reads were generated for each sample

      This information is present in Table S1.

      - Provide statistical analyses for Figures 1C and D.

      We added t tests to determine statistical differences for these panels.

      Reviewing Editor (Recommendations for The Authors):

      1. It is important to mention in the abstract (line 27) that 'upregulated genes' is in comparison to day 1. This is also true in the introduction (lines 92-93).

      We updated in the results and introduction to more clearly include that day 1 is our baseline measurement.

      2. It is also important to discuss in the manuscript that because your 'controls' are day 1 samples, initial transcriptome changes in response to the tick environment might be missed.

      This has been added in the discussion as a caveat (Lines 460-463).

      3. As someone who does not work with Bb, I would like to have seen a clearer description of what the feeding event looks like. Although there is some text in the introduction that touches on that ('prolonged nature of I. scapularis feeding'), I would like to see something even clearer. Maybe stating that feeding may take from x-y days would clarify that for the non-specialist.

      We updated the results to more clearly state that the tick falls off of the mice by around 4 days after feeding, our last time point (Lines 113-115). Additional details of tick feeding are also in the Figure 1 legend.

      4. In Fig. 3 linear DNA molecules seem to be drawn to scale. Is that also the case for plasmids? This could be clarified in the legend.

      The genome is drawn approximately to scale. We noted this and updated the legend with more information about how linear and circular plasmid names denote their size.

      5. Figure 5C: Colors are a bit confusing here. The legend indicates that they refer to fold changes, but the scale in the panel shows expression levels, not fold changes. Please clarify. Also, is this really TPM or RPKM? If comparisons of relative levels between different genes are made, number of reads should be normalized by gene length.

      The heatmap in Figure 4C does show expression levels, and we updated the legend to more clearly state this. The highlighted gene names are meant to show which genes change two-fold during this time (those present in panel A). The data are presented as TPM (transcripts per million), which, like RPKM, is normalized by gene length (PMID: 20022975).

    1. Author Response:

      The following is the authors' response to the original reviews.

      We have now incorporated the changes recommended by the reviewers to improve the interpretations and clarity of the manuscript. We are grateful for their thoughtful comments and suggestions, which have significantly strengthened the manuscript.

      Reviewer #1 (Public Review):

      Park et al demonstrate that cells on either side of a BM-BM linkage strengthen their adhesion to that matrix using a positive feedback mechanism involving a discoidin domain receptor (DDR-2) and integrin (INA-1 + PAT-3). In response to its extracellular ligand (Collagen IV/EMB-9), DDR-2 is endocytosed and initiates signaling that in turn stabilizes integrin at the membrane. DDR-2 signaling operates via Ras/LET-60. This work's strength lies in its excellent in vivo imaging, especially of endogenously tagged proteins. For example, tagged DDR-2:mNG could be seen relocating from seam cell membranes to endosomes. I also think a second strength of this system is the ability to chart the development of BM-BM linkage over time based on the stages of worm larval development. This allows the authors to show DDR signaling is needed to establish linkage, rather than maintain it. It likely is relevant to many types of cells that use integrin to adhere to BM and left me pondering a number of interesting questions.

      We thank the reviewer for highlighting the strengths and impact of our work in expanding our understanding of tissue linkages and how DDR and integrins might work in other contexts.

      For example: (1) Does DDR-2 activation require integrin? Perhaps integrin gets the process started and DDR-2 positively reinforces that (conversely is DDR-2 at the top of a linear pathway)?

      DDR activation by receptor clustering upon exposure to its ligand collagen is well documented (Juskaite et al., 2017 eLife PMID: 285ti0245). Clustered DDR is rapidly internalized into endocytic vesicles, where full activation of tyrosine kinase activity is thought to occur (Fu et al., 2013 J Biol Chem PMID: 23335507). Supporting this model, we found that concentrated type IV collagen is required for vesicular DDR-2 localization in the utse and seam cells at the utse-seam connection. Whether DDR-2 activation requires integrin has not been fully established. However, one study using mouse and human cell lines showed that DDR1 activation occurs independent of integrin (Vogel et al., 2000 J Biol Chem PMID: 10681566), consistent with the latter possibility raised by the reviewer that DDR-2 is upstream of integrin.

      To test these hypotheses, we require an experimental condition where loss or near complete loss of INA- 1 integrin is achieved by the mid-to-late L4 larval stage, when DDR-2 is activated by collagen and taken into endocytic vesicles. Currently, we can only partially deplete INA-1 by RNAi (Figure 5—figure supplement 2E), and strong loss of function mutations in ina-1 result in early larval arrest and lethality (Baum and Garriga, 1titi7 Neuron PMID: ti247263). To overcome these obstacles, we are adapting the new FLP-ON::TIR1 system developed for precise spatiotemporal protein degradation in worms (Xiao et al., 2023 Genetics PMID: 36722258). We hope to achieve a near complete knockdown of ina-1 with this timed depletion strategy. In the future, we will use this system to block DDR-2 and integrin function specifically in the utse or seam cells, to complement our current dominant negative mis-expression approach.

      (2) In ddr-2(qy64) mutants, projections seem to form from the central portion of the utse cell. Does this reveal a second function for DDR-2, regulating perhaps the cytoskeleton?

      We thank the reviewer for their observation and agree with their interpretation. We think it is important to comment on this and have stated in the results text, lines 208-212: “In addition, membrane projections emanating from the central body of the utse were detected in ddr-2(qy64) animals. These projections were first observed at the mid L4 stage and persisted to young adulthood (Figure 2C). These observations suggest that DDR-2 functions around the mid L4 to late L4 stages to promote utse-seam attachment, and that DDR-2 may also regulate utse morphology.”

      And (3) can you use the forward genetic tools available in C. elegans to find new genes connecting DDR-2 and integrin?

      This is an excellent suggestion. We found that loss of ddr-2 strongly enhanced the uterine prolapse (Rup) defect caused by RNAi mediated depletion of integrin. To find new genes connecting DDR-2 and integrin, a targeted screen for the Rup phenotype could be performed in an integrin reduction of function condition. As we cannot work with null or strong loss-of-function ina-1 alleles (described above), the screen could be conducted with either timed depletion of INA-1 with candidate RNAi treatments, or combinatorial ina-1 RNAi with candidate RNAi treatments.

      I do see two areas where the manuscript could be improved. First, the authors rely on imprecise genetic methods to reach their conclusions (i.e. systemic RNAi, or expression of dominant negative constructs.) I think their conclusion would be stronger if they used tissue specific degradation to block ddr-2 function specifically in the utse or seam cells. Methods to do this are now regularly used in C. elegans and the authors have already developed the necessary tissue-specific promoters.

      We agree with the reviewer that tissue specific degradation of DDR-2 in the utse and seam cells will complement and strengthen our evidence for the site of action of DDR-2. As described earlier, we are currently adapting the FLP-ON::TIR1 tissue degradation system to perform these experiments and will provide our findings in a follow-up manuscript.

      Second, the manuscript is presented in the introduction as a study on formation and function of BM-BM linkage. The authors start the discussion in a similar manner. But their results are about adhesion between cells and BM. In fact they show the BM-BM linkage forms normally in ddr-2 mutants. Thus it seems like what they have really uncovered is an adhesion mechanism that works in parallel to the BM-BM linkage. Since ddr-2 appears to function equally in both utse + seam cells (based on their dominant negative data), there are likely three layers of adhesion (utse-BM, BM-BM, BM-seam) and if any of those break down, you get a partially penetrant rupture phenotype.

      The reviewer raises an important and interesting point, and we agree that we did not articulate the organization of the utse-seam tissue connection clearly. The utse-seam connection is comprised of the utse and seam BMs each ~50nm thick, and a connecting matrix bridging the two BMs, which is ~100nm thick (Vogel and Hedgecock, 2001 Development PMID: 11222143). Type IV collagen builds up to high levels within the connecting matrix and links the utse and seam BMs, and its concentration is required for DDR-2 vesiculation. An important point we did not highlight is that type IV collagen is approximately 400 nm long (Timpl et al. 1ti81, Eur J Biochem PMID: 6274634). Thus, collagen molecules within the connecting matrix could span the entire length of the utse-seam connection and project into the utse and seam BMs to interact with cell surface receptors. Consistent with this possibility, we found that buildup of type IV collagen that spans the utse-seam BM-BM linkage correlated with the timing of DDR-2 activation/vesiculation within utse and seam cells. In addition, super-resolution imaging of the mouse kidney glomerular basement membrane (GBM), a tissue connection between endothelial BM and epithelial (podocyte) BM, showed type IV collagen, which spans the BMs, projects into the endothelial and podocyte BMs (Suleiman et al., 2013 eLife PMID: 24137544 ). We carefully considered these points to generate the schematics in Figure 1A and Figure 8, but failed to articulate this point in the manuscript. We are grateful for the reviewer for bringing up our error and have now stated these details in the text to address the reviewer’s concern as outlined below.

      In the introduction (lines ti3-ti6): “A BM-BM tissue connection between the large, multinucleated uterine utse cell and epidermal seam cells stabilizes the uterus during egg laying. The utse-seam connection is formed by BMs of the utse and the seam cells, each ~50 nm thick, which are bridged by an ~100 nm connecting matrix (Vogel and Hedgecock 2001, Morrissey, Keeley et al. 2014, Gianakas, Keeley et al. 2023).”

      In the discussion (lines 507-520): “We also found that internalization of DDR-2 at the utse-seam connection correlated with the assembly of type IV collagen at the BM-BM linkage and was dependent on type IV collagen deposition. Type IV collagen is ~400 nm in length and the utse-seam connecting matrix spans ~100 nm, while the utse and seam BMs are each ~50 nm thick (Timpl, Wiedemann et al. 1ti81, Vogel and Hedgecock 2001). Thus, collagen molecules in the connecting matrix could project into the utse and seam BMs to interact with DDR-2 on cell surfaces. Consistent with this possibility, super- resolution imaging of the mouse kidney glomerular basement membrane (tiBM), a tissue connection between podocytes and endothelial cells, showed type IV collagen within the tiBM projecting into the podocyte and endothelial BMs (Suleiman, Zhang et al. 2013). As DDR-2 is activated by ligand-induced clustering of the receptor (Juskaite, Corcoran et al. 2017, Corcoran, Juskaite et al. 201ti), it suggests that the BM-BM linking type IV collagen network, which is specifically assembled at high levels, clusters and activates DDR-2 in the utse and seam cells to coordinate cell-matrix adhesion at the tissue linkage site.”

      These concerns do not undercut the significance of this work, which identifies an interesting mechanism cells use to strengthen adhesion during BM linkage formation. In fact, I am excited to read future papers detailing the connection between DDR-2 and integrin. But before undertaking those experiments the authors should be certain which cells require DDR-2 activity, and that should not be determined based solely on mis expression of a dominant negative.

      We thank the reviewer for recognizing the significance of our work and reiterate that we will use tissue-specific degradation for site of action experiments in future studies on the biology of the utse- seam tissue linkage.

      Reviewer #2 (Public Review):

      This paper explores the mechanisms by which cells in tissues use the extracellular matrix (ECM) to reinforce and establish connections. This is a mechanistic and quantitative paper that uses imaging and genetics to establish that the Type IV collagen, DDR-2/collagen receptor discoidin domain receptor 2, signaling through Ras to strengthen an adhesion between two cell types in C. elegans. This connection needs to be strong and robust to withstand the pressure of the numerous eggs that pass through the uterus. The major strengths of this paper are in crisply designed and clear genetic experiments, beautiful imaging, and well supported conclusions. I find very few weaknesses, although, perhaps the evidence that DDR-2 promotes utse-seam linkage through regulation of MMPs could be stronger. This work is impactful because it shows how cells in vivo make and strengthen a connection between tissues through ECM interactions involving collaboration between discoidin and integrin.

      We appreciate the reviewer’s assessment of the impact of our work in detailing a mechanism for how cells increase their adhesion to the ECM to establish connections between adjacent tissues. We have softened the interpretation of our MMP localization data to address the reviewer’s concern (detailed below).

      Reviewer #1 (Recommendations For The Authors):

      Regarding Figure 1D, is it possible to show when the BM forms on the cartoons more clearly (something like the 3rd section of Fig 3A)? I can see it in the timeline but it's hard to follow in the diagrams.

      We agree with the reviewer that we could show when the BM-BM connecting matrix forms more clearly in Figure 1D. Hemicentin and fibulin, the earliest components of the connecting matrix, are detected at very low levels at the utse-seam connection during the mid-L4 stage and are more prominently localized by the mid-to-late L4 stage (Gianakas et al., 2023 J Cell Biol PMID: 36282214). For this reason, we only show the connecting matrix in yellow from the mid-to-late L4 stages onward. We have now made the BM-BM connection more prominent in the figure 1D cartoons with boxed outlines (similar to Figure 3A as the reviewer suggested). We also added a label for the time window when the BM-BM connection forms.

      Regarding the RNAi induced prolapse phenotype, looking at 2B, it appears that between 5% and 10% of animals have uterine prolapse when fed control RNAi. Is this correct, it seems very high? This prolapse in control animals was not observed other RNAi experiments such as Figure 5C.

      We thank the reviewer for pointing this out. For Figure 2B, the control used was wild-type N2 animals fed with OP50 E. coli bacteria, rather than HT115 bacteria carrying the L4440 empty vector (control RNAi). This is because the main comparisons were to five ddr-1 and ddr-2 mutant strains. We did notice a slightly higher baseline uterine prolapse frequency (5% on average, detailed in Figure 2—Source data 1) in wild-type animals fed OP50 bacteria, compared to HT115 bacteria fed animals (approximately 1-2% on average). It is possible this could be linked to the nutritional differences in the two bacterial strains. However, we are confident of our data in Figure 2B as we carried out 3 independent trials, and the uterine prolapse frequencies in ddr-1 mutant animals matched the baseline in wild-type animals, while the frequencies for ddr-2 mutants were all increased over the baseline in all trials (as detailed in Figure 2—Source data 1).

      Relating to the point above, in reading the methods to try to understand how they did the RNAi, I noticed that they measure prolapse continually over five days. I didn't realize it takes a long time to occur. I think they should explain this in the text and in the figures. Reading the manuscript I thought prolapse occurred as soon as mutant animals began laying eggs. In the text they should explain this when they first assay the phenotype (page 7), and for figures the Y axis on the graphs could say "% uterine prolapse after 5 days."

      We thank the reviewer for their suggestions. We did not articulate clearly that the utse-seam connection is able to withstand some mechanical stress, even when key components are lost. It’s only over time and repeated use that the connection breaks down. This is likely because a number of components contribute to the connection and as we have shown previously, there is feedback, such that when one components is reduced, such as collagen, hemicentin is increased in levels at the BM-BM connection. Since ruptures arising from utse-seam detachments typically occur sometime after the onset

      of egg-laying, we screened the entire egg-laying period (days two to five post-L1) as described in Gianakas et al. 2023. We have now incorporated these points in the text and figures as follows:

      In the introduction, we clarified that utse-seam BM-BM connection breaksdown over time, by adding (lines titi-105): “Hemicentin promotes the recruitment of type IV collagen, which accumulates at high levels at the BM-BM tissue connection and strengthens the adhesion, allowing it to resist the strong mechanical forces of egg-laying. The utse-seam connection is robust, with each component of the tissue- spanning matrix contributing to the BM-BM connection (Gianakas, Keeley et al. 2023). This likely accounts for the ability of the utse-seam connection to initially resist mechanical forces after loss of any one of these components, delaying the uterine prolapse phenotype until sometime after the initiation of egg-laying.”

      We expanded the results text when we first describe the Rup phenotype (lines 183-184): “We first screened for the Rup phenotype caused by uterine prolapse, observing animals every day during the egg-laying period, from its onset (48 h post-L1) to end (120 h) (Methods)”.

      We provided more detail in the Methods section (lines 784-7ti0): “Uterine prolapse frequency was assessed as described previously (Gianakas et al 2023). Briefly, synchronized L1 larvae were plated (~20 animals per plate) and after 24 h, the exact number of worms on each plate was recorded. Plates were then visually screened for ruptured worms (uterine prolapse) every 24 h during egg-laying (between 48 h to 120 h post-L1). We chose to examine the entire egg-laying period as ruptures arising from utse-seam detachments do not usually occur at the onset of egg-laying, but after cycles of egg-laying that place repeated mechanical stress on the utse-seam connection (Gianakas et al 2023).”

      Finally, we modified the Y-axes of graphs in Figure 2B and 5C and the respective figure legends as suggested by the reviewer.

      Then I went back and compared to the previous publication (Gianakas, 2023). I would be interested to see a time course of how many animals prolapse after 1 day, 2 days, etc.? Is this consistent with their data on hemicentrin?

      We agree with the reviewer that a time course of uterine prolapse would be interesting as we saw ruptures occur throughout the egg-laying period. However, for the hemicentin knockdown experiments in Gianakas et al. 2023 as well as the experiments in this study, we recorded only the pooled number of animals with ruptures at the end of the experimental window. In future studies we will also record the uterine prolapse frequencies on each day to generate time courses that will provide more insight into the function of proteins at the utse-seam connection.

      Lines 183-184: I'm not sure what it means to say "trended towards displaying a significant Rup phenotype?" Since the difference was not statistically significant, it would be better to say something like "increased but not statistically significant."

      We agree with the reviewer and have now modified this sentence (lines 190-193): “Animals carrying the ddr-2(ok574) allele, which deletes a portion of the intracellular kinase domain (Unsoeld, Park et al. 2013),also showed an increased frequency of the Rup phenotype compared to wild-type animals, although this difference was not statistically significant (Figure 2A and B)”.

      Line 186: 'penetrant' needs a qualifier to indicate the magnitude of the proportion of individuals with the phenotype.

      As we provide the Rup frequency numbers in Figure 2—Source data 1, we modified the sentence as follows (lines 1ti3-1ti5): “We further generated a full-length ddr-2 deletion allele, ddr-2(qy64), and confirmed that complete loss of ddr-2 led to a significant uterine prolapse defect (Figure 2A and B).”

      Lines 206-208; could the mounting/imaging procedure (which I assume requires squeezing the worm between agarose pad and coverslip) alter the occurrence of prolapse? I would think prolapse would occur more frequently under these conditions as compared to worms laying eggs on a plate.

      The reviewer brings up an important concern. The mounting and imaging procedure does require placing the worm between an agarose pad and a coverslip. However, this did not alter the occurrence of uterine prolapse in this experiment. We were careful to perform the same procedure on both wild-type and ddr- 2(qy64) animals to control for this. As detailed in the manuscript, none of the eight wild-type animals we mounted underwent uterine prolapse after recovery off the coverslip, and among the ddr-2(qy64) mutants we mounted, only the ones that exhibited utse-seam detachments went on to rupture later.

      We articulated these points more clearly by modifying lines 214-216 as follows: “Wild-type and ddr- 2(qy64) animals were mounted and imaged at the L4 larval stage for utse-seam attachment defects, recovered, and tracked to the 72-hour adult stage, where they were examined for the Rup phenotype.”

      In seam cells you can see that DDR-2:mNG is present at membranes from early to mid L4, which makes sense. But I cannot see it on the membrane at any time point in the utse. Perhaps it is obscured by the yellow dotted line. Should it be visible on utse membranes before it is endocytosed?

      The reviewer raises an interesting question. We think it is likely that DDR-2 is initially on the membrane of the utse like it is on the seam cells. However, we have not observed this, possibly due to the complex shape and thin membrane extensions of the utse. We are unable even to detect clear membrane enrichment of membrane markers in the utse (for example, compare the utse and seam membrane markers in Figure 3B). Thus, we refrained from speculating on DDR-2 utse membrane localization in the manuscript, and instead focused on the pattern of vesicular DDR-2 peaking at the late L4 stage, which was clearly visible in both the utse and seam cells.

      Sup Fig 3A - please show quantification of seam cells not contacting utse at the same Y-axis scale as for regions that do contact utse.

      We have modified the Y-axis scale for the quantification of the seam region not contacting the utse.

      Figure 4A - I don't see a difference between WT and ok574 - what am I missing?

      In the representative ok574 animal shown, a portion of the utse arm on the top right is detached from the seam. To make this phenotype clearer, we have recropped the image panels, readjusted the brightness and contrast of the utse and the seam, and redrawn the outline of the detachment to make this clearer.

      Figure 4C+D, and lines 296-298: I'd bet that both are needed to recruit DDR-2 to membranes. But him-4 has a more severe phenotype because the RNAi knockdown is much more effective (perhaps b/c they are using the newer t444t vector).

      We agree with the reviewer that the him-4 knockdown phenotype is likely more severe than emb-9 knockdown. Type IV collagen at the utse-seam connection is very stable compared to hemicentin (Gianakas et al 2023, J Cell Biol PMID: 36282214, see Fig. 5C), which could explain the lower knockdown efficiency.

      We modified our interpretation of the data in the text as follows (lines 308-312): “In addition, we did not detect DDR-2 at the cell surface, suggesting that hemicentin has a role in recruiting DDR-2 to the site of utse-seam attachment. It is possible that collagen could also function in DDR-2 recruitment, but we could not assess this definitively due to the lower knockdown efficiency of emb-9 RNAi (Figure 4—figure supplement 1A).”

      Reviewer #2 (Recommendations For The Authors):

      Line 218 DDR-2 (typo)

      We have corrected this typo.

      Evidence (line 344-348) may not be strong enough to say whether or not DDR-2 promotes utse- seam linkage through regulation of MMPs.

      We agree with the reviewer and have softened our conclusions as follows (lines 356-363): “The C. elegans genome harbors six MMP genes, named zinc metalloproteinase 1-6 (zmp-1-6) (Altincicek, Fischer et al. 2010). We examined four available reporters of ZMP localization (ZMP-1::tiFP, ZMP-2::tiFP, ZMP-3::tiFP, and ZMP-4::tiFP) (Kelley, Chi et al. 201ti).Only ZMP-4 was detected at the utse-seam connection and its localization was not altered by knockdown of ddr-2 (Figure 5—figure supplement 1F). These observations suggest that DDR-2 does not promote utse-seam linkage through regulation of MMPs, although we cannot rule out roles for DDR-2 in promoting the expression or localization of ZMP-5 or ZMP-6.”

      The authors show the critical period is in late L4, however, is the signaling needed later too? For example, is the linkage strengthening moderated by DDR-2 important as more eggs accumulate?

      The reviewer raises an interesting question. We observed that the vesicular localization of DDR-2 sharply declined before the onset of egg-laying. By young adulthood, very few punctate structures of DDR-2 were observed in the seam cells, and none in the utse (Figure 3B). Furthermore, the frequency of utse- seam detachments in ddr-2 mutant animals peaked by the late L4 stage and did not increase after this time, suggesting DDR function is no longer required after the late L4 stage (Figure 2D). Thus, we believe that DDR-2 signaling strengthens tissue linkage only during the early formation of the utse-seam connection between the mid and late L4 stage.

      We incorporated these points in the discussion (lines 477-485): “Through analysis of genetic mutations in the C. elegans receptor tyrosine kinase (RTK) DDR-2, an ortholog to the two vertebrate DDR receptors (DDR1 & DDR2) (Unsoeld, Park et al. 2013), we discovered that loss of ddr-2 results in utse-seam detachment beginning at the mid L4 stage. The frequency of detachments in ddr-2 mutant animals peaked around the late L4 stage and did not increase after this time. This correlated with the levels of DDR-2::mNG at the utse-seam connection, which peaked at the late L4 stage and then sharply declined by adulthood. Together, these findings suggest that DDR-2 promotes utse-seam attachment in the early formation of the tissue connection between the mid and late L4 stage.”

      Fig. 3B is the fluorescence quantification normalized to the area?

      Yes, it is. We used mean fluorescence intensity for all fluorescence quantifications to normalize for the area where the signal was measured. We added a line in Methods to emphasize this (lines 73ti-740): “We measured mean fluorescence intensity for all quantifications in order to account for linescan area.”

      Fig. 4B a statistical assessment of the degree of co-localization of DDR-2::mNG and the endosomal markers might be a nice addition.

      We believe the reviewer is referring to Figure 3—figure supplement 1B. We have now added the statistical assessment of the degree of co-localization of DDR-2::mNG and the endosomal markers.

      We want to sincerely thank the two reviewers for their thoughtful comments and suggestions. The changes we have made in response to these comments have substantially improved the manuscript.

    1. Author Response

      eLife assessment

      In this valuable study, the authors investigate the mechanism of amyloid nucleation in a cellular system using their novel ratiometric measurements and uncover interesting insights regarding the role of polyglutamine length and the sequence features of glutamine-rich regions on amyloid formation. Overall, the problem is significant and being able to assess nucleation in cells is of considerable relevance. The data, as presented and analyzed, are currently still incomplete. The specific claims would be stronger if based on in vitro measurements that avoid the intricacies of specific cellular systems and that are more suitable for assessing sequence-intrinsic properties.

      We are pleased that the editors find our study valuable. We find that the reviewers’ criticisms largely arise from misunderstandings inherent to the conceptually challenging nature of the topic, rather than fundamental flaws, as we will elaborate here. We are grateful for the opportunity afforded by eLife to engage reviewers in a constructive public dialogue.

      Reviewer #1 (Public Review):

      The authors take on the challenge of defining the core nucleus for amyloid formation by polyglutamine tracts. This rests on the assertion that polyQ forms amyloid structures to the exclusion of all other forms of solids. Using their unique assay, deployed in yeast, the authors attempt to infer the size of the nucleus that templates amyloid formation by polyQ. Further, through a series of sequence titrations, all studied using a single type of assay, the authors converge on an assertion stating that a single polyQ molecule is the nucleus for amyloid formation, that 12-residues make up the core of the nucleus, that it takes ca. 60 Qs in a row to unmask this nucleation potential, and that polyQ amyloid formation belongs to the same universality class as self-poisoned crystallization, which is the hallmark of crystallization from polymer melts formed by large, high molecular weight synthetic polymers. Unfortunately, the authors have decided to lean in hard on their assertions without a critical assessment of whether their findings stand up to scrutiny. If their findings are truly an intrinsic property of polyQ molecules, then their findings should be reconstituted in vitro. Unfortunately, careful and rigorous experiments in vitro show that there is a threshold concentration for forming fibrillar solids. This threshold concentration depends on the flanking sequence context on temperature and on solution conditions. The existence of a threshold concentration defies the expectation of a monomer nucleus. The findings disagree with in vitro data presented by Crick et al., and ignored by the authors. Please see: https://doi.org/10.1073/pnas.1320626110. These reports present data from very different assays, the importance of which was underscored first by Regina Murphy and colleagues. The work of Crick et al., provides a detailed thermodynamic framework - see the SI Appendix. This framework dove tails with theory and simulations of Zhang and Muthukumar, which explains exactly how a system like polyQ might work (https://doi.org/10.1063/1.3050295). The picture one paints is radically different from what the authors converge upon. One is inclined to lean toward data that are gleaned using multiple methods in vitro because the test tube does not have all the confounding effects of a cellular milieu, especially when it comes to focusing on sequence-intrinsic conformational transitions of a protein. In addition to concerns about the limitations of the DAmFRET method, which based on the work of the authors in their collaborative paper by Posey et al., are being stretched to the limit, there is the real possibility that the cellular milieu, unique to the system being studied, is enabling transitions that are not necessarily intrinsic to the sequence alone. A nod in this direction is the work of Marc Diamond, which showed that having stabilized the amyloid form of Tau through coacervation, there is a large barrier that limits the loss of amyloid-like structure for Tau. There may well be something similar going on with the polyQ system. If the authors could show that their data are achievable in vitro without anything but physiological buffers one would have more confidence in a model that appears to contradict basic physical principles of how homopolymers self-assemble. Absent such additional evidence, numerous statements seem to be too strong. There are also several claims that are difficult to understand or appreciate.

      Rebuttal to the perceived necessity of in vitro experiments

      The overarching concern of this reviewer and reviewing editor is whether in-cell assays can inform on sequence-intrinsic properties. We understand this concern. We believe however that the relative merit of in-cell assays is largely a matter of perspective. The truly sequence-intrinsic behavior of polyQ, i.e. in a vacuum, is less informative than the “sequence-intrinsic” behaviors of interest that emerge in the presence of extraneous molecules from the appropriate biological context. In vitro experiments typically include a tiny number of these -- water, ions, and sometimes a crowding agent meant to approximate everything else. Obviously missing are the myriad quinary interactions with other proteins that collectively round out the physiological solvent. The question is what experimental context best approximates that of a living human neuron under which the pathological sequence-dependent properties of polyQ manifest. We submit that a living yeast cell comes closer to that ideal than does buffer in a test tube.

      The reviewer’s statements that our findings must be validated in vitro ignores the fact -- stressed in our introduction -- that decades of in vitro work have not yet generated definitive evidence for or against any specific nucleus model. In addition to the above, one major problem concerns the large sizes of in vitro systems that obscure the effects of primary nucleation. For example, a typical in vitro experimental volume of e.g. 1.5 ml is over one billion-fold larger than the femtoliter volume of a cell. This means that any nucleation-limited kinetics of relevant amyloid formation are lost, and any alternative amyloid polymorphs that have a kinetic growth advantage -- even if they nucleate at only a fraction the rate of relevant amyloid -- will tend to dominate the system (Buell, 2017). Novel approaches are clearly needed to address these problems. We present such an approach, stretch it to the limit (as the reviewer notes) across multiple complementary experiments, and arrive at a novel finding that is fully and uniquely consistent with all of our own data as well as the collective prior literature.

      That the preceding considerations are collectively essential to understand relevant amyloid behavior is evident from recent cryoEM studies showing that in vitro-generated amyloid structures generally differ from those in patients (Arseni et al., 2022; Bansal et al., 2021; Radamaker et al., 2021; Schmidt et al., 2019; Schweighauser et al., 2020; Yang et al., 2022). This is highly relevant to the present discourse because each amyloid structure is thought to emanate from a different nucleating structure. This means that in vitro experiments have broadly missed the mark in terms of the relevant thermodynamic parameters that govern disease onset and progression. Note that the rules laid out via our studies are not only consistent with structural features of polyQ amyloid in cells, but also (as described in the discussion) explain why the endogenous structure of a physiologically relevant Q zipper amyloid differs from that of polyQ.

      A recent collaboration between the Morimoto and Knowles groups (Sinnige et al.) investigated the kinetics of aggregation by Q40-YFP expressed in C. elegans body wall muscle cells, using quantitative approaches that have been well established for in vitro amyloid-forming systems of the type favored by the reviewer. They calculate a reaction order of just 1.6, slightly higher than what would be expected for a monomeric nucleus but nevertheless fully consistent with our own conclusions when one accounts for the following two aspects of their approach. First, the polyQ tract in their construct is flanked by short poly-Histidine tracts on both sides. These charges very likely disfavor monomeric nucleation because all possible configurations of a four-stranded bundle position the beginning and end of the Q tract in close proximity, and Q40 is only just long enough to achieve monomeric nucleation in the absence of such destabilization. Second, the protein is fused to YFP, a weak homodimer (Landgraf et al., 2012; Snapp et al., 2003). With these two considerations, our model -- which was generated from polyQ tracts lacking flanking charges or an oligomeric fusion -- predicts that amyloid nucleation by their construct will occur more frequently as a dimer than a monomer. Indeed, their observed reaction order of 1.6 supports a predominantly dimeric nucleus. Like us and others, Sinnige et al. did not observe phase separation prior to amyloid formation. This is important because it not only argues against nucleation occurring in a condensate, it also suggests that the reaction order they calculated has not been limited by the concentration-buffering effect of phase separation.

      While we agree that our conclusions rest heavily on DAmFRET data (for good reason), we do provide supporting evidence from molecular dynamics simulations, SDD-AGE, and microscopy.

      To summarize, given the extreme limitations of in vitro experiments in this field, the breadth of our current study, and supporting findings from another lab using rigorous quantitative approaches, we feel that our claims are justified without in vitro data.

      Rebuttal to the perceived incompatibility of monomeric nucleation with the existence of a critical concentration for amyloid

      We appreciate that the concept of a monomeric nucleus can superficially appear inconsistent with the fact that crystalline solids such as polyQ amyloid have a saturating concentration, but this is only true if one neglects that polyQ amyloids are polymer crystals with intramolecular ordering. The perceived discrepancy is perhaps most easily dispelled by protein crystallography. Folded proteins form crystals. These crystals have critical concentrations, and the protein subunits within them each have intramolecular crystalline order (in the form of secondary structure). To extrapolate these familiar examples to our present finding with polyQ, one need only appreciate the now well-established phenomenon of secondary nucleation, whereby transient interactions of soluble species with the ordered species leads to their own ordering (Törnquist et al., 2018). Transience is important here because it implies that intramolecular ordering can in principle propagate even in solutions that are subsaturated with respect to bulk crystallization. This is possible in the present case because the pairing of sufficiently short beta strands (equivalent to “stems” in the polymer crystal literature) will be more stable intramolecularly than intermolecularly, due to the reduced entropic penalty of the former. Our elucidation that Q zipper ordering can occur with shorter strands intramolecularly than intermolecularly (Fig. S4C-D) demonstrates this fact. It is also evident from published descriptions of single molecule “crystals” formed in sufficiently dilute solutions of sufficiently long polymers (Hong et al., 2015; Keller, 1957; Lauritzen and Hoffman, 1960).

      In suggesting that a saturating concentration for amyloid rules out monomeric nucleation, the reviewer assumes that the Q zipper-containing monomer must be stable relative to the disordered ensemble. This is not inherent to our claim and in fact opposes the definition of a nucleus. The monomeric nucleating structure need not be more stable than the disordered state, and monomers may very well be disordered at equilibrium at low concentrations. To be clear, our claim requires that the Q zipper-containing monomer is both on pathway to amyloid and less stable than all subsequent species that are on pathway to amyloid. The former requirement is supported by our extensive mutational analysis. The latter requirement is supported by our atomistic simulations showing the Q zipper-containing monomer is stabilized by dimerization (see our 2021 preprint). Hence, requisite ordering in the nucleating monomer is stabilized by intermolecular interactions. We provide in Author response image 1 an illustration to clarify what we believe to be the discrepancy between our claim and the reviewer’s interpretation.

      Author response image 1.

      That the rate-limiting fluctuation for a crystalline phase can occur in a monomer can also be understood as a consequence of Ostwald’s rule of stages, which describes the general tendency of supersaturated solutes, including amyloid forming proteins (Chakraborty et al., 2023), to populate metastable phases en route to more stable phases (De Yoreo, 2022; Schmelzer and Abyzov, 2017). Our findings with polyQ are consistent with a general mechanism for Ostwald’s rule wherein the relative stabilities of competing polymorphs differ with the number of subunits (De Yoreo, 2022; Navrotsky, 2004). As illustrated in Fig. 6 of Navrotsky, a polymorph that is relatively stable at small particle sizes tends to give way to a polymorph that -- while initially unstable -- becomes more stable as the particles grow. The former is analogous to our early stage Q zipper composed of two short sheets with an intramolecular interface, while the latter is analogous to the later stage Q zipper composed of longer sheets with an intermolecular interface. Subunit addition stabilizes the latter more than the former, hence the initial Q zipper that is stabilized more by intra- than intermolecular interactions will mature with growth to one that is stabilized more by intermolecular interactions.

      We apologize to the Pappu group for neglecting to cite Crick et al. 2013 in the current preprint. Contrary to the reviewer’s assessment, however, we find that the conclusions of this valuable study do more to support than to refute our findings. Briefly, Crick et al. investigated the aggregation of synthetic Q30 and Q40 peptides in vitro, wherein fibrils assembled from high concentrations of peptide were demonstrated to have saturating concentrations in the low micromolar range. As explained above, this finding of a saturating concentration does not refute our results. More relevant to the present work are their findings that “oligomers” accumulated over an hours-long timespan in solutions that are subsaturated with respect to fibrils, and these oligomers themselves have (nanomolar) critical concentrations. The authors postulated that the oligomers result from liquid–liquid demixing of intrinsically disordered polyglutamine. However, phase separation by a peptide is expected to fix its concentration in both the solute and condensed phases, and, because disordered phase separation is inherently faster than amyloid formation, the postulated explanation removes the driving force for any amyloid phase with a critical solubility greater than that of the oligomers. In place of this interpretation that truly does appear to -- in the reviewer’s words -- “contradict basic physical principles of how homopolymers self-assemble”, we interpret these oligomers as evidence of our Q zipper-containing self-poisoned multimers, rounded as an inherent consequence of self-poisoning (Ungar et al., 2005), and likely akin to semicrystalline spherulites that have been observed in other polymer crystal and amyloid-forming systems (Crist and Schultz, 2016; Vetri and Foderà, 2015). That Crick et al. also observed the formation of a relatively labile amyloid phase when the reactions were started with 50 uM peptide is unsurprising in light of the aforementioned kinetic advantage that large reaction volumes can confer to labile polymorphs, and that high concentrations (in this case, orders of magnitude higher than the likely physiological concentration of polyQ (Wild et al., 2015)) can favor the formation of labile amyloid polymorphs (Ohhashi et al., 2010). Indeed, a contemporaneous study by the Wetzel group using very similar peptide constructs and polyQ lengths -- but beginning with lower concentrations -- found that the relevant saturating concentrations for amyloid lie below their limit of detection of 100 nM (Sahoo et al., 2014).

      Rebuttals to other critiques

      The reviewer states that we found nucleation potential to require 60 Qs in a row. Our data are collectively consistent with nucleation occurring at and above approximately 36 Qs, a point repeated in the paper. The reviewer may be referring to our statement, ”Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments”. The purpose of this statement is simply to describe the practical consideration that led us to use 60 Qs for the bulk of our assays. We do appreciate that the fraction of AmFRET-positive cells is very low for lengths just above the threshold, especially Q40. They are nevertheless highly significant (p = 0.004 in [PIN+] cells, one-tailed T-test), and we will modify the figure and text to clarify this.

      The reviewer characterizes self-poisoning as the hallmark of crystallization from polymer melts, which would be problematic for our conclusions if self-poisoning were limited to this non-physiological context. In fact the term was first used to describe crystallization from solution (Organ et al., 1989), wherein the phenomenon is more pronounced (Ungar et al., 2005).

      Reviewer #2 (Public Review):

      Numerous neurodegenerative diseases are thought to be driven by the aggregation of proteins into insoluble filaments known as "amyloids". Despite decades of research, the mechanism by which proteins convert from the soluble to insoluble state is poorly understood. In particular, the initial nucleation step is has proven especially elusive to both experiments and simulation. This is because the critical nucleus is thermodynamically unstable, and therefore, occurs too infrequently to directly observe. Furthermore, after nucleation much faster processes like growth and secondary nucleation dominate the kinetics, which makes it difficult to isolate the effects of the initial nucleation event. In this work Kandola et al. attempt to surmount these obstacles using individual yeast cells as microscopic reaction vessels. The large number of cells, and their small size, provides the statistics to separate the cells into pre- and post-nucleation populations, allowing them to obtain nucleation rates under physiological conditions. By systematically introducing mutations into the amyloid-forming polyglutamine core of huntingtin protein, they deduce the probable structure of the amyloid nucleus. This work shows that, despite the complexity of the cellular environment, the seemingly random effects of mutations can be understood with a relatively simple physical model. Furthermore, their model shows how amyloid nucleation and growth differ in significant ways, which provides testable hypotheses for probing how different steps in the aggregation pathway may lead to neurotoxicity.

      In this study Kandola et al. probe the nucleation barrier by observing a bimodal distribution of cells that contain aggregates; the cells containing aggregates have had a stochastic fluctuation allowing the proteins to surmount the barrier, while those without aggregates have yet to have a fluctuation of suitable size. The authors confirm this interpretation with the selective manipulation of the PIN gene, which provides an amyloid template that allows the system to skip the nucleation event.

      In simple systems lacking internal degrees of freedom (i.e., colloids or rigid molecules) the nucleation barrier comes from a significant entropic cost that comes from bringing molecules together. In large aggregates this entropic cost is balanced by attractive interactions between the particles, but small clusters are unable to form the extensive network of stabilizing contacts present in the larger aggregates. Therefore, the initial steps in nucleation incur an entropic cost without compensating attractive interactions (this imbalance can be described as a surface tension). When internal degrees of freedom are present, such as the conformational states of a polypeptide chain, there is an additional contribution to the barrier coming from the loss of conformational entropy required to the adopt aggregation-prone state(s). In such systems the clustering and conformational processes do not necessarily coincide, and a major challenge studying nucleation is to separate out these two contributions to the free energy barrier. Surprisingly, Kandola et al. find that the critical nucleus occurs within a single molecule. This means that the largest contribution to the barrier comes from the conformational entropy cost of adopting the beta-sheet state. Once this state is attained, additional molecules can be recruited with a much lower free energy barrier.

      There are several caveats that come with this result. First, the height of the nucleation barrier(s) comes from the relative strength of the entropic costs compared to the binding affinities. This balance determines how large a nascent nucleus must grow before it can form interactions comparable to a mature aggregate. In amyloid nuclei the first three beta strands form immature contacts consisting of either side chain or backbone contacts, whereas the fourth strand is the first that is able to form both kinds of contacts (as in a mature fibril). This study used relatively long polypeptides of 60 amino acids. This is greater than the 20-40 amino acids found in amyloid-forming molecules like ABeta or IAPP. As a result, Kandola et al.'s molecules are able to fold enough times to create four beta strands and generate mature contacts intramolecularly. The authors make the plausible claim that these intramolecular folds explain the well-known length threshold (L~35) observed in polyQ diseases. The intramolecular folds reduce the importance of clustering multiple molecules together and increase the importance of the conformational states. Similarly, manipulating the sequence or molecular concentrations will be expected to manipulate the relative magnitude of the binding affinities and the clustering entropy, which will shift the relative heights of the entropic barriers.

      The reviewer correctly notes that the majority of our manipulations were conducted with 60-residue long tracts (which corresponds to disease onset in early adulthood), and this length facilitates intramolecular nucleation. However, we also analyzed a length series of polyQ spanning the pathological threshold, as well as a synthetic sequence designed explicitly to test the model nucleus structure with a tract shorter than the pathological threshold, and both experiments corroborate our findings.

      The authors make an important point that the structure of the nucleus does not necessarily resemble that of the mature fibril. They find that the critical nucleus has a serpentine structure that is required by the need to form four beta strands to get the first mature contacts. However, this structure comes at a cost because residues in the hairpins cannot form strong backbone or zipper interactions. Mature fibrils offer a beta sheet template that allows incoming molecules to form mature contacts immediately. Thus, it is expected that the role of the serpentine nucleus is to template a more extended beta sheet structure that is found in mature fibrils.

      A second caveat of this work is the striking homogeneity of the nucleus structure they describe. This homogeneity is likely to be somewhat illusory. Homopolymers, like polyglutamine, have a discrete translational symmetry, which implies that the hairpins needed to form multiple beta sheets can occur at many places along the sequence. The asparagine residues introduced by the authors place limitations on where the hairpins can occur, and should be expected to increase structural homogeneity. Furthermore, the authors demonstrate that polyglutamine chains close to the minimum length of ~35 will have strict limitations on where the folds must occur in order to attain the required four beta strands.

      We are unsure how to interpret the above statements as a caveat. We agree that increasing sequence complexity will tend to increase homogeneity, but this is exactly the motivation of our approach. We explicitly set out to determine the minimal complexity sequence sufficient to specify the nucleating conformation, which we ultimately identified in terms of secondary and tertiary structure. We do not specify which parts of a long polyQ tract correspond to which parts of the structure, because, as the reviewer points out, they can occur at many places. Hence, depending on the length of the polyQ tract, the nucleus we describe may have any length of sequence connecting the strand elements. We do not think that the effects of N-residue placement can be interpreted as a confounding influence on hairpin position because the striking even-odd pattern we observe implicates the sides of beta strands rather than the lengths. Moreover, we observe this pattern regardless of the residue used (Gly, Ser, Ala, and His in addition to Asn).

      A novel result of this work is the observation of multiple concentration regimes in the nucleation rate. Specifically, they report a plateau-like regime at intermediate regimes in which the nucleation rate is insensitive to protein concentration. The authors attribute this effect to the "self-poisoning" phenomenon observed in growth of some crystals. This is a valid comparison because the homogeneity observed in NMR and crystallography structures of mature fibrils resemble a one-dimensional crystal. Furthermore, the typical elongation rate of amyloid fibrils (on the order of one molecule per second) is many orders of magnitude slower than the molecular collision rate (by factors of 10^6 or more), implying that the search for the beta-sheet state is very slow. This slow conformational search implies the presence of deep kinetic traps that would be prone to poisoning phenomena. However, the observation of poisoning in nucleation during nucleation is striking, particularly in consideration of the expected disorder and concentration sensitivity of the nucleus. Kandola et al.'s structural model of an ordered, intramolecular nucleus explains why the internal states responsible for poisoning are relevant in nucleation.

      We thank the reviewer for noting the novelty and plausibility of the self-poisoning connection. We would like to elaborate on our finding that self-poisoning inhibits nucleation (in addition to elongation), as this could prove confusing to some readers. While self-poisoning is claimed to inhibit primary nucleation in the polymer crystal literature (Ungar et al., 2005; Zhang et al., 2018), the semantics of “nucleation” in this context warrants clarification. Technically, the same structure can be considered a nucleus in one context but not in another. The Q zipper monomer, even if it is rate-limiting for amyloid formation at low concentrations (and is therefore the “nucleus”), is not necessarily rate-limiting when self-poisoned at high concentrations. Whether it comprises the nucleus in this case depends on the rates of Q zipper formation relative to subunit addition to the poisoned state. If the latter happens slower than Q zipper formation de novo, it can be said that self-poisoning inhibits nucleation, regardless of whether the Q zipper formed. We suspect this to be the mechanism by which preemptive oligomerization blocks nucleation in the case of polyQ, though other mechanisms may be possible.

      To achieve these results the authors used a novel approach involving a systematic series of simple sequences. This is significant because, while individual experiments showed seemingly random behavior, the randomness resolved into clear trends with the systematic approach. These trends provided clues to build a model and guide further experiments.

      Reviewer #3 (Public Review):

      Kandola et al. explore the important and difficult question regarding the initiating event that triggers (nucleates) amyloid fibril growth in glutamine-rich domains. The researchers use a fluorescence technique that they developed, dAMFRET, in a yeast system where they can manipulate the expression level over several orders of magnitude, and they can control the length of the polyglutamine domain as well as the insertion of interfering non-glutamine residues. Using flow cytometry, they can interrogate each of these yeast 'reactors' to test for self-assembly, as detected by FRET.

      In the introduction, the authors provide a fairly thorough yet succinct review of the relevant literature into the mechanisms of polyglutamine-mediated aggregation over the last two decades. The presentation as well as the illustrations in Figure 1A and 1B are difficult to understand, and unfortunately, there is no clear description of the experimental technique that would allow the reader to connect the hypothetical illustrations to the measurement outcomes. The authors do not explain what the FRET signal specifically indicates or what its intensity is correlated to. FRET measures distance between donor and acceptor, but can it be reliably taken as an indicator of a specific beta-sheet conformation and of amyloid? Does the signal increase with both nucleation and with elongation, and is the signal intensity the same if, e.g., there were 5 aggregates of 10 monomers each versus 50 monomeric nuclei? Is there a reason why the AmFRET signal intensity decreases at longer Q even though the number of cells with positive signal increases? Does the number of positive cells increase with time? The authors state later that 'non-amyloid containing cells lacked AmFRET altogether', but this seems to be a tautology - isn't the lack of AmFRET taken as a proof of lack of amyloid? Overall, a clearer description of the experimental method and what is actually measured (and validation of the quantitative interpretation of the FRET signal) would greatly assist the reader in understanding and interpreting the data.

      We believe the difficulty in understanding the illustrations in Figure 1A and 1B is inherent to the subject. We agree that elaborating how DAmFRET works would help the reader, and will add a few sentences to this end. Beyond this, we refer the reviewer and readers to our cited prior work describing the theory and interpretation of DAmFRET. Note that the y-axes of DAmFRET plots are not raw FRET but rather “AmFRET”, a ratio of FRET to total expression level. As explained thoroughly in our cited prior work, the discontinuity of AmFRET with expression level indicates that the high AmFRET-population formed via a disorder-to-order transition. When the query protein is predicted to be intrinsically disordered, the discontinuous transition to high AmFRET invariably (among hundreds of proteins tested in prior published and unpublished work) signifies amyloid formation as corroborated by SDD-AGE and tinctorial assays.

      When performed using standard flow cytometry as in the present study, every AmFRET measurement corresponds to a cell-wide average, and hence does not directly inform on the distribution of the protein between different stoichiometric species. As there is only one fluorophore per protein molecule, monomeric nuclei have no signal. DAmFRET can distinguish cells expressing monomers from stable dimers from higher order oligomers (see e.g. Venkatesan et al. 2019), and we are therefore quite confident that AmFRET values of zero correspond to cells in which a vast majority of the respective protein is not in homo-oligomeric species (i.e. is monomeric or in hetero-complexes with endogenous proteins). The exact value of AmFRET, even for species with the same stoichiometry, will depend both on the effect of their respective geometries on the proximity of mEos3.1 fluorophores, and on the fraction of protein molecules in the species. Hence, we only attempt to interpret the plateau values of AmFRET (where the fraction of protein in an assembled state approaches unity) as directly informing on structure, as we did in Fig. S3A.

      We believe that AmFRET decreases with longer polyQ because the mass fraction of fluorophore decreases in the aggregate, simply because the extra polypeptide takes up volume in the aggregate.

      Yes, the fraction of positive cells in a discontinuous DAmFRET plot does increase with time. However, given the more laborious data collection and derivation of nucleation kinetics in a system with ongoing translation, especially across hundreds of experiments with other variables, ours is a snapshot measurement to approximately derive the relative contributions of intra- and intermolecular fluctuations to the nucleation barrier, rather than the barrier’s magnitude.

      We will revise the tautological statement by removing “non-amyloid containing”.

      The authors demonstrate that their assay shows that the fraction of cells with AmFRET signal increases strongly with an increase in polyQ length, with a 'threshold around 50-60 glutamines. This roughly correlates with the Q-length dependence of disease. The experiments in which asparagine or other amino acids are inserted at variable positions in the glutamine repeat are creative and thorough, and the data along with the simulations provide compelling support for the proposed Q zipper model. The experiments shown in Figure 5 are strongly supportive of a model where formation of the beta-sheet nucleus is within a monomer. This is a potentially important result, as there are conflicting data in the literature as to whether the nucleus in polyQ is monomer.

      We thank the reviewer for these comments. We wish to clarify one important point, however, concerning the correlation of our data with the pathological length threshold. As we state in the first results section, “Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner”. Hence, most of our experiments were conducted with 60Q not because it resembles the pathological threshold, but rather because it was most convenient for DAmFRET experiments.

      I did not find the argument, that their data shows the Q zipper grows in two dimensions, compelling; there are more direct experimental methods to answer this question. I was also confused by the section that Q zippers poison themselves. It would be easier for the reader to follow if the authors first presented their results without interpretation. The data seem more consistent with an argument that, at high concentrations, non-structured polyQ oligomers form which interfere with elongation into structured amyloid assemblies - but such oligomers would not be zippers.

      Self-poisoning is a widely observed and heavily studied phenomenon in polymer crystal physics, though it seems not yet to have entered the lexicon of amyloid biologists. We were new to this concept before it emerged as an extremely parsimonious explanation for our results. As described in the text, two pieces of evidence exclude the alternative mechanism suggested by the reviewer -- that non-structured oligomers form and subsequently engage and inhibit the template. Specifically, 1) inhibition occurs without any detectable FRET, even at high total protein concentration, indicating the species do not form in a concentration-dependent manner that would be expected of disordered oligomers; and 2) inhibition itself has strict sequence requirements that match those of Q zippers. Hence our data collectively suggest that inhibition is a consequence of the deposition of partially ordered molecules onto the templating surface.

      Although some speculation or hypothesizing is perfectly appropriate in the discussion, overall the authors stretch this beyond what can be supported by the results. A couple of examples: The conclusion that toxicity arises from 'self-poisoned polymer crystals' is not warranted, as there is no relevant data presented in this manuscript. The authors refer to findings 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', but I cannot recall any evidence for this statement in the results section.

      We restricted any mention of toxicity to the introduction and a section in the discussion that is not worded as conclusive. Nevertheless, we will soften the subheading and text of the relevant section in the discussion to more clearly indicate the speculative nature of the statements.

      We stand by our statement 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', as this follows directly from self-poisoning.

      Bibliography

      Arseni D, Hasegawa M, Murzin AG, Kametani F, Arai M, Yoshida M, Ryskeldi-Falcon B. 2022. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 601:139–143. doi:10.1038/s41586-021-04199-3

      Bansal A, Schmidt M, Rennegarbe M, Haupt C, Liberta F, Stecher S, Puscalau-Girtu I, Biedermann A, Fändrich M. 2021. AA amyloid fibrils from diseased tissue are structurally different from in vitro formed SAA fibrils. Nat Commun 12:1013. doi:10.1038/s41467-021-21129-z

      Buell AK. 2017. The Nucleation of Protein Aggregates - From Crystals to Amyloid Fibrils. Int Rev Cell Mol Biol 329:187–226. doi:10.1016/bs.ircmb.2016.08.014

      Chakraborty D, Straub JE, Thirumalai D. 2023. Energy landscapes of Aβ monomers are sculpted in accordance with Ostwald’s rule of stages. Sci Adv 9:eadd6921. doi:10.1126/sciadv.add6921 Crist B, Schultz JM. 2016. Polymer spherulites: A critical review. Prog Polym Sci 56:1–63. doi:10.1016/j.progpolymsci.2015.11.006

      De Yoreo JJ. 2022. Casting a bright light on Ostwald’s rule of stages. Proc Natl Acad Sci USA 119. doi:10.1073/pnas.2121661119

      Hong Y, Yuan S, Li Z, Ke Y, Nozaki K, Miyoshi T. 2015. Three-Dimensional Conformation of Folded Polymers in Single Crystals. Phys Rev Lett 115:168301. doi:10.1103/PhysRevLett.115.168301

      Keller A. 1957. A note on single crystals in polymers: Evidence for a folded chain configuration. Philosophical Magazine 2:1171–1175. doi:10.1080/14786435708242746

      Landgraf D, Okumus B, Chien P, Baker TA, Paulsson J. 2012. Segregation of molecules at cell division reveals native protein localization. Nat Methods 9:480–482. doi:10.1038/nmeth.1955

      Lauritzen JI, Hoffman JD. 1960. Theory of Formation of Polymer Crystals with Folded Chains in Dilute Solution. J Res Natl Bur Stand A Phys Chem 64A:73–102. doi:10.6028/jres.064A.007

      Navrotsky A. 2004. Energetic clues to pathways to biomineralization: precursors, clusters, and nanoparticles. Proc Natl Acad Sci USA 101:12096–12101. doi:10.1073/pnas.0404778101

      Ohhashi Y, Ito K, Toyama BH, Weissman JS, Tanaka M. 2010. Differences in prion strain conformations result from non-native interactions in a nucleus. Nat Chem Biol 6:225–230. doi:10.1038/nchembio.306

      Organ SJ, Ungar G, Keller A. 1989. Rate minimum in solution crystallization of long paraffins. Macromolecules 22:1995–2000. doi:10.1021/ma00194a078

      Radamaker L, Baur J, Huhn S, Haupt C, Hegenbart U, Schönland S, Bansal A, Schmidt M, Fändrich M. 2021. Cryo-EM reveals structural breaks in a patient-derived amyloid fibril from systemic AL amyloidosis. Nat Commun 12:875. doi:10.1038/s41467-021-21126-2

      Sahoo B, Singer D, Kodali R, Zuchner T, Wetzel R. 2014. Aggregation behavior of chemically synthesized, full-length huntingtin exon1. Biochemistry 53:3897–3907. doi:10.1021/bi500300c

      Schmelzer JWP, Abyzov AS. 2017. How do crystals nucleate and grow: ostwald’s rule of stages and beyond In: Šesták J, Hubík P, Mareš JJ, editors. Thermal Physics and Thermal Analysis, Hot Topics in Thermal Analysis and Calorimetry. Cham: Springer International Publishing. pp. 195–211. doi:10.1007/978-3-319-45899-1_9

      Schmidt M, Wiese S, Adak V, Engler J, Agarwal S, Fritz G, Westermark P, Zacharias M, Fändrich M. 2019. Cryo-EM structure of a transthyretin-derived amyloid fibril from a patient with hereditary ATTR amyloidosis. Nat Commun 10:5008. doi:10.1038/s41467-019-13038-z

      Schweighauser M, Shi Y, Tarutani A, Kametani F, Murzin AG, Ghetti B, Matsubara T, Tomita T, Ando T, Hasegawa K, Murayama S, Yoshida M, Hasegawa M, Scheres SHW, Goedert M. 2020. Structures of α-synuclein filaments from multiple system atrophy. Nature 585:464–469. doi:10.1038/s41586-020-2317-6

      Snapp EL, Hegde RS, Francolini M, Lombardo F, Colombo S, Pedrazzini E, Borgese N, Lippincott-Schwartz J. 2003. Formation of stacked ER cisternae by low affinity protein interactions. J Cell Biol 163:257–269. doi:10.1083/jcb.200306020

      Törnquist M, Michaels TCT, Sanagavarapu K, Yang X, Meisl G, Cohen SIA, Knowles TPJ, Linse S. 2018. Secondary nucleation in amyloid formation. Chem Commun 54:8667–8684. doi:10.1039/c8cc02204f

      Ungar G, Putra EGR, de Silva DSM, Shcherbina MA, Waddon AJ. 2005. The Effect of Self-Poisoning on Crystal Morphology and Growth Rates In: Allegra G, editor. Interphases and Mesophases in Polymer Crystallization I, Advances in Polymer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 45–87. doi:10.1007/b107232

      Vetri V, Foderà V. 2015. The route to protein aggregate superstructures: Particulates and amyloid-like spherulites. FEBS Lett 589:2448–2463. doi:10.1016/j.febslet.2015.07.006

      Wild EJ, Boggio R, Langbehn D, Robertson N, Haider S, Miller JRC, Zetterberg H, Leavitt BR, Kuhn R, Tabrizi SJ, Macdonald D, Weiss A. 2015. Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington’s disease patients. The Journal of Clinical Investigation.

      Yang Y, Arseni D, Zhang W, Huang M, Lövestam S, Schweighauser M, Kotecha A, Murzin AG, Peak-Chew SY, Macdonald J, Lavenir I, Garringer HJ, Gelpi E, Newell KL, Kovacs GG, Vidal R, Ghetti B, Ryskeldi-Falcon B, Scheres SHW, Goedert M. 2022. Cryo-EM structures of amyloid-β 42 filaments from human brains. Science 375:167–172. doi:10.1126/science.abm7285

      Zhang X, Zhang W, Wagener KB, Boz E, Alamo RG. 2018. Effect of Self-Poisoning on Crystallization Kinetics of Dimorphic Precision Polyethylenes with Bromine. Macromolecules 51:1386–1397. doi:10.1021/acs.macromol.7b02745

    1. Author Response

      Reviewer #1 (Public Review):

      This study by Park et al. describes an interesting approach to disentangle gene-environment pathways to cognitive development and psychotic-like experiences in children. They have used data from the ABCD study and have included PGS of EA and cognition, environmental exposure data, cognitive performance data and self-reported PLEs. Although the study has several strengths, including its large sample size, interesting approach and comprehensive statistical model, I have several concerns:

      • The authors have included follow-up data from the ABCD Study. However, it is not very clear from the beginning that longitudinal paths are being explored. It would be very helpful if the authors would make their (analysis) approach clearer from the introduction. Now, they describe many different things, which makes the paper more difficult to read. It would be of great help to see the proposed path model in a Figure and refer to that in the Method.

      We clarified the specific longitudinal paths explored in our study in the end of the Introduction section (line 149~160). We also added a figure of the proposed path model (Figure 1) and refer to it in the Method section (line 232~239).

      • There is quite a lot of causal language in the paper, particularly in the Discussion. My advice would be to tone this down.

      We corrected and tone-downed all causal languages used in our manuscript. Per your suggestion, we deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      • I feel that the limitation section is a bit brief, and can be developed further.

      We specified additional potential constraints of our study, including limited representativeness, limited periods of follow-up data, possible sample selection bias, and the use of non-randomized, observational data. These corrections can be found in line 518~538.

      • I like that the assessment of CP and self-reports PEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL were used and how did they correlate with the child-reported PEs? And how was distress taken into account in the child self-reported PEs measurement? Which PEs measures were used?

      We believe that the Reviewer #1’s comment for the correlations between PLEs derived from PQ-BC (total score and distress score PLEs) and from CBCL (parent-rated PLEs) might have been due to the fact that she/he was referring to the prior version of our manuscript submitted to a different journal. We obtained Pearson’s correlation coefficients between the PLEs (baseline year: r = 0.095~0.0989, p<0.0001; 1-year follow-up: r = 0.1322~0.1327, p<0.0001; 2-year follow-up: r = 0.1569~0.1632, p<0.0001) and added this information in the Method section for PLEs (line 198~201).

      • What was the correlation between CP and EA PGSs?

      We also added the Pearson’s correlation between the two PGSs (r =0.4331, p<0.0001) in the Methods section for PGS (line 214~215).

      • Regarding the PGS: why focus on cognitive performance and EA? It should be made clearer from the introduction that EA is not only measuring cognitive ability, but is also a (genetic) marker of social factors/inequalities. I'm guessing this is one of the reasons why the EA PGS was so much more strongly correlated with PEs than the CP PGS. See the work bij Abdellaoui and the work by Nivard.

      We thank the reviewer for the feedback to clarify that educational attainment (EA) is not only a genetic marker of cognitive ability but also that of socioeconomic outcomes. Per your suggestion, we included the associations of EA PGS with multiple biological and socioeconomic outcomes found in prior studies (e.g., Abdellaoui et al., 2022) in the Introduction (line 131~142).

      Abdellaoui, A., Dolan, C. V., Verweij, K. J. H., & Nivard, M. G. (2022). Gene–environment correlations across geographic regions affect genome-wide association studies. Nature Genetics. doi:10.1038/s41588-022-01158-0

      • Considering previous work on this topic, including analyses in the ABCD Study, I'm not surprised that the correlation was not very high. Therefore, I don't think it makes a whole of sense to adjust for the schizophrenia PGS in the sensitivity analyses, in other words, it's not really 'a more direct genetic predictor of PLEs'.

      We conducted this adjustment considering that PLEs often precede the onset of schizophrenia. In addition, prior studies found that schizophrenia PGS is significantly associated with cognitive intelligence within psychosis patients (Shafee et al., 2018) and individuals at-risk of psychosis (He et al., 2021), and that significant distress psychotic-like experiences had greater positive correlation with schizophrenia PGS than PGS for psychotic-like experiences (Karcher et al., 2018).

      For these reasons, we thought that it is necessary to assess whether the effects of cognitive phenotypes PGS (i.e., CP PGS and EA PGS) in the linear mixed model are significant after adjusting for schizophrenia PGS. We believe our results from the mixed linear model showed the sensitivity and specificity of the association between cognitive phenotype PGS and PLEs.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      • How did the FDR correction for multiple testing affect the results?

      For all analysis results presented in our study, False Discovery Rate (FDR) correction for multiple testing compared p-values of nine key study variables: PGS (cognitive performance or educational attainment), family income, parental education, family’s financial adversity, Area Deprivation Index, years of residence, proportion of population below -125% of the poverty line, positive parenting behavior, and positive school environment. An exception was the sensitivity analysis that included schizophrenia PGS in the linear mixed model for adjustment: with another PGS variable added, FDR correction compared p-values of ten key variables. Overall, the effects of FDR correction on the results were limited; i.e., the majority of associations between the key variables and the outcomes, which were deemed highly significant, remained unchanged after the FDR correction.

      Overall, I feel that this paper has the potential to present some very interesting findings. However, at the moment the paper misses direction and a clear focus. It would be a great improvement if the readers would be guided through the steps and approach, as I think the authors have undertaken important work and conducted relevant analyses.

      We express our appreciation to the reviewer for the constructive feedback and guidance, which has significantly contributed to the improvement of our manuscript. As addressed in the preceding sections, we have implemented the necessary corrections and clarifications in response to the reviewer's suggestions. We remain open to making further amendments as needed, and thus invite any additional comments should any aspect of our revisions be deemed inadequate or inappropriate.

      Reviewer #2 (Public Review):

      This paper tried to assess the link between genetic and environmental factors on psychotic-like experiences, and the potential mediation through cognitive ability. This study was based on data from the ABCD cohort, including 6,602 children aged 9-10y. The authors report a mediating effect, suggesting that cognitive ability is a key mediating pathway in the link between several genetic and environmental (risk and protective) factors on psychotic-like experiences.

      While these findings could be potentially significant, a range of methodological unclarities and ambiguities make it difficult to assess the strength of evidence provided.

      Strengths of the methods:

      The authors use a wide range of validated (genetic, self- and parent-reported, as well as cognitive) measures in a large dataset with a 2-year follow-up period. The statistical methods have the potential to address key limitations of previous research.

      We sincerely thank the reviewer for recognizing these methodological strengths of our study. The reviewer’s positive comments are highly supportive and encouraging for us.

      Weaknesses of the methods:

      The rationale for the study is not completely clear. Cognitive ability is probably a more likely mediator of traits related to negative symptoms in schizophrenia, rather than positive symptoms (e.g., psychosis, psychotic-like symptom). The suggestion that cognitive ability might lead to psychotic-like symptoms in the general population needs further justification.

      We sincerely thank and highly appreciate the concerns that the reviewer has raised regarding our proposal that cognitive ability may serve as a mediator of psychotic-like experiences. To the best of our knowledge, it has been proposed that cognitive ability can be a mediator of positive symptoms in schizophrenia (including psychotic-like experiences), as well as negative symptoms. This mediating role of cognitive ability was proposed in several prior studies on cognitive model of schizophrenia/psychosis. Per your suggestion, we included further justification in the Introduction section of our study (line 104~107). Specifically, we highlighted that cognitive ability has been theoretically proposed as a potential mediator of genetic & environmental influence on positive symptoms of schizophrenia such as psychotic-like experiences. We refer to studies conducted by Howes & Murray (2014) and Garety et al. (2001).

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      Terms are used inconsistently throughout (e.g., cognitive development, cognitive capacity, cognitive intelligence, intelligence, educational attainment...). It is overall not clear what construct exactly the authors investigated.

      Thank you for your comment. We corrected the term ‘cognitive capacity’ to ‘cognitive phenotypes’ throughout our manuscript. We also added in the Introduction (line 141~143) that we will collectively refer to these two PGSs of focus as ‘cognitive phenotypes PGSs’, which is similar to the terms used in prior research (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019).

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      Not the largest or most recent GWASes were used to generate PGSes.

      Thank you for mentioning this point. The reason why we were not able to use the largest GWAS for cognitive intelligence, educational attainment and schizophrenia is because (unfortunately) our study started earlier than the point when the GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022) were published. We corrected that our study used ‘a GWAS of European-descent individuals for educational attainment and cognitive performance’ instead of the largest GWAS (line 206~208).

      It is not fully clear how neighbourhood SES was coded (higher or lower values = risk?). The rationale, strengths, and assumptions of the applied methods are not fully clear. It is also not clear how/if variables were combined into latent factors or summed (weighted by what). It is not always clear when genetic and when self-reported ethnicity was used. Some statements might be overly optimistic (e.g., providing unbiased estimates, free even of unmeasured confounding; use of representative data).

      Consistent with the illustration of neighborhood SES in the Methods section, higher values of neighborhood SES indicate risk. In the original Figure 2, higher values of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~ -0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      We represented PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs as composite indicators (derived from a weighted sum of relevant observed variables). To the best of our knowledge, it has been suggested from prior studies that these variables are less likely to share a common factor and were assessed as a composite index during analyses. For instance, Judd et al. (2020) and Martin et al. (2015) analyze genetic influence of educational attainment and ADHD as composite indicators. Also, as mentioned in Judd et al. (2020), socioenvironmental influences are often analyzed as composite indicators. Studies on psychosis continuum (e.g., van Os et al., 2009) suggest that psychotic disorders are likely to have multiple background factors instead of having a common factor, and notes that numerous prior research uses composite indices to measure psychotic symptoms. These are the reasons why we used components for these constructs instead of generating latent factors (which is done in the standard SEM method). On the contrary, we represented general intelligence as a common factor that determines the underlying covariance pattern of fluid and crystallized intelligence, based on the classical g theory of intelligence. We added this explanation in line 269~285.

      Moreover, during estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components. We added this explanation in the description about the IGSCA method (line 266~268).

      We deleted overly optimistic statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead, throughout our manuscript.

      Judd, N., Sauce, B., Wiedenhoeft, J., Tromp, J., Chaarani, B., Schliep, A., ... & Klingberg, T. (2020). Cognitive and brain development is independently influenced by socioeconomic status and polygenic scores for educational attainment. Proceedings of the National Academy of Sciences, 117(22), 12411-12418.

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      Martin, J., Hamshere, M. L., Stergiakouli, E., O'Donovan, M. C., & Thapar, A. (2015). Neurocognitive abilities in the general population and composite genetic risk scores for attention‐deficit hyperactivity disorder. Journal of Child Psychology and Psychiatry, 56(6), 648-656.

      van Os, J., Linscott, R., Myin-Germeys, I., Delespaul, P., & Krabbendam, L. (2009). A systematic review and meta-analysis of the psychosis continuum: Evidence for a psychosis proneness–persistence–impairment model of psychotic disorder. Psychological Medicine, 39(2), 179-195. doi:10.1017/S0033291708003814

      It appears that citations and references are not always used correctly.

      We thoroughly checked all citations and specified the references for each statement. We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant primary studies (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) use PGS of cognitive intelligence (which mentions the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference. These corrections can be found in line 131~141.

      Strengths of the results:

      The authors included a comprehensive array of analyses.

      We thank the reviewer for the positive comment.

      Weaknesses of the results:

      Many results, which are presented in the supplemental materials, are not referenced in the main text and are so comprehensive that it can be difficult to match tables to results. Some of the methodological questions make it challenging to assess the strength of the evidence provided in the results.

      As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis (line 375). The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, which encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions on how to present our supplementary results in a more accessible and digestible format. We are ready and willing to implement any necessary modifications to ensure clarity and ease of comprehension. Your guidance in this matter is highly valued.

      Appraisal:

      The authors suggest that their findings provide evidence for policy reforms (e.g., targeting residential environment, family SES, parenting, and schooling). While this is probably correct, a range of methodological unclarities and ambiguities make it difficult to assess whether the current study provides evidence for that claim.

      Impact:

      The immediate impact is limited given the short follow-up period (2y), possibly concerns for selection bias and attrition in the data, and some methodological concerns.

      We added as study limitations (line 518~538) that the impact of our findings for understanding cognitive and psychiatric development during later childhood may be limited due to the relatively short follow-up period, the possibility of sample selection bias, and the problems of interpreting analyses results from an observational study as causality (despite the novel causal inference methods, designed for non-randomized, observational data, that we used).

      As responded above, we made necessary corrections and clarifications for the points suggested by the reviewer. As we are willing to make additional revisions, please feel free to give comments if you feel that our corrections are insufficient or inappropriate.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their positive statement and the significance of our work.

      2. Point-by-point description of the revisions


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This paper contains a set of highly valuable information on the physicochemical parameteters of betain lipids - which are synthesized in microalgae and some other lower eukaryotic organisms.

      The authors, using advanced biophysical techniques - neutron diffraction and small-angle scattering (SANS) as well as molecular dynamics (MD) simulations - established key physicochemical parameters of synthetic betaine lipid DP-DGTS, and compared it with those of the DPPC phospholipid. They "show that DP-DGTS bilayers are thicker, more rigid, and mutually more repulsive than DPPC bilayers". These are important findings.

      The authors also analyzed the phylogenetic tree of the appearance and disappearance of DGTS biosynthesis enzymes, which - together with the observed "different properties and hydration response of PC and DGTS" led them to explain "the diversity of betaine lipids observed in marine organisms and for their disappearance in seed plants". The authors tentatively suggest "A physicochemical cause of betaine lipid evolutionary loss in seed plants" (Title with "?")

      We put a question mark because our work suggests that the difference of sensitivity to hydration between DGTS and PC bilayers could be an explanation for the betaine lipid disappearance in seed plants due to the dry stage of the seed. In our hands, we never managed to obtain 35S-BTA1 overexpressing plant that produce seed. However, we do not have a formal evidence for this fact. We propose to change the title into: “The possible role of lipid bilayer properties in the evolutionary disappearance of betaine lipids in seed plants.

      May major concerns with this suggestion are:

      • In thylakoid membranes (TMs) the only phospholipid, PG, plays key roles in PSII and PSI functions (Wada and Murata 2007 Photosynth Res, Hagio et al. Plant Physiol 2000, Domonkos et al. 2004 Plant Physiol; it is difficult to explain how these roles would be overtaken by betaine lipids. In fact, data of Huang et al. (https://www.sciencedirect.com/science/article/pii/S2211926418309366) indicate betaine lipids constitute the major compounds of non-plastidial membranes" and compensation mechanism operate according to which "by the increase of PG in thylakoid membranes, suggesting a transfer of P from non-plastidial membranes to chloroplasts that would maintain a stable lipid composition of thylakoid membranes".
      • Although neutron diffraction and SANS data, as well as MD simulationa might indicate important differences, the behavior of membranes (e.g. stacking interactions, overall structure and structural dynamics of TMs, protein embedding conditions / membrane thickness etc), TMs are more dominantly determined by protein-protein interactions, mainly because these membranes, contain only small areas occupied by the bilayer phase. Similar arguments hold true for the inner mitochondrial membranes (IMMs). I suggest to take into account these severe limitations when extrapolating the data and trying to reach general conclusions. In general, I suggest a more cautious interpretation of data.

      We fully agree with the reviewer’s comments. We indeed wrote in the introduction: “In algae, under phosphate starvation, a situation commonly met in the environment, betaine lipids replace phospholipids in extraplastidic membranes. Because betaine lipids are localized in these membranes [11, 12] and share a common structural fragment with the main extraplastidic phospholipid phosphatidylcholine (PC) (Figure 1A and B), it can be speculated that these two lipid classes are interchangeable, but this was never demonstrated.”

      Plastidial membranes are mainly composed of the non-phosphorus glycerolipids MGDG, DGDG and SQDG. It is well known that in phosphate starvation, in plants and algae, the main phospholipid present in thylakoid membranes, PG, is replaced by SQDG because they are both anionic and bilayer forming lipids (Hölzl G, Dörmann P. Chloroplast Lipids and Their Biosynthesis. Annu Rev Plant Biol. 2019 Apr 29;70:51-81. doi: 10.1146/annurev-arplant-050718-100202; Endo K, Kobayashi K, Wada H. Sulfoquinovosyldiacylglycerol has an Essential Role in Thermosynechococcus elongatus BP-1 Under Phosphate-Deficient Conditions. Plant Cell Physiol. 2016 Dec;57(12):2461-2471; Van Mooy BA, Rocap G, Fredricks HF, Evans CT, Devol AH. Sulfolipids dramatically decrease phosphorus demand by picocyanobacteria in oligotrophic marine environments. Proc Natl Acad Sci U S A. 2006 Jun 6;103(23):8607-12.; Kobayashi K, Fujii S, Sato M, Toyooka K, Wada H. Specific role of phosphatidylglycerol and functional overlaps with other thylakoid lipids in Arabidopsis chloroplast biogenesis. Plant Cell Rep. 2015 Apr;34(4):631-42.). We recently showed by the same kind of neutron diffraction approaches that PG and SQDG share similar physicochemical properties that can explain their conserved replacement by each other in plastidial membranes (Bolik S, Albrieux C, Schneck E, Demé B, Jouhet J. Sulfoquinovosyldiacylglycerol and phosphatidylglycerol bilayers share biophysical properties and are good mutual substitutes in photosynthetic membranes. Biochim Biophys Acta Biomembr. 2022 Dec 1;1864(12):184037. ). However, nothing is known about mitochondrial membranes and DGTS localization. Because PC is a major lipid component of mitochondria in plants and fungi and PC is absent in Chlamydomonas reinhardtii, mitochondria membranes could contain DGTS at least in Chlamydomonas.

      To clarify this statement, we added in the introduction the sentences: “Betaine lipid synthesis is located in the ER [13,14] and betaine lipids are expected to be absent in photosynthetic membranes [12]. Therefore, this PC-betaine lipid replacement is not expected to occur in photosynthetic membranes. However, it might occur at the surface of the chloroplast envelope where PC might be present [15–17]. Nothing is known about the composition of mitochondrial membranes in algae but because PC is a major lipid component in plant and fungal mitochondria, this replacement might also occur in mitochondria.” In the discussion, we replaced “cellular membrane” with “extraplastidial membrane”.

      A minor point - just to avoid possible misunderstanding: betaine can be present in large quantities in many photosynthetic organisms. A short statement on betaine would help.

      To avoid any confusion with betaine as a soluble molecule and betaine lipid, we added this sentence in the introduction: “The presence of betaine lipids is not linked to the synthesis of betaine, a soluble compound present in almost every organism including most animals, plants, and microorganisms, acting as protectant against osmotic stress [22].”

      **Referee cross-commenting**

      I agree with the evaluation of Reviewer #2 - while keeping mine

      Reviewer #1 (Significance (Required)):

      The physico-chemical properties of betaine lipids have not been established. These lipids - under P starvation of microalgae - accummulate in large quentites. Thus, their detailed characterization and comparison to (otherwise similar) phospolipids are of high importance and advance our knowledge about the roles of these lipids and the organization and structural / functional plasticity of biological membranes.

      As outlined above, I suggest a more cautious interpretation of the data and conclusions regarding e.g. the energy-converting membranes.

      I think the audience is relatively broad: (i) basic research of lipid models and (ii) methodology as well as calling the attention of membrane biologists to the scarcely studied betaine lipids.

      My field is the biophysics photosynthesis - the stability and plasticity of the oxygenic photosynthetic machinery at different levels of complexity; the and closest to this topic is the polymorphic lipid phase behavior of plant TMs.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript nicely presents the effect of phosphate depletion on how betaine lipids function as effective replacements in a water-rich environment. The mix of computational and wet lab experiments provides details on membrane structure and general effects when phospholipids are changed to betaine lipids. I found this manuscript easy to read and understand and is worthy of publishing. However, I do have a few minor comments below to improve the manuscript.

      Minor Comments:

      1. Phases in PC lipids with saturated tails: The authors present a gel to liquid crystalline phase change for DPPC at 40oC. However, this is at the ripple-liquid crystalline phase transition and the gel doesn't occur until about 34-35oC. This should be noted in the manuscript.

      We indeed completed the sentence in the first result section by : “The DSC data show a sharp phase transition at 40.2 ± 0.1°C for DPPC corresponding to the transition between the ripple phase and the fluid phase, which is consistent with earlier reports on DPPC large unilamellar vesicles [25].”

      Page 4: I am confused with the following phase: "indicating either weak cooperativity between lipid bilayers or that phase co-existence is not a thermodynamic disadvantage, while this phenomenon is not observed for DPPC bilayers." What is meant by phase co-existence is not a thermodynamic disadvantage? Could this also be due to some frustration in phase coexistence and the presence of a ripple phase that kinetically is inhibited and thus a sharp transition is not observed?

      We did not observe a ripple phase in DP-DGTS as it is defined in DPPC bilayer either by DSC, neutron diffraction or SANS experiments. We don’t know if it exists in DP-DGTS bilayers. What we observe in neutron diffraction is a coexistence of gel phase and fluid phase domains in oriented multilayer films of DP-DGTS over a wide range of humidity whereas for DPPC we observe only a gel phase or a fluid phase. Because the thicknesses of the DP-DGTS bilayers are not so different between the gel phase and the fluid phase, we suppose that the free energy difference between the two phases is very small over a wide osmotic pressure range and that could explain the broad phase transition.

      To further clarify our point, we have reworded the sentence in the following way: “As seen in Figure 2A , by increasing the humidity, DPPC molecules transit from the gel to the fluid phase via a ripple phase through a narrow window of osmotic pressures as previously reported [30,31]. In contrast, DP-DGTS bilayers show a phase coexistence that can be observed over a wide P-range and without the appearance of a third phase that could be attributed to a distinct ripple phase (Figure 2B) before forming a single fluid phase at high humidity (i.e., at low P). Based on DSC and neutron diffraction as two independent techniques, we can safely conclude that the phase transition for DP-DGTS is broad. This observation indicates that the free energy difference between the two phases is very small over a wide osmotic pressure range and may be connected to the shapes of the pressure-distance relations in the two phases, which are discussed further below.” We also added in the legend of figure 4 (SANS experiment): “No ripple phase Pb was detected for DP-DGTS bilayers.”

      DOI for computational methods: The DOI listed computational files (https://doi.org/10.18419/darus-2360) does not work.

      Unfortunately, we did not ask for publication of the URL upon submission of the manuscript and thank the reviewer for carefully checking this. Since DaRUS is a peer-reviewed repository ensuring high quality data sets according to the FAIR principle, peer review is still ongoing. The provided link will work definitely only when the manuscript will be published. In the meantime, we provide a temporary link for reviewing :

      https://darus.uni-stuttgart.de/privateurl.xhtml?token=cbfac341-0e4a-4403-8f73-87bce31ca805

      Reviewer #2 (Significance (Required)):

      This work has broad significance and would be of general interest to those in membrane biophysics to plant biology and evolution. The work nicely touches on all these topics, and I find this fills a gap in details of these betaine lipids structure and relation to evolution in terrestrial vs. marine plants.

    1. Because here’s something else that’s weird but true: in the day-to-day trenches of adult life, there is actually no such thing as atheism. There is no such thing as not worshipping. Everybody worships. The only choice we get is what to worship.

      I find this to be true because in reality, as much as some people may not believe in worshipping anything be it spiritual, supernatural, of anything of the sort, practically, people believe in things or worship things which keeps them going. For instance, one may find himself or herself in a critical situation with no certainty of how to get out of it, but he or she may wish to get out of that situation without thinking of anyone in mind but just believes and it their wish comes to pass, the act of wishing alone is a prayer made. Just like Wallace mention, atheism does not exist as people find themselves worshipping various things since in reality, whatever we humans dedicate our time to so much to the extent we believe we cannot do without (worthy) is actually a form of worship. This includes, money, spending much time with tv, social media, and the likes. Hence, we give reverence to these things which makes us prisoners in our own selves. Therefore, as humans, we should learn to be conscious about what is real and important, so we can control how we think and make choices.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports new findings about the role of the glutamate transporter EAAC1 in controlling neural activity in the striatum. The significance is two-fold - it addresses gaps in knowledge about the functional significance of EAAC1, as well as provides a potential explanation for how EAAC1 mutations contribute to striatal hyperexcitability and OCD-associated behaviors. The manuscript is clearly presented, and the well-designed experiments are rigorously performed and analyzed. The main results showing that EAAC1 deletion increases the dendritic arbor of MSN D1 neurons and increases excitatory synaptic connectivity, as well as reduces D1-to-D1 mediated IPSCs are convincing. These results clearly demonstrate that EAAC1 deletion can alter excitatory and inhibitory synaptic function. Modelling the potential consequences for these changes on D1 MSN neural activity, and the behavior changes are interesting. Minor weaknesses include incomplete support for the conclusions about how EAAC1 regulates GABAergic transmission.

      We would like to take this opportunity to thank the reviewer. New sets of pharmacology experiments now address the minor concern about supporting the conclusions about the regulation of GABAergic transmission by EAAC1. The revised manuscript also includes new behavioral assays that allow us to examine in more depth the cell- and region-specificity of the effects of EAAC1.

      Reviewer #2 (Public Review):

      The manuscript by Petroccione et al., examines the modulatory role of the neuronal glutamate transporter EAAC1 on glutamatergic and GABAergic synaptic strength at D1- and D2-containing medium spiny neurons within the dorsolateral striatum. They find that pharmacological and genetic disruption of EAAC1 function increases glutamatergic synaptic strength specifically at D1-MSNs. They show that this is due to a structural change in release sites, not release probability. They also show that EAAC1 is critical in maintaining lateral inhibition specifically between D1-MSNs. Taken together, the authors conclude that EAAC1 functions to constrain D1-MSN excitation. Using a computational modeling technique, they posit that EAAC1's modulatory role at glutamatergic and GABAergic inputs onto D1-MSNs ultimately manifests as a reduction of gain of the input-output firing relationship and increases the offset. They go on to show that EAAC1 deletion leads to enhanced switching behavior in a probabilistic operant task. They speculate that this is due to a dysregulated E/I balance at D1-MSNs in the DLS. Overall, this is a very interesting study focused on an understudied glutamate transporter. Generally, the study is done in a very thorough and methodical manner and the manuscript is well written.

      We thank the reviewer for the thorough analysis and insightful comments on the manuscript. Our point-to-point responses to the concerns raised on the initial submission of this work are reported below:

      Major Comments/Concerns:

      Regional/Local manipulations in behavior study: The manuscript would be greatly improved if they provided data linking the ex vivo electrophysiological findings within the DLS with the behavior. Although they are using a DLS-dependent task, they are nonetheless, using a constitutive EAAC1 KO mouse. Thus, they cannot make a strong conclusion that the behavioral deficits are due to the EAAC1 dysfunction in the DLS (despite the strong expression levels in the DLS).

      Corrected - We concur with the reviewer. To address this concern, we performed new experiments to assess the cell- and regional-specificity of the effects of EAAC1 on task-switching behaviors.

      First, we repeated the behavioral assays described in Fig. 8 in two mouse lines (D1Cre/+:EAAC1f/f and A2ACre/+:EAAC1f/f) lacking EAAC1 expression in D1- or D2-MSNs, respectively (Supp. Fig. 8-1). As in the case of EAAC1+/+ and EAAC1-/- mice, when the switch time was short (<15 s), D1Cre/+:EAAC1f/f and A2ACre/+:EAAC1f/f mice collected a similar number of rewards (Supp. Fig. 8-1K, L) and performed a similar number of lever presses (Supp. Fig. 8-1M, N). As the switch time increased (30-75 s), D1Cre/+:EAAC1f/f mice collected more rewards than A2ACre/+:EAAC1f/f mice, at low and high reward probabilities (Supp. Fig. 8-1L, N). Overall, the task switching behavior of D1Cre/+:EAAC1f/f mice was similar to that of EAAC1-/- mice, whereas that of A2ACre/+:EAAC1f/f mice was similar to that of EAAC1+/+ mice (cf. Supp. Fig. 8 and Supp. Fig. 8-1). This suggests that loss of expression of EAAC1 from D1-MSNs is sufficient to reproduce the task switching behavior of EAAC1-/- mice. Because EAAC1 limits excitation onto D1-MSNs (Fig. 2, 3) and lateral inhibition between D1-MSNs (Fig. 4-6), these findings suggest that increased excitation onto D1-MSNs and reciprocal inhibition among D1-MSNs limit execution of reward-based behaviors with task-switching intervals >30s.

      Second, as noted by the reviewer, another potential limitation of the experiments performed on constitutive EAAC1-/- mice is that , on their own, they do not allow us to say whether they are due to changes in E/I onto D1MSNs within a specific domain of the striatum like the DLS. Although the DLS is recruited during task-switching, reward-based flexibility in executive control relies on neuronal activity in the VMS (Wallis 2007; Gu et al. 2008). Therefore, we asked whether limiting excitation in D1-MSNs and strengthening D1-D1 lateral inhibition via EAAC1 in the VMS could also alter reward-based task-switching behaviors. To address this question, we repeated the task switching test in EAAC1f/f mice that received stereotaxic injections of a Cre-dependent viral construct (AAV-D1Cre) that we used to remove EAAC1 expression from D1-MSNs in the DLS or VMS, respectively (Supp. Fig. 8-2). The results showed that the task switching behaviors of EAAC1f/f mice receiving AAV-D1Cre injections in the DLS or VMS were similar to each other and to those of EAAC1-/- mice, while being statistically different from those of EAAC1+/+ mice. This finding is important, as it suggests that: (i) the DLS and VMS are both recruited for the execution of task switching behaviors; (ii) the modulation of E/I onto D1-MSNs by EAAC1 may not be limited to the DLS but could extend to the VMS.

      Third, we performed further tests to examine the regional-specificity of the effects of EAAC1 in D1-MSNs. D1 receptor expressing cells are present not only throughout the striatum, but also in the substantia nigra (pars compacta and reticulata; SN) and ventral tegmental area (VTA) (Cadet et al. 2010; Savasta, Dubois, and Scatton 1986; Boyson, McGonigle, and Molinoff 1986; Wamsley et al. 1989). To determine whether lack of EAAC1 in D1expressing cells in the SN/VTA could also contribute to increased compulsivity, we repeated the task switching behavioral assays in EAAC1f/f mice that received injections of AAV-D1Cre in the SN/VTA (Supp Fig. 8-3). The task switching behavior of these mice was similar to that of EAAC1+/+ , not EAAC1-/- mice, suggesting that altering EAAC1 expression in D1-MSNS of the DLS/VMS, but not the SN/VTA, is implicated with the control of task switching of reward-based behaviors in mice.

      The results of these new sets of experiments are included in the revised version of the manuscript and their implications are reported in the Discussion section of the paper.

      Statistics used in the study: There are some missing details regarding the precise stats using for the different comparisons. I am particularly concerned that the electrophysiology studies that were a priori designed as a 2-factor analysis did not have 2-way ANOVAs performed, but rather a series of t-tests. For example, in Figure 3b, the two factors are 1) cell type and 2) genotype. Was a 2-way ANOVA performed? It is hard for me to tell from the text.

      Corrected - We apologize for any potential confusion. The statistical analysis for the experiments included in this work includes paired and unpaired t-tests, one-way ANOVA, two-way ANOVA, and ANOVA for repeated measures tests followed by post hoc t-test comparisons (reported in the text). To ensure both accuracy and readability of the manuscript, we report the results of the statistical comparisons in the main text of the manuscript, but also provide a fully detailed statistical analysis across all datasets performed in the data repository for this manuscript deposited on Open Science Framework. We revised the methods section to clarify the use of different statistical tests and values reported in the manuscript.

      Moderate Concerns:

      Control mice: I am moderately concerned that littermates were not used for controls for the EAAC1 KO, but rather C57Bl/6NJ presumably ordered from a vendor. It has been shown that issues like transit and rearing conditions can have long term effects on behavior. Were the control mice reared in house? How long was the acclimation time before use?

      Corrected - Sorry for the potential confusion. The EAAC1-/- mice are bred in house and have been backcrossed with C57BL/6J for more than 10 generations. We perform backcrossing regularly and routinely in our animal colony. The C57BL/6J are also bread in house. They are replaced every 10 generations to avoid genetic drift. Therefore, there is no concern about transit from vendors and rearing affecting the results of our experiments. This information has been added to the Methods section of the paper.

      OCD framework: I generally find the OCD framework unnecessary, particularly in the Introduction. Compulsive behaviors are not restricted to OCD. Indeed, the link between the behavioral observations and OCD phenotype seems a bit tenuous. In addition, studying the mechanisms of behavioral flexibility in and of itself is interesting. I do not think such a strong link needs to be made to OCD throughout the entirety of the paper. The authors should consider tempering this language or restricting it to the discussion and end of the abstract.

      Corrected - We concur with the reviewer and have revised the manuscript accordingly. At the end of the Abstract, we refer only to behavior flexibility. We have toned down our emphasis on OCD in the Introduction, broadening the genetic link between the gene encoding EAAC1 (SLC1A1) and neuropsychiatric diseases like OCD, ADHD and ASD. This is now limited to a single sentence. We also revised the Discussion section because we agree with the reviewer on the fact that compulsive behaviors are not limited to OCD.

    1. Author Response

      Reviewer #2 (Public Review):

      1) The authors in reality do not analyze oscillations themselves in this manuscript but only the power of signals filtered at determined frequency bands. This is particularly misleading when the authors talk about "spindles". Spindles are classically defined as a thalamico-cortical phenomenon, not recorded from hippocampus LFPs. Thus, the fact that you filter the signal in the same frequency range matching cortical spindles does not mean you are analyzing spindles. The terminology, therefore, is misleading. I would recommend the authors to change spindles to "beta", which at least has been reported in the hippocampus, although in very particular behavioral circumstances. However, one must note that the presence of power in such bands does not guarantee one is recording from these oscillations. For example, the "fast gamma" band might be related to what is defined as fast gamma nested in theta, but it might also be related to ripples in sleep recordings. The increase of "spindle" power in sleep here is probably related to 1/f components arising from the large irregular activity of slow wave sleep local field potentials. The authors should avoid these conceptual confusions in the manuscript, or show that these band power time courses are in fact matching the oscillations they refer to (for example, their spindle band is in fact reflecting increased spindle occurrence).

      We thank the reviewer for allowing us to clarify this subject. We completely agree with concerns raised in the comments. To avoid any confusion, we have replaced throughout the manuscript the word ‘spindle’ with ‘beta’.

      2) The shuffling procedure to control for the occupancy difference between awake and sleep does not seem to be sufficient. From what I understand, this shuffling is not controlling for the autocorrelation of each band which would be the main source of bias to be accounted for in this instance. Thus, time shifts for each band would be more appropriate. Further, the controls for trial durations should be created using consecutive windows. If you randomly sample sleep bins from distant time points you are not effectively controlling for the difference in duration between trial types. Finally, it is not clear from the text if the UMAP is recomputed for each duration-matched control. This would be a rigorous control as it would remove the potential bias arising from the unbalance between awake and sleep data points, which could bias the subspace to be more detailed for the LFP sleep features. It is very likely the results will hold after these controls, given it is not surprising that sleep is a more diverse state than awake, but it would be good practice to have more rigorous controls to formalize these conclusions.

      We are grateful to the reviewer for suggesting alternative analysis. We have used this direction, to create surrogate datasets obtained by time shifting each band and obtained their respective UMAP projections (see modified Figure 2D). Additionally, as suggested, for duration-matched controls, we have selected consecutive windows, rather than random points (Figure 2 – figure supplement 1C). UMAP projections were obtained for each duration-matched control and occupancy was computed. The text in the method section has been modified to indicate the analysis. As expected, the results were identical.

      3) Lots of the observations made from the state space approach presented in this manuscript lack any physiological interpretation. For example, Figure 4F suggests a shift in the state space from Sleep1 to Sleep2. The authors comment there is a change in density but they do not make an effort to explain what the change means in terms of brain dynamics. It seems that the spectral patterns are shifting away from the Delta X Spindle region (concluding this by looking at Fig4B) which could be potentially interesting if analyzed in depth. What is the state space revealing about the brain here? It would be important to interpret the changes revealed by this method otherwise what are we learning about the brain from these analyses? This is similar to the results presented in Figure 5, which are merely descriptions of what is seen in the correlation matrix space. It seems potentially interesting that non-REM seems to be split into two clusters in the UMAP space. What does it mean for REM that delta band power in pyramidal and lm layers is anti-correlated to the power within the mid to fast gamma range? What do the transition probabilities shown in Figures 6B and C suggest about hippocampal functioning? The authors just state there are "changes" but they don't characterize these systematically in terms of biology. Overall, the abstract multivariate representation of the neural data shown here could potentially reveal novel dynamics across the awake-sleep cycle, but in the current form of this manuscript, the observations never leave the abstract level.

      We thank the reviewer for allowing us to clarify this aspect of the manuscript. We have now edited the main text to include considerations on the biological relevance of the findings of Figure 4, 5 and 6.

      Additions to figure 4: In particular, non-REM states in sleep2 tended to concentrate in a region of increased power in the delta and beta bands, which could be the results of increased interactions with cortical activity modulated in the same range. It is also likely that such effect was induced by the exposure to relevant behavioral experience. In fact, changes in density of individual oscillations after learning have been reported using traditional analytical methods and are thought to support memory consolidation (Bakker et al., 2015; Eschenko et al., 2008, 2006). Nevertheless, while traditional methods provide information about individual components, the novel approach used here provides additional information about the combinatorial shift in the dynamics of network oscillations after learning or exploration. Thus, it provides the basis for identifying how coordinated activity among different oscillations supports memory consolidation processes, as those occurring during non-REM sleep after exploration, which cannot be elucidated using traditional analytical methods.

      Additions to figure 5: Gamma segregation and delta decoupling offer a picture of hippocampal REM sleep as being more akin to awake locomotion (with the major difference of a stronger medium gamma presence) while also suggesting a substantial independence from cortical slow oscillations. On the other hand, the across-scale coherence of non-REM sleep is consistent with this sleep stage being dominated by brain-wide collective fluctuations engaging oscillations at every range. Distinct cross frequency coupling among various individual pairs of oscillations such as theta-gamma, delta-gamma etc., have been already reported (Bandarabadi et al., 2019; Clemens et al., 2009; Hammer et al., 2021; Scheffzük et al., 2011). However, computing cross frequency coupling on the state space provides the additional information on how multiple oscillations, obtained from distinct CA1 hippocampal layers (stratum pyramidale, stratum radiatum and stratum lacunosum moleculare), are coupled with each other during distinct states of sleep and wakefulness. Furthermore, projecting the correlation matrices on 2D plane, provides a compact tool that allows to visualize the cross-frequency interactions among various hippocampal oscillations. Altogether, this approach reveals the complex nature of coupling dynamics occurring in hippocampus during distinct behavioral states

      Additions to Figure 6: We found that transitions occurring from REM-to-REM sleep and non-REM-to-non-REM sleep (intra-state transitions) are more vulnerable to plasticity after exploration as compared to inter-state transitions (such as non-REM to REM, REM-to-intermediate etc.) (Fig 6E, F). These changes in intra-state transitions were observed to be beyond randomness (Fig S9 E, F) indicating a specificity in plastic changes in state transitions after exploration. In particular, while the average REM period duration is unaltered after exploration (Fig 4G), REM temporal structure is reorganized. In fact, increased probability of REM to REM transitions indicates a significant prolongation of REM bout duration. Similarly, the increase in non-REM to non-REM transition probability reflects an increased duration of non-REM bouts. Therefore, environment exploration was accompanied by an increased separation between REM and non-REM periods, possibly as a response to increased computational demands. More in general, the network state space allows to characterize the state transitions in hippocampus and how they are affected by novel experience or learning. By observing the state transition patterns, this analytical framework allows to detect and identify state-specific changes in the hippocampal oscillatory dynamics, beyond the possibilities offered by more traditional univariate and bivariate methods. We next investigated how fast the network flows on the state space and assessed whether the speed is uniform, or it exhibits specific region-dependent characteristics.

      Reviewer #3 (Public Review):

      1) My primary concern is to provide clear evidence that this approach will provide key insights of high physiological significance, especially for readers who may think the traditional approaches are advantageous (for example due to their simplicity). I think the authors' findings of distinct sleep state signatures or altered organization of the NLG3-KO mouse could serve this purpose. However, right now the physiological significance of these results is unclear. For example, do these sleep state signatures predict later behavior performance, or is altered organization related to other functional impairments in the disease model? Do neurons with distinct sleep state signatures form distinct ensembles and code for related information?

      We are thankful to the reviewer for raising a very interesting line of questioning regarding sleep signatures and distinct ensemble. In this study, we show that sleep state signatures can predict how individual cells may participate in information processing during open field exploration. However, further analysis exploring the recruitment of neuronal ensembles are in preparation for another manuscript and is beyond the scope of this article.

      We have further modified the description of the results (as also suggested by other reviewers) to highlight the key advantages of this approach over traditional methods.

      Regarding functional impairment: as described in the manuscript, the altered organization in animal model of autism could possibly due to alterations in cellular and synaptic mechanisms as those described in previous reports (Modi et al 2019, Foldy et al 2013)

      2) For cells with different mean firing rates during exploration: is that because they are putative fast-spiking interneurons and pyramidal cells? From the reported mean firing rates, I think some of these cells are interneurons. Since mean firing rates are well known to vary with cell type, this should be addressed. For example, the sleep state signatures may be distinct for different putative pyramidal cells and interneurons. This would be somewhat expected considering prior work that has shown different cell types have different oscillatory coupling characteristics. I think it would be more interesting to determine if pyramidal cells had distinct sleep state signatures and, if so, whether pyramidal cells from the same sleep state signature have similar properties like they code for similar things or commonly fire together in an ensemble ms the number of cells in Fig. 8 may be limited for this analysis. The authors could use the hc-11 data in addition, which was also tested in this work.

      We thank the reviewer for suggesting this additional analysis to better describe the data. To this end, we have added an additional Figure in supplementary data (analysis of hc11 dataset: Figure Figure 8 – figure supplement 3), to demonstrate that interneurons and pyramidal cells have distinct sleep signatures. These findings are in agreement with dataset presented in Figure 8D, E.

      As shown in the manuscript, the spatial firing (sparsity) has large variability for cells having similar network signatures (Fig 8E). Thus, additional parameters beside oscillations may be involved in cells encoding. Different network state spaces are required to be explored in future studies to further understand this phenomenon in detail.

      We agree that investigating neuronal ensembles and state space are an interesting direction to follow. In another study (in preparation) which are investigating in detail the recruitment of neuronal ensemble by oscillatory state space. Thus, those findings are beyond the scope of this introductory article.

      3) Example traces are needed to show how LFPs change over the state-space. Example traces should be included for key parts of the state-space in Figures 2 and 3.

      We thank the reviewer for this key insight on data representation. Example traces of how LFP varies on the state space have been added (see Figure 4 – figure supplement 1).

      4) What is the primary rationale for 200ms time bins? Is this time scale sufficient to capture the slow dynamics of delta rhythm (1-5Hz) with a maximum of 1s duration?

      Time scale of binning depends on the scale of investigation. We also replicated the results with different time bins (such as 50 ms and 1 seconds) and the results are identical. For delta rhythms, with 200 ms time bins, the dynamics will be captured across multiple bins. Additionally, the binned power time series are also smoothed before obtaining projections.

      5) Since oscillatory frequency and power are highly associated with running speed, how does speed vary over the state space. Is the relationship between speed and state-space similar to the results of previous studies for theta (Slawinska and Kasicki, Brain Res 1998; Maurer et al, Hippocampus 2005) and gamma oscillations (Ahmed and Mehta J. Neurosci 2012; Kemere et al PLOS ONE 2013), or does it provide novel insights?

      We thank the reviewer for highlighting this crucial link between oscillation and locomotion. While various articles have focused on individual oscillations, the combinatorial effects of multiple oscillations from multiple brain areas in regulating the speed of the animal during exploration is definitely worth exploring with this novel approach. These set of results will be introduced in another study, currently in preparation.

      6) The separation of 9 states (Fig. 6ABC) seems arbitrary, where state 1 (bin 1) is never visited. I suggest plotting the density distribution of the data in Fig. 2A or Fig. 6A to better determine how many states are there within the state space. For example, five peaks in such a density plot might suggest five states. Alternately, clustering methods could be useful to determine how the number of states.

      We thank the reviewer for this this useful suggestion. We agree that additional clustering methods can be used to identify non-canonical sleep states. These are currently being explored in our lab and will be part of future studies. As for this dataset, the density plots are available in figure 4E, which determines how many states are in each part of the state space.

      7) The results in Fig. 4G are very interesting and suggest more variation of sub-states during non REM periods in sleep1 than in sleep2. What might explain this difference? Was it associated with more frequent ripple events occurring in sleep2?

      The reviewer is right in looking for the source of the decreased of state variability in sleep2. Considering the distribution of relative frequency power in the state space, the higher concentration in sleep 2 corresponds to higher content in the slower delta and spindle frequency bands, rather than the higher frequencies of SWRs. This result can be interpreted in the light of enhanced cortical activity (which is known to heavily recruit those bands) and possibly of enhanced cortical-hippocampal communication following relevant behavioral experience. In fact, it is also necessary to mention that with our recording setup we cannot rule out the effects of volume conductance completely, and thus we cannot exclude that the increase in the delta and spindle bands in the hippocampus were a spurious effect of purely cortical frequency modulations.

      8) The state transition results in Fig. 6 are confusing because they include two fundamentally different timescales: fast transitions between oscillatory states and slow dynamics of sleep states. I recommend clarifying the description in the results and the figure caption. Furthermore, how can an animal transition between the same sleep state (Fig. 6EF)? Would they both be in a single sleep state?

      The transitions capture the fast oscillatory scales (as they are investigated over a timeframe of 1 second). The sleep stages (REM, non-REM etc.) are used as labels from which the states originate on the state space. This allows us to characterize fast oscillatory dynamics in various sleep stages.

      Regarding same state transition: An increase in same state transition probability corresponds to increase in prolongation of that particular state, thereby altering the temporal structure of a given sleep state.

    1. When we don’t think certain messages meet our needs, stimuli that would normally get our attention may be completely lost. Imagine you are in the grocery store and you hear someone say your name. You turn around, only to hear that person say, “Finally! I said your name three times. I thought you forgot who I was!” A few seconds before, when you were focused on figuring out which kind of orange juice to get, you were attending to the various pulp options to the point that you tuned other stimuli out, even something as familiar as the sound of someone calling your name.

      This happens with my boyfriend and I all of the time. He will be playing a video game or on his phone, and when I try to get his attention, this happens. I also thought it was because he was tuning me out on purpose or something. I also heard that humans are not meant to focus their attention on multiple things at once, so this makes sense. I think that this concept is super interesting and now I know why people do this.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity____:

      Summary: the paper suggested a new approach to study in vivo possible interaction between glioblastoma cells and glioblastoma associated macrophages. By using single cells transcriptome profiling and in vitro and in vivo functional experiments the authors also suggested LGALS1 as possible key factor in the suppression of the immune system and a new target for immune modulation in glioma patients. The experimental plan is well described, and the results are beautifully presented using images, clear drawings, and videos.

      Major comments: none

      Minor comments:

      • The number of zebrafish embryos analyzed after the xenograft is highly variable (e.g. 3-18; 4-22 in Figure 6). These numbers can be reported in the results section (not only in the legends) and the authors may comment on them in the discussion. The reproducibility of thexenotransplant experiments is always challenging as it is quite difficult to inject the same number of cells in every embryo and to have the same survival rate of injected cells and of transplanted embryos. For these reasons the volume of each xenograft can vary significantly in different embryos and in different experimental session. Accordingly, the number of macrophages associated to the tumor can vary and the statistical analysis can be deeply influenced by the number of replicates for each experimental group (a group with 3 embryos is very different in term of quality and quantity of information in respect to a group of 18 embryos). It could be useful for the reader, who has no experience in this technique, be aware of the advantages and disadvantages of the procedure including the possible influence of the temperature (34°C instead of 37°C) on the embryo survival and the replication rate of glioma cells or macrophages behavior. Comment on these aspects does not weaken the power and the relevance of the model but unveil the critical aspects that every scientist has to evaluate before planning these kinds of experiments.

      __Response: __We agree with Reviewer #1 that the zebrafish avatar model is challenging, and it is difficult to obtain reproducible tumor sizes and survival rates. To be even more transparent about this, we have added a few sentences about the variable n number in the Results section and a critical comment about it in the Discussion section.

      • An aspect that could be interesting to address, to further validate the avatar model, is to monitor the level of pro-inflammatory cytokines (Tumor Necrosis Factor and Interleukin 1, 6, and 8) that are expressed at basal level in the early developing zebrafish embryos. Do their expression level increase after the xenotransplantation? Can the zebrafish cytokines affect the behavior of glioma associated macrophages (i.e. macrophages polarization)?

      __Response: __This is an interesting point, indeed. We have injected murine melanoma (B16) cells into Tg(mpeg1:mCherry-F); Tg(TNFa:eGFP-F) embryos, a TNFa reporter line. Some (but not all) macrophages expressed TNFa and their expression decreased over time, which is consistent with previous reports (Póvoa et al, 2021). We further observed that TNFa-expressing macrophages mostly had a round, “tumor-attacking” phenotype. This is in line with our hypothesis that the tumor induces a phenotype switch in GAMs. Of note, we did not see TNFa expression in the rest of the brain tissue. We would be happy to add this data if deemed useful.

      We did not investigate other cytokines in the developing zebrafish, but we believe this is not essential for the following reasons: We are mainly interested in the differences between the patient-derived GBM stem cell cultures (GSSCs), and since they are all used in the same avatar model, we expect that if zebrafish cytokines would have an effect on GAMs and their polarization, this effect would be consistent in all avatars, and can thus be ignored when comparing different GSCCs. More importantly, our findings in the zebrafish avatar model were consistent with those in the in vitro model. We observed the same phenotype switch in the co-culture model, indicating that the key interaction is between tumor cells and macrophages.

      Significance____:

      Strengths and limitation. The manuscript is the result of a well-orchestrated effort to dissect a biological problem by complementary approaches and provide new data with high impact translational value. The image processing pipeline developed by the authors is a step forward in the in vivo analysis of cells interaction in living embryos. The identification of LGALS1 as a potential target for immune modulation can support the development of new therapeutical strategy implementing chemo- or immunotherapy protocols. The described zebrafish avatar can represent a new tool for personalized drug testing recapitulating in a in vivo model the heterogeneity of GBM found in patients.

      Audience: All the scientist interested in cell biology, cancer cell biology, imaging techniques, translational medicine, in vivo models for cancer research, precision medicine.

      Reviewer expertise: applied developmental biology

      Reviewer #2

      Evidence, reproducibility and clarity____:

      Finotto et al aim to address the polarisation of macrophages within GBM in their study. To do this, they have developed two different models. The first model is an in-vitro co-culture model of patient derived GSC lines and human monocyte derived macrophages. This model was used for single cell sequencing to understand the transcriptomic changes of macrophages upon contact to GBM cells. The second model is a zebrafish xenograft model. Here GFP labeled GBM cells were transplanted into the larval zebrafish ventricle. These experiments were done in the transgenic mpeg zebrafish which allowed to monitor responses of macrophages in vivo.

      In my opinion both models are not sophisticated enough to draw solid conclusions on macrophage polarisation in GBM. The in vitro model is highly artificial and is far from the complex situation in GBM. Within GBM the GAM population represents a heterogenous mix of resident microglia and infiltrating macrophages. These are influenced by the heterogeneous environment (which consists of tumour cells but also other host cells) and show diverse transcriptomic adaptations as shown in rodent models as well as sequencing studies of patient derived tumour samples. Studying monocyte derived macrophages in vitro does not provide any reliable insight.

      Response: We understand the reviewer’s concern about the complexity of our in vitro model. However, these simple models are needed to gain more insight into the complex in vivo situation. Others have demonstrated their usefulness in the past (C. Jayakrishnan et al, 2019; Zhou et al, 2022; Hubert et al, 2016; Chen et al, 2020; Coniglio et al, 2016; Li et al, 2022). Moreover, it may be advantageous to look at only two different cell types and unravel their reciprocal interaction, without the influence of other cell types, making it too complex to draw conclusions. We acknowledge that GAMs are a heterogeneous mix of both microglia and bone marrow-derived macrophages. Considering that bone marrow-derived macrophages have been shown to play an important role in tumor progression and are by far the most abundant immune cell population in GBM tumors (which even increases in recurrent GBM) (Pombo Antunes et al, 2021; Abdelfattah et al, 2022), we chose to focus initially on bone marrow-derived macrophages. Notably, it has already been reported that microglia were associated with significantly better survival, suggesting that they are anti-tumorigenic, whereas macrophages were associated with worse survival, suggesting that they are pro-tumorigenic (Pombo Antunes et al, 2021; Abdelfattah et al, 2022). This justifies our approach to focus on this cell type. Furthermore, although this model may be rather simplistic, it allowed us to screen different GSCCs side by side in a standardized way, through which we found an apparent phenotype switch within the macrophages, even without the complex interplay with other cell types. Because the results obtained using the in vitro model were also confirmed in GBM patient material and KO experiments in the zebrafish avatar model, our work shows that reliable and important insights can be derived. This, combined with its simplicity, makes our co-culture model an exceptionally relevant model that is scalable, screenable and allows us to study the effect of perturbations. Finally, the immunosuppressive role of the target we identified using this model, LGALS1, has been previously demonstrated by others (Verschuere et al, 2014; Van Woensel et al, 2017; Chen et al, 2019), which proves our approach is valid.

      Although the zebrafish can be a great model to understand the progression of tumours and the role of immune cells, I don't think that the model developed by the authors is suitable to address their questions. Transplantation of GBM cells into the the ventricle of larval zebrafish doesn't seem to be the right approach here. The poor survival of the transplanted cells is a clear indication of that. Many other groups have reported growth and proliferation of human cancer cells in the larval zebrafish. Direct transplantation into the brain parenchyma would be the better approach here. The brain parenchyma would provide the right environment for the GBM cells including a resident microglial population. This would also allow to study the complex mix of microglia and infiltrating macrophages in the context of GBM.

      Response: The reviewer does not specify which articles have reported growth and proliferation of human cancer cells in zebrafish larvae. Most research groups reporting this, did not follow tumor growth/proliferation over time or used immortalized cell lines (Vargas-Patron et al, 2019; Pan et al, 2020; Pudelko et al, 2018; Breznik et al, 2017; Vittori et al, 2017; Hamilton et al, 2016), which obviously have a much higher proliferation rate than the patient-derived cell lines used in this work. Second, although the number of patient-derived tumor cells decreases over time, we observed a clear invasive and migratory behavior, indicating that the human tumor cells reside well in the zebrafish microenvironment. Furthermore, it is important to note that the zebrafish avatars are grown at 34°C, a temperature that is suboptimal for tumor cell growth. The tumor cells still proliferate, albeit at a lower rate than at 37°C.

      To our knowledge, there is only one publication that reports the growth of patient-derived GBM tumors over time (Almstedt et al, 2022). However, here, zebrafish embryos were grown at 33°C. Also, prior to injection, patient-derived GBM cells were resuspended in medium containing polyvinylpyrrolidone, a polymer that enhances extracellular matrix deposition and cell proliferation. Furthermore, the authors observed substantial differences in proliferative capacity, ranging from growth to decline of signal, and represented only two patient-derived cell lines with growing tumors. Similar to our findings, another article has demonstrated that injected patient-derived GBM tumor cells progressively underwent mitotic arrest, while maintaining an invasive and aggressive growth pattern (Rampazzo et al, 2013).

      Although the tumor cells are injected into the hindbrain ventricle, they end up in the brain parenchyma, as evidenced by the presence of the typical brain vasculature of the zebrafish embryo. Notably, in Tg(mpeg1:mCherryF)ump2 zebrafish embryos, both macrophages and microglia are labeled with mCherry, meaning that we have studied both cell types in our zebrafish avatar model. Therefore, we consider the reviewer’s comment to be unfounded.

      Reviewer #3

      __ Evidence, reproducibility and clarity: __

      In this study, Finotto and colleagues developed patient-derived Glioblastoma (GBM) stem cell cultures from 7 patients. These GBM stem cell cultures were either co-cultured in vitro with human macrophages combined with single-cell RNA sequencing or injected into the orthotopic zebrafish xenograft to study live GBM-macrophage/microglia interactions. Authors aimed at studying tumor heterogeneity and GBM-associated macrophages (GAMs) which often exhibit immunosuppressive features that promote tumor progression. Their analyses revealed substantial heterogeneity across GBM patients in GBM-induced macrophages polarization and the ability to attract and activate GAMs - features that correlated with patient survival. Also authors show 3 distinct macrophage subclusters (MC1-3), highlighting that the simple M1/M2 polarization phenotypes is too reductive and there are no clear "markers". Authors associate these profiles with morphology and macrophage behaviour. Differential gene expression analysis, immunohistochemistry on original tumor samples, and knock-out experiments in zebrafish subsequently identified / confirmed that LGALS1 as a primary regulator of immunosuppression.

      Cheng et ( DOI: 10.1002/ijc.32102) had previously shown the immunosuppression effect of LGALS1 - but this work shows as a proof of concept that the authors approach is a valuable and interesting approach to find immune regulators.

      Response: We fully agree with Reviewer #3. In fact, the immunosuppressive role of LGALS1 has already been described by several research groups (Van Woensel et al, 2017; Verschuere et al, 2014), which indeed proves that our approach is valid. The reference cited by the reviewer was already included in the manuscript, along with other references.

      Major comments:

      In general claims are supported by date - very carefully presented and well characterized data with numbers, stats. It is an interesting descriptive study that illustrates the complexity and diversity of glioblastoma and the induced TME. I just have a few comments or clarifications that I would like to have elucidated:

      • I did not understand why not single cell sequence the original tumor - without in vitro passaging and have the original patient population of MACs/microglia and monocytes sequenced? In other words why sequence the in vitro system-with its inherent caveats of in vitro culturing and not the original tumor? Can you please clarify.

      Response: We agree with Reviewer #3 that our in vitro model does indeed have caveats inherent to patient-derived cell culture models. However, we chose this model to specifically focus on the reciprocal interaction between GBM tumor cells and macrophages in a way that also allows us to investigate how perturbations affect these interactions. This is not possible when using original tumors (e.g. we cannot make KO cells, as we did for LGALS1, and study the effects of genes of interest). (See also the response to the comment of Reviewer #2)

      We do have scRNAseq data from one original tumor sample (LBT123) that is currently being analyzed. Unfortunately, scRNAseq is not available for the other tumor samples. Also, for some of the patients, there is no original material left to use for sequencing. For LBT123, we will compare the scRNAseq data from the original tumor with the in vitro data from the co-culture model.

      • Mac signatures - out of curiosity- authors could not find TNFa and IFN signatures in any population?

      Response: Our analyses did not reveal TNF or IFN as cluster signature genes. However, we did find that TNF expression was slightly higher in MC2, the pro-inflammatory macrophages, although still at low levels. We did not find IFN expression in the macrophage subclusters, but we did find low expression of some IFN receptors. We found a gradient for IFNGR1 with the highest expression in MC3, followed by MC1 and the lowest expression in MC2. IFNGR2 was expressed at slightly higher levels in MC1 compared to the other subclusters. IFNAR1 and IFNAR2 were expressed at comparable low levels in all subclusters. Finally, IFNLR1 expression was higher in MC3 compared to the other two macrophage subclusters. Considering the overall low expression of IFN receptors, we believe that the differences in expression are rather negligible. Furthermore, it has been previously shown that IFN exerts its anti-tumor effect primarily through the responsiveness of endothelial cells and not of myeloid cells, such as macrophages (Kammertoens et al, 2017). Since vascular cells were not present in the co-culture model, low IFN receptor expression is not surprising. We are happy to investigate this in more detail and include it if deemed useful.

      • 8 please show controls side by side with the KO

      Response: We thank Reviewer #3 for this comment. We are not quite sure which panel the reviewer is referring to. If it is panel F, we agree with Reviewer #3 and have changed the order of the bars in the revised version. If it is panel E, the corresponding control images are shown in Figure 5I. Since we believe that these images should not be repeated, we have added a figure reference to Figure 5I in the figure legend of Figure 8, in addition to the figure reference already provided in the text. Furthermore, images of all embryos are presented side by side in Figure S8D-E.

      • Figure 5: if each pair of images are separated and have the legend on top would be easier to *read and follow. *

      Response: We appreciate the comment that the figure should be intuitively easy to read and follow. However, we have chosen a compromise between overview and visibility of details (e.g. morphological features of GAMs). Since this figure already has the maximum width, the images would become smaller if they needed to be separated. Reducing the size would compromise the visibility of important details.

      Significance:

      It is a very interesting study, carefully designed and performed that highlights the heterogeneity of glioblastoma and how GBM can modulate the macrophage population into 3 different subsets. This study constitutes a proof of concept of the combination of and in vitro approach and an in vivo approach to find new players and treatments in glioblastoma. I believe that it would be important and interesting to have a the original tumor sequenced to compare to the in vitro platform and understand how the in vitro selection impacts on the tumor biology and even if it changes the heterogeneity and differential composition of the tumor and macrophage profiles.

      References:

      Abdelfattah N, Kumar P, Wang C, Leu JS, Flynn WF, Gao R, Baskin DS, Pichumani K, Ijare OB, Wood SL, et al (2022) Single-cell analysis of human glioma and immune cells identifies S100A4 as an immunotherapy target. Nat Commun13

      Almstedt E, Rosen E, Gloger M, Stockgard R, Hekmati N, Koltowska K, Krona C & Nelander S (2022) Real-time evaluation of glioblastoma growth in patient-specific zebrafish xenografts. Neuro Oncol 24: 726–738

      Breznik B, Motaln H, Vittori M, Rotter A & Turnšek TL (2017) Mesenchymal stem cells differentially affect the invasion of distinct glioblastoma cell lines. Oncotarget 8: 25482–25499

      Jayakrishnan P, H. Venkat E, M. Ramachandran G, K. Kesavapisharady K, N. Nair S, Bharathan B, Radhakrishnan N & Gopala S (2019) In vitro neurosphere formation correlates with poor survival in glioma. IUBMB Life 71: 244–253

      Chen JWE, Lumibao J, Leary S, Sarkaria JN, Steelman AJ, Gaskins HR & Harley BAC (2020) Crosstalk between microglia and patient-derived glioblastoma cells inhibit invasion in a three-dimensional gelatin hydrogel model. J Neuroinflammation 17

      Chen Q, Han B, Meng X, Duan C, Yang C, Wu Z, Magafurov D, Zhao S, Safin S, Jiang C, et al (2019) Immunogenomic analysis reveals LGALS1 contributes to the immune heterogeneity and immunosuppression in glioma. Int J Cancer145: 517–530

      Coniglio S, Miller I, Symons M & Segall JE (2016) Coculture assays to study macrophage and microglia stimulation of glioblastoma invasion. Journal of Visualized Experiments 2016

      Hamilton L, Astell KR, Velikova G & Sieger D (2016) A zebrafish live imaging model reveals differential responses of microglia toward glioblastoma cells in vivo. Zebrafish 13: 523–534

      Hubert CG, Rivera M, Spangler LC, Wu Q, Mack SC, Prager BC, Couce M, McLendon RE, Sloan AE & Rich JN (2016) A three-dimensional organoid culture system derived from human glioblastomas recapitulates the hypoxic gradients and cancer stem cell heterogeneity of tumors found in vivo. Cancer Res 76: 2465–2477

      Kammertoens T, Friese C, Arina A, Idel C, Briesemeister D, Rothe M, Ivanov A, Szymborska A, Patone G, Kunz S, et al(2017) Tumour ischaemia by interferon-γ resembles physiological blood vessel regression. Nature 545: 98–102

      Li H, Yan X & Ou S (2022) Correlation of the prognostic value of FNDC4 in glioblastoma with macrophage polarization. Cancer Cell Int 22

      Pan H, Xue W, Zhao W & Schachner M (2020) Expression and function of chondroitin 4-sulfate and chondroitin 6-sulfate in human glioma. FASEB Journal 34: 2853–2868

      Pombo Antunes AR, Scheyltjens I, Lodi F, Messiaen J, Antoranz A, Duerinck J, Kancheva D, Martens L, De Vlaminck K, Van Hove H, et al (2021) Single-cell profiling of myeloid cells in glioblastoma across species and disease stage reveals macrophage competition and specialization. Nat Neurosci 24: 595–610

      Póvoa V, Rebelo de Almeida C, Maia-Gil M, Sobral D, Domingues M, Martinez-Lopez M, de Almeida Fuzeta M, Silva C, Grosso AR & Fior R (2021) Innate immune evasion revealed in a colorectal zebrafish xenograft model. Nat Commun12

      Pudelko L, Edwards S, Balan M, Nyqvist D, Al-Saadi J, Dittmer J, Almlöf I, Helleday T & Bräutigam L (2018) An orthotopic glioblastoma animal model suitable for high-throughput screenings. Neuro Oncol 127: 415

      Rampazzo E, Persano L, Pistollato F, Moro E, Frasson C, Porazzi P, Della Puppa A, Bresolin S, Battilana G, Indraccolo S, et al (2013) Wnt activation promotes neuronal differentiation of glioblastoma. Cell Death Dis 4

      Van Woensel M, Mathivet T, Wauthoz N, Rosière R, Garg AD, Agostinis P, Mathieu V, Kiss R, Lefranc F, Boon L, et al(2017) Sensitization of glioblastoma tumor micro-environment to chemo- and immunotherapy by Galectin-1 intranasal knock-down strategy. Sci Rep 7: 1–14

      Vargas-Patron LA, Agudelo-Dueñãs N, Madrid-Wolff J, Venegas JA, González JM, Forero-Shelton M & Akle V (2019) Xenotransplantation of human glioblastoma in zebrafish larvae: in vivo imaging and proliferation assessment. Biol Open 8

      Verschuere T, Toelen J, Maes W, Poirier F, Boon L, Tousseyn T, Mathivet T, Gerhardt H, Mathieu V, Kiss R, et al (2014) Glioma-derived galectin-1 regulates innate and adaptive antitumor immunity. Int J Cancer 134: 873–884

      Vittori M, Breznik B, Hrovat K, Kenig S & Lah TT (2017) RECQ1 helicase silencing decreases the tumour growth rate of U87 glioblastoma cell xenografts in zebrafish embryos. Genes (Basel) 8

      Zhou F, Shi Q, Fan X, Yu R, Wu Z, Wang B, Tian W, Yu T, Pan M, You Y, et al (2022) Diverse macrophages constituted the glioma microenvironment and influenced by PTEN status. Front Immunol 13

    1. Author Response

      Reviewer #1 (Public Review):

      The paper describes a robotic system that can be used for prolonged recording of forced activity in crawling Drosophila larvae. This is mostly intended to be a proof of principle description of a tool potentially useful for the community. The system - whose value lies completely in its reproducibility and adoption - is only superficially described in the paper, but a more detailed description is made available through Github, along with the software used for the collection and analysis of data.

      There is good, convincing evidence this can work as some sort of "larval conveyor belt", used to artificially prolong food crawling behaviour in the animals. More could be said about the ecological implications of the assay (for instance: how relevant is it to an animal's natural behaviour? Does the system introduce artifactual distortions in the analysis, driven by the fact that animals crawl greater distances than they would normally crawl in nature? Will this extensive activity affect their development to pupation or adulthood?).

      In addition all our code being available on GitHub, we have added substantially to Materials and Methods in the manuscript (1-1.5 pages) detailing the analysis pipeline more thoroughly.

      We agree that a more thorough comparison of ecological vs. laboratory conditions was warranted here, and have addressed this in new Discussion section material (6th paragraph especially). The developmental effect due to prolonged locomotion is a very good point – with only a single animal measured for more than 24 hours, we do not yet know whether instar molting or pupation is delayed, but this could certainly be a concern in longer experiments moving forward.

      Reviewer #3 (Public Review):

      "Continuous, long-term crawling behavior characterized by a robotic transport system" by Yu et al. presents their new robotic device to track, reposition, and feed Drosophila larvae as they crawl on an arena. By using a water droplet (or if necessary, suction) to transport larvae from the edge of the arena to the middle, long behavior trajectories can be recorded without losing larvae from the arena or camera field of view. The picker robot is also able to dispense small amounts of apple juice at precise locations to keep larvae alive for extended periods although the food was not sufficient to trigger molting and the development to the next instar stage.

      The approach is interesting, but the authors could provide more details on why the approach is necessary for non-expert readers. For example, what are the advantages of using the robot picker compared to simply confining larvae in a closed arena? It's not obvious (to me) that being picked back to the center of the arena is a smaller perturbation compared to running into a chamber wall and changing direction.

      Thank you for this suggestion, it’s a very good point. We have expanded our Introduction considerably, and directly address this issue (4th paragraph in particular). We do quantify the perturbation due to robot pick-ups and drop-offs (Fig. 3D), but that only addresses the short term. We prefer not to use a closed arena for three reasons: (1) in a gradient navigation experiment, reaching the edge would effectively end “navigation” and we would be unable to study that behavior over longer times, (2) larvae can crawl up the sides of walls and will be lost to the tracker (they do this all the time in the Petri dishes they are raised in), and (3) larvae often do not bounce off walls and resume crawling, they tend to dwell near edges they find. To this last point, we have added a new Supplemental figure (Figure 1 – supplement 1) illustrating this effect with a representative example.

      The first paragraph of the introduction emphasizes the multiple time scales that are relevant for behavior from rapid stimulus response up to developmental times. This is to set the context of the authors' contribution but I'm not sure it's a fair representation of the state of the art. For example, the authors state that high-bandwidth measurement over long times is prohibitive and cite three Drosophila papers, but there are home-cage monitoring systems that allow continuous recording of mouse behavior over long times with high resolution. At the other end of the spectrum, there have been some long-term behaviour experiments done on worm behaviour with reasonably high time resolution (e.g Stern et al. 10.1016/j.cell.2017.10.041).

      This is absolutely correct, the context needed to be much broader than our own prior larva results. We have overhauled that section and written a wider introduction that includes the C. elegans paper you mentioned, and also brings in other model systems like adult flies, mice, and rats. We frame our own work as (1) in a new animal, for long term measurements; (2) investigating non-confined free locomotion over a long time scale.

      The authors train a neural network to segment and track the larvae, however, little information is given on the training process and I don't think it would be possible to reproduce the model based on the description. More details of the network, hyperparameters, and training data would be required to evaluate it.

      Definitely! We have added a new section to Materials and Methods (1-1.5 pages in length), detailing our analysis pipeline, with sections for position tracking, postural analysis, and behavioral classification.

      The authors also state several times that larval identity is maintained throughout the recording, but this isn't quantified. It's not clear whether identity is maintained across collisions of two or more animals by the tracking algorithm or whether these collisions simply don't happen in their data because density is low.

      This has also been addressed and clarified in the same new part of the Materials and Methods section. We quantify collision rates and give the accuracy maintaining identity after collisions.

      The environment is nominally isotropic, but once larvae have been crawling on the surface for hours, including periodic feeding, there will likely be multiple gradients the larvae may sense. This may not be observable in the data, but should perhaps be mentioned in the text.

      This is certainly true. Other than the single animal 30-hour experiment described in the manuscript, there is no food introduced to the larvae during our 6-hour experiments. Looking ahead, the presence of food remnants in the arena could become a serious confounding factor in nominally isotropic experiments, as the reviewer points out. We have added substantially to the Discussion section to discuss various limitations of the design and experiments, and directly talk about the odor/taste stimuli being introduced by food (second to last paragraph in Discussion).

      The authors show that the picking action results in a small but detectable increase in speed. The degree of perturbation overall depends on the picking frequency so some quantification of the inter-pick time interval would help to interpret whether this perturbation is relevant for a particular experiment. Is there a difference in excitation when larvae are picked successfully on the first try compared to when multiple tries or suction are required?

      We have now quantified the amount of time between pickups and added that in the Materials and Methods section directly (it’s 0.87 pick-ups per hour per animal). We do not have a sufficient amount of data to determine whether there is a statistically significant difference in behavior for multiple pickup attempts – this can also be confounded because sometimes an unsuccessful pickup is one that does not touch the larva at all (so would presumably not introduce additional perturbations).

      From the reconstructed trajectory in Figure 4, this interval looks very long compared to speed increase after picking. When reconstructing the trajectory, how are the segments joined? Is it simply by resetting the xy position or also updating rotating to match the previous direction of travel? (I'm guessing the larva can rotate during transport?)

      We have updated the Figure 4 caption to make it clear that the segments are only joined translationally, by resetting the xy position.

      The authors present a simple model in Figure 6 to illustrate the differences between individuals that can be hidden when looking at population distributions. However, the differences they show in the simulation don't seem relevant to the differences they observe in the experiments. Specifically, Fig. 6A and B show a contrast between individuals with similar mean speeds compared to individuals with different (but still unimodal) mean speeds. In contrast, the experimental data in Fig. D shows individual distributions that are quite similar but that are bimodal. So, there is indeed a difference between the individual distributions that is obscured in the population distribution, but is there evidence of larval personality types (line 444)? Similarly, the sentence beginning line 381 doesn't seem right either.

      We are really glad this was brought up so that we could clarify better in the text, as it’s an important point. We have edited the text in the Results subsection related to Figure 6 and the Figure 6 caption to clear things up. The individual distributions in 6D are not bimodal, there are 38 traces shown that are all essentially unimodal. In addition to stating this directly in the text, we have quantified this by adding the average BC for individuals in both isotropic and thermal gradient contexts (they are essentially the same, i.e. equally unimodal in both cases).

    1. Author Response

      Reviewer #1 Public Review:

      1) “…The authors make reasonable assertions, but all of these need to be validated by electrophysiological studies before they can be treated as fact. Instead, they should be treated as predictions. For example, in the conclusions from the model section, that endbulb size does not strictly predict synaptic efficacy should be modified from an assertion to a prediction.”

      The reviewer makes an important point. We realize that, despite describing the data as the output of a model, we needed to be clearer that the model output is in fact a set of predictions to be tested experimentally. In the reorganization of the results, we collect the model output explicitly in a section named “Model Predictions”, and list five classes of predictions that describe explorations of bushy cells. The fifth set of predictions was previously a separate section but should now be better appreciated as conveying hypotheses since it is incorporated into this newly named section. Please note that the hypotheses are constrained to varying extents by the high-resolution structural data we present, such as the estimation of synaptic weights from the counts of synapses. The compartmental models for each bushy cell also are constrained by the structural data and published biophysical and electrophysiological properties of the cells. The pipeline to create the models is described in its own section now using that terminology: “A pipeline for translating high-resolution neuron segmentation into compartmental models consistent with in vitro and in vivo data.”, which we hope conveys the notion that the modeling framework is indeed a template that can be applied to future experimental data. We explicitly make this latter point in the new Discussion section “Toward a complete computational model for globular bushy cells: strengths and limitations”.

      Reviewer #2 Public Review:

      1) …” While this is technically impressive (in regards to both the structure and modelling) there are significant weaknesses because this integration makes massive assumptions and lacks a means of validation; for example, by checking that the results of the structural modelling recapitulate the single-cell physiology of the neuron(s) under study. This would require the integration of in vivo recorded data, which would not be possible (unless combined with a third high throughput method such as calcium imaging) and is well beyond the present study.

      We appreciate the support for our approach, and we now make explicit in the manuscript that the output of the models should be interpreted as predictions for eventual experimental testing. We also consider in the Discussion some experimental procedures that might be used to test the predictions. Ca2+ imaging is currently too slow a reporter for the rapid synaptic events and integration time constant for bushy cells, as the reviewer knows, and we think (and present in the Discussion, section 2) that focal optical stimulation simultaneous with recording from fast voltage sensors are potential avenues to achieve this goal.

      2) The authors need to be more open about the limitations of their observations and their interpretations and focus on the key conclusions that they can glean from this impressive data set.

      As indicated in response to a similar comment from Reviewer 1, we have collected and discuss the primary limitations in a new section within the Discussion, entitled “Toward a complete computational model for globular bushy cells: strengths and limitations”.

      3) The manuscript would be considerably improved by re-writing to focus the science on the most important results and provide clear declarations of limitations in interpretation.

      We have extensively re-organized and re-written the text to highlight the key structural observations (Figures 1-3, 7-8), the pipeline from structure to model (Figure 4) and interleave structural observations with the outputs of the model (Figures 5-6, 8). The latter are explicitly detailed in a new section called “Model Predictions”. These predictions are organized into five classes. We think that this new organization will improve communication of the key results, and further highlights the key discoveries from structural analysis and predicted functional mechanisms as explored in the compartmental models.

      Reviewer #3 Public Review:

      1) The authors extract here from the longer introductory commentary a one-sentence summary of the strengths of the manuscript, and thereafter focus on the weaknesses, since this document emphasizes our response to those critiques. To quote reviewer #3: “The strengths of this paper are that the authors obtained unprecedented high-resolution 3-D images of the AN-bushy cell circuit, and they implemented a biophysical model to simulate the neural processing of AN inputs based on these structural data. … The biophysical modeling, although lacking comparison with in vivo physiological data due to the chosen species (mice), is also solid and well documented.”

      We appreciate that the reviewer acknowledges the attention to detail that entered into the nanoscale imaging, cell reconstructions, building the modeling pipeline and constructing the compartmental models.

      2) Despite the high quality of the data, the paper is marred by the species they chose: there are very few published in vivo single-unit results from mouse bushy cells, so it is hard to evaluate how well the model predictions fit the real-world data, and how the structural findings address the “fundamental questions” in physiology. … No rationale (e.g. use of molecular tools or in vitro physiology) is given why the authors focus on the mouse. It seems that the analyses provided here could as well have done on a species with good low-frequency hearing, which may have provided a much more interesting case for understanding the spectacular temporal transformation performed by bushy cells.

      We now report our reasons, in the first paragraph of the Results, for selecting the mouse. One reason for choosing mouse was that biophysical properties of bushy cells, which were important parameters to constrain the compartmental models, were collected from mice. These data are collected from dissociated cells and from brain slices, and these experiments continue to be more tractable in mice. The second reason is that mice are used in nanoscale and light microscopy connectomic studies because their neurons, cell groups and entire brain are smaller, so that a given volume of imaged brain will contain more cellular elements. These other connectomic studies provide a template for eventual comparisons among brain regions. Our overall goal is to image the entire cochlear nucleus, and the size of the mouse brain makes this goal tractable given current technology. Indeed, we are currently analyzing an image volume of the more rostral ventral cochlear nucleus that is about 5x larger than this image volume and collected with a much better signal to noise ratio. The third reason for choosing mouse was so that the current project could be augmented by genetic tools to further classify cochlear nucleus (CN) neurons and their extrinsic inputs, and potentially manipulate neural circuits in future studies. For example, the atoh7 (math5) and hhip gene products are markers for subsets of bushy cells, suggesting the presence of molecular subtypes of this cell class (Jing et al. 2023).

      3) If we look at data from other animals such as cats and gerbils, it is true that high-frequency (globular) bushy cells show envelope phase locking, but compared to ANs they are at best only moderately enhanced (gerbils: Frisina et al. 1990: Fig 7 and 10; cats: Joris and Yin 1998 Fig 4); the most prominent enhancement is actually to the temporal fine structures of low-frequency bushy cells (cells tuned to < 1 kHz), which mice lack. Furthermore, the temporal modulation transfer function (tMTF, i.e. the vector strengths vs modulation frequency plots in Fig 7O of the paper) of (globular) bushy cells are mostly low-pass filtered, with a cutoff frequency close to 1 kHz, and the highest vector strength rarely surpasses 0.9 (cats: Rhode 1994 Fig 9, 16, Rhode 2008 Fig 8G, Joris and Yin 1998 Fig 7; and there's one report from mice: Kopp-Scheinpflug et al 2003 Fig 8). Thus, the band-pass tMTFs tuned to 100-200 Hz with vector strengths > 0.9 or 0.95 in this paper (Fig 7O, Fig 8M) do not really match known physiology (in non-mouse species). Again, we know very little about in vivo physiology of mouse (globular) bushy cells and there is of course a possibility that responses in mice may be closer to the predictions of this paper.

      We agree that there are (unfortunately) few studies in mouse that can be compared with our simulations. With regard to the tMTFs, we can make a couple of points. First, we note that the stimulus used for all the panels except P2 in Figure 6 (previous Figure 7) were at 15 dB SPL, which is the level where maximal envelope phase-locking occurs in the low-threshold ANF inputs. This choice was based on previous experimental work that examined the intensity dependence for SAM stimuli in the auditory nerve (Smith and Brachman, 1980; Joris and Yin, 1992; Cooper et al, 1993; Dreyer and Delgutte, 2006, Figure 2B, Figure 3). Second, Figure 6, Supplemental Figure 1 confirms the behavior of the auditory nerve model used for input to the bushy cells (Rudnicki and Hemmert (2017) implementation), replicating Zilany et al., 2009, Figure 13D. These results show that phase-locking decreases at higher intensities as expected from the experimental work. Relevant to this topic, the lone report of responses to SAM stimuli in mice (Kopp-Scheinpflug et al. 2003) used 100% SAM at CF at 80 dB SPL. At this high intensity, it is expected that the envelope phase locking at CF will be less than at lower intensities because of rate saturation in the high and medium spontaneous rate ANFs (Carney, JARO 2019; Joris and Yin, 1998). In guinea pig, envelope phase locking is greater in low-SR fibers at 80 dB SPL than in medium and high SR fibers, but it is still lower than at its peak at about 50 dB SPL (Cooper et al., 1993). All of these experimental observations therefore lead to the prediction that the SAM envelope locking in Kopp-Scheinpflug et al. (2003) should be lower than in our simulations.

      In addition, Kopp-Scheinpflug et al. (2003) did not report which VCN cell populations cells were recorded. If the recorded cells were a heterogenous mixture of bushy and multipolar cells, then their data are not directly comparable to our model predictions. The stimulus intensity also needs to be considered for comparison with the work of Rhode (1994), whose lowest stimulus level is 30 dB SPL (Figure 9), and who also used a different stimulus, 200% SAM, and with the work of Frisina et al. (1990), who used 50 dB SPL. Interestingly, Figure 14D in Rhode (1994) shows a synchrony coefficient ranging from 0.5 to 0.9 at 30 dB SPL at 300 Hz modulation, which is similar to what we predict in Figure 6P2. We also remind the reviewer that our simulations did not include the effects of feed-back inhibition at CF (Caspary and Palombi, 1994; Campagnola and Manis, 2014; Xie and Manis, 2014, Keine et al. eLife 2016), which may affect phase synchrony in complex ways (Gai and Carney, 2008). One important feedback pathways arises from the tuberculoventral cells of the DCN (Wickesberg and Oertel, 1991; Campagnola and Manis, 2014), but the envelope synchrony behavior of those cells is not known.

      Thus, we now emphasize in the revised manuscript (in the Discussion) considerations of stimulus intensity used across published studies, citing the works above, the relatively high vector strengths at low modulation frequency, and that these simulation results are currently predictive. The simulations are also limited in that we used only one configuration of ANF inputs (low-threshold, high SR). This ANF SR category was selected to be consistent with the suggestion by Liberman (1991) that the globular BCs receive input principally from the low-threshold high-SR fibers. Mixtures of input SR classes would be expected to change the envelope representation at higher intensities. Finally, the parameter space is quite large (intensity x frequency x [ANF distributions], x inhibition) and is better explored in a separate study once we are able to provide better or additional constraints to the modeling framework. Also, to put the selection of SAM stimuli in context, we indicate that mice can encode temporal fine structure although only as low at 1 kHz, but at similar VS to larger rodents such as guinea pig (Taberner and Liberman 2005; Palmer and Russell 1986).

      Reviewer 4: Public comments

      1) The authors have collected an impressive array of physiological data and provided some beautiful 3D images of SBCs with dendrites. These are clearly strengths. The computational models for mechanisms of SBC responses, however, are made to fit what may be inadequate anatomical data. Instead of conclusions, perhaps they need to reword their discussions to refer to the anatomy as hypothetical substrates.

      It is true that the SBEM image volumes have strengths and limitations. We now collect these considerations in the second section of the Discussion, “Toward a complete computational model for globular bushy cells: strengths and limitations”. One limitation of this volume is that we do not have sufficient resolution to categorize synaptic vesicles by shape and must infer their excitatory or inhibitory nature. Note that tracing inputs to a source neuron, such as tracing the endbulbs to parent auditory nerve fibers, solves this problem, but the smaller terminals remain problematic in this regard. The goal is to not only assign excitatory or inhibitory phenotype, but also a cell type of origin, so that actual spike patterns, evoked by sound, can be provided as inputs to the model. The compartmental model is detailed, and amenable to mapping this information from other experiments as it becomes available. Nanoscale imaging does provide detailed structural information in terms of surface areas, volumes and process diameters that is important in constraining the compartmental models, and that is not attainable by standard light microscopy approaches. These points are now made in the Results and in the Discussion, as mentioned earlier in this paragraph. And, as indicated in the responses to other reviewers, we highlight the model outputs as predictions to be tested experimentally.

    1. Author Response

      Reviewer #1 (Public Review):

      This work reports an important demonstration of how to predict the mutational pathways to antimicrobial resistance (AMR) emergence, particularly in the enzyme DHFR (dihydrofolate reductase). Epistasis, or non-additive effects of mutations due to their background dependence, is a major confounding factor in the predictability of protein evolution, including proteins that confer antimicrobial resistance. In the first approach, they used the Rosetta to predict the mutant DHFRdrug binding affinity and the resulting selection coefficient, which then became inputs to a population genetics model. In the second approach, they use the observed clinical/environmental frequency of the variants to estimate the selection coefficient. Overall, this work is a compelling demonstration that a mechanistic model of the fitness landscape could recapitulate AMR evolution; however, considering that the number of mutations and pathways is small, a more compelling description of the robustness of the results and/or limitations of the model is needed.

      Major strengths:

      1) This is a compelling multi-disciplinary work that combines a mechanistic fitness landscape of DHFR (previously articulated in literature and cited by the authors), Rosetta to determine the biophysical effects of mutations, and a population genetics model.

      2) The study takes advantage of extensive data on the clinical/environmental prevalence of DHFR mutations.

      3) Provides a careful review of the surrounding literature.

      Major weakness:

      1) Considering that the number of mutations and pathways being recapitulated is rather small, I would suggest a more detailed description of the robustness of the results. For example:

      a) Please report the P-value for the correlation of the predicted DDG_{binding, theory} and DDG_{binding, experimental}.

      We thank the reviewer for the suggestion. We agree the available experimental data is small, limiting the statistical power of the Pearsons correlation test to determine how well Flex ddG predicts binding free energy change. However, as highlighted in the manuscript, two earlier studies by Aldeghi et al. 2018 & 2019 considered much larger datasets and found a correlation in a similar range to the one we found here. Furthermore, as suggested by the Reviewer, we carried out a onesided T-test with alternative hypothesis that the correlation is greater than 0 and found a p-value of 0.040, suggesting the correlation we observed is significant. We have included this test and p-value to the Results section.

      If interested in showing the correct assignment of mutational effects, perhaps use a contingency matrix to derive a P-value.

      As suggested by the Reviewer, we used a contingency matrix known as a confusion matrix to determine how accurate Flex ddG is at classifying mutations as stabilising or destabilising. This gave an accuracy of 0.89, sensitivity of 0.83 and a specificity of 1. The p-value associated with this continency table was 0.14, despite the high accuracy, sensitivity and specificity. This is likely due to the small sample size making it difficult to determine significance. This analysis has been included in the Results section.

      b) Although the DDG_binding calculation in Rosetta seems to converge (Appendix figures 3 and 4), I do not think the DDG values before equilibration should be included in the final DDG estimate. In practice, there is a "burn in" number of runs where the force field optimizes the calculation to account for potential clashes in the structure, etc. This is particularly important since the starting structures are modeled from homology. Consequently, the distributions of DDG that include the equilibration runs are multimodal (Appendix figure 2), which means that calculating an average may be inappropriate.

      Each Flex ddG prediction is independent (see Figure 1 of Barlow et al. 2018 for a summary of the Flex ddG method), i.e. the distribution of values does not represent a MCMC process in which there is a burn-in in order to equilibrate. The structures of both the wild-type and mutant are equilibrated in each run using the backrub algorithm. The reason so many runs are required is because each prediction is from a distribution of possible ddG values associated with that specific mutation and the authors of Flex ddG suggest running 35 runs or more and taking the average of the distribution. Therefore, in order to get an accurate prediction, enough simulations must be run per mutation to adequately characterise the distribution so that the average converges to a constant value.

      2) The geographical areas over which the mutational pathways are independently estimated are not isolated, allowing for the potential that an AMR variant in one region arose due to "migration" from another area. For example, the S58R-S117N is the most frequent double mutant of PvDHFR in geographically proximate Southern/Southeastern Asia (Fig. 4). To a certain extent, similar mutational patterns occur for PfDHFR in Southern/Southeastern Asia (Fig. 3). Although accounting for mutant migration in the model may be beyond the scope of the study, a clear argument for the validity of the "isolated island" assumption is needed.

      The Reviewer is correct that some variants in one region may have arisen due to “migration” from another area. This would impact the method for inferring mutational pathways from regional isolate frequency data but not when considering the worldwide population. If this occurred, we would expect to see a multiple mutant appearing in a region without the precursor (single, double etc) mutations, even in the case of large sample size. However, this does not seem to have been an issue for the pathways we have been predicting here. If it were the case that a variant migrated, and the precursor mutations could not be found in that region, we could look to mutations from neighbouring regions to infer the pathway, under the assumption of migration.

      We have added some discussion on this between lines 517-523:

      “When inferring pathways at a regional level, it is possible we may encounter instances where genotypes with multiple mutations are observed in a specific region, but the precursor mutations in the pathway are absent. This could happen either due to insufficient sampling of the region or due to "migration" of the variant from a neighbouring region. To infer pathways in the former case more samples would be required, whereas in the latter case we can look to the data from neighbouring regions where the variant is present and use the frequency data of the precursor mutations.”

    1. Author Response

      Reviewer #2 (Public Review):

      1) Analytical approaches are in the current form preliminary and not enough to draw firm biological conclusions. While the datasets are large (which is highly appreciated), they represent a relatively early stage of ENS development and possible differences between vagal and sacral-derived populations could partially be attributed to difference in maturity. Maturity will surely not explain the whole difference observed but needs to be factored into the interpretation. As scRNA-seq datasets from the mature chicken ENS are lacking (as well as detailed IHC-based neural classification system) the inference made in the paper between molecular classes and functional types are premature.

      We appreciate this comment and think it is an excellent suggestion that we definitely plan to do. This made us realize that we failed to clarify in the text why we chose this particular time point for our study, which is two-fold.

      First, we are particularly interested in how neural crest cells choose their prospective fates. E10 is a time when the post-umbilical gut has been completely populated by both vagal and sacral neural crest cells for 2 days so cells are in the process of differentiation but there still exists a large precursor pool. For this reason, we can capture both precursors and some differentiated neuronal subtypes. We have clarified this point in the revised manuscript and now focus much more on the precursor population to identify both genes that are common to vagal and sacral neural crest cells as well as those that are distinct. This enables us to formulate testable hypotheses for the role of potential role of particular transcription factors is allocation of cell fate. Of particular interest, we find that at E10, the sacral neuronal precursor pool is largely depleted whereas the vagal crest has a substantial neuronal precursor pool. Thus, we believe this is the perfect time point for initial analysis.

      Second and perhaps even more important, in the US, chick embryos are not considered vertebrates until after E10. Thus, E10 represents the last timepoint we can raise embryos without animal approvals which are not currently in hand. We completely agree that performing experiments at later timepoints will be incredibly valuable and therefore are now applying for approvals. But realistically, these take several months and thus would delay publication of our datasets (already delayed due to Covid restrictions) for at least another year. Therefore, we propose to publish the mature dataset as a Research Advance that would focus on differences between mature neuronal subtypes between preumbilical vagal, post-umbilical vagal and sacral datasets that would nicely complement the current work. Instead, we have refocused this paper on the precursor to differentiated neuron transition.

      I should mention that this refocusing seems particularly important given that our original aim was to explore differences between vagal and sacral neural crest contributions to the gut. However, the single cell data reveals strong overlap between sacral and vagal neural crest contributions to the postumbilical gut, suggesting a strong environmental influence on cell fate decisions.

      Specific concerns:

      1) Analysis of scRNA-sequenced sacral- versus vagal-derived ENS reveals clusters consistent with a non-ENS identity (endothelial, muscle, vascular and more). Previous studies in mouse using the neural crest tracing line Wnt1-Cre has not demonstrated such diverse progenies of neural crest from any region. An exception being a small population of mesenchymal-like cells (Ling and Sauka-Spengler, Nat Cell Biol. 2019; Zeisel et al., Cell 2018; Morarach et al., 2021; Soldatov et al., Science 2019). Therefore, the claimed broad potential of 6 of 13 neural crest giving rise to diverse gut cell populations warrants more validating experiments.

      We thank the reviewer for this comment. We clarify that hematopoetic clusters have dropped out upon reanalysis. The other clusters we believe are real based on gene markers used in previous studies to identify cell types such as neural crest-derived melanocytes like Mlana, Dct, and Mitf.

      2) Several earlier studies have revealed that parts of the ENS is derived from neural crest that attach to nerve bundles, obtain a schwann cell precursor-like identity and thereafter migrate into the gut (Uesaka et al. J Neurosci 2015 and Espinosa-Medina et al, PNAS 2017). The current work in chicken needs to be interpretated in the light of these findings and the publications should be discussed in relevant sections of the introduction and discussion.

      Thank you for this suggestion. We agree and indeed our data cannot differentiate between SCPs, which are neural crest-derived, versus early migrating neural crest cells. We have added this point to the discussion and also discuss these papers in more detail.

      3) The analysis indicates the presence of melanocytes. It is not clear why they are part of the GI-tract preparations. Could they correspond to another cell type, with partially overlapping gene expression profile as melanocytes?

      We have assigned these as melanocytes based on expression of Mlana, Mitf, and Dct as highly upregulated genes. These have been used in previous studies to identify neural crest derived melanocytes in the heart (Chen et al., 2021)

      4) As evident, the sacral- and vagal-derived ENS are not clonally related. To decipher differentiation paths and relations between clusters, individual analysis of the different datasets are needed. With only one UMAP representing the merged datasets combined with little information on markers, it is hard to evaluate the soundness of the conclusions regarding cell-identities of clusters and lineage differentiation.

      This is an excellent suggestion and we apologize for not including this previously. We have now added individual pre-umbilical vagal, post-umbilical vagal and sacral neural crest datasets as well as trajectory analysis for each.

      5) E10 is a relatively early stage in chicken ENS development. Around E7, the intestines do not contain differentiated neurons even. The relative high expression of Hes5 (marking mature enteric glia in the mouse; Morarach et al., 2021) in the vagal neural crest population might be explained by the more mature state of vagal versus sacral ENS. As also outlined below, Th/Dbh are known to be transiently expressed in the developing ENS why they could indicate the relative immaturity of sacral neural crest rather than differential neural identities. These issues need to be taken into account when interpreting biology from scRNA-seq data.

      We completely agree. We now clarify that we are particularly interested in how neural crest cells choose their prospective fates. We chose the E10 time point because this reflects a time point when the post-umbilical gut has been completely populated by both vagal and sacral neural crest cells for 2 days so cells are in the process of differentiation but there still exists a large precursor pool. For this reason, we can capture both precursors and some differentiated neuronal subtypes. Notably, the sacral derived precursors seem to be glial in flavor whereas neuronal precursors appear to be absent. We have clarified this point in the revised manuscript.

      6) Unlike the guineapig, and to some extent pig and murine ENS, the physiology of chicken enteric neurons has not been well characterized yet. Therefore, it is highly advisable to refrain from a nomenclature of clusters designating functions. Several key molecular markers are known to differ between murine, guineapig, rat and human systems. IPANs are a good example where differential expression is seen (SST in human but not mice; CGRP labels some IPANS in mouse, but not in guineapig, where Tac1 instead is expressed). IPANs are not defined in the chicken very well, and molecular markers found in other species may not be valid. Adrenergic and noradrenergic neurons have not been validated in the ENS (although, TH and Dbh have been observed in the especially in the submucosal ENS). Cholinergic neurons are also mentioned in the text, but do not appear in the figures as a defined group.

      Another reason to refrain from functional nomenclature is that a rather early stage is analysed in the present study, without possibilities to compare with scRNA-seq data from the mature chicken ENS (which was performed in Morarach et al, 2021 for the mouse). Recent data suggest that considerable differentiation may occur even in postmitotic neurons, and several markers are known to display a transient expression pattern (TH, DBH and NOS1; Baetge and Gershon 1990; Bergner et al., 2014; Morarach et al., 2021) why caution should be taken to infer neuronal identities to clusters.

      This is an excellent point and we thank the reviewer for this valuable input. Accordingly, we have now renamed the clusters based on prominent gene expression rather than neuronal or precursor subtype. Indeed we struggled with finding appropriate names making this comment all the more useful.

      7) The immunohistochemical analysis (Figure 5,6) is an essential complementary addition and validation of scRNA-seq. However, it is very difficult to discern staining when magenda and red are combined to display coexpression.

      Good point. This has been changed to be more readily discernible and higher magnification views have been added.

      8) To give more information to the field and body of evidence for claims made, quantifications relating to the analysis in Figures 5 and 6 are warranted as well as an expanded set of marker genes that align with the scRNA-seq results.

      Good point. We have added additional markers as suggested. In terms of quantitation, we can include numbers of labeled cells in a particular region but this may give a false impression of degree of contribution since we are using different viruses for vagal vs sacral that may have different titers making it a bit like comparing apples and oranges. We now emphasize that our labeling approach does not mark the entire population and that the degree of labeling can be variable.

      9) Correlations between genes and functions/neuron class are in many cases wrong (including Grm3, Gad1, Nts, Gfra3, Myo9d, Cck and more).

      Good point. We have toned this down.

      10) Attempts to subcluster neuronal populations are needed (Figure 7). However, to understand the biology, it is important to address which cells are sacral versus vagal-derived. Additionally, related to previous comment, as the vagal and sacral neurons are not clonally related, it would be important to make separate analysis of neurons relating to each region.

      Good point. We have added additional analysis to address this important point in what is now Fig 6 and in particular validated sacral contributions to glial cells (new Fig 8).

    1. Author Response

      Reviewer #1 (Public Review):

      In the current work, the authors aimed to investigate the genetic and non-genetic factors that impact structural asymmetry.

      A major strength is the number of data samples included in the study to assess brain structural asymmetry. A consequence of the inclusion of many samples is then also the sample size.

      We thank the reviewer for their supportive and insightful comments that have helped improve our paper.

      Comment #1: Given that the authors also work with longitudinal data, it would be nice to be able to appreciate the individual effects across time points, this is now a little unclear.

      Our lifespan analysis incorporated both single and repeat measures over time in the trajectory estimation, and hence these will be an intermediate estimate of cross-sectional and longitudinal trajectories. We have clarified this in the Methods (see 1). A comprehensive analysis of the individual-specific asymmetry change effects in the current paper is thus hindered by many properties of the data, including that many participants contribute a single measure, that participants vary in their number of repeat-measures (1-6 timepoints), that the number of repeat-measures is dependent on age, and that the degree of asymmetry change differs between cortical metrics, clusters, and along the age variable. Most importantly, the average degree of asymmetry change is small; Fig. 3 indicates thickness asymmetry typically corresponds to a ~0.1 - 0.2mm difference, such that changes therein will be smaller and thus likely unclear at the individual level. Nevertheless, we have modified the average plots in Figures 2 and 3 to allow better visualization of the individual hemispheric measures across timepoints, as well as an appreciation of the density of our longitudinal data.

      1 – (line 646) “GAMMs incorporate both single and repeat measures over time to capture nonlinearity of the mean level trajectories across persons, resulting in population estimates that are intermediate between cross-sectional and longitudinal trajectories”

      Comment #2: A possible less well-developed approach is the genetic basis, as this was stated as the main question, here the investigations are not that deep and may only touch upon the question.

      We agree the previous formulation of our Abstract did convey this impression, and have thus made the following important amendment:

      (Abstract) “Cortical asymmetry is a ubiquitous feature of brain organization that is subtly altered in some neurodevelopmental disorders, yet we lack knowledge of how its development proceeds across life in health. Achieving consensus on the precise cortical asymmetries in humans is necessary to uncover the developmental timing of asymmetry and extent to which it arises through genetic or later influences in childhood.”

      Our paper aims to serve as a critical reference for the normative childhood development and lifespan change of cortical asymmetry. We performed heritability analyses as they are informative regarding development and shed light on the timing of influences shaping cortical asymmetry (also possibly prior to age ~4 at which our sample starts). Similarly, genetic correlation analysis sheds light on whether the replicable interregional correlations are underpinned by genetic differences, indicative of coordinated genetic development of asymmetries. We apologize the rationale behind these analyses was not well-specified, and have clarified this (see response #4). Thus, we respectfully disagree the genetic aspect represented the main research question, but rather lends support to our developmental perspective.

      Given the density of analyses already included and that these are well-specified within the context of our overarching question, we do not see how adding more genetic analyses will be beneficial for our paper. However, we agree with the Reviewer’s subsequent comment (#8) that the genetic correlations in HCP data should also have been reported, and now incorporate these (see response #8).

      Comment #3: Moreover, the association with cognition, handedness, sex, and ICV is somewhat interesting yet seems also a bit minimal to fully grasp its implications.

      In the asymmetry field it has been commonplace to assume these factors are strongly related to asymmetry, particularly sex. Here, despite optimizing the delineation of asymmetries, associations with factors purportedly related to it were all very small. We believe this is an important message that may help reorient the field away from entrenched views; unless we show it is not the case, researchers may think the effects of these factors are larger than they are. Further, because questions pertaining to sex and handedness differences will certainly arise for many, we chose to address them by quantifying the average effects in big data, because our lifespan trajectory analysis was not well-suited to assessing e.g. sex differences in asymmetry trajectories (i.e. 3-way non-linear interactions; sexagehemisphere). We have strengthened the reasoning for this analysis in the Introduction (see 1):

      1 – (line 118) “Therefore, as a final step, we reasoned that combining an optimal delineation of population-level cortical asymmetries with big data would optimize detection and quantification of the effects of factors commonly assumed important for asymmetry, namely general cognitive ability, handedness and sex.”

      Contrary to approaches that often place emphasis on p-values (e.g. pheWAS), our targeted approach using variables long considered important for asymmetry enabled transparent reporting of the effect sizes and directions. We hope the Reviewer agrees we have taken care in this regard, and are careful to communicate the found effects are small. The small effects seem typical of structural brain associations in big data, as may be expected when relating complex phenotypes to any single structural measure. For these reasons, we opt not to extend the analysis beyond our initial targeted approach, arguing instead that the size of the effects is reason enough to report them.

      Despite being small, however, we argue they are not negligible (see 2-4). Of note, though it may appear so in Fig. 7, the p-value for the cognitive association was far from just surviving Bonferroni correction (it would survive >13,000 comparisons at our alpha level [⍺=.01], whereas we corrected for our 136). Note we did not accept a 5% false positive rate. We have clarified this in the Results (see 5):

      2 – (line 485) “Other factors commonly espoused to be important for asymmetry were associated with only small average effects in adults. For example, we found one region – SMG/perisylvian – wherein higher leftward areal asymmetry related to subtly higher cognitive ability. Since interhemispheric anatomy here is likely related to brain torque 2,3, this may agree with work suggesting torque relates to cognitive outcomes 4,5. Interestingly, that ~94% of humans exhibit leftward asymmetry in this region (Figure 1G) suggests tightly regulated genetic-developmental programs control its lateralized direction in humans (see Figure 6). This result may therefore suggest disruptions in areal lateralization early in life are associated with cognitive deficits detectable in later life as small effects in big data 6. While speculative, this may also agree with evidence that differences in general cognitive ability that show high lifespan stability 6 relate primarily to areal phenotypes formed early in life 7–9.”

      3 – (line 461) “We also found areal asymmetry in anterior insula is, to our knowledge, the most heritable asymmetry yet reported with genomic methods 10–14, with common SNPs explaining ~19% variance. This is notably higher than in our recent report (< 5%) 14, illustrating a benefit of our approach. As we reported recently 14, we confirm asymmetry here associates with handedness.”

      4 - (line 495) “Consistent with our recent analysis in UKB 14, we confirmed leftward areal asymmetry of anterior insula, and leftward somatosensory thickness asymmetry is subtly reduced in left-handers. Sha et al. 14 reported shared genetic influences upon handedness and asymmetry in anterior insula and other more focal regions. Anterior insula lies within a left-lateralized functional language network 15, and its structural asymmetry may relate to language lateralization 16–18 in which left-handers show increased atypicality 19–21. Since asymmetry here emerges early in utero 22 and is by far the most heritable (Figure 6), we agree with others 16 that this ontogenetically foundational region of cortex may be fruitful for understanding genetic-developmental mechanisms influencing laterality 23,24. Less leftward somatosensory thickness asymmetry in left-handers also echoes our recent report 14 and fits a scenario whereby thickness asymmetries may be partly shaped through use-dependent plasticity and detectable through group-level hemispheric specializations of function. Still, the small effects show cortical asymmetry cannot predict individual handedness. Associations with other factors typically assumed important were similarly small, and mostly compatible with the ENIGMA report 25 and elsewhere 26,27. 5 - (line 3221) ”Although small, we note this association was far from only just surviving correction at our predefined alpha level (⍺ = .01; corrected for 136 tests; Methods).”

      6 - (line 348) “we … uncover novel and confirm previously-reported associations with factors purportedly related to asymmetry – all with small effects”

      Thus, in quantifying effects we could not include in our lifespan analysis we preempt the questions likely to arise for many researchers, provide a sobering account of the effect sizes of factors typically assumed important for asymmetry, and find results that fit the developmental framework we lay out in the paper. We therefore opt to keep these together with the lifespan and heritability results in the current paper.

      Comment #4: To some extent, the aim of the study could still be written with more clarity. However, the authors have in part achieved their aims - assuming it is found a consensus on the brain asymmetry patterns in humans as is stated in the abstract.

      Alongside the amendment to the Abstract that better clarifies our aims (response #2), we have restated the aims in the Introduction:

      1 - (line 121) Here, we first aimed to delineate population-level cortical areal and thickness asymmetries using vertex-wise analyses and their overlap in 7 international datasets. With a view to gaining insight into cortical asymmetry development, we then aimed to trace a series of lifespan and genetic analyses. Specifically, we chart the developmental and lifespan trajectories of cortical asymmetry for the first time longitudinally across the lifespan. Next, we examine phenotypic interregional asymmetry correlations, under the assumption correlations indicate coordinated development of left-right asymmetries through genes or lifespan influences. To shed light on the extent to which differences in asymmetry are genetic, we test heritability of asymmetry using genome-wide single nucleotide polymorphism (SNP) and extended twin data, and examine whether or not phenotypic associations are underpinned by genetic correlations suggestive of coordinated development through genes. Finally, we screen our set of robust, population-level asymmetries for association with general cognitive ability and factors purportedly related to asymmetry in UK Biobank (UKB). 28

      Comment #5: Overall the results support the conclusions, yet the strong interpretation of early life factors in particular is not empirically investigated as far as I gather.

      The reviewer is correct that we do not have data on neonates to directly support interpretations of prenatal factors. We have therefore tempered strong interpretations pertaining to prenatal accounts accordingly, have added text at the start of the Discussion to address this (see 1), and qualified all discussion of prenatal factors:

      1 – (line 366) “Tracing their lifespan development, we show the trajectories of areal asymmetry primarily suggest this form of asymmetry is developmentally stable at least from age ~4, maintained throughout life, and formed early on – possibly in utero 13,29,30 (while we cannot extrapolate to ages before our sample begins, we note this agrees with findings in neonates 29,30). One interpretation of lifespan stability combined with low heritability may be stochastic early-life developmental influences determine individual differences in areal asymmetry more than later developmental change, but work linking prenatal and childhood trajectories is needed to affirm this”

      2 – (Abstract) “Results suggest areal asymmetry is developmentally stable and arises early in life through genetic but mainly subject-specific stochastic effects”

      We have also added argumentation regarding a just-published study suggesting the average pattern of neonatal areal asymmetry is largely similar to adults 1. In addition, we reiterate what our data can and cannot say about the developmental timing of asymmetry in several places in the Discussion (see 3 & 5). In other places, we have removed reference to prenatal factors (see 4). Still, while we agree we previously used the terms “prenatal” and “early life factors” interchangeably, we note the latter often encompasses periods of early childhood covered here and is not necessarily restricted to factors present at birth 2,3. Thus, we have amended the Discussion to qualify the age-range the interpretation pertains to (see 5), and then retain the conclusion as follows (see 6).

      3 - (line 383) “For areal asymmetry, adult-like patterns of lateralization were strongly established before age ~4, indicating areal asymmetry traces back further and does not primarily emerge through later cortical expansion 33. Rather, the lifespan trajectories predominantly show stability from childhood to old age, as asymmetry was maintained through periods of developmental expansion and aging-related change that were region-specific and bilateral. This may align with evidence indicating areal asymmetry may be primarily determined in utero 29,30, including evidence suggesting little change in areal asymmetry from birth to 2 years 29,33,34, and little difference between maps derived from neonates and adults 29,30. It may also fit with the principle that the primary microstructural basis of cortical area 8 – the number of and spacing between cortical minicolumns – is determined in prenatal life 8,9, and agree with work suggesting asymmetry at this microstructural level may underly hemispheric differences in surface area 35. The developmental trajectories agree with studies indicating areal asymmetry is established and strongly directional early in life 29,36. That change in surface area later in development follows embryonic gene expression gradients may also agree with a prenatal account for areal asymmetry 9”

      4 - (line 439) “The strongest relationships all pertained to asymmetries that were proximal in cortex but opposite in direction. Several of these were underpinned by high asymmetry-asymmetry SNP-based genetic correlations, illustrating some lateralizations in surface area exhibit coordinated genetic development.”

      5 - (line 481) “Regardless, these results support a differentiation between early-life (i.e. before age ~4) and later developmental factors in shaping areal and thickness asymmetry, respectively.”

      6 - (Conclusion) “Developmental and lifespan trajectories, interregional correlations and heritability analyses converge upon a differentiation between early-life and later-developmental factors underlying the formation of areal and thickness asymmetries, respectively. By revealing hitherto unknown principles of developmental stability and change underlying diverse aspects of cortical asymmetry, we here advance knowledge of normal human brain development.”

      Overall this is a nice and thorough work on asymmetry that may inform further work on brain asymmetry, its genetic basis, development, environmentally induced change, and link to behavioural variation.

    1. Author Response

      Reviewer #1 (Public Review):

      Bacterial carboxysomes are compartments that enable the efficient fixation of carbon dioxide in certain types of bacteria. A focus of the current work is on two protein components that provide spatial regulation over carboxysomes. The McdA system is an ATPase that drives the positioning of carboxysomes. The McdB system is essential for maintaining carboxysome homeostasis, although how this role is achieved is unclear. Previous studies, by the lead author's lab, showed that the McdB system is a driver of phase separation in vitro and in cells. They proposed a putative connection between McdB phase separation and carboxysome homeostasis. The central premise of the current work is as follows: In order to understand if and how phase separation of McdB impacts carboxysome homeostasis, it is important to know how the driving forces for phase separation are encoded in the sequence and architecture of McdB. This is the central focus of the current work. The picture that emerges is of a protein that forms hexamers, which appears to be a trimer of dimers. The domains that drive that the dimerziation and trimerization appear to be essential for driving phase separation under the conditions interrogated by the authors. The N-terminal disordered region regulates the driving forces for phase separation - referred to as the solubility of McdB by the authors. To converge upon the molecular dissections, the authors use a combination of computational and biophysical methods. The work highlights the connection between oligomerization via specific interactions and emergent phase behavior that presumably derives from the concentration (and solution condition) dependent networking transitions of oligomerized McdB molecules.

      Having failed to obtain specific structural resolution for the full-length McdB as a monomer or oligomer, the authors leverage a combination of computational tools, the primary one being iTASSER. This, in conjunction with disorder predictors, is used to identify / predict the domain structure of McdB. The domain structure predictions are tested using a limited proteolysis approach and, for the most part, the predictions stand up to scrutiny affirming the PONDR predictions. SEC-MALS data are used to pin down the oligomerization states of McdB and the consensus that emerges, through the investigations that are targeted toward a series of deletion constructs, is the picture summarized above.

      Is the characterization of the oligomerization landscape complete and likely perfect? Quite possibly, the answer is no. Deletion constructs pose numerous challenges because they delete interactions and inevitably impose a modularity to the interpretation of the totality of the data.

      This is a good point and always a possibility with truncations – the protein McdB may not be as modular in nature as it seems in our tripartite model. But the deletion constructs were more so intended to be tools for identifying key regions of oligomerization and condensate formation as others have done, and for this, they were indeed useful. Additionally, we were able to strategically aim our substitution mutations based on data from the deletion constructs. These substitutions provided data consistent with the deletions, but in the context of the full-length protein (see Fig. 5 vs. Figs. 2, 4). However, we ultimately agree with the reviewer that this is always a possibility with truncations, and we have therefore mentioned this caveat in the discussion.

      Line 415 “Truncated proteins have been useful in the study of biomolecular condensates. But it is important to note that using truncation data alone to dissect modes of condensate formation can lead to erroneous models since entire regions of the protein are missing. However, data from our truncation and substitution mutants were entirely congruent. For example, deletion of the CTD or substitutions to this region caused destabilization of the hexamer to a dimer, and deletion of the IDR or substitutions to this region caused solubilization of condensates without affecting hexamer formation.”

      Accordingly, we are led to believe that the N-terminal IDR plays no role whatsoever in the oligomerization.

      Our updated data still strongly supports this interpretation. Both truncation of the IDR (Fig. 2) and the six-Q-substitution mutant in the IDR (Fig. 5) form a monodispersed hexamer in solution via SEC-MALS, as does wild-type McdB.

      Close scrutiny, driven by the puzzling choice of nomenclature and the Lys to Gln titrations in the N-terminal IDR raise certain unresolved issues. First, the central dimerization domain is referred to as being Q-rich. This does not square with the compositional biases of this region. If anything is Q/L or just L-rich. This in fact makes more sense because the region does have the architecture of canonical Leu-zippers, which do often feature Gln residues. However, there is nothing about the sequence features that mandates the designation of being Q-rich nor are there any meaningful connections to proteins with Q-rich or polyQ tracts. This aspect of the analysis and discussion is a serious and erroneous distraction.

      We changed the language here, and no longer refer to the central region as “Q-rich”. However, we would like to note that the second half of the McdB central domain is indeed enriched in glutamines (14/53 = 26.4%) to a comparable extent as the region of FUS, which has been shown to help drive condensate formation via glutamine H-bonding (14/44 = 31.8%; Murthy et al 2019). We were simply proposing that, at a molecular level, there was some insight to be gained from this comparison. We agree, however, that there is no functionally meaningful comparison between McdB and polyQ-tract proteins, as we may have previously alluded to in our discussion, and that text has been removed.

      Back to the middle region that drives dimerization, the missing piece of the puzzle is the orientation of the dimers. One presumes these are canonical, antiparallel dimers. However, this issue is not addressed even though it is directly relevant to the topic of how the trimer of dimers is assembled.

      Indeed, we were unable to resolve the orientation issue, despite much effort. The story we present is not a complete and final model of McdB structure, nor its molecular modes of oligomerization or condensate formation. However we now provide a discussion section “McdB homologs have polyampholytic properties between their N- and C-termini” that highlights this issue. We also mention the remaining dimer orientation issue at the end of the results section “Se7942 McdB forms a trimer-of-dimers hexamer”. However, we believe the data presented still provides useful initial models, which for example, allowed us to create a series of substitutions that tune McdB condensate solubility and verify that they do not affect oligomerization. We would like to further add that for other condensate forming proteins in bacteria, like the PopZ protein we mention in the text, there remains no detailed structural model beyond the resolution we provide here for McdB; despite PopZ being first identified in 2008. Over 40 publications on PopZ have progressively provided useful and more detailed models that are only now being used to develop PopZ as a tool for condensate technologies that are furthering our understanding of the biological implications of condensate formation across all cell types. The intention with our current report is therefore not to generate a finalized molecular model of this entirely unstudied class of McdB proteins. But instead, to generate useful insight into McdB biochemistry that can advance our understanding of this class of protein’s function in vivo. To this end, we now add in vivo data based on these initial models where we specifically link cellular phenotypes to McdB condensate solubility (Fig. 8). Of course, there are several follow-up studies that come from the current report, but we believe that speaks to the value of the presented research in advancing this field.

      If the trimer is such that all binding sites are fully satisfied (with the binding sites presumably being on the C-terminal pseudo-IDR), then the hexamer should be a network terminating structure, which it does not seem to be based on the data. Instead, we find that only the full-length protein can undergo phase separation (albeit at rather high concentrations) in the absence of crowder. We also find that the driving forces for phase separation are pH dependent, with pH values above 8.5 being sufficient to dissolve condensates. Substitution of Lys to Gln in the N-terminal IDR leads to a graded weakening of the driving forces for phase separation. The totality of these data suggest a more complex interplay of the regions than is being advocated by the authors.

      Thank you and we agree. As we discuss above in response #4 and below in response #7, we have changed the focus and tone of our report to say that, while the models we have generated are useful, we are aware they are incomplete at a molecular level. Furthermore, as we describe in response #6, we have added several new McdB mutants to investigate more deeply the role of the CTD, but this region was not amenable to mutagenesis as these mutants affected McdB oligomerization. Lastly, while network forming interactions are certainly important for condensate formation as the reviewer describes, so are solvent interactions. We have added new text and data related to Figs. 3, 4 that address these issues.

      Almost certainly, there are complementary electrostatic interactions among the N-terminal IDR and C-terminal pseudo IDR that are important and responsible for the networking transition that drives phase separation, even if these interactions do not contribute to hexamer formation. The net charge per residue of the 18-residue N-terminal IDR is +0.22 and the NCPR of the remainder is ≈ -0.1. To understand how the N-terminal IDR is essential, in the context of the full-length protein, to enable phase separation (in the absence of crowder), it is imperative that a model be constructed for the topology of the hexamer. It is also likely that the oligomer does not have a fixed stoichiometry.

      We agree and thank the reviewer for these comments. We have added several new substitution mutants aimed at addressing this (Figs. 5, S6). However, the C-terminus was not amenable to substitutions as the trimer-of-dimers was significantly destabilized in these mutants (Figs. 5, S7). Therefore, in this report we were unable to determine specifically how the basic residues in the IDR contribute to condensate formation. However, with the addition of new data in Fig. 8, we think we adequately show that the IDR mutants can be used to investigate McdB condensate formation in vivo, and that follow-up studies will be aimed at investigating these details. We have also added an new discussion section “McdB homologs have polyampholytic properties between their N- and C-termini” that highlight this very likely possibility suggested by the reviewer.

      Therefore, the central weakness of the current work is that it is too preliminary. A set of interesting findings are emerging but by fixating on Lys to Gln titrations within the N-terminal IDR and referring to these titrations as impacting solubility, a premature modular and confused picture emerges from the narrative that leaves too many questions unanswered.

      The work itself is very important given the growing interest in bacterial condensates. However, given that the focus is on understanding the molecular interactions that govern McdB phase behavior - a necessary pre-requisite in the authors minds for understanding if and how phase separation impacts carboxysome homeostasis - it becomes imperative that the model that emerges be reasonably robust and complete. At this juncture, the model raises far too many questions.

      We agree that our previous report was focused mainly on the molecular basis of McdB condensate biochemistry, and in that report we left the model short. In this revised version, we have added several pieces of new data that strengthen the model (Figs. 3-5), although it is still incomplete. However, in this revised version, we have also shifted the focus from a complete biochemical understanding of McdB condensates to a study that links McdB condensate formation in vitro to phenotypes in vivo. In this regard, we have added the in vivo data in Fig. 8 and somewhat changed the focus in the text.

      The MoRF analysis is distraction away from the central focus.

      The MoRF analysis has been removed.

      The problem, as I see it, is that the authors have gone down the wrong road in terms of how they have interpreted the preliminary set of results. Further, the methods used do not have the resolution to answer all the questions that need to be answered. Another issue is that a lot of standard tropes are erected and they become a distraction. For example, it is simply not true that in a protein featuring folded domains and IDRs it almost always is the case that the IDR is the driver of phase transitions. This depends on the context, the sequence details of the IDRs, and whether the interactions that contribute to the driving forces for phase separation are localized within the IDR or distributed throughout the sequence. In McdB it appears to be the latter, and much of the nuance is lost through the use of specific types of deletion constructs.

      Thank you. We have removed much of this and changed the diction on how our current model of McdB condensate formation fits into the literature in the discussion.

      Overall, the work represents a good beginning but the data do not permit a clear denouement that allows one to connect the molecular and mesoscales to fully describe McdB phase behavior. Significantly more work needs to be done for such a picture to emerge.

      Reviewer #2 (Public Review):

      In this work, Basalla et al. study the biochemical properties of the carboxysome positioning protein, McdB. Using in vitro experiments, the authors characterize McdB oligomeric states and the domains driving and modulating its phase separation. Based on bioinformatics analysis, the authors identify a putative binding recognition motif between McdB and its two-component system counterpart McdA. As McdAB-like systems emerge as spatial regulators of bacterial compartments, the data presented here may be of general interest. The study is well executed and provides exciting hypotheses to be tested in vivo.

      The authors found that McdB from S. elongatus PCC 7942 consists of three domains: an N-terminal 18 aa disordered region, a Q-rich helical domain, and a helical C-terminal domain (CTD). Analyzing these domains, the authors present three key results: (i) The Q-rich domains form dimers, and the CTD drives the formation of trimers of dimers (ii) Phase separation is pH sensitive, driven by the Q-rich domain, and modulated by basic residues in the IDR, (iii) The IDR contains a putative recognition motif that binds McdA. While these three sets of results are rich in data, they are disjointed. Relating the three datasets (oligomeric states of the protein, its phase separation behavior, and its ability to bind McdA) is required to provide a complete picture of the molecular mechanism driving McdB condensation.

      Specific comments:

      1) The main limitation of this manuscript is the lack of integration between the three areas of results. In particular: how do the IDR basic residues disrupt phase separation? Is that through interference with either the dimer or timer interface? Does the McdB IDR regulate phase separation behavior when bound to McdA? Or, in other words, is the MoRF acting both as a binding interface and as a solubility regulator, and if so, can both functions be achieved simultaneously? It seems like the MoRF includes at least three basic residues.

      Indeed, we were unable to fully resolve the specific molecular interactions that give rise to condensates versus those that give rise to oligomers, and how these two modes of self-association contribute to one another. One limitation was that, as shown in our new data, the CTD was not amenable to mutagenesis, as it caused destabilization of the trimer-of-dimers (Fig. 5, Fig. S7). Therefore, we could not dissect how the CTD contributes to oligomerization versus driving condensates. However, we did include in vivo data showing how the IDR mutations allowed us to specifically link phenotypes to McdB condensate solubility (Fig. 8). As we discuss above in responses #4, #6, and #7, we changed the focus of the revised manuscript from the molecular basis of McdB condensate formation to linking McdB condensate formation in vitro and its functionality in vivo. To this end, we think the IDR mutation set has been useful, and follow-up studies will be done to further the molecular model of McdB condensate formation. Reviewers 1 and 3 deemed the MoRF section a distraction. Therefore, MoRF analysis and discussions of McdA interactions with this potential MoRF have been removed.

      Finally, what is the effective concentration of McdB in cells, and how does that translate to the in vitro studies?

      In our previous version, we used McdB concentrations between 50-100 µM. We do not know the in vivo concentration of McdB. We have tried several antibodies against McdB, and a few were good enough to detect the presence of McdB, but not quantifiably. We therefore believe in vivo McdB levels are low (sub-micromolar), and definitely lower than the range we previously used in our in vitro studies. In our revised manuscript, we include a titration of McdB at lower concentrations, and see condensates at McdB concentrations lower than 2 µM.

      2) How general are the conclusions made here to other McdBs? The authors have published nice work surveying the commonalities and differences between homologous McdB proteins. Can you comment on the applicability of your findings to other McdB proteins?

      This is a great point, which we have added to a new discussion section titled “McdB homologs have polyampholytic properties between their N- and C-termini”.

      Additional issues:

      3) Using SEC and SEC-MALS, the authors demonstrated that the Q-rich domain forms a stable dimer and that the full-length protein forms hexamers, suggesting trimers of dimers assembly. The authors also suggest that the CTD is responsible for forming those trimers of dimers based on SEC-MALS measurements. However, Figure 2D shows that while the full length runs at 6.6x the monomer, the Q-rich+CTD runs at 5.4x the monomer. First, I could not find SEC-MALS of the full-length protein, and it is not clear whether SEC-MALS was used for all or a fraction of the constructs discussed in Figure 2D. Second, could it be that the Q-rich domain+CTD is an ensemble of hexamers and dimers? Perhaps the IDR is playing a secondary role in stabilizing the hexamer?

      We have repeated the SEC-MALS experiments and included the full-length protein (Fig. 2). Furthermore, we have included SEC-MALS for some of the key substitution mutants (Figs. 5, S7). With the additional findings, our conclusions remain the same as in our previous version of the manuscript.

      4) The analysis of the phase separation results needs to have some extra quantification. The authors show that at 100 uM protein with 10% PEG the full-length phase separates as well as IDR+Q-rich. Lines 176-178: "The CTD, on the other hand, has no effect on the Q-rich domain condensates; Q-rich+CTD condensates formed at the same protein concentration and with identical droplet morphologies at the Q-rich domain alone." It is hard to draw this conclusion solely based on the data presented in Figure 3. An alternative interpretation might be that Q-rich+CTD reduces csat. I suggest the authors include turbidity assays (as shown for pH effect) to quantitively determine csat for these different constructs and perhaps perform FRAP to determine the mobility of these different constructs. In addition, how long after the addition of PEG were these droplets imaged?

      We now include an additional figure where we characterize condensates for full-length McdB (Fig. 3), including FRAP as suggested by the reviewer. We also include additional experiments for the truncations as requested (Fig. 4), and relate the truncation data to the model we propose for the full-length protein. All condensate samples were incubated for 30 mins prior to imaging unless otherwise stated, which we have added to the methods section “Microscopy of protein condensates”.

      5) Solubility assays shown in Figures 4A, B, D, and 5C are missing error bars. Without replicates, it is difficult to assess, for example, the effect of KCl.

      We have included replicates and error bars. Apologies for the omission.

      Also, please indicate the physiological ranges of KCl and pH in Figure 6. The phase separation sensitivity to pH is intriguing. By changing basic residues to glutamines, the authors conclude that the positive charge of the IDR modulates solubility. The Q-rich domain, however, is negatively charged. Can the authors comment on the role of acidic residues in the Q-rich domain? Are they required for phase separation? Also - based on your previous bioinformatics analysis, are the charges of the IDR and the Q-rich domains conserved across McdB homologs?

      Data from this report, and as described by reviewer #1, suggest that charge in the CTD, and not the central region, may be important. Our previous report (MacCready et al., Mol Biol Evol. 2020) touches on the conservation of charge in the NTD and CTD, which we have now added to the discussion section titled ““McdB homologs have polyampholytic properties between their N- and C-termini””. However, we were unable to experimentally verify electrostatic associations between the NTD and CTD because the CTD was not amenable to mutagenesis, as shown in our new data added to the manuscript (Figs. 5, S7).

      6) In previous work, the authors showed a conserved RKR segment in the IDR is highly conserved and missing in S. elongatus PCC 7942 (MacCready et al., Mol Biol Evol. 2020). Given the current finding, it would be important to understand whether the RKR deletion carries functional implications for phase separation behavior.

      The RKR segment is not missing, but likely relates to the KKR residues from S. elongatus PCC 7942. We describe this in more detail elsewhere (MacCready et al., Mol Biol Evol. 2020). However, as we show here, these specific residue locations do not seem to be especially important for condensate formation, but instead the overall net charge of the IDR mediates condensate solubility regardless of the specific residues mutated (Fig. 6).

      7) McdB proteins with 2Q left mutated vs. 2Q middle and 2Q right seem to result in condensates with different material properties (e.g., DIC pictures show different droplet morphologies for the different constructs). Is that the case? And if so, can you comment on that?

      We have included a brief mention of this in the text. However, the overall interpretation of these results remains that regardless of the residues mutated, there is a comparable degree of condensate solubilization for constructs with the same IDR net charge (Fig. 6).

      Reviewer #3 (Public Review):

      Through a series of rigorous in vitro studies, the authors determined McdB's domain architecture, its oligomerization domains, the regions required for phase separation, and how to fine-tune its phase separation activity. The SEC-MALS study provides clear evidence that the α-helical domains of McdB form a trimer-of-dimers hexamer. Through analysis of a small library of domain deletions by microscopy and SDS-PAGE gels of soluble and pellet fractions, the authors conclude that the Q-rich domain of McdB drives phase separation while the N-terminal IDR modulates solubility. A nicely executed study in Figure 4 demonstrated that McdB phase separation is highly sensitive to pH and is influenced by basic residues in the N terminal IDR. The study demonstrates that net charge, as opposed to specific residues, is critical for phase separation at 100 micromolar. In addition, the experimental design included analysis of McdB constructs that lack fluorescent proteins or organic dyes that may influence phase separation. Therefore, the observed material properties have full dependence on the McdB sequence.

      Thank you for the kind words and this perspective. We have added a brief mention to it in the discussion section titled “McdB condensate formation follows a nuanced, multi-domain mechanism”: “Furthermore, it should be noted that the McdB constructs used in our in vitro assays were free from fluorescent proteins, organic dyes, or other modification that may influence phase separation. Therefore, the observed material properties of these condensates have full dependence on the McdB sequence.”

      Studies of proteins often neglect short, disordered segments at the N- or C- terminus due to unclear models for their potential role. This study was interesting because it revealed a short IDR as a critical regulator of phase separation. This includes experiments that remove the IDR (Fig 2 & 3) and mutate the basic residues to show their importance towards McdB phase separation. In a nice set of SDS-PAGE experiments, the authors showed that as the net charge of the IDR decreased the construct became more soluble.

      One challenge is in the experimental design when mutating residues is to assess their impact on phase separation. The author's avoided substitutions to alanine, as alanine substitutions have synthetically stimulated phase separation in other systems. The authors, therefore, have a good rationale for selecting potentially milder mutations of lysine/arginine to glutamine. A potential caveat of mutation to glutamine is that stretches of glutamines have been associated with amyloid/prion formation. So, the introductions of glutamines into the IDR may also have unexpected effects on material properties. Despite these caveats, the authors show mutation of six basic residues in the short IDR abolished phase separation at 100 mM.

      Thank you for the thoughtful consideration, and appreciation of our work! Reviewer 1 had reservations for the Gln substitutions as well. We also used Alanine in new data added to the manuscript. But as the reviewer notes, the alanine mutations artificially drove further phase separation activity, and even aggregation. We show that mutants with the introduction of glutamines, however, remain soluble in vitro and in E. coli even at very high concentrations. Furthermore, we now include SEC-MALS of the McdB variant with 6 glutamines introduced in the IDR and show that there is no impact on oligomeric state. Together the data show no amylogenic properties of these glutamine enriched mutants.

      We have added a note to this potential caveat in the discussion section “McdB condensate formation follows a nuanced, multi-domain mechanism”: “Glutamine-rich regions are known to be involved in stable protein-protein interactions such as in coiled-coils and amyloids (52, 53), and expansion of glutamine-rich regions in some proteins lead to amylogenesis and disease (54, 55). However, when we introduced glutamines into the IDR of McdB solubility was increased both in vitro and in vivo, and without any impact on hexamerization. Together, the data show that increasing the glutamine content in the IDR of McdB did not lead to amylogenesis, but rather increased solubility. Our findings therefore underpin the importance of positive charge in the IDR specifically for stabilizing McdB condensates.”

      Computational studies (Fig 7) also suggest that this short N-IDR region may play a role as a MORF upon potential binding to a second protein McdA. The formulation of this hypothesis is strengthened by the fact that for other ParA/MinD-family ATPases, the associated partner proteins have also been shown to interact with their cognate ATPase via positively charged and disordered N-termini. This aspect of understanding McdB's N-IDR as a MORF is at a very early stage. This study lacks experimental evidence for an N-IDR: McdA interaction and experimental data showing conformational change upon McdA binding. However, the computation study sets up the future to consider whether and how the phase separation activity of McdB is related to its structural dynamics and interactions with McdA.

      Based off of these comments and from Reviewer 1 comments, we have removed the MoRF analyses entirely. The MoRF analysis will be coupled to another study in the lab focused on McdB interactions with McdA.

      In summary, this study provides a strong foundation for the contribution of domains to McdB's in vitro phase separation. This knowledge will inform and impact future studies on McdB regulating carboxysomes and how the related family of ParA/MinD-family ATPases and their cognate regulatory proteins. For example, it is unknown if and how McdB's phase separation is utilized in vivo for carboxysome regulation. However, the revealed roles of the Q-rich domain and N-IDR will provide valuable knowledge in developing future research. In addition, the systematic domain analysis of McdB can be combined with a similar analysis of a broad range of other biomolecular condensates in bacteria and eukaryotes to understand the design principles of phase separating proteins.

    1. Author Response

      Reviewer #1 (Public Review):

      When we tilt our heads, we do not perceive objects to be tilted or rotated. In this study, the authors investigate the underlying neural underpinnings by characterizing how neurons in monkey IT respond to objects when the entire body is tilted. They performed two experiments. In the first experiment, the authors record single neuron responses to objects rotating in the image plane, under two conditions - when the animals were tilted +20{degree sign} or -20{degree sign} relative to the gravitational vertical. Their main finding is that neural tuning curves for object orientation were highly correlated under these conditions. This high correlation is interpreted by the authors as indicative of encoding of object orientations relative to an absolute gravitational reference frame. To control for the possibility that the whole-body tilt could have induced compensatory torsional rotations of the eyes, the authors estimated the eye torsional rotation between the {plus minus}20{degree sign} whole-body tilt to be only {plus minus}6{degree sign}. In the second experiment, the authors recorded neural responses to objects rotated in the image plane with no whole-body tilt but with a visual horizon that could be tilted by the same {plus minus}20{degree sign} relative to the gravitational vertical. Here too they find many neurons whose tuning curves were correlated between the two horizon tilt conditions. Based on these results, the authors argue that IT neurons represent objects relative to the gravitational or absolute vertical.

      The question of whether the visual system encodes objects relative to the gravitational vertical is an interesting and basic one, and I commend the authors for attempting this question through systematic testing of object selectivity under conditions of whole-body tilt. However, I found this manuscript extremely difficult to read, with important analyses and controls described in a very cursory fashion. I also have several major concerns about these results.

      First, the high tuning correlation in the {plus minus}20{degree sign} whole-body tilt conditions could also occur if IT neurons encoded object orientation relative to other fixed contextual cues in the surrounding, such as the frame of the computer monitor. The authors ideally should have some experiment or analysis to address this potential confound, or else acknowledge that their findings can also be interpreted as the encoding of object orientation relative to contextual cues, which would dilute their overall conclusions.

      We think there are three possible interpretations of this comment. First, that visible edges, including the horizon and ground plane (in the scene stimuli), and the screen edges and other gravitationally aligned edges in the room, could serve as visual cues for the orientation of gravity. We agree with this wholeheartedly, and in fact showed a strong degree of gravitational alignment based purely on visual scene cues in Figures 3 and 4. This is consistent with our previous results suggest computation of gravity’s direction in the middle channel of IT (Vaziri et al., Neuron 2014; Vaziri and Connor, Current Biology 2016). Our findings would not be diluted by the fact that multiple cues, not just vestibular/somatosensory but also visual, could help in computing the direction of gravity.

      Second, that overlap between objects and horizon could produce a shape-configuration interaction that changes with object orientation and produces a tuning effect that remains consistent across monkey tilts. We agree this was a possibility, and that is why we tested neurons in the isolated object condition. We have added text to better explain this concern and the control importance of the isolated object condition in the discussion of Fig. 1: “The Fig. 1 example neuron was tested with both full scene stimuli (Fig. 1a), which included a textured ground surface and horizon, providing visual cues for the orientation of gravity, and isolated objects (Fig. 1b), presented on a gray background, so that primarily vestibular and somatosensory cues indicated the orientation of gravity. The contrast between the two conditions helps to elucidate the additional effects of visual cues on top of vestibular/somatosensory cues. In addition, the isolated object condition controls for the possibility that tuning is affected by a shape-configuration (i.e. overlapping orientation) interaction between the object and the horizon or by differential occlusion of the object fragment buried in the ground (which was done to make the scene condition physically realistic for the wide variety of object orientations that would otherwise appear improbably balanced on a hard ground surface).”

      The comparable results in the isolated object condition address the reasonable concern about the horizon/object shape configuration interaction.: “Similar results were obtained for a partially overlapping sample of 99 IT neurons tested with isolated object stimuli with no background (i.e. no horizon or ground plane) (Fig. 2b). In this case, 60% of neurons (32/53) showed significant correlation in the gravitational reference frame, 26% (14/53) significant correlation in the retinal reference frame, and within these groups 13% (7/53) were significant in both reference frames. The population tendency toward positive correlation was again significant in this experiment along both gravitational (p = 3.63 X 10–22) and retinal axes (p = 1.63 X 10–7). This suggests that gravitational tuning can depend primarily on vestibular/somatosensory cues for self-orientation.”

      Third, that the object and screen edges in the isolated object condition have an orientation interaction that influences tuning in a way that remains consistent across monkey tilt. If this was intended, we do not think this is a reasonable concern that needs mentioning in the paper itself. The closest screen edges on our large display were 28 in the periphery, and there is no reason to suspect that IT encodes orientation relationships between distant, disconnected visual elements. Screen edges have been present in all or most studies of IT, and no such interactions have been reported. We will discuss this point in online responses.

      Second, I do not fully understand torsional eye movements myself, but it is not clear to me whether this is a fixed or dynamic compensation. For instance, have the authors measured torsional eye rotations on every trial? Is it fixed always at {plus minus}6{degree sign} or does it change from trial to trial? If it changes, then could the high tuning correlation between the whole-body rotations be simply driven by trials in which the eyes compensated more? The authors must provide more data or analyses to address this important control.

      We now clarify that we could only measure ocular rotation outside the experiment with high-resolution closeup color photography, not possible on individual trials. The extensive literature on ocular counter-rotation has no indication that the degree of rotation is changed by any conditions other than tilt. Our measurements were consistent with previous reports showing that counterroll is limited to 20% of tilt. Moreover, they are consistent with our analyses showing that maximum correlation with retinal coordinates is obtained with a 6 correction for counterroll, indicating equivalent counterroll during experiments. Our analytical compensation for counterroll was based on this value, which optimized results in the retinal reference frame, so our measurements of counter-roll are used only to confirm this value. Ocular rotation would need to be five times greater than any previous observations to completely compensate for tilt and mimic the gravitational tuning we observed. For these reasons, counterroll is not a reasonable explanation for our results:

      “Compensatory ocular counter-rolling was measured to be 6 based on iris landmarks visible in high-resolution photographs, consistent with previous measurements in humans6,7, and larger than previous measurements in monkeys41, making it unlikely that we failed to adequately account for the effects of counterroll. Eye rotation would need to be five times greater than previously observed to mimic gravitational tuning. Our rotation measurements required detailed color photographs that could only be obtained with full lighting and closeup photography. This was not possible within the experiments themselves, where only low-resolution monochromatic infrared images were available. Importantly, our analytical compensation for counter-rotation did not depend on our measurement of ocular rotation. Instead, we tested our data for correlation in retinal coordinates across a wide range of rotational compensation values. The fact that maximum correspondence was observed at a compensation value of 6 (Figure 1–figure supplement 1) indicates that counterrotation during the experiments was consistent with our measurements outside the experiments.”

      Third, I find that when the objects were presented against a visual horizon, different object features are occluded at each orientation. This could reduce the correlation between the neural response in the retinal reference frame, thereby biasing all results away from purely retinal encoding. The authors should address this either through additional analyses or acknowledge this issue appropriately throughout.

      This idea of a shape interaction between object and horizon/ground is essentially the same concern discussed as the second interpretation of the first point, above. As outlined there, we addressed this concern in the best way possible, by removing the horizon/background (in the isolated object condition) and showing that the same results obtained. This comment raises the related point (also cured by the isolated object condition) of differential partial occlusion at the bottom of the object, 15% (by virtual mass) of which was buried below ground to provide a realistic physical interpretation for unbalanced orientations.

      We make both concerns explicit in the revised manuscript: “The Fig. 1 example neuron was tested with both full scene stimuli (Fig. 1a), which included a textured ground surface and horizon, providing visual cues for the orientation of gravity, and isolated objects (Fig. 1b), presented on a gray background, so that primarily vestibular and somatosensory cues indicated the orientation of gravity. The contrast between the two conditions helps to elucidate the additional effects of visual cues on top of vestibular/somatosensory cues. In addition, the isolated object condition controls for the possibility that tuning is affected by a shape-configuration (i.e. overlapping orientation) interaction between the object and the horizon or by differential occlusion of the object fragment buried in the ground (which was done to make the scene condition physically realistic for the wide variety of object orientations that would otherwise appear improbably balanced on a hard ground surface).”

      And we report that the control produces similar results in the absence of horizon/background: “Similar results were obtained for a partially overlapping sample of 99 IT neurons tested with isolated object stimuli with no background (i.e. no horizon or ground plane) (Fig. 2b). In this case, 60% of neurons (32/53) showed significant correlation in the gravitational reference frame, 26% (14/53) significant correlation in the retinal reference frame, and within these groups 13% (7/53) were significant in both reference frames. The population tendency toward positive correlation was again significant in this experiment along both gravitational (p = 3.63 X 10–22) and retinal axes (p = 1.63 X 10–7). This suggests that gravitational tuning can depend primarily on vestibular/somatosensory cues for self-orientation.”

      Reviewer #3 (Public Review):

      This is a very interesting study examining for the first time the influence of lateral tilt of the whole body on orientation tuning in macaque IT. They employed two types of displays: one in which the object was embedded in a scene that had a horizon and textured ground surface, and a second one with only the object. For the first type, they examined the orientation tuning with and without tilting the subject. However, the effect of tilt for the scene stimuli is difficult to interpret in terms of gravitational reference frame since varying the orientation of the object relative to the horizon leads to changes in visual features between the horizon and object. If neurons show tolerance for the global orientation of the scene (within the 50{degree sign} manipulation range) then the consistent orientation tuning across tilts may just reflect tuning for the object-horizon features (like the angle between the object and the horizon line/surface) that is tolerant for the orientation of the whole scene. Thus, the effects of tilt can be purely visually-driven in this case and may reflect feature selectivity unrelated to gravitation. The difference between retinal and gravitational effects can just reflect neurons that do not care about the scene/horizon background but only about the object and neurons that respond to the features of the object relative to the background. Thus, I feel that the data using scenes cannot be used unambiguously as evidence for a gravitational reference frame. The authors also tested neurons with an object without a scene, and these data provide evidence for a gravitational reference frame. The authors should concentrate on these data and downplay the difficult-to-interpret results using scenes.

      We still believe it is important to present these two experimental conditions in parallel, because we believe that visual driving of gravitational tuning by environmental cues is important in real life, and this is substantiated by the effects of visual cues alone. But, we have tried in this revision, in response to these comments and to comments from other reviewers, to clarify the potential concerns about visual effects in the full scene experiment, the importance and meaning of the isolated object condition as a control for concerns about other kinds of tuning, and the relationships between the two experimental conditions:

      Concerns about full scene experiment and the control importance of the isolated object condition: “The Fig. 1 example neuron was tested with both full scene stimuli (Fig. 1a), which included a textured ground surface and horizon, providing visual cues for the orientation of gravity, and isolated objects (Fig. 1b), presented on a gray background, so that primarily vestibular and somatosensory cues indicated the orientation of gravity. The contrast between the two conditions helps to elucidate the additional effects of visual cues on top of vestibular/somatosensory cues. In addition, the isolated object condition controls for the possibility that tuning is affected by a shape-configuration (i.e. overlapping orientation) interaction between the object and the horizon or by differential occlusion of the object fragment buried in the ground (which was done to make the scene condition physically realistic for the wide variety of object orientations that would otherwise appear improbably balanced on a hard ground surface) …

      Similar results were obtained for a partially overlapping sample of 99 IT neurons tested with isolated object stimuli with no background (i.e. no horizon or ground plane) (Fig. 2b). In this case, 60% of neurons (32/53) showed significant correlation in the gravitational reference frame, 26% (14/53) significant correlation in the retinal reference frame, and within these groups 13% (7/53) were significant in both reference frames. The population tendency toward positive correlation was again significant in this experiment along both gravitational (p = 3.63 X 10–22) and retinal axes (p = 1.63 X 10–7). This suggests that gravitational tuning can depend primarily on vestibular/somatosensory cues for self-orientation. However, we cannot rule out a contribution of visual cues for gravity in the visual periphery, including screen edges and other horizontal and vertical edges and planes, which in the real world are almost uniformly aligned with gravity and thus strong cues for its orientation (but see Figure 2–figure supplement 1). Nonetheless, the Fig. 2b result confirms that gravitational tuning did not depend on the horizon or ground surface in the background condition.”

      Cell-by-cell comparisons of scene and isolated stimuli, for those cells tested with both, in Figure 2–figure supplement 6. This figure shows 8 neurons with significant gravitational tuning only in the floating object condition, 11 neurons with tuning only in the gravitational condition, and 23 neurons with significant tuning in both. Thus, a majority of significantly tuned neurons were tuned in both conditions. A two-tailed paired t-test across all 79 neurons tested in this way showed that there was no significant tendency toward stronger tuning in the scene condition. The 11 neurons with tuning only in the gravitational condition by themselves might suggest a critical role for visual cues in some neurons. However, the converse result for 8 cells, with tuning only in the floating condition, suggests a more complex dependence on cues or a conflicting effect of interaction with the background scene for a minority of cells.

      Main text: “This is further confirmed through cell-by-bell comparison between scene and isolated for those cells tested with both (Figure 2–figure supplement 6).”

      Furthermore, the analysis of the single object data should be improved and clarified.

      We have added Figure 1–figure supplement 3–10 that expand the analysis of example cells and additional cells to include all stimuli shown and smoothed tuning curves for individual repetitions of the orientation range.

      We also now present results for individual monkeys in Figure 2–supplements 2,3, and the anatomical locations of individual neurons in Figure 2–supplements 4,5.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      __Response: __Thank you to all the reviewers for their helpful efforts on behalf of our manuscript. At current, we have addressed most of the reviewers’ major comments, including providing additional replicates for many experiments and clarifying ambiguous points in the text. Related data, figures and text have been adjusted accordingly. We believe that these changes have improved our manuscript, both strengthening our main conclusions and clarifying ambiguous text.

      Several still-ongoing experiments are elaborated below. These experiments are well within the abilities of our lab and can be completed in short order.

      Specific responses to the individual concerns addressed by the reviewers are outlined below.

      Please feel free to contact me if I can be of any help in the decision process.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      • *

      [Reviewer 1]

      Comment: Across the manuscript, NIX levels appear to be unresponsive to most treatments in the MDA-MB-231 line, including hypoxia treatment. This is an unusual result and raises questions about the role of NIX in MDA-MB-231 line, mainly that BNIP3 is the primary driver of mitophagy in this system. Indeed, Figure 7D indicates that there is very little mitophagy contribution by NIX since knockout of BNIP3 is sufficient to abolish mitophagy almost completely. Therefore, the effects seen on mitophagy following EMC3 knockout in Figure 7 might be smaller in a line that is responsive to NIX mitophagy. It would be beneficial to analyse basal mitophagy flux in an additional cell line, for example U2OS (FigS1E) in which NIX is responsive to hypoxia.

      Response: Thank you for bringing this intriguing insight to our attention. We have seen that EMC3 knockout prevents lysosomal delivery of BNIP3 in U2OS cells (Fig S2D). However, we don’t know what the effects on mitophagy are in U2OS, or the extent to which mitophagy is dependent on BNIP3 and/or NIX. To test this, we will perform the suggested experiment, taking mt-Keima expressing U2OS cells testing the role of NIX and/or BNIP3 in mitophagy.

      Comment: Following on from comment 1 above, Figure 7 would benefit with an analysis of hypoxia (or DFP, or cobalt chloride) stimulation of mitophagy to assess whether mitophagy levels are higher in EMC3 KOs. The authors argue that BNIP3 is trafficked to the ER during mitophagy and is not turned over by mitophagy itself, it would therefore be interesting to test if BNIP3 is prevented from being removed from mitochondria whether this would affect the rate or levels of mitophagy under stimulating conditions.

      • *

      __Response: __To address this question, we will perform mitoflux analysis on EMC3 KO cells +/- hypoxia.

      Comment: Figure 4B: The localisation of tf-BNIP3 is reminiscent of ER in BTZ treated samples. How much of the protein is on mitochondria in the presence of BTZ? Does MLN4924 cause a similar issue?

      __Response: __To address this question, we will perform fluorescence microscopy of tf-BNIP3 cells co-expressing mito-BFP under these treatments and utilize our Coloc2 plugin pipeline to monitor correlation.

      • *

      Comment: Can the authors assess whether BNIP3 that is on mitochondria is transferred to the ER (perhaps through photoswitchable GFP-BNIP, activated on mitos and then observe its transfer to ER)? This seems important in order to address the possibility that BNIP3 that is being turned over by the endolysosome is being delivered directly to the ER.

      • *

      __Response: __This is an interesting question and a curiosity also shared by Reviewer #2. To test this hypothesis, we will utilize a photo-switchable Dendra2 fluorophore to track BNIP3 in the cell via microscopy.

      • *

      [Reviewer #2]

      Comment: How is BNIP3 inserted into the outer membrane? A previous study from the Weissman lab proposed that MTCH2 serves as insertase. The authors did not mention MTCH1 and MTCH2 in context of Fig. 2B. Were these proteins not found? Did the authors test the relevance of MTCH2 in their assay? This aspect should be addressed and mentioned.

      __Response: __Thank you for the insight and suggestion. We were intrigued when the Weissman/Voorhees paper characterizing MTCH1/2 was published. Consistent with their findings, MTCH2 was found in the “suppressor” population of our tf-BNIP3 CRISPR screen, but given our 0.5-fold change threshold, the gene was not validated (fold change value = 0.46, Table S1). We suspect the lack of significance stems from the redundancy with MTCH1. Consequently, we would hypothesize that MTCH1/2 are the responsible insertases. To formally address this suggestion, we plan to genetically perturb MTCH1/2 and look at BNIP3 localization and mitophagy.

      • *

      Comment: The authors generated an interesting BNIP3 mutant with a C-terminal Fis1 anchor. This variant is constantly located in the outer membrane (which is shown here). The physiological consequence of the constitutive distribution on mitochondria is however only superficially studied. The authors should characterize this interesting mutant in some more depth.

      • *

      __Response: __In the original manuscript, we characterized BNIP3(Fis1TMD) for lysosomal delivery and mitophagy. Going forward, we will perform Seahorse oxygen consumption experiments and mitochondrial network analysis to view the physiological consequences of constitutive expression of BNIP3(Fis1TMD) on the outer membrane.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      [Reviewer #1]

      Comment: Continuing from comment 2, given that the authors conclude that BNIP3 is not turned over by mitophagy, can they examine whether BNIP3 is excluded from sealed mitophagosomes?

      __Response: __We have softened the wording of our conclusions to reflect that the vast majority of BNIP3 lysosomal degradation is by this alternative pathway and not mitophagy. However, we do not wish to completely dismiss that BNIP3 is present on mitophagosomes. Rather, if mitophagosomes contain BNIP3, they seemingly account for only a very small portion of BNIP3 degradation in the cell, to the extent that it is not easily detectable by our assays (Lines 414-419). Definitively identifying whether BNIP3 is in sealed mitophagosomes will be part of future studies using CLEM or FIB-SEM techniques.

      Comment: Is the BNIP3(FisTMD) expressed to equivalent levels to WT BFP-BNIP3? Given that theFis1 form of BNIP3 cannot traffic to endolysosomes, its levels might be higher. In addition, overexpression of the BNIP3-Fis construct was used to make the argument that dimerization is not important for mitophagy. But the authors should also take into account the possibility that with overexpression, the potential efficiency afforded to mitophagy via dimerization of endogenous proteins may be negated, and therefore hidden. Given this, I don’t think that the authors can confidently conclude that dimerization does not contribute to mitophagy, and that instead its main role is ER-endolysosomal turnover of BNIP3.

      __Response: __We thank the reviewer for pointing out the possible over-interpretation of our data. Overexpression is an important caveat to consider. We would expect the Fis1 form of BNIP3 to be higher in protein levels given its deficiency in endolysosomal trafficking. Still, as the reviewer points out, over-expression could be mitigating the effect of our dimerization mutants. This caveat is now discussed in the manuscript and our interpretations regarding this fact have been greatly softened (Lines 373-376, Lines 449-462).

      • *

      Comment: Please include molecular weight markers for all western blots.

      • *

      __Response: __All western blots have now been labeled with molecular weight markers.

      Comment: Figure 5A-G: These data do not make a convincing case for the role of dimerization and are very difficult to follow. Only the mislocalized S172A mutant was responsive to Baf treatment, while the LG swap mutant which is mitochondrial and cannot dimerize is unaffected by Baf treatment. Figure 5H-I utilize a construct of BNIP3 that is missing most of the protein and which has very low turnover (Figure 5B). Unfortunately these results don’t make a highly convincing case about the biology of native, full length, mitochondrial BNIP3. The authors are advised to either strengthen the dimerization argument, or perhaps lighten the language around the main conclusions from these data.

      Response: __Thank you for bringing the lack of clarity to our attention. Both dimer mutants of BNIP3 (S172A and LG swap) are insensitive to Baf-A1 treatment. These results hold for full-length BNIP3 using either the tf (__Fig 5D) or IRES (Fig 5I) reporter. To demonstrate that defects in lysosomal transport were due to dimerization defects (and not other, unanticipated effects of the mutations), we looked at whether chemically induced dimerization could reverse the trafficking defects. Indeed, forced dimerization of the ER-restricted variant rescued ER-to-lysosome trafficking. From this, we conclude that that dimerization is a critical facet of BNIP3 trafficking to the lysosome.

      We have re-worked the relevant text (both in results and discussion) to clarify major points and lighten the language around the conclusions from these data (described below).

      First, as mentioned above, we have added a significant discussion about the limitations of our assay and of possible interpretations. (Lines 300-303, Lines 323-326, Lines 483-489).

      Second, with regards to the specific construct used in this experiment, we have expanded the results section to better describe our rationale and approach (Lines 304-308). In short, because dimerization of native BNIP3 occurs within the membrane, we aimed to place the DmrB domain as close to the TM segment as possible. Due to the topology of TA proteins, a C-terminal tag isn’t possible. Therefore, we used the shortest truncation version of BNIP3 (117-end) that undergoes measurable lysosomal delivery. This was an important experimental consideration, and one we did not sufficiently rationalize in the original manuscript. We now include this point in the text.

      • *

      [Reviewer #2]

      Comment: The authors show that BNIP3 on the ER is not stable but degraded by the proteasome. Does this require ERAD factors? Is the mitochondrial BNIP3 protein likewise degraded by proteasomal degradation? It is not clear whether both BNIP3 pools are constantly turned over or whether degradation exclusively/predominantly occurs on the ER surface.

      Response: __These are fascinating mechanistic questions. We hope to thoroughly address these questions in a subsequent study. However, as a teaser, we have included the basic answer to these questions in __Fig 5I.

      To preliminarily characterize the proteasomal degradation of ER- and mitochondrial-BNIP3, we utilized our IRES reporter system - adapted from Steve Elledge’s system for degron monitoring (Fig 5I). Strikingly, our ER-restricted BNIP3 mutation (S172A) is sensitive to inhibition of both the proteasome and the AAA-ATPase p97/VCP, a key extractase for ERAD substrates. These data tentatively suggest an ERAD-dependent degradation mechanism (although many follow-up studies will be needed to confirm the mechanistic details). In sharp contrast, our mitochondrial-restricted mutant (LG Swap) is sensitive to proteasome inhibition by Bortezomib, but it is insensitive to VCP inhibition. The differential requirement for VCP suggests that proteasomal degradation occurs on both cellular pools of BNIP3 albeit through different mechanisms.

      Comment: The results of the screen shown in Fig. 2B are particularly interesting for readers. The glutathione peroxidase GPX4 was found as a top hit among the EMC components. GPX4 protects membranes (including those of mitochondria) against oxidative damage, is a major component of ferroptosis and linked to mitochondrial dysfunction and mitophagy. The authors should mention this interesting hit in the context of their discussion of the lipid-sensing properties of the dimerizing TM domains of BNIP3.

      __Response: __Thank you to Reviewer #2 for bringing this to our attention. The relationship between GPX4 and BNIP3 flux is very interesting. We have incorporated GPX4 into the discussion section (Lines 457-459).

      • *

      [Reviewer #3]

      Comment: For all of the tf-BNIP3 FACS data (all violin plots), it is unclear how many biological replicates were performed. The author only stated that at least 10,000 cells were analyzed per sample, but I believe this is for each biological replicate. To better demonstrate the biological replicates, the authors should consider using bar graphs of the medians(triplicates) with error bars.

      Response: We have included biological replicates of FACS data in all primary figures (except for Fig.1C). Biological replicates, represented as medians (in triplicate), are indicated in figure legends.

      Comment: In Fig 3D, it is unclear as to why there is no basal state accumulation of BNIP3 protein levels compared to Baf1A treated condition especially with USO1 and SAR1A KO samples. Is this because BNIP3 are targeted for proteasomal degradation? I think Fig 3D should include a BTZ treatment next to Baf1A to account for the lack of basal state accumulation of BNIP3.

      Response: We apologize for the lack of clarity on this point. Yes, the reviewer’s interpretation of the data is correct. This point is more clearly elaborated in the text of our revised manuscript (Lines 219-223). Our results indicate that when lysosomal degradation is diminished, the expected increase in total BNIP3 protein levels is attenuated by proteasomal degradation (as evidenced by the hyperstability of BNIP3 upon Bortezomib treatment in mutant backgrounds). As requested, we have included the same knockout panel, now treated with BTZ (Fig S2E). These genetic data are further supported by Fig 3E, where a small molecule inhibitor of vesicle trafficking, Brefeldin-A, ameliorates the effect of lysosomal inhibition (BafA1) but exacerbates the effect of proteasome inhibition.

      Comment: Truncation of proteins could affect their protein stability even during their synthesis. For Fig 5B and 6B, the authors should show the blots for the expression of the different truncated mutants to prove that the change in BNIP3 stability and their effect of mitoflux (or lack thereof), is not due to poor expression of these mutants.

      Response: These were important potential caveats to document, and we thank the reviewer for their comment.

      We note that, due to differences in transduction efficiency, western blot data is an incomplete measure for relative expression levels – it cannot distinguish between fraction of cells transduced and expression level per cell. However, RFP fluorescence (Fig 5B) and BFP fluorescence (Fig 6B) are fluorescent internal controls allowing us to assess expression levels with single cell resolution. We have provided histograms of RFP and/or BFP intensity (new Fig S4A, Fig S5B), which provides support that overall expression levels of these constructs are similar. Critically, any variation we observe does not correlate with any of the effects we report.

      In addition, we have clarified the figure axis in Fig 5B to indicate that the value we are reporting is the “fold-stabilization upon BafA1 treatment”. The original figure legend wasn’t clear. Our metric (fold-stabilization) is internally normalized to compensate for differences in expression level. This is an important clarification.

      Comment: For the data in Fig 7, the authors demonstrated that treating cells with proteasomal inhibitor increases mitoflux. Since the proteasome targets monomeric BNIP3 for degradation, the logical assumption is that BTZ drives dimerization of BNIP3. Can the authors demonstrate this in an approach similar to Fig 5C? This simple experiment will add significant insight into the study.

      Response: __Thank you for the suggestion. As Fig 5C relied on BNIP3 over-expression, we thought it even more informative to assess the effects of BTZ on dimerization of endogenous BNIP3. Indeed, we see accumulation of an SDS-resistant BNIP3 dimer in cells treated with BTZ (__new Fig S2E, line 221). We hypothesize that BTZ indirectly drives dimerization of BNIP3 by accumulating the total levels of the protein, potentiating monomers to form additional stable dimers.

      Comment: In line 168-169, "In addition, multiple suppressor genes identified from our screen had previously been reported including TMEM11..." -- Unclear what biology they are reported to be involved in

      __Response: __We have clarified this line to read: "In addition, we recovered multiple known suppressors of BNIP3 flux, including outer membrane protein spatial restrictor TMEM11, mitochondrial protein import factors DNAJA3 and DNAJA11, and mitochondrial chaperone HSPA9"

      Comment: Along the line with Major comment 2, the explanation for Fig 3D needs to be better elaborated, perhaps to include the role of proteasome already at this point (if the authors think this is the reason why basal BNIP3 levels remains low with USO1 and SAR1A KO).

      __Response: __We have included a discussion about compensation by the proteasome in these genetic backgrounds (lines 219-226) and have referred to the newly incorporated western blot (new Fig S2E).

      Comment: Line 302-304, I believe that statement only refers to Fig S4C and the statement for Fig5G is in the next sentence. Please remove Fig5G from line 304. It was confusing to read.

      Response: __The reference of __Fig 5G has been removed.

      Comment: Line 367, there is a reference for Fig S5C but that figure is missing.

      __Response: __The spurious reference has been removed.

      Comment: Line 410-411, are there any reported clinical cases of EMC mutations with phenotypes that could be explained by elevated mitophagy?

      __Response: __Thank you for the suggestion. There are clinical presentations of EMC mutations and splice variants in diseases and conditions related to the central nervous system (PMID: 23105016, PMID: 26942288, PMID: 29271071). However, all characterization has been done in the clinical setting looking at clinical presentations/symptoms and not molecular or cellular characterization. We have added a line to the discussion about this speculative correlation between EMC deficiency and mitophagy (lines 516-519).

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *

      [Reviewer #1]

      Comment: Figure 3B: Are the red puncta observed in USO1 and SAR1A cells a product of higher levels of ER-phagy owing to BNIP3's high presence on the ER membrane?

      __Response: __This is an intriguing hypothesis. We will test whether this is true using a USO1/ATG9A dual KO. However, we don’t think this result is critical to the overall arc of the manuscript and we will not include these data if they indicate otherwise.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] This study brings a lot of new information on the regulation of flagellar genes, from the identification of novel sigma 28-dependent sRNAs to their effects on flagella production and motility. It represents a considerable amount of work; the experimental data are clear and solid and support the conclusions of the paper. Even though mechanistic details underlying the observed regulations by MotR or FliX sRNAs are lacking, the effect of these sRNAs on fliC, several rps/rpl genes, and flagellar genes and motility is convincing.

      The connection between r-protein genes regulation and flagellar operons is exciting and raises a few questions. First, from the RILseq data, chimeric reads with mRNA for r-proteins (including rpsJ) are not restricted to the sigma 28-dependent sRNAs (e.g. rpsJ-sucD3'UTR, rpsF-DicF, rplN-DicF, rplK-ChiX, rplU-CyaR, rpsT-CyaR, rpsK-CyaR, rpsF-MicA...), suggesting that regulation of r-protein synthesis by sRNAs is not necessarily related to flagella/motility. Second, it would be interesting to know if the flagellar operons are more sensitive than other long operons to antitermination following MotR overexpression? In other words, does pMotR similarly affect antitermination in rrn or other long operons?

      The general effect of pMotR or pFliX on the expression of multiple middle and late flagellar genes is also interesting even though the mechanism is not clear. While it may be difficult to fully address it, testing whether some of these regulatory events depend on the control of fliC and/or the S10 operon could be relevant (by analyzing the effects in strains deleted for fliC or nusB for instance).

      We also think the connection between r-protein genes regulation and flagellar operons is exciting and raises some intriguing questions. While there are other RIL-seq chimeras for r-protein genes, the highest numbers are found for MotR and FliX. Nevertheless, understanding the impact of these other sRNAs on the r-protein operons and elucidating which long operons are most sensitive to antitermination following MotR overexpression are important directions for further studies.

      Reviewer #2 (Public Review):

      [...] This is a very interesting study that shows how sRNA-mediated regulation can create a complex network regulating flagella synthesis. The information is new and gives a fresh outlook at cellular mechanisms of flagellar synthesis. The presented work could benefit from additional experiments to confirm the effect of endogenous sRNAs expressed at natural level.

      We agree that experiments regarding the endogenous effects of endogenous sRNAs are important. We provide such data in Figures 8 and S14 for MotR and FliX in a variety of assays: flagella numbers by electron microscopy, motility and competition assays, expression of flagellar genes by RT-qPCR and western analysis. We went to the trouble of constructing strains carrying point mutations in the chromosomal copies of these genes rather than deletions to avoid interfering with expression of motA and fliC given that MotR and FliX encompass the 5’ and 3’ UTRs respectively.

      Reviewer #3 (Public Review):

      [...] Overall, this comprehensive study expands the repertoire of characterized UTR derived sRNAs and integrate new layers of post-transcriptional regulation into the highly complex flagellar regulatory cascade. Moreover, these new flagella regulators (MotR, FliX) act non-canonically, and impact protein expression of their target genes by base-pairing with the CDS of the transcripts. Their findings directly connect flagella biosynthesis and motility, highly energy consuming processes, to ribosome production (MotR and FliX) and possibly to carbon metabolism (UhpU).

      Specific points to be considered:

      • The authors use a crl- hyper-motile strain as WT strain for the study and sometimes also a crl+ strain is used. Can the authors comment on potential reasons why some phenotypes (e.g., UhpU and MotR effects on motility) are only detectable in the crl+ strain or vice versa? Is σS regulation important for the function of these sRNAs?

      • In several experiments, a variant of MotR sRNA, MotR that harbors a 3 nt mutation upstream of the seed sequence is used and seems to mediate stronger phenotypes (impact on flagellar number) upon overexpression compared to WT or phenotypes not retrieved for WT MotR (increased flagellin expression). It would be helpful to have some more clarification throughout the text, why this variant was used, even when OE of WT MotR already has impact on the target and how these three mutated nucleotides impact target regulation. For example, does MotR show increased RNA stability or Hfq binding compared to MotR? Does the mutation in MotR* impact MotR structure (e.g., based on secondary structure predictions) or increase the complementarity with selected targets at potential secondary binding sites (e.g., based on target predictions)? For example, Fig. S7 shows additional regions of interaction between MotR and fliC mRNA beside the seed sequence. It is also suggested that MotR might have multiple interaction sites on rpsJ mRNA. Additional structure probing or biocomputational predictions could clarify these points.

      • It is suggested that UphU impacts on motility via regulation of LrhA, which represses transcription of flhDC, and therefore the flagellar cascade. While LhrA-mediated regulation by UphU is validated based on reporter genes, the effect of UhpU OE on FlhDC levels is not directly examined (Fig. 3). Furthermore, as deletion of LrhA de-represses the flagellar cascade and UhpU was also shown to increase motility, the conclusions could be further strengthened by examining flhDC levels and/or the effect of ∆UhpU (if the sRNA part can be deleted) on motility (reduction) due to relieved down-regulation of LrhA.

      • This study provides many opportunities for future follow-work. Now that the four sRNAs and some of their targets and opposing effects on flagella biogenesis have been identified, it will be interesting to see how the sRNAs themselves are temporally regulated throughout the flagella biogenesis cascade and which other targets are regulated by them. Future studies could also provide insights into the mechanism and function of FlgO sRNA, which seems to act via a different mechanism than base-pairing to target RNAs, as well as the global effects of regulation of ribosomal genes via FliX and MotR.

      We thank the reviewer for the constructive comments about the variation between the crl- and crl+ strains, and about the use of MotR versus MotR*, and will address these points in a revised version of the manuscript. Regarding the UhpU-mediated regulation, we agree that assays of flhDC expression will strengthen our conclusions. We share the reviewer opinion regarding many opportunities for future follow-up work.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      Dear Editor and reviewers,

      We would like to thank the three reviewers for their thorough review of our manuscript and their detailed comments and very helpful suggestions to improve the manuscript. Overall, we thought the reviews were very positive with the reviewers commenting that our discovery of a novel genetic code variant is a “cause for celebration” and that our study is “technically solid” and “rigorous”. All three reviewers agree that our manuscript would “stimulate new discussions in the field of genetic code evolution” and also be of broad interest to evolutionary cell biologists, protistologists and the translation/protein synthesis community at large. The reviewers highlight the particular novelty of the genetic code variant described here due to it being an exception to the wobble hypothesis which adds a new level of complexity to stop-codon reassignment. The reviewers share our frustration about the lack of proteomics data due to being unable to establish a stable culture but acknowledge that we address this limitation frankly in our discussion and agree that it is “frustrating but it's not a limitation”.

      We present an updated and improved version of the manuscript after taking on board the reviewers’ suggestions. Our point-by-point responses to their comments and our modifications are detailed below in bold.

      Point-by-point description of the revisions

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      Summary

      This study by J. McGowan and colleagues reports the discovery of a ciliate species that uses a variant genetic code where the codons UAA and UAG, which are stop codons in the canonical code, instead code for lysine and glutamate respectively. The primary data are genomic and transcriptomic sequence libraries from single cells. The genetic code was predicted by aligning coding sequences to references from other species and examining the most frequent amino acids in positions homologous to putative coding-UAA/UAGs. They also identified suppressor tRNAs for UAA and UAG, and tandem in-frame stop UGAs (but not UAA/UAG) in the 3'-UTR, which further support the recoding of UAA and UAG.

      A limitation of this study (and several other recent studies on variant genetic codes) is that the predictions are based on nucleic acid sequencing, without confirmation from proteomics. The authors acknowledge and briefly but frankly discuss the limitations in their manuscript (lines 258-261).

      Major comments

      Controls against contamination and sequence chimeras

      The ciliate species studied here was an environmental isolate, and sequence libraries were prepared by amplification from small pools of cells sorted by FACS. The genome assembly was produced by co-assembly of multiple amplified libraries. Given the potential for contamination and amplification artefacts (such as sequence chimeras) associated with these methods, I think it is important to demonstrate that the data truly originate from one species, so as to rule out the possibility that the co-assembly may be chimeric, i.e. representing two or more organisms with different genetic codes (one with UAA recoded and the other with UAG recoded, for instance). Even if the cell sorting was accurate, contamination could still enter down the line during library preparation so it would be important to show internal evidence from the sequence data too.

      We understand the reviewer's concerns about the possibility of contamination as it can be a major issue in environmental single cell sequencing experiments. We have addressed the individual points below in detail to demonstrate that we have generated a clean genome assembly of a single ciliate species but also summarise here:

      • The cells we sequenced originated from the same clonally isolated cell propagated in culture
      • We have manually curated the assembly
      • The assembly has a unimodal GC content peak with a low BUSCO duplication score
      • Most genes (95.9 %) contain both in-frame UAA and UAG codons
      • We recovered a single identical ciliate 18S rRNA gene across all 10 samples
      • De novo assemblies of the 10 individual gDNA libraries are virtually identical in terms of average nucleotide identity
      • We also predicted the genetic code for each of the genome and transcriptome samples individually
      • 85% of the final assembly is taxonomically classified as Ciliophora. The remainder is either unclassified (i.e. no hits) or has spurious/inconsistent hits

        Specifically:

      (a) From the description in Methods under "Sampling, Ciliate isolation, culturing, and cell-sorting", it is not clear whether all the cells that were ultimately sequenced originated from the same clone (i.e. the same well in the 96-well plate described in line 389). Could the authors confirm whether this was the case?

      Yes. All the sorted cells originated from the same ciliate clone. A single-cell was isolated and cleaned (without removing all the environmental bacteria). The ciliate single-cell divided and we established a mono-clonal ciliate culture that we used for the cell sorting and sequencing. This culture grew but only for a relatively short period. We could not establish a long term culture.

      (b) What % of genes have in-frame coding UAA, UAG, or both? How per gene on average? Counts are given for the conserved genes/domains identified by PhyloFisher or Codetta (lines 192-207), and overall frequencies per codon are addressed later in lines 263 onward, but how often do they occur together in the same genes?

      My reasoning behind this is that if genes with both in-frame coding UAA and UAGs are common then it is very unlikely to be the result of chimeric sequence artefacts from whole-genome amplification.

      We have updated the text to include this information. From the PhyloFisher analysis, we had reported that 58 genes contained in-frame UAA codons and 46 genes contained in-frame UAG codons. We have now added the text “Amongst the genes identified by PhyloFisher, 27 contained both an in-frame UAA codon and an in-frame UAG codon.”

      Additionally, from our annotated gene set, we had reported that 98.6% of genes contain at least one UAA codon and 96.4% of genes contain at least one UAG codon. We have now added text to report how many genes contain both codons “The reassigned codons are widely used across genes with 95.9% of genes containing both a UAA codon and a UAG codon”.

      The example gene (tubulin gamma chain protein) shown in Figure 1 contains both in-frame UAA codons and in-frame UAG codons, with the UAA codons aligning to lysine and the UAG codons to glutamic acid.

      (c) What is the sequence identity of conserved marker sequences between the individual amplified replicate libraries?

      I would naively expect that individual replicates may not have the full set of markers because of uneven amplification, but if the sequences originate from the same clone they should have overlapping coverage of the conserved markers, and these should be +/- identical between replicates (save for allele variants). If so this would support the claim that contaminant sequences were mostly removed during sequence QC and that the cells were clonal.

      We generated an individual assembly for each of the 10 gDNA libraries and calculated average nucleotide identity at the whole assembly level. On average, the 10 assemblies are 99.43% identical to each other, with the least similar pair being 99.37% identical to each other. This level of variation includes not only allelic variants but also sequencing/assembly errors as the individual libraries are relatively low coverage. In terms of assembly alignment coverage (i.e. the fraction of each assembly that is aligned to another assembly), the average value is 76.5% and the value for the lowest pair is 59.1%. We have now also made the individual 10 assemblies available in the Zenodo repository (10.5281/zenodo.7944379) and updated the methods section.

      Furthermore, as an additional quality control step, we predicted the genetic code for each of the 10 individual genome assemblies and obtained the same predictions that UAA encodes lysine and UAG encodes glutamic acid for all 10 individual assemblies. We also predicted the genetic code for each individual RNA-Seq sample based on individual transcriptome assemblies which yielded consistent predictions.

      (d) Line 392: "Non-axenic" presumably refers to environmental prokaryotes. This also appears to contradict the statement that the cells were "free of any other contaminant" (line 387). Could authors confirm whether they mean "non-axenic but monoeukaryotic"?

      In line 387, when we say "free of any other contaminant” we mean that we isolated a ciliate single-cell from the environmental sample, and the picked ciliate cell was washed 3 times until it was free of any other eukaryotes, but still containing environmental bacteria. In line 392, when we say non-axenic, we mean that the mono-clonal ciliate culture contained environmental bacteria and was monoeukaryotic.

      We have modified the text in the methods section to say “free from any other eukaryote” and “non-axenic but monoeukaryotic”.

      (e) Lines 448-451: More details should be given on the criteria used to identify and bin out contaminants. MetaBAT typically bins prokaryotic genomes quite well, but not eukaryotic ones. What did the bins look like and how were the eukaryotic ones chosen?

      We routinely use MetaBAT2 to assist with separating bacterial contigs from protist genomes. From our experience we find that it generally performs well but requires careful manual curation. We only use tetranucleotide frequencies when binning single-cell assemblies and not coverage variance as this is heavily skewed due to amplification bias from single-cell amplification. We integrated the binning results from MetaBAT2 with taxonomic classification from tools such as CAT, Blobtools and Tiara, and manually curated the assembly.

      We have modified both the results and methods section to clarify that the assembly was manually curated to remove contaminant contigs.

      For example, using CAT, which taxonomically classifies contigs based on blast/diamond hits to open reading frames:

      The final curated assembly is 69.7 Mb in length.

      59.5 Mb (85.4%) is classified as Ciliophora.

      9.7 Mb (13.9%) is unclassified.

      The remaining 0.5 Mb (0.7%) have inconsistent, low-identity hits to 22 different Eukaryotic and Bacterial phyla (due to lack of closely related species in public databases).

      Furthermore, we recovered only a single ciliate 18S rRNA gene and the final curated assembly has a unimodal GC content peak with a low BUSCO duplication score and high cDNA mapping rate.

      __Minor comments __

      Line 52: Not strictly true, some germline-limited segments contain mobile elements with coding sequences, e.g. TBE elements in Oxytricha (doi:10.1371/journal.pgen.1003659)

      Thank you for pointing this out. We have rephrased “excision of non-coding sequences” to “excision of micronucleus-limited sequences” to describe the process of macronuclear development more generally.

      Lines 229-231, Supplementary Table 1: Presenting the identity matrix as a distance tree may make it easier to see the pattern of similarity between the tRNAs

      We have added a phylogenetic network of tRNA genes as a supplementary figure to better visualise the relationships between tRNA genes.

      Lines 274-275: Suggest stating the criterion for classifying genes as "highly expressed" on the first mention of this in the Results, although it's explained later on in the Methods.

      We have clarified this in the results section by adding the text: ‘We defined a subset of genes as “highly expressed” based on the 10% of genes with the highest transcripts per million (TPM) values for comparison below.’

      Lines 298-299: What is the frequency of tandem UGA stops in the 3'-UTR in genes with coding-UAA/UAG vs. genes without, and is there a significant difference? The argument in this paragraph is that UAA+UAG reassignment increases selective pressure to minimize translational readthrough. Therefore I think that it would make sense to compare the frequency in genes with and without these codons.

      Following the reviewer’s suggestion, we have looked at tandem UGA stop codons in the 3’-UTR of genes that don’t use UAA and genes that don’t use UAG. We found similar enrichment for in-frame UGA codons at the beginning of the 3’-UTR in these small subsets of genes.

      To clarify, the hypothesis from the literature is that there may be stronger selective pressure to maintain tandem stop codons in ciliates with reassigned genetic codes, particularly those that use only UGA as a stop codon. Within a genome, we wouldn’t expect a difference if a gene contains UAA/UAG codons.

      Lines 353-354, Figure 5: Suggest marking the internal nodes where genetic code changes likely occurred. At the moment only the leaves of the tree are annotated with the genetic codes of the respective species. This would make it clearer how one counts the numbers of independent origins as reported in the text (e.g. "... a fourth independent origin of UGA being translated as tryptophan").

      We have decided not to label the internal nodes on the phylogeny. We think that deeper sampling will reveal that some of these genetic code changes occurred independently, so we don’t want the figure to be misleading. Also, for the species with the genetic code UAA=Q, UAG=Q and UGA=W, we can’t determine the order of events.

      Lines 371-372: Question out of curiosity (not necessary to address for the manuscript at hand): Do the authors think the recoding of UAA and UAG happened simultaneously in both codons or stepwise, or is there insufficient information to speculate?

      An initial guess would be that it happened as a stepwise process but without deeper sampling of this lineage it is not possible to determine the order of events.

      This highlights the need for deeper sampling and sequencing across undersampled lineages of ciliates and demonstrates the utility of single-cell OMICs approaches for species that are not yet amenable to culturing.

      Line 395: "10uL" should use the actual symbol for "micro" prefix. Also, the choice of spacing or no spacing between numerical figure and units should be made consistent in manuscript.

      Fixed

      Line 403: "Biotynilated" should be "Biotinylated"

      Fixed

      Line 414 and elsewhere: "2" in MgCl2 should be subscripted

      Fixed

      Lines 419-420: Clarify whether the "r" and "+" symbols are to be read as prefixes or suffixes, i.e. is the modified base the preceding or succeeding one.

      We have clarified in the text that these symbols are to be read as prefixes.

      Table 1: What is the difference between the two sets of BUSCO completeness scores reported? One is given under "Genome assembly" and the other under "Genome annotation", but the annotation is based on the same assembly, right? I'm assuming this has to do with different modes in which BUSCO can be run, but this should be explained in the Methods (lines 452-453, 496-497) and briefly explained in the Table caption.

      Yes this is because we ran BUSCO in two different modes. BUSCO is run in genome mode on the genome assembly and in protein mode on the genome annotation. In genome mode gene prediction is performed by Augustus guided by amino acid BUSCO group block-profiles while in protein mode the gene set described in our methods is the input to BUSCO classification. The superior BUSCO results for the protein mode reflect the superiority of our final annotation over that generated by BUSCO Augustus. We have added text to the methods section and to the table caption to clarify which mode was used.

      **Referee Cross-commenting** I generally agree with the other reviewers' comments. Specifically I like reviewer #3's suggestion #3 to have a more detailed summary of the codon frequencies, perhaps as a graphic, and to compare the tandem stop frequencies with other ciliate species, especially those with all three canonical stops.

      Reviewer #1 (Significance (Required)):

      Any new genetic code variant discovered is a cause for celebration! This is a basic biological fact with inherent significance and should be generally interesting to biologists because the rarity of variant codes stands in contrast to the diversity of most biological systems.

      This variant code would also stimulate new discussions in the field of genetic code evolution specifically because, as the authors point out, when both UAA and UAG are recoded they both usually encode same amino acid, but here they are recoded to different ones. This is an apparent exception to the "wobble" hypothesis for why these codons often evolve in concert, which was well explained with relevant citations in the Introduction.

      For context: My expertise is in genomics and environmental microbiology.

      END reviewer 1

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This study reports the reassignment of the UAA and UAG stop codons to lysine and glutamic acid, respectively, in the ciliate Oligohymenophorea sp PL0344. The paper is nicely written, easy to read and the experimental approach, ideas and questions are easy to follow. The work is technically solid both at the NGS - in house library preparation, sequencing and data interpretation - as well as phylogeny levels. The conclusions are consistent with the comparative genomic and transcriptomic data obtained by the study.

      __Reviewer #2 (Significance (Required)): __

      The work extends current knowledge on codon reassignment in ciliates, confirming previous discoveries of existence of very high stop codon assignment flexibility in these organisms. The assignment of UAA and UAG to two different amino acids by two different tRNAs is very interesting and reinforces the idea that stop codon reassignment in ciliates is rather common. It also raises important questions about the parallel evolution of the release factor-1 (eRF1), Lysine and Glutamine tRNAs, as the reassignment requires loss of recognition of both UAA and UAG by eRF1 with parallel appearance of the new Lysine and Glutamic Acid suppressor tRNAs.

      The main issue of this work is the inability to cultivate the ciliate Oligohymenophorea sp PL0344 in the laboratory to prepare protein extracts for direct analysis of the amino acids inserted at UAA and UAG sites by Mass Spectrometry. The comparative genomic and transcriptomic data, as well as the identification of cognate tRNA anticodons for UAA and UAG, are likely correct, but provide indirect evidence for the assignment of UAA to Lysine and UAG to Glutamic Acid. This issue is relevant because one cannot exclude the possibility of insertion of other amino acids at UAA and UAG sites beyond Lysine and Glutamic acid, respectively; nor can one exclude the possibility that such amino acids are inserted at high level. The authors do acknowledge the limitations of the unavailability of protein extracts for direct MS analysis of the reassignment, but should consider, in particular in the discussion, the possibility of multiple amino acid insertions in a context where Lysine and Glutamine Acid are the major but not the only amino acid species being inserted at those sites.

      Based on my expertise of studying codon reassignments in fungi of the CTG clade, I believe this work is very interesting and appealing to the genetic code community, and is of relevance to the evolution and protein synthesis research communities at large.

      We thank the reviewer for their positive review. They raise an important point about the possibility of amino acids other than lysine and glutamic acid being inserted for UAA/UAG codons which we hadn’t considered. We have added text and relevant references to our discussion to highlight this possibility:

      “Additionally, while the genomic and transcriptomic data provide strong evidence that lysine and glutamic acid are the major translation products of UAA and UAG codons, respectively, we cannot rule out the possibility that other amino acids are (mis)incorporated at these sites which could be detected using mass-spectrometry [38, 39].”

      Krassowski T, Coughlan AY, Shen X-X, Zhou X, Kominek J, Opulente DA, et al. Evolutionary instability of CUG-Leu in the genetic code of budding yeasts. Nat Commun. 2018;9:1887. Mordret E, Dahan O, Asraf O, Rak R, Yehonadav A, Barnabas GD, et al. Systematic Detection of Amino Acid Substitutions in Proteomes Reveals Mechanistic Basis of Ribosome Errors and Selection for Translation Fidelity. Molecular Cell. 2019;75:427-441.e5.

      END reviewer 2

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Summary: from genome and transcriptome sequencing of what appears to be a novel ciliate from the class Oligohymenophorea, McGowan et al provide convincing evidence of a protist in which the stop codons UAA and UAG have almost certainly been recoded to specify incorporation of different amino acids (UAA = K; UAG = E) during translation. Several ciliates from different classes use a non-standard genetic code (as do a narrow variety of other protists), but this is an unusual observation in that stop codons which differ only in the wobble position code for different amino acids in the ciliate identified here.

      I say 'almost certainly' the stop codons have been recoded in Oligohymenophorea sp. PL0344 because in the absence of being able to retain the ciliate in culture the authors have not been able to complete the proteomics which would unequivocally (a) show stop codons now code for amino acids and (b) confirm the identity of the amino acids now encoded (the authors discuss this issue on p12).

      Comments: overall this manuscript is straightforward to read and the analyses realistically taken as far as is realistic in the absence of a continuous culture method. My suggested revisions should be straightforward for the authors to address.

      1) The manuscript appears to report the identification and genome/transcriptome sequencing of a novel ciliate species - clarity should be provided by the authors. However, it disappointed me that this manuscript was crafted entirely from nucleotide sequencing. I would have welcomed seeing the morphology of the ciliate identified here and would have anticipated that there was sufficient material to perform microscopy at the light level (for DIC images) and by scanning or transmission electron microscopy.

      Yes, based on the 18S rRNA sequence and phylogenies of protein-coding genes, this is a novel species that hasn’t been described before. The most similar hits to the 18S rRNA gene are to other unnamed/environmental sequences. We haven’t attempted to name or describe this species as we weren’t able to establish a culture, so have referred to it as Oligohymenophorea sp. PL0344. We have clarified in the text that this is a novel, unnamed ciliate species.

      The genomic and transcriptomic data was generated from a single cell isolate propagated into micro-cultures of 10’s of cells. These were done in the strictest conditions in an attempt to minimise contamination. Consistent with this approach it was not possible to obtain useful SEM/TEM as it would be very hard to recover EM imaging from 10’s of cells (a process that would have drastically reduced our ability to do replete genome sampling). Similarly, our approach to culturing limited our ability to acquire useful DIC images. After discovering that this ciliate uses a novel genetic code, we attempted on a number of occasions to re-isolate the same species from the same and surrounding water bodies but failed.

      2) It is unfortunate that the ciliate could not be maintained in culture (or cryopreserved). Coordinates for the University Parks pond are provided, but I got the impression that this ciliate could be repeatedly isolated. Thus, in the absence of culture methods could the authors indicate the points in the year when the ciliate could be isolated (i.e. is there a season element to when PL0344 could be isolated) and how frequently when sampling was performed could PL0344 be seen? From the environmental sequence data that is publicly available is there any evidence for the presence of PL0344 anywhere else in the world? I'd be surprised if this was a UK-specific ciliate.

      The water sample from which this ciliate was isolated was collected in April 2021. After having sequenced its genome and identifying the genetic code change, we made several attempts to reisolate it from the same pond but were unsuccessful. Regarding the geographic distribution of this ciliate, in the text we mention that the most similar 18S rRNA sequence in GenBank is to an unnamed species recovered in a metabarcoding study in France with 99.81% identity. We assume that this is the same species. We also examined other publicly available environmental datasets such as the PR2/metaPR2 database. The most similar match in the metaPR2 database was to a sequence “OLIGO4_XX_sp”. In the metaPR2 database this sequence is unique to Lake Garda in Italy (sample name: “Lake_Garda-LTER-euphotic-water”). However, this hit was only 98% identical with a partial alignment so we did not discuss it in the text. We agree that it is very unlikely that this is a UK-specific ciliate but cannot determine its geographic range based on the publicly available environmental sequence data, other than the single hit to a sequence from France. We think it is important to stress that it was not the aim of our paper to describe the taxonomy and biogeographical range of this ciliate but rather to report the exciting shift in codon usage.

      3) I felt the statistics presented on pages 13-14 (lines 277-301) for codon usage were a little superficial. It would be helpful to see how frequently other E and K codons are used in PL0344 and ideally to see how similar codon usage differs in the more model ciliates Paramecium, Tetrahymena or Stentor. To complete an analysis and justify/confirm conclusions drawn, I would also like to see how frequently in-frame, downstream stop codons are seen in ciliates where stop codons have NOT been reassigned - although the data in Fig 5 indicates genome/transcriptome sequences are not necessarily complete for many ciliate species (where stop codons are not reassigned), there is certainly more varied data to look at than when Fleming and Cavalcanti published their PLoS One work (which is cited in the manuscript).

      We have shortened this section about UAA and UAG usage, with supplementary table 3 showing usage of all codons in all genes compared to our subset of highly expressed genes.

      We have also added a sentence stating how many genes contain both in-frame UAA and UAG codons based on the point from Reviewer 1: “The reassigned codons are widely used across genes with 95.9% of genes containing both a UAA codon and a UAG codon.“

      According to our knowledge, there are no new genome assemblies available for ciliates that use the canonical genetic code since the Fleming and Cavalcanti publication from 2019, certainly not any with annotated gene sets available for comparison. The species in Fig 5 which use the canonical genetic code are all from transcriptome data (other than Stentor) that have generally low completeness. We do not think comparison with low-quality transcriptome assemblies would make a fair comparison as they would be biased towards transcripts with higher expression. Furthermore, they likely include many fragmented transcripts which are not suitable for detailed comparisons of the stop codon/3-UTR region.

      4) Given the presence of just one stop codon in PL0344 have the authors looked genome-wide at nucleotide composition 5' and 3' to UGA. The nucleotide sequences 5' and 3' to a stop can influence whether read through is and thus potentially limits the frequency of or tendency for unwanted readthrough?

      We thank the reviewer for this suggestion which is something we did not investigate initially but have now added a short section in the manuscript to address. Many studies in model organisms have demonstrated that UGA is the least robust stop codon and the most prone to read through. As the reviewer alludes to, this is particularly interesting for ciliates with reassigned genetic codes that use only UGA as a stop codon. Experimental data from model organisms have shown that the sequence composition surrounding a stop codon can influence the frequency of read through, with the nucleotide immediately downstream of the stop codon (“+4 position”) being particularly important.

      We have now looked at the sequence composition around stop codons for Oligohymenophorea sp. PL0344 and our results show that cytosine tends to be avoided following the UGA stop codon. From the literature, presence of a cytosine following UGA (i.e., UGAC) leads to a substantial increase in translational read through. Furthermore, when examining the subset of highly expressed genes, there are significantly fewer cases of UGAC when compared to all genes. This trend has previously been reported in Paramecium and Tetrahymena based on EST data (Salim, Ring and Cavalcanti; 2008).

      We have added a short section to the text reporting this and a supplementary figure showing a sequence frequency logo around the stop codon for all genes and for the subset of highly expressed genes. We are very cautious, however, that there is a paucity of experimental studies investigating stop codon robustness in ciliates. While several publications hypothesise that read through may happen at higher rates in ciliates due to a combination of factors (e.g., ERF-1 mutations, presence of tandem stop codons, competition from suppressor/near-cognate tRNA genes, etc..) we are careful not to speculate without experimental evidence.

      __Reviewer #3 (Significance (Required)): __

      Strengths - I found this a straightforward manuscript to read - aside from the interesting and unexpected observation about genetic code use in PL0344, Fig 5 draws together a lot of earlier published information into an easily accessible form - I felt this a particularly useful part of the manuscript.

      I don't feel the absence of proteomics to back up the genome/transcriptome analysis is a notable limitation - it's perhaps frustrating but it's not a limitation. However, the work does perhaps inevitably feel a little bit observational - there's not really a lot of insight or new insight into why the genetic code can be revised in some microbial eukaryotes - in contrast, for instance, to a recently published study of the aptly named Blastocrithidia nonstop. McGowan et al's manuscript, however, will be of interest and should be formally published.

      Descriptions of organisms that have tweaked the standard genetic code are not new; coupled to the limited insight into why the genetic code can be rewritten so readily in ciliates, this limits the general appeal of the work. However, the study executed is rigorous and it should be of interest to a wide variety of protistologists, evolutionary cell biologists, and researchers in the translation field.

      END reviewer 3

    1. Author Response:

      The following is the authors' response to the original reviews.

      eLife assessment

      This study presents important findings regarding the quantification of dynamics in fish communities in changing ecosystems by combining a large-scale environmental DNA metabarcoding time series with novel statistical approaches. The methods are convincing, with controlled experiments, thorough statistical analyses, and a substantial dataset covering two years of detailed observation, which can provide sufficient power to detect fine-scale ecological interactions. This work is relevant for informing future research on assessing community stability under climate change.

      Thank you so much for your careful evaluation of our manuscript. We are very pleased to hear that you found our study important. We have revised our manuscript according to the helpful comments to further improve our manuscript.

      Reviewer #1 (Public Review):

      […] Their work provides a highly relevant approach to perform species-interaction strength analysis based on eDNA biodiversity assessments, and as such provides a research framework to study marine community dynamics by eDNA, which is highly relevant in the study of ecosystem dynamics. The models and analytical methods used are clearly described and made available, enabling application of these methods by anyone interested in applying it to their own site and species group of interest.

      Thank you so much for your time and effort to evaluate our manuscript. We are very pleased to hear that you found our study interesting. We have further revised the manuscript according to your comments and hope that the revised manuscript is now better than the original one.

      Strengths: The authors have a study setup that is suitable to measure the effects of temperature of the eDNA diversity, and have taken a large number of samples and all appropriate controls to be able to accurately measure and describe these dynamics. The applied internal spike in to enable relative eDNA copy number quantification is convincing.

      We are happy to hear that you found the study design and the method to estimate eDNA copy number are suitable and convincing.

      Weaknesses: The authors aim to study the relationship between species interaction strength and ecosystem complexity, and how temperature will influence this. However, there is only limited ecological context discussed explaining their results, and a link with climate change scenario's is also limited. A further discussion of this would have strengthened the manuscript.

      Thank you so much for the comment. We have added discussion about how our study contributes to understanding fish community assembly process and predicting the community-level response under ongoing climate change. We have added one subsection, "Implications for fish community assembly and the effect of global climate change ", at L679. As for the ecological discussion for each specific fish-fish interaction, we provided this in Supplementary file 1c.

      The authors were able to find a correlation between water temperature and interaction strengths observed. However, since water temperature is dependent on many environmental variables that are either directly or indirectly influencing ecosystem dynamics, it is hard to prove a direct correlation between the observed changes in community dynamics and the temperature alone.

      Thank you for pointing this. We have discussed the possibility of the effects of other environmental variables (e.g., oxygen) and how we could overcome this issue at L661. Some of the sentences were originally in the subsection " Interaction strengths and environmental variables ", but were moved to the subsection " Potential limitations of the present study and future perspectives".

      Reviewer #2 (Public Review):

      In this work Ushio et al. combine environmental DNA metabarcoding with novel statistical approaches to demonstrate how fish communities respond to changing sea temperatures over a seasonal cycle. These findings are important due to the need for new techniques that can better measure community stability under climate change. The eDNA metabarcoding dataset of 550 water samples over two years is, I feel, of sufficient scale to provide power to detect fine-scale ecological interactions, the experiments are well controlled, and the statistical analysis is thorough.

      Thank you so much for your time and effort to evaluate our manuscript. We are happy to hear that you found our study technically sound and important. We have revised the manuscript according to your comments to improve our manuscript further.

      The major strengths of the manuscript are: (1) the magnitude of the dataset, which provides densely replicated sampling that can overcome some of the noise associated with eDNA metabarcoding data and scale up the number of data points to make unique inferences; (2) the novel method of transforming the metabarcode reads using endogenous qPCR "spike-in" data from a common reference species to obtain estimates of DNA concentration across other species; and (3) the statistical analysis of time-series and network data and translating it into interaction strengths between species provides a cross-disciplinary dimension to the work.

      Thank you for your positive comments. Regarding (1), we are very pleased to hear that (1) our intensive and extensive water sampling, (2) our method for using the common fish species eDNA as "spike-in," and (3) our nonlinear time series analysis were positively evaluated.

      I feel like this kind of study showcases the power of eDNA metabarcoding to answer some really interesting questions that were previously unobtainable due to the complexities and cost of such an exercise. Notwithstanding the problems associated with PCR primer bias and PCR stochasticity, the qPCR "spike-in" method is easy to implement and will likely become a standardised technique in the field. Further studies will examine and improve on it.

      We must admit that our endogeneous "spike-in" method does not overcome all problems associated with PCR. However, we agree with you and believe that we are heading in a correct direction. The method

      does not require the addition of external internal standard DNAs and enables post-hoc evaluation of eDNA absolute concentrations. Although this approach requires an additional experiment (qPCR), the method may be an alternative for quantifying eDNA concentrations.

      Overall I found the manuscript to be clear and easy to follow for the most part. I did not identify any serious weaknesses or concerns with the study, although I am not able to comment on the more complex statistical procedures such as the "unified information-theoretic causality" method devised by the authors. The section on limitations of the study is important and acknowledges some issues with interpretation that need to be explained. The methods, while brief in parts, are clear. The code used to generate the results has been made available via a GitHub repository. The figures are clear and attractive.

      We are very happy to hear that you found our manuscript clear and not containing any serious weakness.

      Reviewer #1 (Recommendations For The Authors):

      This is a very nice manuscript discussing highly relevant methods to use eDNA analysis to study interactions in marine ecosystems. There are some minor concerns that we will address below:

      - As already mentioned above, based on the statements in the introduction we expected a very elaborate discussion section concerning the ecological interaction observed between species. This is however missing, and a more extensive general discussion of the biological interactions would be appreciated, either based on existing literature, or by suggesting further experiments. Alternatively, the claims made in e.g. line 124-128 (Overcoming these difficulties....) could be amended so this expectation is not raised.

      Thank you so much for the comment. As answered in the response above, we have added discussion about how our study contributes to the fish community assembly process and predicting the community-level response under ongoing climate change at L679.

      Specifically, we argued that our study provides a piece of evidence that temperature exerts influences on fish-fish interactions under field conditions at a relatively short time scale (weeks to months). We suggested that temperature effects on fish community assembly involve effects at different time scales, and thus, integrating results from different temporal (and spatial) scales are necessary to understand the fish community assembly process in nature. As stated above, we provided the detailed ecological discussion for each specific fish-fish interaction in the Supporting Information.

      - A lot of negative controls were taken and described in the material & methods. However, there is no clear mention of what was done with the outcome of these negative controls. How did the results of the negative controls influence your analysis? Or were they all completely negative?

      Thank you for pointing this out. The negative controls produced negligible reads (177 ± 665 reads [mean ± S.D.]), which accounted for ca. 0.1% of the positive sample reads. Moreover, all the reads were assigned to non-target taxa, such as fish species that had never been observed in the study region and freshwater fish species. Therefore, we conclude that any contaminations in our experiments were negligible, and we discarded the sequence reads from the negative control samples. We have explained this in L533–L539 in the main text.

      - Line 423 states: "..suggesting that weak interactions are key to the maintenance of species-rich communities." We are wondering if this can be stated like this, as it seems the other way around would also be true, since in a species rich community it can be expected that most interactions are weak?

      Thank you for pointing this. out We agree that there is a possibility that the high species diversity could be a cause of weak intearctions. To clarify this, we have revised the sentence as follows in L568: " ...suggesting that understanding the causes and effects of weak interactions is key to understanding the maintenance of species-rich communities. "

      - There is a correlation between DNA concentration and temperature (e.g. shown in fig. S2b). We wondering what could be an argument to not correct for this temperature effect on eDNA concentrations (as now described) or if it would be better to apply a correction factor for this, as it is also shown that there is a correlation between DNA concentration and interaction strengths.

      In the unified information theoretic (UIC) analysis, we took the effect of temperature into account if temperature had statistically clear influence on eDNA dynamics of a particular fish species (L439). This means that temperature was included as a conditional variable in the calculation of TE (i.e., Zt in Eqn. [1]). Other environmental variables were also included if they had statistically clear influence. Similarly, in the MDR S-map, we included temperature or other environmental variables as conditional variables if they had statistically clear influence on eDNA dynamics of a particular fish species. We explained this in L479.

      - The models used for the interaction dynamics calculations are extensively discussed in this manuscript, although these details are also present in the original papers describing these models, and therefore the manuscript could be shortened by removing some of this explanation.

      Thank you for your suggestion. As you understood, the details of the method (S-map and MDR S-map) are available in Sugihara (1994), Chang et al. (2021), and elsewhere. However, we have kept the explanation so that readers who are not familiar with the methods can briefly understand the methods without the needs to read the detail of the previoius studies.

      Reviewer #2 (Recommendations For The Authors):

      L50-L72: I feel like the abstract could be snappier, i.e. quicker to read with less detail. Consider reducing it a little.

      Thank you for your suggestion. We have deleted some redundant phrases and shortened the abstract a little.

      L173-L176: I don't understand exactly what is suggested here. Perhaps rephrase?

      We have revised the sentence as follows (L165): " As our eDNA time series was taken twice a month, the interactions detected should also have the same time scale (e.g., the interactions detected may cause changes in the population size at the same time scale), which means that we tend to focus on behavior-level interactions (e.g., schooling) rather than birth-death process in the present study (except for predation)."

      L228: How many PCR replicate reactions were undertaken per sample?

      We performed eight technical replicates for the same eDNA template. This information is described in the third paragraph of the section "Paired-end library preparation and MiSeq sequencing." This section has been moved from the previous supplementary methods to the main text in the revision.

      L236: There is no mention later of how these blanks are used to clean up or filter the dataset from the effects of contamination. Consider adding this information.

      Thank you for pointing this. As in the responses above, we have described the negative controls in L533–L539 in the main text. The negative controls generated negligible reads, so we simply discarded the sequence reads.

      L252-L253: "Primer sequences were removed from merged reads and reads without the primer sequences underwent quality filtering"? Wouldn't all of the reads not have primers after the primers were trimmed off? Or is something else intended here?

      All primer sequences were removed after merging the paired- end reads (see "Sequence analysis"). There is no specific reason for this process, and we think that the primer removal before merging the paired- end reads will generate the same results.

      L264-L265: "To refine the above taxon assignments". I assume because there were lots of assignments to species that were not known from the study area? Explain why this was done.

      At present, the reference sequences are available for about 70% of 4,500 fish species in Japan. However, due to the unknown degree of intraspecific variation, using a uniform threshold of 98.5% to delineate species can result in over-splitting or over-clustering MOTUs. To solve this issue, the manual refinement of the taxon assignments was performed based on the phylogenetic tree. This has been explained in L335.

      L274: More details of the qPCR assay are required, or a citation of previous study or supporting information.

      The details of the qPCR assay are provided in the secion "Quantitative PCR and estimation of DNA copy numbers." This section has been moved from the previous supplementary methods to the main text in the revision.

      L327: Explain further how seasonality was treated here? This is an important part of the study, so deserves further attention.

      We included water temperature (if it had statistically clear influence on fish eDNA dynamics) as a conditional variable z(t) in the calculation of TE, and this took the effect of the seasonality in detecting causation into account. We have described this in L436–444.

      L407: Consider giving the code repository a DOI to cite.

      We have archived the analysis codes at Zenodo and provided the DOI in L39 and L521.

      L411: How many MiSeq runs exactly?

      We performed 21 MiSeq runs (often with other eDNA samples). We have described this in the main text (L299).

      L411: What proportion of your total sequencing data were assigned to fishes? This is a useful statistic to compare methods between studies.

      About 98% of the total sequence reads was assigned to fish. We have described this in the main text (L528).

      Figure 2: There does not appear to be a key to the color-coded species ecologies.

      We have added a legend for the fish ecology in Figure 2.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Consolidated response to public comments:

      We are grateful to the reviewers for their careful examination of our manuscript and for their insights for improving our work. We appreciate that they recognize the potential of the TARDIS approach for diverse transgenesis applications.

      We address two primary concerns that the reviewers raise. First is a concern that this approach is not as innovative as stated. We acknowledge that our work builds upon previous studies in the field, such as those by Nonet, Mouridi et al., with Malaiwong coming after our initial preprint. However, we believe that our approach offers a unique contribution, in that prior work does not provide a protocol or process to provide large-scale multiplexed transgenesis. Specifically, our introduction of large sequence library arrays (TARDIS Library Arrays or TLAs). While high throughput multiplexed transgenesis is discussed in Nonet & Mouridi manuscripts, it is never demonstrated. It is the combination of library construction, heritable transmission of the library itself, and then induced transgenesis of library components at a defined location within single individuals that makes this approach particularly useful.

      Second, there were concerns that we have not demonstrated that this approach will work beyond C. elegans. We agree that our discussion of the potential application of TARDIS beyond C. elegans is speculative at this point. Our intention was to highlight the potential for future development and application in other systems. In some cases, large integrations into the genome are possible, such as in the case of H11 locus in mice, which could provide a means to inherit a sequence library. We are hopeful that our success in C. elegans will inspire work in other systems. The motivation for this will naturally depend on the usefulness of actual TARDIS implementations, which will be forthcoming in due course.

      Reviewer #1 (Recommendations For The Authors):

      1. Section titled "Integration from TARDIS array to F1" beginning on line 161 has some missing details that make it difficult to follow. Many of those details are present in the following section titled "Generation and Integration of TARDIS promoter library", but should have been present sooner.<br /> a. How many barcodes were in the array in line PX786?<br /> b. Clarify the use of G-418, heat shock, hygromycin, etc. in this paragraph.<br /> c. Please clarify that the L1 death is due to selection with G-418 - "We found that a portion of the initially plated worms die, likely due to lack of array inheritance." is confusing unless you add that they are selected in this step.<br /> d. "These results suggest that approx. 100-200 worms need to be heat shocked to obtain an integrated line" - the math actually looks like 200-300, and this would be to get a single integrant.<br /> 2. In general, the barcoding study and results reported here read like a teaser/proof-of-concept but do not really robustly demonstrate the application of the method for barcoding and tracing individual lineages in a population of C. elegans. How many barcodes were in the array, and how many ended up in F1s? Would one need to screen for duplicate barcodes after integration?<br /> 3. The promoter library study is impressive but again, rather limited.<br /> 4. The Discussion section about extending this technology to other systems is fairly balanced, acknowledging the limitations that would need to be overcome. The language in the abstract and introduction is less balanced and oversells the current translation of this approach to systems outside C. elegans.

      Reviewer #2 (Recommendations For The Authors):

      As I mentioned in the Public Review, I appreciate the design of the selection markers for integration. However, I do not see a major advance in the field. The use of barcoding of individuals to address a biological question would change that impression.

      Regarding the integration of promoters, I think this is something that anyone could address in diverse forms using existing knowledge.

      Suggestions:<br /> - Use one or two more landing pads for barcoding of animals and check numbers, efficacy, enrichments..etc. About 500 sequences overrepresented may be too much for future applications;<br /> - Increase the number of landing pads for inserting promoters. Genomics context matters and this could help to have a better summary of the real expression patterns driven by the promoter of interest;<br /> - Other references about landing pads would be Vicencio et al, Genetics 2019, and Nonet microPublication Biology 2021.

      In addition to the general comments, the reviewers provided useful suggestions to the text that we have used to clarify the manuscript.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      The authors investigated state-dependent changes in evoked brain activity, using electrical stimulation combined with multisite neural activity across wakefulness and anesthesia. The approach is novel, and the results are compelling. The study benefits from an in-depth sophisticated analysis of neural signals. The effects of behavioral state on brain responses to stimulation are generally convincing.

      It is possible that the authors' use of "an average reference montage that removed signals common to all EEG electrodes" could also remove useful components of the signal, which are common across EEG electrodes, especially during deep anesthesia. For example, it is possible (in fact from my experience I would be surprised if it is not the case) that under isoflurane anesthesia, electrical stimulation induces a generalized slow wave or a burst of activity across the brain. Subtracting the average signal will simply remove that from all channels. This does not only result in signals under anesthesia being affected more by the referencing procedure than during waking but also will have different effects on different channels, e.g. depending on how strong the response is in a specific channel.

      We thank the reviewer for the positive comments and for raising this point. We do not believe that the average reference montage is obscuring an evoked slow wave in the isoflurane-anesthetized mice. Electrical stimulation did elicit a brief activation in nearby neurons that was followed by roughly 200 ms of quiescence, but no significant changes in firing in the other regions we recorded from (Author response image 1).

      Author response image 1

      ERP and evoked population activity during isoflurane anesthesia do not show evidence of global responses.

      (Top). ERP (-0.2 to +0.8 s around stimulus onset) with all EEG electrode traces superimposed. Data represented is the same: red traces have been processed with the average reference montage, black traces have not. (Bottom) Population mean firing rates from the areas of interest from the same experiment as above.

      We are familiar with the work from Dasilva et al. (2021), a study similar to ours because they also performed cortical electrical stimulation in mice anesthetized with isoflurane. They show widespread evoked multi-unit activity (derived from LFP) in isoflurane-anesthetized mice in response to electrical stimulation, but critical experimental differences may underlie the conflicting results presented in our study. Both works use similar levels of isoflurane to maintain anesthesia (we use a level roughly equivalent to their “deep” level). However, our experiments use only isoflurane, whereas Dasilva et al. induced anesthesia with ketamine and medetomidine followed by isoflurane. It has been shown that isoflurane and ketamine have different effects on neural dynamics (Sorrenti et al., 2021). Typically, isoflurane causes reduced spontaneous firing rates and decreased evoked response amplitudes compared to wakefulness, whereas ketamine has been shown to increase firing rates and evoked response amplitudes (Aasebø et al., 2017; Michelson & Kozai, 2018). Perhaps a more relevant difference are the electrical stimulation parameters used to perturb the brain. Dasilva et al. used 1 ms pulses of 500 μA, which would have a much larger effect than the stimulation used in this work, 0.2 ms pulses of 10-100 μA.

      Additionally, we would like to clarify that the average reference montage is not impacting the main findings of this work. As the reviewer correctly pointed out, the average reference montage does change the appearance of the ERP in the butterfly plots (Top panel in Author response image 1). However, all the quantitative analyses of the EEG-ERPs are performed on the global field power, computed by taking the standard deviation across all EEG channels, which is not affected by the average reference montage.

      Reviewer #2 (Public Review):

      […] The conclusions regarding the thalamic contributions to the ERP components are strongly supported by the data.

      The spatiotemporal complexity is almost a side point compared to what seems to be the most important point of the paper: showing the contribution of thalamic activity to some components of the cortical ERP. Scalp ERPs have long been regarded as purely cortical phenomena, just like most EEGs, and this study shows convincing evidence to the contrary.

      The data presented seemingly contradicts the results presented by Histed et al. (2009), who assert that cortical microstimulation only affects passing fibers near the tip of the electrodes, and results in distant, sparse, and somewhat random neural activation. In this study, it is clear that the maximum effect happens near the electrodes, decays with distance, and is not sparse at all, suggesting that not only passing fibers are activated but that also neuronal elements might be activated by antidromic propagation from the axonal hillock. This appears to offer proof that microstimulation might be much more effective than it was thought after the publication of Histed 2009, as the uber-successful use of DBS to treat Parkinson's disease has also shown.

      We thank the reviewer for their positive comments and thoughtful suggestions. We appreciate and agree with the reviewer’s perspective that the thalamic contribution to the cortical ERP is one of the key points of this study. We also thank the reviewer for their comment on the apparently contradictory results reported by Histed et al. (2009). This gives us the opportunity to further highlight the important contribution of our study to the field.

      First, we would like to highlight some key experimental differences between the two studies. In our study we used single pulse stimulation with currents between 10 and 100 μA, whereas Histed et al. used trains of pulses (100 ms in duration at 250 Hz) with lower current intensities (between 2 and 50 μA). We varied the depth of stimulation, targeting superficial and deep cortical layers; Histed et al. exclusively stimulated superficial cortical layers. In addition, the two studies used recording methods that are orthogonal in nature. We used Neuropixels probes that record from neurons that span all cortical layers depth-wise while Histed et al. used two-photon calcium imaging to record from a horizontal plane of neurons (again, in the superficial cortical layers).

      Because of these important methodological differences, it is more appropriate to compare the Histed et al. results to our results from superficial stimulation at comparable current intensities. In this case, we believe the two studies show similar results: stimulation activated a small fraction of neurons even hundreds of microns away from the stimulating electrode (see Figure 4A from our manuscript). However, our study adds an important observation pointing to the critical role of the depth of the stimulating electrode. We observe significant excitation of local cortical neurons (Figure 4D) and trans-synaptic activation of the thalamus only when we delivered deep stimulation (Figure5A). This effect is likely mediated by activation of large, myelinated cortico-thalamic fibers, which are thought to be more excitable that non-myelinated horizontal fibers (Tehovnik & Slocum, 2013).

      To summarize, Histed et al. (2009) concluded that microstimulation causes a sparse activation of a distributed set of neurons with little evidence of synaptically driven activation. Instead, we showed that microstimulation can robustly activate local neurons and trans-synaptically activate distant neurons when stronger stimuli are directed to deep cortical layers. Based on this, we conclude that electrical stimulation is indeed highly effective, and is a valid tool that can be used to probe and characterize the cortico-thalamo-cortical network of any behavioral state.

      ----------

      Reviewer #1 (Recommendations for the authors):

      1. I am not clear how "putative pyramidal" or RS and "putative inhibitory" fast-spiking neurons were identified. Please provide some further details on that, including average spike wave shapes, and distribution of firing rates, and it would be interesting to know the proportion of "putative" RS and FS neurons in your recorded population. Obviously, caution is warranted here because, without further work, you cannot be sure that those are indeed pyramidal cells or interneurons! Is this subdivision necessary at all?

      We added details regarding the cell-type classification to the Results (lines 136-140) and the Methods section. This classification is common practice in cortical extracellular electrophysiology recordings given that cell-type specific analyses can reveal important differences between the two putative populations (Barthó et al., 2004; Bortone et al., 2014; Bruno & Simons, 2002; Jia et al., 2016; Niell & Stryker, 2008; Sirota et al., 2008). Based on our findings that the two populations respond to electrical stimulation in similar ways (excitation followed by a period of quiescence and rebound excitation), we agree the subdivision is not necessary to support our conclusions. However, we believe that some readers will appreciate seeing the two putative populations presented separately.

      2. I wonder how the authors know whether the animals were awake, specifically when they were not running. Did you observe animals falling asleep when head-fixed? Providing some analyses of spontaneous EEG/LFP signals in each state could add some reassurance that only wakefulness was included, as intended.

      While we cannot conclusively rule out that mice were asleep during the “quiet wakefulness” periods we analyzed, we believe they are likely to be awake for two main reasons: 1) all the experiments are performed during the dark phase of the light/dark cycle, when the mice are less likely to enter a sleep state (Franken et al., 1999); 2) the animals are not undergoing specific training to promote drowsiness or sleep. Indeed, many sleep-focused studies in head-fixed mice are performed during the light phase of the animal’s cycle to maximize the likelihood of capturing sleep states (Kobayashi et al., 2023; Turner et al., 2020; Yüzgeç et al., 2018; Zhang et al., 2022). We have added this note to the Discussion section (lines 402-406).

      Because we do not specifically record during sleep states and our recording does not include electromyography, which is commonly used in conjunction with EEG to classify sleep stages, we cannot accurately perform spectral comparison between “quiet wakefulness” and sleep states in our recordings.

      3. I was unsure about the meaning of some of the terminology, specifically "rebound", "rebound spiking", "rebound excitation" etc. Why do you call it "rebound"?

      “Rebound” is a term often used to describe a period of enhanced spiking following a period of prolonged silence or inhibition (Guido & Weyand, 1995; Roux et al., 2014). Grenier et al. list “postinhibitory rebound excitation” as an intrinsic property of cortical and thalamic neurons (1998). We added this description to the text (lines 79-80).

      Reviewer #2 (Recommendations For The Authors):

      Regarding analysis, I would make three main points:

      Regarding the CSD analysis, I think the authors have done a good job of circumventing several of the known issues of this technique, especially by using ERPs rather than ongoing activity. However, although I do not immediately have access to the literature to back up this claim, I've heard that many assumptions behind CSD require a laminar structure with electrodes positioned perpendicular to these layers. In Figure 1B it seems like the neuropixels probe is not really perpendicular to the cortical layers, and I wonder if this might be an issue. I am also wondering how to interpret the thalamic CSD, as this structure is not laminar, lacks the mass of neatly stacked neuronal dipoles present in the cortex, and does not have an orderly array of synaptic inputs and outputs. I understand that CSD analysis helps minimize the contributions of volume conduction, but in this case, I also wonder if the thalamic CSD is even necessary to back up the paper's claims.

      One-dimensional CSD is computed assuming that the electrode is inserted perpendicular to cortex. This is mainly important for the interpretation of sinks and sources, since CSD can be also computed on radial voltages (e.g., EEG [Tenke & Kayser, 2012]). In general, our Neuropixels probes do not significantly deviate from perpendicular (mean deviation from perpendicular 15.3 degrees, minimum 5.2 degrees, and maximum 36.6 degrees). The probe represented in Figure 1B deviates from perpendicular by 31.2 degrees, which is an outlier compared to the rest of the insertions. Any deviation from perpendicular would result in the “effective” cortical thickness being larger by a factor of 1/cos(angle deviation from perpendicular) and thus would not affect the relative location of sources and sinks. We have added a statement to clarify this in the text (lines 126 and 454-456).

      We agree with the statement regarding CSD analysis in the thalamus. We originally included the CSD for the thalamus in Figure 2F for completeness. As the reviewer pointed out, thalamic CSD was not used to perform any subsequent analysis and is, therefore, not necessary to back up any claims. As such, we have removed CSD plot from Figure 2F to avoid any confusion and made a comment to this effect in the legend (lines 1175-1177).

      On the merits of using the z-score normalization for spike rates vs. other strategies like standardizing to maximum firing, I am aware that both procedures have limitations, but the z-score changes the range of the firing rate from [0, +Inf] to [-Inf, +Inf]. This does not seem correct considering that negative spiking rates do not exist. The standardization to maximum rate keeps the range within [0, 1], not creating negative rates. Another point that it will be worth discussing is the reported values of the z-scored values. For example, what does it mean to be 54 standard deviations away from the mean? 6 standard deviations is already a big distance from the mean.

      For Figure 2, we chose to represent the neural firing rates as z-scores because we found it important to report the magnitude of both the increase and decrease of the evoked firing rates in the post-stimulus period relative to the pre-stimulus rate. The normalization we used helps to visualize the magnitude of the effects of electrical stimulation in neuronal activity for both directions, which is an important result of the study. Despite the differences between the two normalization methods, the normalization based on the maximum firing does not significantly change the qualitative interpretation of Figure 2 in the manuscript (Author response image 2).

      Author response image 2

      Evoked firing rates for neurons in the areas of interest in response to deep stimulation in MO during the awake state. (Left) Firing rates of all neurons normalized by the average, pre-stimulus firing rate. (Right) Firing rates of all neurons normalized by the maximum post-stimulus firing rate.

      Regarding Figure 3 and the associated text, we would like to clarify that the magnitude metric is not simply a z-score value (with units of s.d.) but rather it is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). This can help explain why we see values of ~50 s.d.∙s. We chose to z-score firing rates, LFP, and CSD to normalize across the different signals and magnitudes of the evoked responses. We often observed the largest responses in the LFP (see Figure 3A), which may be partly due to the signal naturally having a larger dynamic range than the measured neural firing rates. Then we integrated the z-score response time series to capture the dynamic of the signal over the response window, rather than a static value such as the mean or maximum z-score. After performing a thorough literature search, we found no other ways to capture and compare the magnitudes of the different signals. We have added language to clarify the magnitude metric (lines 155-156) and added the appropriate units.

      In reporting the p-values, I recommend increasing the number of significant digits to four because the p-value seems to be the same for different tests in several places (e.g.: lines 207 to 218), which seems odd. I also wonder whether this could be an artifact of the z-scoring procedure. In the figures, I would like to advise the use of 1 asterisk to denote "weak evidence to reject the null hypothesis (0.05 > p > 0.01)" and two asterisks to denote "strong evidence to reject the null hypothesis (0.01 > p)", and make a note of it accordingly in the manuscript and/or figure legends.

      According to the reviewer’s suggestion, we have changed the statistics language to “* weak evidence to reject null hypothesis (0.05 > p > 0.01), ** strong evidence to reject null hypothesis (0.01 > p > 0.001), *** very strong evidence to reject null hypothesis (0.001 > p)” throughout the manuscript.

      We have also increased the number of significant digits to four throughout the manuscript. It is true that some of the p-values reported for Figure 3 (lines 169-180) are the same for different tests. This is not an artifact of the z-scoring, but rather a consequence of performing the Wilcoxon signed-rank test (an ordinal statistical test) with small sample numbers. Because the p-value depends only on the relative ordering, not the continuous distribution of values, the small sample size (N=6-14) increases the likelihood of obtaining the exact same p-value if the relative ordering of samples is the same.

      Line 202: If the magnitude corresponds to z-score data, please add "s.d." after the number, as z-scored values are expressed in standard deviation units. Please update this throughout the paper.

      As stated above the magnitude metric is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). We have added the correct units in all places.

      Line 214: Please report how the multiple comparisons correction was performed

      We have added the test used for multiple comparisons in line 169 (formerly line 214) and in the Methods section (line 770).

      Line 462: please replace "Neuropixels activity" with "LFP and single-unit activity".

      We changed the wording to specify “LFP, and single neuron responses…” (now line 337).

      Line 475: a short explanation of the bi-stability phenomena will be helpful for the reader.

      We added the following description: “a state characterized by spontaneous alternation between bouts of activity and periods of silence” (lines 350-351).

      Line 601: It is asserted that "Electrical stimulation directly activates local cells and axons that run near the stimulation site via activation of the axon initial segment" and the paper by Histed et al. 2009 is cited. This does not seem like an appropriate citation, as Histed et al. explicitly state that electrical microstimulation does not activate local neuronal bodies near the electrode tip. See my comment above.

      Upon further reading, we believe we are seeing evidence of direct axonal activation and subsequent antidromic activation of local cell bodies, as you suggested in your above comment and has been proposed by many including Histed et al. (2009) and Nowak and Bullier (1998). We edited our sentence accordingly, kept the Histed et al. citation, and added other relevant citations (lines 487-490).

      References

      • Aasebø, I. E. J., Lepperød, M. E., Stavrinou, M., Nøkkevangen, S., Einevoll, G., Hafting, T., & Fyhn, M. (2017). Temporal Processing in the Visual Cortex of the Awake and Anesthetized Rat. ENeuro, 4(4), 59–76. https://doi.org/10.1523/ENEURO.0059-17.2017

      • Barthó, P., Hirase, H., Monconduit, L., Zugaro, M., Harris, K. D., & Buzsáki, G. (2004). Characterization of Neocortical Principal Cells and Interneurons by Network Interactions and Extracellular Features. Journal of Neurophysiology, 92(1), 600–608. https://doi.org/10.1152/jn.01170.2003

      • Bortone, D. S., Olsen, S. R., & Scanziani, M. (2014). Translaminar Inhibitory Cells Recruited by Layer 6 Corticothalamic Neurons Suppress Visual Cortex. Neuron, 82, 474–485. https://doi.org/10.1016/j.neuron.2014.02.021

      • Bruno, R. M., & Simons, D. J. (2002). Feedforward Mechanisms of Excitatory and Inhibitory Cortical Receptive Fields. The Journal of Neuroscience, 22(24), 10966–10975. https://doi.org/10.1523/JNEUROSCI.22-24-10966.2002

      • Dasilva, M., Camassa, A., Navarro-Guzman, A., Pazienti, A., Perez-Mendez, L., Zamora-López, G., Mattia, M., & Sanchez-Vives, M. V. (2021). Modulation of cortical slow oscillations and complexity across anesthesia levels. NeuroImage, 224, 117415. https://doi.org/10.1016/j.neuroimage.2020.117415

      • Franken, P., Malafosse, A., & Tafti, M. (1999). Genetics of sleep regulation in mice-Franken et al Genetic Determinants of Sleep Regulation in Inbred Mice. SLEEP, 22(2). https://academic.oup.com/sleep/article/22/2/155/2731698

      • Grenier, F., Timofeev, I., & Steriade, M. (1998). Leading role of thalamic over cortical neurons during postinhibitory rebound excitation. Proceedings of the National Academy of Sciences of the United States of America, 95(23), 13929–13934. https://doi.org/10.1073/pnas.95.23.13929

      • Guido, W., & Weyand, T. (1995). Burst responses in thalamic relay cells of the awake behaving cat. Journal of Neurophysiology, 74(4), 1782–1786. https://doi.org/10.1152/JN.1995.74.4.1782

      • Histed, M. H., Bonin, V., & Reid, R. C. (2009). Direct Activation of Sparse, Distributed Populations of Cortical Neurons by Electrical Microstimulation. Neuron, 63(4), 508–522. https://doi.org/10.1016/j.neuron.2009.07.016

      • Jia, X., Siegle, J., Bennett, C., Gale, S., Denman, D. R., Koch, C., & Olsen, S. (2016). High-density extracellular probes reveal dendritic backpropagation and facilitate neuron classification 1 2. Journal of Neurophysiology, 121(5), 1831–1847. https://doi.org/10.1101/376863

      • Kobayashi, G., Tanaka, K. F., & Takata, N. (2023). Pupil Dynamics-derived Sleep Stage Classification of a Head-fixed Mouse Using a Recurrent Neural Network. The Keio Journal of Medicine, 2022-0020-OA. https://doi.org/10.2302/KJM.2022-0020-OA

      • Michelson, N. J., & Kozai, T. D. Y. (2018). Isoflurane and ketamine differentially influence spontaneous and evoked laminar electrophysiology in mouse V1. Journal of Neurophysiology, 120(5), 2232. https://doi.org/10.1152/JN.00299.2018

      • Niell, C. M., & Stryker, M. P. (2008). Highly selective receptive fields in mouse visual cortex. Journal of Neuroscience, 28(30), 7520–7536. https://doi.org/10.1523/JNEUROSCI.0623-08.2008

      • Nowak, L. G., & Bullier, J. (1998). Axons, but not cell bodies, are activated by electrical stimulation in cortical gray matter. II. Evidence from selective inactivation of cell bodies and axon initial segments. Experimental Brain Research, 118(4), 489–500. https://doi.org/10.1007/S002210050305/METRICS

      • Roux, L., Stark, E., Sjulson, L., & Buzsáki, G. (2014). In vivo optogenetic identification and manipulation of GABAergic interneuron subtypes. Current Opinion in Neurobiology, 26, 88–95. https://doi.org/10.1016/j.conb.2013.12.013

      • Sirota, A., Montgomery, S., Fujisawa, S., Isomura, Y., Zugaro, M., & Buzsáki, G. (2008). Entrainment of Neocortical Neurons and Gamma Oscillations by the Hippocampal Theta Rhythm. Neuron, 60(4), 683–697. https://doi.org/10.1016/j.neuron.2008.09.014

      • Sorrenti, V., Cecchetto, C., Maschietto, M., Fortinguerra, S., Buriani, A., & Vassanelli, S. (2021). Understanding the Effects of Anesthesia on Cortical Electrophysiological Recordings: A Scoping Review. International Journal of Molecular Sciences, 22(3), 1286. https://doi.org/10.3390/IJMS22031286

      • Tehovnik, E. J., & Slocum, W. M. (2013). Two-photon imaging and the activation of cortical neurons. Neuroscience, 245(March), 12–25. https://doi.org/10.1016/j.neuroscience.2013.04.022

      • Tenke, C. E., & Kayser, J. (2012). Generator localization by current source density (CSD): Implications of volume conduction and field closure at intracranial and scalp resolutions. Clinical Neurophysiology, 123(12), 2328–2345. https://doi.org/10.1016/J.CLINPH.2012.06.005

      • Turner, K. L., Gheres, K. W., Proctor, E. A., & Drew, P. J. (2020). Neurovascular coupling and bilateral connectivity during nrem and rem sleep. ELife, 9, 1. https://doi.org/10.7554/ELIFE.62071

      • Yüzgeç, Ö., Prsa, M., Zimmermann, R., & Huber, D. (2018). Pupil Size Coupling to Cortical States Protects the Stability of Deep Sleep via Parasympathetic Modulation. Current Biology, 28(3), 392. https://doi.org/10.1016/J.CUB.2017.12.049

      • Zhang, X., Landsness, E. C., Chen, W., Miao, H., Tang, M., Brier, L. M., Culver, J. P., Lee, J. M., & Anastasio, M. A. (2022). Automated sleep state classification of wide-field calcium imaging data via multiplex visibility graphs and deep learning. Journal of Neuroscience Methods, 366, 109421. https://doi.org/10.1016/J.JNEUMETH.2021.109421

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      Thank you for your letter dated on May 5, 2023 concerning our manuscript (MS# RC-2023-01906) entitled “Activation of Nedd4L Ubiquitin Ligase by FCHO2-generated Membrane Curvature.”

      We thank the reviewers for their constructive comments and suggestions. We have considered all reviewers’ comments and plan to revise our manuscript accordingly.

      We believe that our revision plan will greatly improve the quality of our manuscript.

      1. Description of the planned revisions

      __Reviewer #1 __

      I enjoyed reading the paper by Sakamoto and colleagues, where they show that Nedd4L ubiquitin ligase activity is stimulated by membranes and in particular positive membrane curvature. This paper is a conceptual advance that hopefully will be extended by many other groups where membranes topology participates in the activation of associated enzymes, giving rise to added complexity but also specificity and further compartmentalization. It is an important paper for all cell biologists to understand.

      1. My comments are all relatively minor and I hope can improve the readability of the paper, but will not alter the overall conclusion as this is well backed up. In general I would like to see more/better statistics/quantitation and better figure legends. I found that often one had to read the paper to understand a figure where reading the figure legend should suffice.

      __Reply: __According to the reviewer’s comment, we will quantify the experiments (Fig. 1C, Fig. 2, Fig. 9B, and Fig. 10B) and add descriptions of statistics (Fig. 5, Fig. 6, B and D, and Fig. 7C). We will also write better figure legends to enable the readers to easily understand experiments.

      1. This paper reminds me of a paper from Gilbert Di Paolo's lab on the activation of synaptojanin PIP2 hydrolysis by high membrane curvature. One would expect that there may be many such proteins whose activities will be dependent on their membrane environment. I find it conceptually rather likely that a protein which interacts with membranes via a C2 domain (which has membrane insertions and will thus likely be curvature sensitive) will likely show some positive curvature sensitivity. Can I suggest this paper is referenced and discussed in the light of the discussion statement "Thus, our findings provide a new concept of signal transduction in which a specific degree of membrane curvature serves as a signal for activation of an enzyme that regulates a number of substrates."

      Reply: __According to the reviewer’s comment, we will cite the paper entitled “synaptojanin-1-mediated PI(4,5)P2 hydrolysis is modulated by membrane curvature and facilitates membrane fission” by Chang-Ileto et al. (Dev. Cell __20, 206–18 , 2011). We will also discuss this paper in the light of the discussion statement.

      1. Where the paper could be improved (or I have not understood fully). In figure 1 there is a robust endocytosis of ENaC that is FCHo2 and Nedd4L sensitive. There is a rescue for FCHo2 in a fluorescence image (unquantified), so it would be good to have the more quantitative approach of rescue with both FCHo2 and Nedd4L in the biochemical assay.

      __Reply: __Although the reviewer suggests a rescue experiment in the biochemical assay, the experiment is difficult because the transfection efficiency is low (about 50%). On the other hand, we agree with the reviewer that a quantitative approach is required in the rescue experiment (Fig. 1C). Therefore, we plan to quantify the rescue experiment for FCHO2 in the immunofluorescence assay. The reviewer also suggests a rescue experiment for Nedd4L as well as FCHO2. However, since the involvement of Nedd4L in ENaC endocytosis is well established, we do not think that the rescue experiment for Nedd4L is further required.

      1. In figure 2 there is nice co-localisation between clathrin/FCHo2 and ENaC but not with Nedd4L. It would be good to have some quantitation of the co-localisation. But also one should use a Nedd4L mutant or a mutant of ENaC and so be able to visualise co-localisation between receptor and ub-ligase. I find it strange that there is no (or much less) Nedd4L-GFP visible in the cells overexpressing ENaC... Is there an explanation? Does overexpression of ENaC lead to more auto-ubiquitination of Nedd4L. Also the Nedd4L-GFP signal in other cells is punctate, while in the next figure Myc-Nedd4L is not.

      __Reply: __According to the reviewer’s comment, we will perform quantitative colocalization analysis in Fig. 2.

      We have found that a catalytically inactive Nedd4L mutant, C922A, co-localizes with cell-surface αENaC and FCHO2 in αβγENaC-HeLa cells. According to the reviewer’s comment, these data will be added in the revised manuscript.

      In Fig. 2C, Nedd4L was transiently transfected in cells stably expressing ENaC. In Nedd4L-transfected cells, overexpression of Nedd4L stimulated ENaC internalization, resulting in the disappearance of ENaC at the cell surface. On the other hand, in non-transfected cells, cell-surface ENaC was detected. Thus, Nedd4L-negative cells are non-transfected cells (cell-surface ENaC positive cells). This explanation will be added in the revised manuscript.

      The staining pattern of Nedd4L depends on what section of the cell a confocal microscope was focused on. Nedd4L-GFP signals were punctate at the bottom section of the cell in Fig. 2, whereas Myc-Nedd4L was diffusely distributed at the upper section (cytoplasm) of the cell (Fig. 3). Thus, Nedd4L shows distribution throughout the cytoplasm and punctate staining at the bottom (cell surface). The staining pattern of Nedd4L is also affected by the expression amount of Nedd4L in cells. When Nedd4L was highly expressed in COS7 and HEK293 cells in Fig. 3, the punctate staining was hardly detected. This localization pattern of Nedd4L will be clearly described in the revised manuscript.

      1. In figure 3 it appears to me that there is co-localization between ENaC and amphiphysin. Is this not a positive piece of information? I am not sure that FBP17 is a good F-BAR domain to use given its oligomerization may well prevent membrane association of Nedd4L. Minor comment: I don't see tubules for amphiphysin in panel B.

      __Reply: __The reviewer states that there is co-localization between Nedd4L and amphiphysin1 (Fig. 3A). However, Nedd4L was not recruited to membrane tubules generated by amphiphysin1. We will clearly show that there is no colocalization between Nedd4L and amphiphysin1.

      The reviewer states that FBP17 may not be a good F-BAR domain to use because its oligomerization may well prevent membrane association of Nedd4L. However, we have shown that FCHO2 as well as FBP17 forms oligomer (Uezu et al. Genes Cells, 16, 868-878, 2011). Furthermore, we have found that FCHO2 inhibits the membrane binding and catalytic activity of Nedd4L when the PS percentage in liposomes is elevated (unpublished data and Fig. 9C). Thus, since FBP17 and FCHO2 probably have similar properties, we presume that FBP17 is a good F-BAR domain to use.

      As the reviewer pointed out, membrane tubules generated by amphiphysin1 were hardly detected in HEK293 cells (Fig. 3B). It showed punctate staining, but did not co-localized with Nedd4L. This description will be added in the revised manuscript.

      1. Figure 5: The affinity of Nedd4 C2 domain for calcium is quite high given we normally assume a cytosolic concentration of 100nM (approximate). The authors have rightly buffered the calcium with EGTA. Normally we would check that the buffering is sufficient by varying the protein concentration and making sure the affinity is still the same, so can I suggest the authors use 3 or 4 times the amount of C2 domain and make sure the curve does not change (provided liposomes are not limiting). Minor comment: How many experiments and what are error bars (SD?).

      __Reply: __According to the reviewer’s comment, we will check that the buffering is sufficient by varying the protein concentration (Fig. 5). We will also add a description of statistics to the legend to Fig. 5.

      1. Figure 6: Controls have been performed to ensure that liposomes are pelleted, according to methods. In Figure 6B can the authors show that there is the same amount of liposomes in each sample by showing more of the coomassie gel so that the reader can see the Neutravidin band is the same in each sample. Also I believe a student t-test should not be used in this experiment (but perhaps an Anova test), and in panel D there does not appear to be a description of statistics.

      __Reply: __To ensure that the same amounts of liposomes were pelleted, the reviewer suggests that we show more of the Coomassie gel to present the neutravidin bands in Fig. 6B. However, as the molecular weight of neutravidin is about 15 kDa, neutravidin run out of the gel (7% SDS-PAGE gel) where Nedd4L (As the reviewer pointed out, we will use an Anova test in Fig. 6B. We will also add a description of statistics in Fig. 6D.

      1. Figure 11: In panel B I note that the FCHo2 BAR domain on small liposomes appears to inhibit Ubiquitination. Is this consistent with the BAR domain not preventing Nedd4L binding?

      __Reply: __The FCHO2 BAR domain enhances the liposome binding and catalytic activity of Nedd4L when the strength of interaction of Nedd4L with liposomes (20% PS) is weak. In contrast, we have also found that the FCHO2 BAR domain inhibits the membrane binding and catalytic activity of Nedd4L when the interaction of Nedd4L with liposomes is increased by elevating the PS percentage in liposomes (unpublished data and Fig. 9C). The reason for the different effects of FCHO2 on Nedd4L is considered as follows: When liposomes (20% PS) are used (the interaction of Nedd4L with PS in liposomes is weak), Nedd4L binds to liposomes mainly through ENaC (Fig. 8F). The liposome binding is hardly mediated by PS. Addition of the FCHO2 BAR domain increases the strength of interaction Nedd4L with PS by generating membrane curvature. Consequently, the FCHO2 BAR domain newly induces the PS-mediated liposome binding of Nedd4L, resulting in the enhancement of liposome binding and catalytic activity of Nedd4L. On the other hand, when the interaction of Nedd4L with PS in liposomes is increased by elevating the PS percentage in liposomes (50% PS), the liposome binding of Nedd4L is mainly mediated by PS. Addition of the FCHO2 BAR domain inhibits the PS-mediated liposome binding of Nedd4L. Since both FCHO2 and Nedd4L are PS-binding proteins, they compete with each other to bind to PS in liposomes. Therefore, the results in Fig. 11B are consistent, because the interaction of Nedd4L with PS is increased by 0.05 µm pore-size liposomes. This explanation will be added in the revised manuscript.

      __Reviewer #2 __

      The authors have reported the involvement of the BAR domain-containing protein FCHO2 in the Nedd4L-mediated endocytosis of ENaC. They propose a model in which the membrane curvature induced by the BAR domain-FCHO2 relieves the auto-inhibition of E3 ligase causing its activation and recruitment. The paper describes a series of in vitro reconstituted experiments that are interesting but not fully connected with the mechanism of ENaC endocytosis. Additional experiments are needed to fully support the authors' conclusions.

      Major comments:

      1. Although the data reported by the authors regarding FCHO2 and Nedd4L involvement in ENaC endocytosis are convincing, it is suggested that the authors perform the same ENaC endocytosis assay presented in Fig.1B under conditions of FBP17 and amphiphysin1 siRNA to formally prove the selective involvement of FCHO2 in the process among other BAR-containing proteins.

      __Reply: __The reviewer suggests the same ENaC endocytosis assay presented in Fig. 1B under conditions of FBP17 and amphiphysin1 siRNA to prove the selective involvement of FCHO2 in ENaC endocytosis. There seems to be a misunderstanding. Similar to FCHO2, FBP17 and amphiphysin are well known to be involved in clathrin-mediated endocytosis. As ENaC is internalized through clathrin-mediated endocytosis, FBP17 and amphiphysin siRNA presumably inhibit ENaC endocytosis. We cannot understand the significance of FBP17 and amphiphysin1 siRNA in the ENaC endocytosis assay.

      1. According to the previous point, it will be interesting to see not only a snapshot image of the internalisation assay performed by immunofluorescence (Fig.1C) but a more quantitative analysis of the different time points (as in Fig.1B) in condition of FCHO2 siRNA and eventually FBP17 and amphiphysin1 siRNA.

      __Reply: __According to the reviewer’s comment, we will perform a quantitative analysis in Fig. 1C. The reviewer also suggests the immunofluorescence assay at the different time point in Fig. 1C. However, we show the time course of ENaC internalization in Fig. 1B. We do not think that the time course in the immunofluorescence assay is further required. As for FBP17 and amphiphysin siRNA, our response is the same as that to the comment 1 of this reviewer.

      1. In Fig.2B, overexpression of the catalytically inactive version of Nedd4L (Nedd4L C922A) would help to see Nedd4L-ENaC co-localization.

      __Reply: __This comment is the same as the comment 4 of the reviewer#1.

      1. In Fig.4D, the authors need to analyse ENaC ubiquitination in the same experimental setting as Fig. 4A instead of transfecting cells with increasing amounts of Nedd4L in the presence or absence of FCHO2 BAR. It is also recommended to include Nedd4L C922A as an additional control.

      __Reply: __The reviewer requests us to analyse ENaC ubiquitination in the same setting as Fig. 4A. However, an in vivo autoubiquitination assay is widely used to determine the catalytic activity of E3 Ub ligase, because the E3 activity is typically reflected in their autoubiquitination. Therefore, the autoubiquitination assay is sufficient to show that Nedd4L is specifically activated by membrane tubules generated by FCHO2 in cells. Furthermore, we have found it very difficult to compare ENaC ubiquitination among many GFP-BAR proteins (GFP alone, GFP-FCHO2, GFP-FBP17, amphiphysin1-GFP, GFP-FCHO2 mutant) in the same experimental setting as Fig. 4A. In Fig. 4A, three types of cDNAs (HA-Ub, Myc-Nedd4L, and GFP-BAR protein) were transfected in cells. The expression amounts of Myc-Nedd4L were similar among the GFP-BAR proteins. On the other hand, in Fig. 4D, four types of cDNA (HA-Ub, Myc-Nedd4L, GFP-BAR protein, and FLAG-αENaC) were transfected in cells. Under these conditions, it is very difficult to adjust the expression amounts of Nedd4L and αENaC among many GFP-BAR proteins. Even when comparing two GFP-BAR proteins (GFP alone and GFP-FCHO2), it was necessary to assess the expression amounts of Nedd4L by transfection with various cDNA amounts of Nedd4L (Fig. 4D). Moreover, as shown in Fig. 4D, enhancement of ENaC ubiquitination by FCHO2 is decreased at higher expression of Nedd4L (1.0 and 1.5 μg DNA), although the reason is unknown. Therefore, we are not sure that we will able to accurately analyse ENaC ubiquitination in the same setting as Fig. 4A instead of transfecting cells with increasing amounts of Nedd4L.

      According to the reviewer’s comment, we will examine the effect of Nedd4L C922A on ENaC ubiquitination.

      1. While discussing the role of hydrophobic residues in Nedd4L C2 domain,the authors never mentioned the publication by Escobedo et al., Structure 2014 (DOI:10.1016/j.str.2014.08.016), which highlighted how I37 and L38 are directly involved in Ca2+ binding. This aspect should be discussed since the authors show the importance of Ca2+ for PS binding in the sedimentation assay.

      __Reply: __According to the reviewer’s comment, we will cite the reference (Escobedo et al.) and discuss the aspect (I37 and L38 are directly involved in Ca2+ binding).

      1. As stated by the authors those two residues I37 and L38 are also involved in E3 enzyme activation by relieving C2-HECT interaction. It is important to further demonstrate the effect of these mutations on ENaC substrate.

      __Reply: __To prove that the I37 and F38 residues are involved in E3 enzyme activation by relieving C2-HECT interaction, the reviewer requests us to further demonstrate the effect of Nedd4L I37A+F38A on ENaC ubiquitination. However, these two residues are critical noy only for Nedd4L activation but also for membrane binding and curvature sensing of Nedd4L. We also show that membrane binding of Nedd4L is critical for ENaC ubiquitination. Actually, we have found that Nedd4L I37A+F38A mutant, which loses membrane binding, shows little ENaC ubiquitination (unpublished data), whereas it enhances autoubiquitination (Fig. 4C). Thus, the effect of the I37A+F38A mutant on ENaC ubiquitination is not appropriate to prove that the two residues are involved in E3 enzyme activation.

      1. There are some concerns regarding the in vitro ubiquitination assay performed in Fig.8 and following figures. The Nedd4L proteins used during the assay has been produced as His tagged at the C-terminus, it was reported (Maspero et al, Nat Struct Mol Biol 2013 DOI: 10.1038/nsmb.2566), at least for the isolated HECT domain, that modification of the C-terminal residue of the protein affects its activity. It would be important to judge the activity of the purified proteins used in the assay. Moreover, as additional control it is suggested the introduction of a mSA-ENaC PY mutant protein. The authors claimed the importance of membrane localized PY motif for recruitment and activation of Nedd4L, it would be informative to perform the experiment in presence of PY mutated ENaC.

      __Reply: __The reviewer states that there are some concerns regarding His-tagged Nedd4L proteins. We have prepared Nedd4L that has no tag at its N- or C-terminus. N-terminal GST-tagged, C-terminal untagged Nedd4L was expressed in E. coli and purified by Glutathione-Sepharose column chromatography. The GST tag was cleaved off and Nedd4L was further purified by Mono Q anion-exchange column chromatography. Using this purified sample, we have examined the catalytic activity of untagged Nedd4L. We have found that concerning Ca2+-dependency, PS-dependency, and curvature-sensing, the properties of untagged Nedd4L are similar to those of C-terminal His-tagged Nedd4L (unpublished data).

      According to the reviewer’s comment, we will perform the experiment in the presence of PY-mutated ENaC.

      1. It is not clear why increasing the concentration of PS (from 20% to 50%) the presence of BAR domain doesn't allow ENaC ubiquitination (Fig.9C), is Nedd4L not recruited to the pellet? It would be interesting to see the sedimentation experiment of Fig.9A done in presence of 50% PS.

      __Reply: __This comment is essentially the same as the comment 8 of the reviewer#1. We have found that FCHO2 BAR domain inhibits the membrane binding of Nedd4L when the PS percentage in liposomes is elevated (~50%) (unpublished data). According to the reviewer’s comment, these data will be added in the revised manuscript.

      1. This reviewer is not an expert of lipids biology, thus the explanations related to the effect of FCHO2 BAR in presence of PI(4,5)P2 (Fig. 10) or 0.05 pore-size liposomes (Fig.11) were not clear. Does FCHO2 BAR have a different effect in inducing membrane tubulation in these two conditions? Is this parameter measurable by tubulation assay?

      __Reply: __According to the reviewer’s comment, we will write more clearly the explanation related to the effect of FCHO2 BAR domain in the presence of PI(4,5)P2 or 0.05 μm pore-size liposomes.

      Minor Comments

      1. It would be appreciated if a nuclei staining panel is included in all immunofluorescence images, as it would help to identify the number of cells in the field of view (e.g., Fig. 1C, Fig. 2B).

      __Reply: __According to the reviewer’s comment, we will show immunofluorescence images to identify the number of cells in Fig. 1C and Fig. 2B.

      1. It would be recommended to include colocalization analysis, such as Pearson's correlation coefficient or Manders coefficient in immunofluorescence images.

      __Reply: __According to the reviewer comment, we plan to perform quantitative colocalization analysis in Fig. 2.

      1. It is not clear how the quantitation of mSA-ENaC ubiquitination in Fig.8D, 8C, and 9B was performed. Did the authors normalise the detected Ub signal over the amount of unmodified mSA-ENaC?

      __Reply: __We did not normalize the detected Ub signals over the amount of unmodified mSA-ENaC, because the same amount of mSA-ENaC was added in each assay. The chemiluminescence intensity of Ub signals was quantified by scanning using ImageJ. According to the reviewer’ comment, we will clearly describe how the quantification of mSA-ENaC ubiquitination was performed.

      __Reviewer #3 __

      --- Summary ---

      The manuscript by Sakamoto et al. describes how the ubiquitin ligase Nedd4L is activated by membrane curvature generated by the endocytic protein FCHO2. For their experiments, the authors use the epithelial sodium channel (ENaC) as a model Nedd4L target and CME cargo. The authors start their manuscript by showing in cells the importance of FCHo2 and Nedd4L in ENaC internalization. Using a combination of experiments in cells and biochemistry, the authors show that Nedd4L binds preferentially to membranes with the same curvature generated by FCHO2. Next, the authors show that a combination of membrane composition (PS), calcium concentration, PY domain presence and membrane curvature all act in concert to recruit Nedd4L to membranes and fully release its ubiquitination activity. Crucially, the authors show that role of FCHO2 in Nedd4L recruitment is not direct, with FCHO2 simply generating an optimal membrane curvature for Nedd4L binding. Taken together, the authors suggest a mechanism by which the curvature of early clathrin coated pits, generated by FCHO1/2 define an optimal environment for the recruitment and activation of the ubiquitin ligase Nedd4L.

      The manuscript convincingly shows the membrane curvature-dependent mechanism of Nedd4L activation. The biochemistry experiments in the manuscript are well designed and the results are of clear. The quality of these experiments is very high. The experiments in cells are, however, not of the same level of quality.

      --- Major comments ---

      1) The results do not show convincingly that Nedd4L is recruited to CCPs. There is plenty of indirect evidence, but to support the model shown in the last figure, authors need to show more than the staining in figure 2C. Live-cell imaging showing the post-FCHo2 recruitment of Nedd4L would be required. I understand that the recruitment would possibly occur in a fraction of events and may be difficult to catch. The cmeAnalysis script from the danuser lab(https://doi.org/10.1016/j.devcel.2013.06.019 can facilitate the identification of these events.

      __Reply: __According to the reviewer comment, we plan to examine by live-cell TIRF microscopy that Nedd4L is recruited to CCPs.

      2) What happens to ENaC in Nedd4L and FCHO2 knockdown cells? One would expect accumulation of the receptor on the surface.

      __Reply: __We have found that upon Nedd4L or FCHO2 knockdown, αENaC accumulates at the cell surface in αβγENaC-HeLa cells. According to the reviewer’s comment, we will show these data in the revised manuscript.

      *3) In the experiments in figure 1, it would be important to use a standard CME cargo as an internal control (transferrin). This will serve as a functional confirmation of FCHO2 knockdown and help the reader to put the Need4L knockdown experiments into the context of CME. *

      __Reply: __According to the reviewer’s comment, we will use a standard CME cargo as an internal control (transferrin).

      *4) Quantification for the rescue experiment is required (figure 1C). if not possible, at least a picture where the reader can see transfected and non-transfected cells side-by-side is necessary. *

      Reply: This comment is the same as those of the reviewer#1 (comment 3) and reviewer#2 (comment 2). According to the reviewer’s comment, we plan to quantify the rescue experiment (Fig. 1C).

      *--- Minor comments --- *

      *1) The experiments in figure 3 must be presented in order as they are in the text. For example, figure 3E is cited in the text into the context of figure 7. It is very confusing. *

      __Reply: __According to the reviewer’ s comment, we will present the experiments in Fig. 3 in order they are in the text.

      *2) A better explanation of the assay in 1C would facilitate its understanding for the non-specialist reader. The reader needs to read the methods section to understand how it was done. *

      __Reply: __According to the reviewer’ comment, we will write a better explanation of the assay in the Fig. 1C legend to enable the readers to understand how it was done.

    1. non lasciarmi pensare alle mie montagne

      Very often, when we think about ‘Il canto di Ulisse’, we tend to recall only the most famous pages in which Levi tries to remember Dante’s canto. The depth and sense of urgency of the Ulyssean passages are so overwhelming and passionate that they may distract us from other elements in the chapter. However, if we go back to the text and read it closely, we cannot avoid noticing that, after a brief opening in which Levi introduces Pikolo and narrates how he came to be Pikolo’s ‘fortunate’ chaperone to collect the soup for the day, ‘Il canto di Ulisse’ also dwells quite significantly on a moment of domestic memories. While going to the kitchens, Levi writes: ‘Si vedevano i Carpazi coperti di neve. Respirai l’aria fresca, mi sentivo insolitamente leggero’. This is the first moment in the chapter in which Levi refers to the mountains as something that revitalises him and makes him feel fresh and light, both physically and mentally.

      This moment foreshadows another, also in this chapter, when Levi goes back to his mountains, those close to Turin, and compares them to the mountain that the protagonist of Dante’s canto, Ulysses, encounters just before his shipwreck with his companions:

      ... Quando mi apparve una montagna, bruna

      Per la distanza, e parvemi alta tanto

      Che mai veduta non ne avevo alcuna.

      Sì, sì, ‘alta tanto’, non ‘molto alta’, proposizione consecutiva. E le montagne, quando si vedono di lontano... le montagne... oh Pikolo, Pikolo, di’ qualcosa, parla, non lasciarmi pensare alle mie montagne, che comparivano nel bruno della sera quando tornavo in treno da Milano a Torino! Basta, bisogna proseguire, queste sono cose che si pensano ma non si dicono. Pikolo attende e mi guarda. Darei la zuppa di oggi per saper saldare ‘non ne avevo alcuna’ col finale.

      The significance of the mountains in Levi’s narration is confirmed in this passage. For him, the mountains represent his experience of belonging, his youthful years, and his work as a chemist – the job he was doing when he commuted by train from Turin to Milan. At the same time, Levi’s own memories of the mountains intertwine and overlap with another mountain, Dante’s Mount Purgatory. Here, a deep and perhaps not fully conscious intertextual game starts to emerge and to characterise Levi’s writing. The lines that Levi does not remember are these:

      Noi ci allegrammo, e tosto tornò in pianto,

      ché de la nova terra un turbo nacque,

      e percosse del legno il primo canto.

      For Dante’s Ulysses, Mount Purgatory signifies the final moment of his adventure and his desire for knowledge. The marvel and enthusiasm that Ulysses and his company feel when they see the mountain is suddenly transformed into its contrary. From the mountain, a storm originates that will destroy the ship and swallow its crew: ‘Tre volte il fe’ girar con tutte l’acque, | Alla quarta levar la poppa in suso | E la prora ire in giù, come altrui piacque’. Dante’s Mount Purgatory, so majestic and spectacular, represents the end of any desire for knowledge that aims to find new answers to and interpretations of human existence in the world without God’s word.

      Going back to Levi’s text, we find that, instead, in a kind of reverse overlapping between his image and that of Ulysses, the image of the mountain of Purgatory suggests to Levi a very different set of thoughts that, although seemingly and similarly overwhelming, opens up new interpretations: ‘altro ancora, qualcosa di gigantesco che io stesso ho visto ora soltanto, nell’intuizione di un attimo, forse il perché del nostro destino, del nostro essere oggi qui’. For a moment, it is almost as if Levi, a new Dantean Ulysses in a new Inferno, stands in front of Mount Purgatory and forgets the terzine and the shipwreck. Maybe Levi cannot or does not want to remember those terzine because the mountain in Purgatory represents something very different for him than for Dante’s Ulysses. Levi’s view of the mountain does not lead to a moment of recognition of sin, as it does in Dante’s Ulysses. For him, the mountain, like his mountain range, is the gateway to knowledge, enrichment, and illumination and to a world that lies beyond the imposed limits of traditional, constricting, and distorted views and that awaits discovery (‘qualcosa di gigantesco che io stesso ho visto ora soltanto’). Something about and beyond the Lager.

      To better understand how the mountains are central in ‘Il canto di Ulisse’, we have to remember that Levi’s view of the mountains strongly depends on his anti-Fascism, which he expressed particularly vigorously in two moments of his life: during his months in the Resistance, just before he was captured and sent to Fossoli, and, even more intensely, during the adventures of his youth, when he was a free young man who enjoyed climbing the mountains surrounding Turin. As Alberto Papuzzi has suggested, ‘le radici del suo rapporto con la montagna sono ben piantate in quella stagione più lontana: radici intellettuali di cittadino che cercava sulla montagna, nella montagna, suggestioni e risposte che non trovava nella vita, o meglio nell’atmosfera ispessita di quella vita torinese, senza passato e senza futuro’ (OC III, 426-27). Indeed, reports Papuzzi, Levi confirms that:

      Avevo anche provato a quel tempo a scrivere un racconto di montagna […]. C’era tutta l’epica della montagna, e la metafisica dell’alpinismo. La montagna come chiave di tutto. Volevo rappresentare la sensazione che si prova quando si sale avendo di fronte la linea della montagna che chiude l’orizzonte: tu sali, non vedi che questa linea, non vedi altro, poi improvvisamente la valichi, ti trovi dall’altra parte, e in pochi secondi vedi un mondo nuovo, sei in un mondo nuovo. Ecco, avevo cercato di esprimere questo: il valico.

      The heart of that epic story made its way into the chapter ‘Ferro’ in Il sistema periodico. The discovery of this (brave) new world, ‘mondo nuovo’, is an integral part and a direct achievement of Levi’s experience in the mountains. The mountains open a new understanding and a new perspective on the world.

      Something that escapes common understanding is revealed through the experience of the mountains, both in Levi’s memories of his youth and in his literary recounting of Auschwitz. Reciting Dante in ‘Il canto di Ulisse’ is therefore not only an intertextual exercise for Levi. Only by inserting Levi’s literary references in the complexity of his own experience – before, during, and after Auschwitz – can we fully capture the depth of his reflections. Levi mentally and metaphorically brought to Auschwitz not only Dante but also his ‘metafisica dell’alpinismo’. Together, they contributed to his attempt to come to terms with that reality.

      VG

    2. Considerate

      My reflections here build on Lino Pertile’s 2010 essay, ‘L’inferno, il lager, la poesia’. Pertile notes the profound correspondence between the opening poem of the book (OC I, 139) and this chapter. He points out how the main theme of Levi’s book, the dehumanising experience in the Lager, based on the annihilation of people’s identity, is expressed in the poem and resurfaces explicitly again in the chapter dedicated to Dante’s Ulysses. The key term revealing the correspondence of themes and intentions is ‘Considerate [consider]’, used twice in Levi’s poem (‘Consider if this is a man | … | Consider if this is a woman’) and rooted in the memory of Dante’s famous tercet where Ulysses addresses his crew as they sail towards the horizon of their last journey beyond the pillars of Hercules: ‘Considerate la vostra semenza: | fatti non foste a viver come bruti, | ma per seguir virtute e canoscenza’ (Inf. 26, 118-20 and OC I, 228).

      There are many other correspondences between the chapter of Ulysses and the opening poem, besides the ‘Considerate’, and that they are profound and filtered through the theme of memory, an eminently Dantean theme: the urgency to fix in the memory itself what is or will be necessary to tell, or the urgency to express and recount what is deposited in memory. Indeed, for Levi, the memory of each individual person contains that person’s humanity.

      Memory is immediately activated as Primo and Jean exit the underground gas tank (‘He [Jean] climbed out and I followed him, blinking in the brightness of the day. It was warm [tiepido] outside; the sun drew a faint smell of paint and tar from the greasy earth that made me think of [mi ricordava] a summer beach of my childhood). Temporarily escaping hell by means of a ladder (a sort of Dantesque ‘natural burella’), it is the tiepido sun and a characteristic smell that evoke the childhood memory and that at the same time the reader cannot avoid connecting to the tiepide case of the initial poem (‘You who live safe | in your heated houses [tiepide case]’ [my emphasis]). It is then around the memory ‘of our homes, of Strasbourg and Turin, of the books we had read, of what we had studied, of our mothers’ that another theme in the chapter coalesces, the theme of friendship (‘He and I had been friends for a week’), a theme that had already emerged in a more general connotation in the opening poem (‘visi amici’). Warmth, friendship (visi amici…Jean), the kitchens as destination for Primo and Jean’s walk (the walk from the tank with the empty pot is ‘the ever welcomed opportunity of getting near the kitchens’, not for that hot food [cibo caldo] evoked in the poem, but for the soup of the camp, an alienating incarnation of Dantesque ‘pane altrui’ whose various names are dissonant). During the respite of the one hour walk from the tank to the kitchens, the intermittent memory of Dante’s canto emerges as if from an underground consciousness, the memory of Inferno as a partial and imperfect mirror of the human condition in the Lager, Ulysses as poetic memory, a sudden epiphany of a semenza, a seed, of humanity that the Lager is made to suppress, and Primo’s wondering in the face of this sudden internal revelation of still possessing an intact humanity. Primo’s memory of his home resurfaces as if springing from the memory of Dante’s text: the ‘montagna bruna’ of Purgatory is reflected in the memory of ‘my mountains, which would appear in the evening dusk [nel bruno della sera] when I returned from Milan to Turin!' But the real, familiar landscape is too heartbreaking a memory of ‘sweet things cruelly distant’, one of those hurtful thoughts, ‘things one thinks but does not say’. There is an epiphanic memory then, the poetic memory that surfaces during the walk and that reveals to Primo that he still is a man, a memory to which he clings despite the sense of his own audacity (‘us two, who dare to talk about these things with the soup poles on our shoulders’); there is also a more intimate memory, equally pulsating with life and humanity - but dangerous, because it makes Primo vulnerable to despair, threatening his own survival in the camp.

      The urgent need to remember Dante’s verses in this chapter develops the theme of memory, which has been central from the opening poem. In Levi’s poem, though, memory is perceived from a different angle: the readers (who live safe…) must honour that memory and transmit it as an imperative testimony of what happened in the concentration camp from generation to generation, testifying to the suffering of the man and the woman ‘considered’ in the poem. This is a memory to be carved in one’s heart, which must accompany those who receive it in every action and in every moment of each day like a prayer. Not coincidentally the poem follows the text of the most fundamental prayer of Judaism, the Shemà Israel, which is read twice a day, a memory to be passed on to one’s own children, a responsibility which is a sign of one’s humanity. The commandment to remember of the opening poem (‘I consign these words to you. | Carve them into your hearts') issues a potential curse to the reader, threatening the destruction of what most fundamentally characterises their humanity - home, health, children: ‘Or may your house fall down, | May illness make you helpless, | And your children turn their eyes from you’. Finally, Primo’s act of remembering during the walk to the kitchens is submerged by the Babelic soup (‘Kraut und Rüben…cavoli e rape…Choux et navets…Kàposzta és répak…Until the sea again closed – over us’) and yet the memory of it becomes part of his testimony in such a central chapter of the book written after surviving the Shoah. If the memory of Dante’s verses contributed to Primo’s faith in his own humanity and his psychological and physical survival in the camp, he then accomplishes the commandment of memory and his responsibility as a man through his own writing.

      CS

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      Major comments:

        • The relevance of these findings to human biology remains unclear. In Figures 1-4, the authors present data showing that AATBC is enriched in thermogenic fat, and they argue that it regulates thermogenesis and mitochondrial biology. However, in Figures 6-7, where the authors look at AATBC in different human cohorts, they actually find that it is enriched in visceral fat, which is thought of as being the least thermogenic fat depot. The authors do not explain this seeming paradox, and thus, the role of AATBC in fat remains uncertain. *

      RESPONSE: We thank the reviewer for this comment and have clarified the discussion to address this point. It has been recently shown (PMID: 28529941) that the pattern of browning genes in human white adipose tissue depots is actually inverted to mice, making visceral adipose tissue in humans actually more thermogenic than subcutaneous. This aligns well with our findings of AATBC is predominantly expressed in thermogenic adipose tissue.

      • In many of the experiments, insufficient controls are provided, or the data are not at all convincing. For example:*

      (a) The first four figures rely on in vitro adipocyte models, but the authors do not present data to show these cells differentiate properly and equally. This is especially relevant for the gain and loss of function studies.

      RESPONSE: We agree with the reviewer that equal differentiation is necessary for in vitro adipocyte models. Therefore, we added Oil-red-O stainings and the corresponding quantifications to Supp. Fig. 4 (see below) for the differentiation of hMADS in the absence of AATBC. We also want to emphasize, that the expression levels of PLIN1, a surrogate marker for differentiation was unchanged in our experiments, as already shown in the initial draft of the manuscript. On top of that, in all experiments presented in the original draft of the manuscript, AATBC gene expression was only altered in mature adipocytes.

      (b) Some of the experiments in Figure 1 (K-L) seem to only show an N of 1.

      RESPONSE: Figure 1 highlights a screening process to find new lncRNA regulated during thermogenesis. The forskolin sample was included to achieve an additional dimension in the filtering process. The displayed values in K&L demonstrate the validity of the sample. The validation of AATBC as a target was performed with statistical power in the work displayed in the following figures.

      (c) The RNAscope data in Figure 2 is not at all convincing for nuclear localization

      RESPONSE: We respectfully disagree. In our opinion, the RNAScope is convincing for nuclear localization of the lncRNA. However, we have repeated the experiments with different probes that strengthen our data (see figure for the reviewer)

      (d) The ASO mediated knockdown of AATBC in Figure 3 only reduced expression slightly. A more complete knockdown or deletion may elicit a stronger phenotype.

      RESPONSE: We thank the reviewer for the feedback. We have repeated the knockdown experiments but were not able to reduce the expression further, even after designing additional ASOs. However, already with current approach, the reduction in AATBC expression elicited a phenotype, highlighting the importance of AATBC in a dose-dependent manner.

      (e) In Figure 4, OPA1 is shown as a single band in panel E and a doublet in panel N. Based on this, are the authors certain they are detecting OPA,1 or could this be a nonspecific band?

      RESPONSE: We thank the reviewer for this comment. Protein extraction has been performed at different research institutes with slightly different buffers. Multiple bands (cleaved/uncleaved) have been described for OPA1 in the past, therefore we are certain that the correct protein has been detected.

      *(f) The correlations in Figure 6 I-L and Figure 7 do not include any statistical analysis. *

      REPONSE: For better readability, the statistical analysis is being mentioned in the figure legend. The reviewer might have overlooked this information.

      • The gain of function studies in mice are problematic. The authors have performed a large amount of invasive studies in a short period of time. The animals will undoubtedly lose weight after each study and with insufficient time to recover, this could influence the subsequent studies.*

      RESPONSE: These general concerns are valid, but all controls are in place and the animals gained weight during the experiments, as one would have been expected with animals of that age (see below).

      *In addition, since the authors present data in Figures 1-4 arguing that AATBC overexpression is associated with increased thermogenesis, it is surprising that the authors never looked at this in Figure 5 (aside from measuring Ucp1 mRNA). It would be interesting to measure energy expenditure by indirect calorimetry and cold tolerance. *

      RESPONSE: We agree with the reviewer on this point but are due to animal protocol limitations in conjunction with the viral approach are unable to perform these experiments.

      • The authors do not provide any mechanistic insights into how AATBC may be acting.*

      RESPONSE: Certainly, more mechanistic insight into the direct mode of action of AATBC would be interesting. To address this point, over the past year we performed multiple attempts to perform pulldown of AATBC using the ChIRP technology. However, we were unable to achieve a sufficient enrichment, which would have allowed us to give further information about direct interaction partners of AATBC. However, we believe that our data regarding mitochondrial dynamics, which we now also have confirmed in in vivo experiments, explain the connection of AATBC and thermogenicity. In future, we aim to work on this point further but for multiple reasons have decided to close this chapter here.

      Minor comments:

      • The introduction is rather long and would benefit from being condensed.*

      RESPONSE: We have edited the text for better readability.

      Reviewer #2:

      Major Comments:

        • The key conclusion that AATBC is a novel obesity-linked regulator of adipocyte plasticity is made relatively clear with the comparison between various stages of adipocytes and the loss and gain of function with AATBC. - Figure 1 H and J do not seem to be consistent with the data in Figure 1F in LINC00473 level-There is no difference in Control vs NE in the heatmap but in Figure1J, the difference seems to be quite obvious; Figure 1K does not seem to be consistent with AATBC level-The measurement in Control VS Fsk group showed no difference in AATBC in heatmap, but in Figure K, there seem to be a dramatic increase. Therefore, the claims that there is a difference in these two lncRNA expression in these cell groups needs further clarification. *

      RESPONSE: To combine the different approaches to identify novel lncRNA into one heatmap the data need to be normalized over experiments. As the fold change of the expression of AATBC in BAT compared to WAT (on average ~100x) is higher than with forskolin (~4x), this will stand out in the heatmap and will to some extent overshadow the smaller fold changes. The same holds true for LINC00473, which is drastically induced with forskolin, which to some extent masks the higher expression in the other approaches. Therefore, we decided to show both the heatmap to represent the general approach and the “zoomed in” versions to show the consistent increases. We are confident this clarifies the issue.

      • Figure 4H and I, the difference in the representative immunoblot seem to be minimal and inconsistent with the decrease shown in the bar graph. *

      RESPONSE: We agree with the reviewer and have removed the claim from our manuscript.

      • In Figure 5, after overexpressing human AATBC in murine adipose tissue , is it possible to look at the mitochondria changes that were seen before in cell lines? If there are similar changes in murine adipose tissue, then it would prove the changes in vitro hold up with the in vivo model. But if the mitochondria changes were not seen, then it would indicate the changes in leptin, triglyceride levels may due to other mechanisms. The length of the suggested experiment to look into the mitochondrial differences in mice may vary depending on whether there are preserved samples from previous experiments. If there are, then the time period would be couple of weeks for immunblot and analysis. If there are no samples preserved, then the estimated period for the suggested experiments may be around 1.5 to 2 months at least .*

      RESPONSE: We thank the reviewer for the suggestion. We performed Western Blot analysis on the tissues from the in vivo study and have included them in Fig. 5, further strengthening the link between AATBC and mitochondrial dynamics (please see figure on the right).

      • The data are convincing overall in that the replicates are clearly marked with dots in many figures. Some immune blot and expression level are inconsistent with other data showing the same results however. *

      RESPONSE: We thank the reviewer and have removed the necessary quantifications.

      • Figure 6 and 7 are provocative and significant, reporting strong associations of AATBC with well-known markers of metabolism in adipocytes. The sex difference for adiponectin and AATBC expression is particularly intriguing. Further discussion of this point would be interesting. However, there is no information provided about the medication status of the obese subjects that were consented for samples used in the analysis. Specifically, many of the obese subjects (mean BMI 45 or more with a range going up to 97.3) would be expected also to have metabolic diagnoses and to be treated with numerous medications, including Metformin, GLP1 agonists, Orlistat, Liraglutide, Bupropion/Naltrexone and combinations. It is unreasonable to ignore possible effects of major medications on AATBC expression. Please comment on the strengths and weaknesses of the analysis that ignores medications, or if some annotations of clinical data are available, perhaps to explain outliers in the plots, please discuss. *

      RESPONSE: We thank the reviewer for this suggestion. Unfortunately, we are unable to exclude additional diagnoses and medication of our patients due to the points the reviewer stated. However, given the large size of the cohorts we are confident that such effects are being compensated for. We have added a part on weaknesses of the study in the discussion.

      Minor Comments:

      • The labeling of figure 2 A-K is not clear because the use of the same color of bars is easily misunderstood as the same source of cells, but it is in fact not. For example, the grey color that appeared in 2B and 2C are not the same source but can be misunderstood. *

      RESPONSE: The coloring of Fig.2A&G has been changed.

      • Figure 3 ASO-AATBC has two repeats #1 and #2, and over-expression of AATBC has one, even though there are enough repeats. It would be less confusing to present all of the repeats in ASO_AATBC together in one bar.*

      RESPONSE: The two different ASO target different areas of AATBC. In line with general guidelines for ASO use, those are not pooled but used separately, which is why the results are also split up. As the overexpression is additional genomic information of AATBC, it is impossible to use different variants in this case, therefore only one bar for overexpression is shown.

      • The experimental outline can be a bit more detailed and explain some of the words like Thermo versus Browning.*

      RESPONSE: The manuscript has been revised regarding this point.

      • Some of the panels in Figure 7 could be put into supplementary if space is at a premium, and present the representative graph would be enough*

      RESPONSE: We think that all our data of Fig. 7 warrants enough attention to be considered in a main figure, but if space is sparse, we are very happy to oblige. We would kindly ask the editors for input on this matter.

      Reviewer #3:

        • Throughout the study, the data provided are mainly correlative and in some cases not robust. In Fig. 2, AATBC expression is described to be elevated in the so-called "thermogenic condition", which contained prolonged PPARg agonist treatment (rosiglitazone) known to promote adipogenesis. Consistent with this notion, adipogenic markers, such as PLIN1 and FABP4, are higher in "thermogenic adipocytes" (Suppl Fig. 2). As such, the result may only suggest that AATBC has higher expression in mature adipocytes vs pre-adipocytes. *

      RESPONSE: We thank the reviewer for the suggestion. We have added Oil-Red-O-Stainings to Suppl. Fig. 2 to show unchanged lipid content upon modulation of AATBC gene expression, which can be seen as a surrogate for differentiation. Concerning the use of rosiglitazone as a browning agent, we want to emphasize that rosiglitazone was used during the entirety of differentiation until day 9, where it was removed in the “non-thermogenic” group. At this point we already observe fully differentiated adipocytes. This is an established protocol. Furthermore, the data is in line with using norepinephrine or forskolin as a short-term inducer of browning, making it very likely that the effect seen is due to the “more thermogenic” character of the adipocytes.

      • Along the same vein, whether and how AATBC affects adipogenesis is unclear. Suppl Fig. 3H and 3L (misplaced as Suppl Fig. 4) show the adipocyte differentiation marker FABP4 is down-regulated by both ASO- and AV-AATBC. Since mitochondrial respiration (and other parameters including UCP1 expression) is tightly linked to adipogenic efficiency, the authors need to address whether these manipulations affect adipocyte differentiation. *

      RESPONSE: We agree with the reviewer that differences in differentiation capacity would falsify our data on mitochondrial dynamics. We have added Oil-Red-O-Staining to Suppl. Fig. 2 to show that no significant difference in lipid content exists during modulation of AATBC gene expression, which can be seen as a surrogate for differentiation. Furthermore, in all experiments presented in the manuscript, the modulation of AATBC occurs in already fully differentiated adipocytes. Accordingly, we are confident that AATBC does not influence differentiation but mainly acts through the modulation of mitochondrial dynamics.

      • The data in Fig. 4 supporting a role for AATBC in regulating mitochondrial dynamics are superficial and not robust. Fig. 4A/4J do not have high enough resolution to provide accurate assessment of the mitochondrial network.*

      RESPONSE: We respectfully disagree with the reviewer on this point. State of the art methods and algorithms were used to image and analyze the mitochondrial network. Furthermore, we have used multiple established markers of mitochondrial dynamics in western blot analysis to further strengthen our assessments of the immunofluorescence. In summary, we feel like have given enough evidence for an accurate assessment of the mitochondrial network.

      • The level of loading control TUBB is clearly lower in siAATBC in Fig. 4H. In addition, OPA1 should have multiple isoforms and Fig. 4E/4N show inconsistent patterns. As such, mitochondrial dynamics is not likely an underlying mechanism. *

      RESPONSE: We agree with the reviewer on the assessment of the expression of complex 5 and have removed this claim from the manuscript. Regarding the expression of OPA1, protein extraction has been performed at different research institutes with slightly different buffers. Multiple bands (cleaved/uncleaved) have been described for OPA1 in the past, therefore we are certain that the correct protein has been detected.

      • Notably, RNAseq data in Suppl Fig. 4 (misplaced as Suppl Fig. 3) seem to indicate that AATBC over-expression promotes TG synthesis, while AATBC knockdown modulates cell death. The authors should consider exploring the leads from RNAseq analysis?*

      RESPONSE: We thank the reviewer for the feedback. The small number of altered genes in the RNASeq make us believe in a rather post-transcriptional role of AATBC. We investigated cell death and oxidative stress response as GO terms were highlighted in the analysis, but we were unable to detect any differences in the absence of AATBC, pointing to a minimal effect on transcriptional level (See figure below for the reviewers).

      • In Fig. 5, the AV-AATBC transduction in WAT/BAT is localized, transient and not homogeneous. Not surprisingly, this manipulation does not produce any robust effects. The difference in circulating leptin/leptin expression appears to be driven by 4-5 mice in the control group (Fig. 5H/5N). The correlation data in Fig. 6 and Fig. 7, although relevant, do not provide additional mechanistic insights. Unfortunately, the efforts in Fig. 5-7 fail to lead to information related to the biological function of adipose AATBC.*

      RESPONSE: We agree with the reviewer on the limitations of the AV model, but we have performed these experiments with the highest technical standard. As the reviewer states, the overexpression, especially in WAT, has different magnitudes depending on the individual mouse, but the overexpression is present and consistently high in every animal. We would expect even bigger alterations in a genetic model, which, however, is beyond the scope of this first manuscript on AATBC in adipocytes. We are disappointed that the reviewer does not value the human data presented, as it very strongly hints to a relevant function of our human lncRNA in vivo by robust correlations with established biomarkers mirroring the effects seen in vitro and in the mouse model. A limitation of human studies is in virtually every case that it is based on correlations, as manipulation of gene expression, which would be necessary to delineate a biological process as requested by the reviewer, is not possible in humans. We do not concur on dismissing our human data on that behalf.

    1. Actually, as Davidson argues, multitasking helps us see more and do more, and experience texts and tasks in different ways. There’s no evidence that anyone ever was deeply reading for hours on end with no interruptions. All we have are claims from Plato saying that writing is going to kill our ability to memorize. Our minds have always been wandering; we’ve always been distractible. We’ve always been doodling on the sides of pages, or thinking about our lunch, or stopping to converse with someone. Now we just have distraction that’s more readily available and purposefully attuned to distracting us — like popup ads, notifications; things that quite literally fly across your screen to distract you. But the fact that we have students who have grown up with those and have trained themselves to deal with those in such interesting ways is something that I think we should bring into the classroom and be talking about and critically thinking about

      1) the point that multitasking can offer different experiences with texts and tasks is interesting to me. initially, the comparison between multitasking and single-tasking seems like a clear distinction between what is beneficial (focus) and what is detrimental (distraction)

      2) taking a bold stance, i would venture to say that there exists a significant number of individuals who engage in deep work, which is perhaps one of the most profound pursuits throughout human history. after all, most of us have experienced a state of flow at least once, to some extent, and our brains subconsciously crave this state of heightened focus and productivity

      3) this observation all the more underscores the rarity of deep work in a world that is perpetually plagued by distractions

      here is one of my notes from deep work by cal newport:

      the connection between depth and meaning in human experience is undeniable. whether approached from the perspectives of neuroscience, psychology, or philosophy, there appears to be a profound correlation between engaging in deep, meaningful activities and a sense of fulfillment. this suggests that our species may have evolved to thrive in the realm of deep work and purposeful engagement

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *The current manuscript by Shiryaev et al describes their observation of the new function of zika NS2B-NS3 proteases. They have shown that NS2B-NS3 protease lacking the helicase domain binds to RNA and the interaction can be affected by protease inhibitors. Main two new findings are presented in the manuscript: super open conformation of the protease; RNA binding activity of the protease region. Although the manuscript is interesting, the design of the experiments is not convincing. *

      Major issues:

        • the claim of a super open confirmation is problematic. Using an artificial construct lacking the C-terminal portion of NS2B will of course generate the open conformation. This is a wrong definition unless you observe such a conformation in living cells.*
      1. We understand the skepticism towards a less known super-open confutation of flavivirus NS2B-NS3pro complex. In addition to our own structure of ZIKV NS2B-NS3pro (PDB ID 7M1V), the crystal structure of another orthologous flavivirus Japanese encephalitis virus (JEV) NS2B-NS3pro (PDB ID 4R8T) was discovered in 2015 1. However, no functional analysis was provided for this crystal structure resulting in the lack of attention paid by the research community. We computed the overlay of the ZIKV NS2B-NS3 protease structures in the super-open conformation (PDB ID 7M1V, deposited by us in 2021) with the crystal structure of JEV protease (PDB ID 7M1V ) (Rebuttal Figure 1). We observed an almost identical organization of the critical NS3pro C-terminal loop between these two structures (RMSD 0.6A). Polypeptides with over 35% identity are very likely to have a similar fold2. Given over 50% identity(!) between flaviviral proteases across the family3,4, we posit that the super-open conformation demonstrated for JEV and ZIKV NS2B-NS3pro is a common feature of the Flaviviridae family. Further, NS2B peptide is always tightly associated with NS3pro via a three-strand beta-barrel (aa 49-58 of NS2B), which remains intact in all NS3Pro conformations. The C-terminal portion of NS2B progressively loses association with NS3pro, being mostly associated in the closed conformation, less so in the open, and even less in the super-open conformation. The G4SG4 linker between NS2B and NS3pro remains unstructured in all conformations. The native C-terminal portion of NS2B (TGKR) is equally unstructured when competed out of the protease active site by another substrate. It is unclear to us why “lacking the C-terminal portion of NS2B will of course generate the open conformation”.

      2. It is odd that authors made homology model to generate open conformation structures. the authors did not cite the two papers of eZiPro (Phoo et al 2016 NC) and bZiPro (Zhang et al 2016, Science). these two structures show the closed conformation of protease in the absence and presence of a natural substrate.*

      3. We agree with the reviewer that in both constructs eZiPro5 and bZiPro6 of ZIKV NS2B-NS3pro are likely to exist in the closed conformation as documented by the crystal structures. However, in both cases, the active center of ZIKV NS2B-NS3pro is occupied with a short peptide fragment, which is sufficient to induce the closed conformation of NS2B-NS3 protease. We superimposed eZiPro (PDB ID 5GJ4) with bZiPro (PDB ID 5GPI) to better demonstrate that the active center in both structures is occupied either by tetrapeptide TGKR (T127-G128-K129-R130 ) originating from the NS2B C-terminus (eZiPro) or by a tetrapeptide KKGE (K14-K15-G16-E17) originating from a neighboring NS3 molecule (bZiPro) (Rebuttal Figure 2). Indeed, Zheng et al., 2016 6 stated that: “the structure (bZiPro) does capture the protease in complex with a reverse peptide. The tetrapeptide K14K15G16E17 folds into a small hairpin loop to occupy the active site.” Further, Phoo et al., 2016 5 stated that “binding of the ‘TGKR’ peptide to the catalytic site stabilizes the protease (eZiPro)”. To the best of our knowledge, so far there are no crystal structures of flaviviral NS2B-NS3 proteases in the closed conformation without peptide/inhibitor in the active center. We take it as a hint that the closed conformation is always induced by a substrate present in the active center.

      Finally, we would like to draw the attention of this reviewer to the fact that the 15N R2 NMR signal from NS2B residues 65-85 is missing in bZiPro alone but re-appears when AcKR is added. This is consistent with the idea that without AcKR, bZiPro exists in the open conformation where much of the C-terminal part of NS2B is dissociated from NS3Pro and remains unstructured, thus resulting in the lack of NMR signal.

      • RNA binding is novel, but is it observed in cells? only one method was used for testing the interactions, not other biophysical methods are used.*

      • Given a complex network of protein-RNA interactions and the fact that NS3pro and NS3hel are connected by a single polypeptide, separating dynamically bound 11kB RNA to NS3pro from that to NS3hel in a native cell is a major technical challenge beyond the scope of this work. We employed a fluorescent polarization assay to demonstrate ssDNA and ssDNA binding to ZIKV NS2B-NS3pro. Subsequently, we employed a proteolytic activity assay with labeled peptide mimicking natural substrate for protease to demonstrate that the presence of ssRNA and ssDNA can efficiently inhibit proteolytic activity. To the best of our knowledge, this is the first indication that ssRNA or ssDNA could block proteolytic activity for any serine proteases, let alone a viral protease. Therefore, we consider the proteolytic activity assay used in the current work an orthogonal biochemical method supporting ssRNA binding to ZIKV NS2B-NS3pro.

      • binding studies with RNA used artificial construct, how about the one with KTGR present like eZiPro. Keep in mind that the P1-P4 residues are present under native conditions.*

      __- __As mentioned by the reviewer, TGKR peptide was found in the active center in the eZiPro crystal. Indeed, the junction region between NS2B and NS3 protease contains native cleavage sites for the NS2B-NS3Pro and is naturally cleaved by protease during the viral polyprotein processing. However, the TGKR peptide representing P1-P4 positions will have to leave the active center after the cleavage to ensure enzyme processivity/cleaving additional targets (otherwise, the protease would get stacked after the first cleavage). Proteolytic activity assay utilizes the fluorogenic peptide labeled with FAM (such as TGKR-FAM; where FAM is a group representing P1’ position in this case). TGKR-FAM peptide will compete and easily replace cleaved TGKR peptide from the active center in proteolytic activity assay. In sum, the C-terminal end of NS2B will be competed out of the protease active center by the next substrate, and there is no evidence that it will be naturally placed back in the active center after each round of protease proteolytic activity. Indeed, several crystal structures of flaviviral NS2B-NS3Pro in open conformation lack the C-terminal part of NS2B in the active center. Our unpublished NMR studies demonstrated that the C-terminal part of NS2B is unstructured in solution if the substrate peptide or small molecule inhibitor are not present in the active center of the protease.

      • authors built up nice models, it is great to consider the full length NS2B, but authors haven't taken into account the effect of NS2B on the open or closed conformation of the protease. *

      - __ All crystal structures of flavivirus NS2B-NS3pro in the closed, open, or super-open conformations have NS2B associated withNS3pro via a beta-barrel (__Rebuttal Figure 3), which is located at the opposite side from the RNA binding site. The transition from the closed to the open and to the super-open conformation is associated with the progressive dissociation of NS2B from NS3pro. Therefore, the effect of NS2B on NS3Pro is progressively diminished. In the closed conformation of NS3Pro, the negatively charged C-terminal part of NS2B is associated with the same positively charged grove as the RNA in the open conformation of NS3Pro. The C-terminal part of NS2B is dissociated from NS3Pro in the open conformation.

      Minor issues:

      *This manuscript shows the novel function of zika protease and conclude that protease binds to RNA. This is a novel finding, but the conclusion needs to be further confirmed, to avoid misinterpretations by future readers *

      • closed, and super open conformations. But the definition was not carefully compared with current literatures. I am surprised that the two important papers are not cited. It is well known the G4SG4 linker affect the conformation of the protease.*

      • The crystal structures and the proteolytic activities of gZiPro, eZiPro, and bZiPro are rather similar. In fact, Km (μM) are 2.86 ± 0.90 for gZiPro, 6.332 ± 2.41 for bZiPro, and the IC 50s of BPTI inhibition for gZiPro, eZiPro and bZiPro are 350, 76 and 12 nM respectively. NS2B and NS3pro have a large binding area in the closed conformation. Upon changing the conformation to the open conformation (and even more so to the super-open conformation), the C-terminal part of NS2B is progressively dissociated from NS3Pro. Therefore, possible minor effects introduced by the G4SG4 linker is unlikely to affect any of the conclusions in our work.

      • Authors need to show super open conformation is present in nature e.g. the model in which full length NS2B and NS3pro.*

      • A full-length NS2B has 2 transmembrane domains, which tether the NS2B-NS3pro complex to the cell membrane (we have modeled the presence of such transmembrane domains to account for the orientation of NS2B-NS3pro with respect to the cell membrane). The full-length complex has never been crystallized or tested in any assay due to the major technical challenges associated with the modeling of complex transmembrane proteins.

      • RNA is a charged molecule under some conditions, NS3 also have charged residues, it is important to show whether the binding between RNA-protease is relevant to the function{Luo, 2010 #9270;Chernov, 2008 #9275;Xu, 2019 #10006}, or is this due to the application of the artificial constructs used in this study. Why so many mutants are used? *

      • The requirement of NS3pro for the helicase function was shown by several investigators 7–9. Given the structural independence of NS3pro and NS3hel, which mostly rules out the allosteric effect, RNA binding by NS3pro is a newly proposed function of NS3pro for the helicase activity. We demonstrated biochemically that RNA-bound to NS3pro inhibits its protease function. A variety of mutants were used to constrain the conformations of NS2B-NS3pro (e.g. enforce the super-open confirmation) for crystallization studies.

      • Using a construct close to the native protease, at least the P1-P4 residues should be present. Using a peptide in the assay is also useful.*

      • We were unable to interpret this critique.

      • Test binding of RNA with protease using another method such as biophysical methods, or even gel shift assay*

      • We thank the reviewer for this suggestion. Although the gel-shift assay seems to be a reasonable method to test the binding, given the ease of spontaneous conformational change (i.e. into the super-open conformation), this assay could result in a progressive loss of bound RNA during migration in the gel.

      • I don't know the correlation between Figure 7 and Figure 6. The authors describe ploy A binding to protease, while Figure 7 is talking about Helicase binds to dsRNAs. *

      • There is no correlation. Figure 6 describes the models for NS2B-NS3pro binding to ssRNA. Figure 7 describes a separate point, the direction of dsRNA processing by NS3hel.

      • I am glad to see the consideration of full length NS2B, NS3 in the models Figure 8, 9 and 11, but there is no data to support any of the model proposed. *

      • There is no experimental data. We have modeled the N-terminal and C-terminal parts of full NS2B, which are predicted to be inserted into the cell membrane due to their characteristic amphipathic helical structure.

      • Is the linker a ploy G not G4SG4? *

      The linker is GGGGSGGGG (G4SG4) as stated in Materials and Methods of the manuscript.

      • Do the mutant sustain their protease activity? *

      • All mutants with intact catalytic centers have protease activity, except the mutants with a disulfide bridge that fixes the polypeptides in the super-open conformation.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *The manuscript by Shiryaev et al., submitted to BioRXiv is an exploration of the ability of NS2B-NS3protease to bind RNA and its subsequent role in NS3 helicase processivity. The authors first utilize fluorescence polarization assays to demonstrate that NS2B-NS3protease can bind ssRNA with a strong affinity (and also ssDNA with lower affinity). They subsequently utilize mutational and small molecule inhibitor strategies in these assays to force the NS2B-NS3protease into different conformations, with the associated results inferring that the "open" conformation is responsible for ssRNA binding affinity. Furthermore, they demonstrate that ssRNA binding impairs protease activity, suggesting these roles may be exclusive in the viral life cycle. They also identified a number of small molecule ligands that target the putative ssRNA binding channel, and demonstrate that these ligands inhibit ssRNA binding by NS2B-NS3protease, providing potential inhibitor candidates for ZIKV. Finally, the authors utilized their crystal structures and others for the various conformations of NS2B-NS3protease to model ssRNA binding by the domain and the full NS3 protein, and used these models to propose a reverse inchworm model for NS3 travelling along ssRNA as it unwinds the dsRNA duplex. Overall, the authors utilize a comprehensive approach to demonstrate a number of novel findings (ssRNA binding by NS2B-NS3protease, small molecule ligands that inhibit this interaction) that would be of interest to both virologists and structural biologists. However, there are some important experimental design limitations and viral life cycle considerations that the authors should address before acceptance of the manuscript. Major and minor comments intended to improve the manuscript are outlined in more detail below. *

      Major Comments:

        • While the quantity of indirect data (ruled out closed and super-open, inhibitors of ssRNA binding pocket) suggest that the open conformation of NS2B-NS3protease is associated with ssRNA binding, the argument would be greatly strengthened by direct experimental data. Is there a mutational or small molecule approach to locking the NS2B-NS3 protease in the open conformation? If so, the authors should perform such experiments to strengthen the foundation of their argument.*
      1. Unfortunately, despite significant efforts, mutations or small molecules locking the NS2B-NS3 protease in the open conformation have not been identified for the ZIKV protease. However, several structures for NS2B-NS3 proteases have been documented in other flaviviruses (i.e., DENV PDB IDs 2FOM and 5T1V; WNV PDB ID 2GGV). Polypeptides with over 35% identity are very likely to have a similar fold2. Given over 50% identity(!) between flaviviral proteases across the family3,4, there is little doubt that ZIKV NS2-NS3 protease adopts an open conformation similar to all flaviviral proteases. Our modeling demonstrated that there are no sterically/structural problems in folding NS2B-NS3 protease into the open conformation.

      2. A negative control should be used in Figure 4A to strengthen the claim that ssRNA binding in the open conformation impairs protease activity (ie. include a curve for dsRNA). Such an experiment would lend support to ssRNA inhibition being due to specific binding instead of some other non-specific effect of increasing local nucleic acid concentration.*

      3. To address this critique, we have conducted the modeling of dsRNA binding to the open conformation of NS2B-NS3Pro. The model revealed that dsRNA could not be accommodated by the open conformation of the NS2B-NS3Pro complex (Rebuttal Figure 4). Indeed, dsRNA has a very different rigid structure compared to the extended form of the ssRNA chain. The dsRNA is unable to provide continuous interactions between negatively RNA backbone and positively charged side chain amino acids in NS3pro. The continuous interface on NS2B-NS3 protease interacting with ssRNA is an extension of the exit groove for one of the ssRNA strands exiting the NS3 Helicase after unwinding. Therefore, the ssRNA, but not dsRNA is naturally always present in close proximity of the NS2B-NS3Pro complex.

      4. *

      5. Due to the highly coupled roles of NS5 and NS3 in replication, the authors should include some more consideration of the role of NS5 in their complex. They very briefly address this interplay in the fifth paragraph of the discussion, but then neglect to discuss the implications any further. In particular (perhaps in a brief comparison to an NS3/NS5 modeling approach such as Brands et al., 2017; WIRES), the authors should consider some of the following questions: could the channel on protease domain lead to ssRNA entry site on RdRp?*

      6. Indeed, our model suggests that the negative strand (-)ssRNA exits from NS2B-NS3protease facing the ER membrane in the area where the protease is anchored to the ER membrane via the NS2B transmembrane domains. It is possible that NS3pro interacts with NS5 polymerase and “handles” (-)ssRNA to the NS5 polymerase. This scenario would modify Brands et al., 2017 model to add NS2B-NS3Pro complex between NS3Hel and NS5. However, at present, the NS3-NS5 (or NS2B-NS3-NS5) complex together has not been crystallized. It would be logical for NS5 polymerase to access the (-)ssRNA strand after it is released from NS2B-NS3Pro since the (-)ssRNA strands are used as a template for the (+)ssRNA which is used for polyprotein synthesis and packaging into viral particles.

      7. would NS5 interaction constrain or augment inchworm model of NS2B/NS3 translocation? *

      8. Yes, integrating NS5 interaction with the NS2B-NS3pro handling (-)ssRNA will augment the utility of the suggested reverse inchworm model.

      9. how does increased activity of NS3 when complexed with NS5 (**Xu et al. 2019) align with proposed inchworm model? *

      10. We appreciate the reviewer's question. We think that NS2, NS3, NS4, and NS5 work in concert as one coordinated complex where various subunits of NS2 and NS4 may provide anchoring of the entire complex to the ER membrane. Indeed, such a complex has recently been proposed6. Also, see our response to the previous reviewer’s point (#4). We have incorporated this discussion into the revised manuscript.

      Minor Comments: 1. Introduction, 4th paragraph, NS3-NS4 should read NS3-NS4A.

      • We corrected this sentence.

      * ** Throughout the manuscript, the authors should denote some key amino acid residues in each figure to help orient the reader better to the observed structural changes and rotations. Inclusion, at least in the supplement, of the crystal structures of mutants solved herein should **also be included. *

      • We annotated the key residues in all figures (e.g. catalytic residues, loop interacting with the membrane, position of NS2B, and other elements) and kept the same orientation of complexes in all figures.

      • Section: RNA binding inhibits the proteolytic activity of ZIKV NS2B-NS3pro, last sentence, NS2N-NS3pro should be NS2B-NS3pro*.

      • We corrected this sentence.

      • Section: Allosteric inhibitors of NS2B-NS3 protease interfere with RNA binding- first sentence: "The open conformation of NS2B-NS3pro is achieved by the rearrangement of NS2B cofactor (its dissociation from the C-terminal half of NS3pro) leading to a loss of proteolytic activity [32]. - the reference is not correct. I could not find the reference the authors refer to here and had not heard before that NS2B cofactor was able to disassociate from the C-terminal half of NS3pro; hence, this really needs to be appropriately referenced. *

      • We have revised this sentence and added additional references. “The open conformation of NS2B-NS3pro is achieved by the rearrangement of NS2B cofactor (partial dissociation from NS3pro), leading to a loss of proteolytic activity4,11.”

      • Section: Modeling RNA binding to ZIKV NS2B-NS3, first sentence - unwinds should be unwind*.

      • We corrected this sentence.


      • With respect to the results of Figure 3A, the authors should address that adding the linker alone to the NS3 protease may not be an accurate examination of its role/importance. The linker in this scenario is only constrained at its N-terminus, while it is always constrained at both termini during infection (and even more so by the interactions of those two linked domains [protease and helicase] with each other). As such, the authors statement that "observations suggests that the 12-aa linker region modulates RNA binding to NS2B-NS3pro" should be more strongly qualified to this effect. In addition, it would be interesting to see the effects of the linker mutations on ssRNA binding in the context of the full NS3 protein, albeit admittedly more complex due to the confounding ssRNA binding by the helicase domain.*

      • We agree with this reviewer that the protease-helicase linker is also restrained at both termini. We have rephrased the statement in the revised manuscript. The goal of the experiment shown in Figure 3A was to examine whether a negatively charged linker is able to compete with ssRNA binding as we expected from the structural model. The mutational analysis of the protease helicase linker is, indeed, a very interesting subject that is, however, beyond the scope of this work.

      7. The NS#hel should be changed to NS3hel in part (C) of figure legend for Figure 11. - We corrected this mishap.

      • The authors data in Figure 4A (and even more so the nature of the viral life cycle where 1000s of viral polyproteins are created from the first genome during infection) disputes the depiction in the inchworm model of how NS3 protease cleaves the polyprotein while the helicase binds ssRNA. At minimum, the authors need to discuss this discrepancy, and it is recommended that they modify the cartoon in their model to not include the ssRNA binding on the protease side of the equation (or show as alternative on that side to the existing cartoon).*
      • Indeed, as proposed by our reverse inchworm model, ssRNA is not bound to NS3Pro in the closed conformation, while NS2B-NS3pro has a protein substrate in the active center (Figure 11A). We agree that NS2B-NS3Pro in the closed conformation cannot bind ssRNA as we demonstrated in competitive cleavage assay. Only large amounts of ssRNA can shift the balance towards the open conformation which binds ssRNA. We think that most of the time NS2B-NS3Pro cycles between the open and the super conformations handling ssRNA (Figure 11(B-C_D), but as soon as protein substrate becomes available (typically a loop from a transmembrane viral polypeptide), NS2B-NS3Pro quickly switches to the closed proteolytically active conformation to act as protease.

      • In the third paragraph of the discussion, the authors state "An alternative model of coupled transcription and translation where viral RNA is associated with ribosomes right after the release from NS2B-NS3 is also possible". Considering there is abundant evidence that translation and replication are exclusive and that translation does not take place in ROs, it would be prudent to remove such statements from the discussion. Without any supporting evidence, these statements will be misleading to readers by providing a false equivalency. The preceding discussion of RFs would be sufficient to contextualize your inchworm model in the broader viral life cycle (which was done quite well). *

      • We have adjusted the discussion in the revised manuscript to avoid a false equivalency.

      10. There were a number of aspects I appreciated about the manuscript and will briefly list a few here: ** i) the focus on how different non-structural proteins effect the structure and function of ** each other during the viral life cycle, which forms a more comprehensive and informative model ** ii) the use of structural and functional assays as complementary approaches to studying the intra- and inter-protein relationships of NS3 ** iii) the depiction of the forks in Figure 10, which effectively demonstrated the channels and oriented the reader to the conservation data ** *iv) the use of small molecule inhibitors to modify structure and function of NS3, which greatly deepened the richness of the story from both a basic and applied science view point *

      • We are very grateful to the Reviewer for these kind remarks.

      Reviewer #2 (Significance (Required)): ** Strengths and limitations: ** - provides some experimental and modeling data to provide a new model for RNA interactions with the NS3pro-hel; may help inform models for enzyme function, mostly consistent with previous literature ** - leaves out the NS5 RdRp, known to contribute to NS3 activity. ** - some suggestions are made which might strengthen the conclusions and inclusions of additional controls would improve the data. ** Advance ** - conceptual, perhaps may provide some insight into mechanism; although limited by the lack of NS5 RdRp which is crucial to helicase activity. It is unclear if the ssRNA would be oriented this way given interactions with NS5 RdRp and MT domains (is the ssRNA routed to NS5 or along NS3, or potentially are both happening?) ** Audience: ** - quite specialist, but may include structural biologists and virologist alike. ** Expertise of the reviewer(s): ** *- molecular virologists, RNA viruses - including flaviviruses; replication complex biogenesis, protein-RNA and RNA-RNA interactions. While comfortable with the concepts regarding complex formation, the appropriateness of computational modeling and RNA docking tools as well as structural biology is out of our area of expertise. *






      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *This paper investigates the nucleic acid binding properties of zika virus protease. In particular the data suggest that single stranded RNAs and DNAs are capable of binding to and inhibiting ZIKV protease at micromolar concentrations. With the use of active site inhibitors and mutants that lock the protease in closed and super-open conformation, the authors concluded that RNA binds to the open conformation. Through extensive modeling of the protease and helicase domains, this manuscript provides a model of how ssRNAs can bind to all conformations of the proteas, but the open conformation provides two positively charged forks that should be available to bind RNA. *

      * SECTION A - Evidence, reproducibility, and clarity ** Major comments: **

      *·The main conclusions of this paper rely on the existence of the super-open conformation, however this conformation has not been reported in the scientific literature previously. Structures deposited in the pdb are referenced in this manuscript, however no citation for an accompanying publication is provided. This calls into question the biological relevance of this super open conformation. This is of particular concern because in other highly-homologous flaviviral proteases, structures that have been observed crystallographically (e.g. the open conformation of dengue virus protease) appear to be only very sparsely populated in solution. What is the evidence that the super-open conformation exists in solution.

      • Please, see our reply to question #1 from Reviewer 1.

      • The activity of each of the constructs used was not reported making it impossible to directly compare the impact of these changes on intrinsic activity. In particular, the NS2B-NS3 long construct is predicted to exist in the super-open conformation. If this is correct, it should show no activity against a peptide substrate. *
      • We appreciate these concerns. The NS2B-NS3pro-long construct is proteolytically active (only NS2B-NS3pro-short construct is proteolytically inactive because its NS3pro C-terminal part is too short to fold into the closed conformation). It is unconstrained and likely capable of adopting all possible conformations (closed, open, super open). As we suspected, the negatively charged linker interferes with RNA binding, potentially via direct competition. Investigating the role of the protease-helicase linker is an exciting subject of a separate manuscript in preparation.

      • This paper reports that the IC50 is much weaker than the Kd for binding of ssRNA to ZIKV NS2B-NS3pro. Are orthogonal assays, such as thermal shift assay, available which could distinguish between the reported IC50 and the Kd. *
      • Binding of ssRNA occurs in an area distinct from the protease active center. We think that there is a constant competition between C-terminal NS2B binding/release versus ssRNA binding/release from NS3pro. We think that ssRNA “catches” the moment when protease has the open conformation and freezes that conformation by blocking the C-terminal of NS2B from binding to NS3Pro. In terms of thermal shift assay, the structure of NS3Pro is changed, only the C-terminal of NS2B is affected. Note that the 15N R2 NMR signal from NS2B residues 65-85 is missing in bZiPro alone but re-appears when AcKR is added6. This is consistent with the idea that without AcKR, bZiPro exists in the open conformation where much of the C-terminal part of NS2B is dissociated from NS3Pro and remains unstructured, thus resulting in the lack of NMR signal. Taken together, these observations suggest that thermal shift assay is unlikely to be of much help.

      • *This paper suggests that ssRNA binds to the open conformation of ZIKV NS2B-NS3pro, however no experimental evidence, only modeling has been used to suggest binding to the open conformation. In Dengue virus protease, the M84P variant has been reported to lock the protease into the open conformation. How does the F84P variant of ZIKV NS2B-NS3pro impact ssRNA binding? *

      • We appreciate this question. Indeed, M84P mutation shifts Dengue NS3Pro to the open conformation, which is proteolytically inactive12, consistent with our reverse inchworm model. We have not investigated the effect of this mutation on ZIKV NS3pro. We expect this mutation has a similar effect in ZIKV NS3pro in Dengue NS3Pro.

      • The relevance of the discussion on the co-crystallization of NSC86314 with the Mut7was not clear. What point was being made?

      • We provide a proof-of-principle for a novel class of allosteric inhibitors that specifically target newly identified druggable pockets present in the open and super-open conformations of ZIKV NS2B-NS3pro. Our results suggest that such allosteric inhibitors can interfere with the RNA-binding activities of NS2B-NS3pro in addition to blocking the protease activity. The co-crystallization of NSC86314 with the Mut7 confirms a novel pocked bound by NSC86314.

      *- These data show that both active site and allosteric inhibitors block binding of ssRNA to the protease. The paper also suggests that ssRNA only binds to the open conformation. What is the evidence that the allosteric inhibitors do not enable or promote formation of the open conformation? *

      • We thank this reviewer for an interesting question. Indeed, we have no evidence of whether allosteric inhibitors enable or promote the formation of the open conformation. This is formally possible and will need to be investigated.

      • This paper makes two claims about the function of the protease. The title should specify what those dual functions are (proteolytic activity and ssRNA-recruitment).*
      • We appreciate this reviewer's suggestions for the title.

      • The discussion of Figures 6 and 9 are highly similar. The main takeaway points for both figures seem to be nearly identical: the presence of two positively charged pitchfork on the open conformation. The distinction between these two figures should be more significantly and explicitly stated. *
      • Figure 6 presents several models that provide evidence for the open conformation of ZIKV NS2B-NS3pro being uniquely suitable to bind RNA. Figure 9 presents several models of the entire RNA-NS2B-NS3pro-NS3hel complex anchored into the ER membrane. Figure 9 illustrates that the open conformation of NS2B-NS3pro provides two positively charged/polar forks, contiguous with the positively charged groove on NS3hel. Figure 6 does not illustrate that point.

      *- Mention explicitly in the materials and methods if the 12-amino acid linker is present in all the mutants used. *

      • This is mentioned explicitly and shown in Supplementary Figure 2A.

      Minor comments: ** · Figure 1. The rotation that promotes the transitions from orientation in panel A to that in panel B should be drawn. ** · FAM should be defined in the legend of Figure 2. ** · The term Cold should be changed to unlabeled. ** · Please check labels for the supplementary Figure 2. For example one label states 1-1 but it ** should be 1-170. ** · Figure 1C does not exist and it is referenced in the results section under "NS2B-NS3pro substrate-mimicking inhibitors compete with RNA binding." ** · As discussed above, if the super open conformation is going to be addressed in this paper, then either a reference for the manuscript describing those structures should be included, or this manuscript should include in the materials and methods the procedure on crystallization, data collection, structure determination, refinement, and analysis as well as a table for crystallographic data and refinement statistics. ** · Adjust figure arrangement (ABCED to ABCDE) in Figure 11.

      • We thank this reviewer for all minor comments. We corrected the above-mentioned errors in the manuscript.

      Reviewer #3 (Significance (Required)): ** It is well established that the flaviviral proteases exist in different conformations but most of the structures published are concentrated on the closed conformation which is the one required for effective substrate processing. The open conformation has recently been the subject of increased interest, especially with the discovery of allosteric inhibitors for which modeling suggests that these compounds result in the dissociation of the C-terminal region of NS2B from the NS3. This paper adds important insights into the function of the open conformation and in general implicitly shows the importance of the dynamic nature of ZIKV NS2B-NS3pro. In addition to these insights, this paper aptly demonstrates that ssRNA can bind and inhibit these proteases as has not been shown previously. ** I am a senior graduate student working on characterizing and understanding the mechanism of action of allosteric compounds against viral proteases, specifically proteases from Zika and dengue viruses.

      References.

      1. Weinert T, Olieric V, Waltersperger S, Panepucci E, Chen L, Zhang H, Zhou D, Rose J, Ebihara A, Kuramitsu S, Li D, Howe N, Schnapp G, Pautsch A, Bargsten K, Prota AE, Surana P, Kottur J, Nair DT, Basilico F, Cecatiello V, Pasqualato S, Boland A, Weichenrieder O, Wang BC, Steinmetz MO, Caffrey M, Wang M. Fast native-SAD phasing for routine macromolecular structure determination. Nat Methods. nature.com; 2015 Feb;12(2):131–133. PMID: 25506719
      2. Solis AD, Rackovsky SR. Fold homology detection using sequence fragment composition profiles of proteins. Proteins. 2010 Oct;78(13):2745–2756. PMCID: PMC2933786
      3. Brinkworth RI, Fairlie DP, Leung D, Young PR. Homology model of the dengue 2 virus NS3 protease: putative interactions with both substrate and NS2B cofactor. J Gen Virol. 1999 May;80 ( Pt 5):1167–1177. PMID: 10355763
      4. Aleshin AE, Shiryaev SA, Strongin AY, Liddington RC. Structural evidence for regulation and specificity of flaviviral proteases and evolution of the Flaviviridae fold. Protein Sci. 2007 May;16(5):795–806. PMCID: PMC2206648
      5. Phoo WW, Li Y, Zhang Z, Lee MY, Loh YR, Tan YB, Ng EY, Lescar J, Kang C, Luo D. Structure of the NS2B-NS3 protease from Zika virus after self-cleavage. Nat Commun. 2016 Nov 15;7:13410. PMCID: PMC5116066
      6. Zhang Z, Li Y, Loh YR, Phoo WW, Hung AW, Kang C, Luo D. Crystal structure of unlinked NS2B-NS3 protease from Zika virus. Science. science.org; 2016 Dec 23;354(6319):1597–1600. PMID: 27940580
      7. Luo D, Wei N, Doan DN, Paradkar PN, Chong Y, Davidson AD, Kotaka M, Lescar J, Vasudevan SG. Flexibility between the protease and helicase domains of the dengue virus NS3 protein conferred by the linker region and its functional implications. J Biol Chem. 2010 Jun 11;285(24):18817–18827. PMCID: PMC2881804
      8. Chernov AV, Shiryaev SA, Aleshin AE, Ratnikov BI, Smith JW, Liddington RC, Strongin AY. The two-component NS2B-NS3 proteinase represses DNA unwinding activity of the West Nile virus NS3 helicase. J Biol Chem. 2008 Jun 20;283(25):17270–17278. PMCID: PMC2427327
      9. Xu S, Ci Y, Wang L, Yang Y, Zhang L, Xu C, Qin C, Shi L. Zika virus NS3 is a canonical RNA helicase stimulated by NS5 RNA polymerase. Nucleic Acids Res. 2019 Sep 19;47(16):8693–8707. PMCID: PMC6895266
      10. Klema VJ, Padmanabhan R, Choi KH. Flaviviral Replication Complex: Coordination between RNA Synthesis and 5’-RNA Capping. Viruses. 2015 Aug 13;7(8):4640–4656. PMCID: PMC4576198
      11. Shiryaev SA, Aleshin AE, Muranaka N, Kukreja M, Routenberg DA, Remacle AG, Liddington RC, Cieplak P, Kozlov IA, Strongin AY. Structural and functional diversity of metalloproteinases encoded by the Bacteroides fragilis pathogenicity island. FEBS J. 2014 Jun;281(11):2487–2502. PMCID: PMC4047133
      12. Lee WHK, Liu W, Fan JS, Yang D. Dengue virus protease activity modulated by dynamics of protease cofactor. Biophys J. 2021 Jun 15;120(12):2444–2453. PMCID: PMC8390872