10,000 Matching Annotations
  1. Oct 2024
    1. #ifdef CONFIG_COMPACTION

      This option acts as a policy control mechanism that determines whether the memory compaction feature is included in the kernel build. By including the compaction-related code within #ifdef CONFIG_COMPACTION and #endif [Line 522], the code conditionally compiles these sections based on the configuration setting. This affects the kernel's behavior regarding memory management and fragmentation handling. CONFIG_COMPACTION is defined in https://elixir.bootlin.com/linux/v6.6.42/source/mm/Kconfig#L637 and default is set to Yes or True

    2. #ifdef CONFIG_COMPACTION static bool suitable_migration_source(struct compact_control *cc, struct page *page) { int block_mt; if (pageblock_skip_persistent(page)) return false; if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction) return true; block_mt = get_pageblock_migratetype(page); if (cc->migratetype == MIGRATE_MOVABLE) return is_migrate_movable(block_mt); else return block_mt == cc->migratetype; }

      This code snippet from the Linux kernel's memory compaction system implements policy and configuration logic for determining suitable migration sources during compaction. It's conditionally compiled based on the CONFIG_COMPACTION option, demonstrating configuration-dependent behavior. The suitable_migration_source function encapsulates policy decisions by considering factors such as persistent skip flags, compaction mode, and migration types. It applies different criteria for async direct compaction versus other modes, and implements specific rules for matching migration types, with special handling for movable pages.

    1. batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); batch /= 4; /* We effectively *= 4 below */ if (batch < 1) batch = 1; /* * Clamp the batch to a 2^n - 1 value. Having a power * of 2 value was found to be more likely to have * suboptimal cache aliasing properties in some cases. * * For example if 2 tasks are alternately allocating * batches of pages, one task can end up with a lot * of pages of one half of the possible page colors * and the other with pages of the other colors. */ batch = rounddown_pow_of_two(batch + batch/2) - 1;

      Determine the number of pages for batch allocating based on a heuristic. Using a (2^n - 1) to minimize cache aliasing issues.

      but I think it may also be categorized as a configuration policy because the code execution depends on CONFIG_MMU.

    2. if (!node_isset(node, *used_node_mask)) { node_set(node, *used_node_mask); return node; } for_each_node_state(n, N_MEMORY) { /* Don't want a node to appear more than once */ if (node_isset(n, *used_node_mask)) continue; /* Use the distance array to find the distance */ val = node_distance(node, n); /* Penalize nodes under us ("prefer the next node") */ val += (n < node); /* Give preference to headless and unused nodes */ if (!cpumask_empty(cpumask_of_node(n))) val += PENALTY_FOR_NODE_WITH_CPUS; /* Slight preference for less loaded node */ val *= MAX_NUMNODES; val += node_load[n]; if (val < min_val) { min_val = val; best_node = n; } } if (best_node >= 0) node_set(best_node, *used_node_mask);

      Selects the best node based on a heuristic that takes into account node distance, CPU availability, and load. The code prefers unused nodes, penalizes closer nodes, and gives preference to less-loaded nodes. It then updates the used node mask to prevent reuse of the same node.

    1. if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh && !writeback_in_progress(wb)) wb_start_background_writeback(wb);

      This is a configuration policy that determines whether to start background writeout. The code here indicates that if laptop_mode, which will reduce disk activity for power saving, is not set, then when the number of dirty pages reaches the bg_thresh threshold, the system starts writing back pages.

    2. /* throttle according to the chosen dtc */ dirty_ratelimit = READ_ONCE(wb->dirty_ratelimit); task_ratelimit = ((u64)dirty_ratelimit * sdtc->pos_ratio) >> RATELIMIT_CALC_SHIFT; max_pause = wb_max_pause(wb, sdtc->wb_dirty); min_pause = wb_min_pause(wb, max_pause, task_ratelimit, dirty_ratelimit, &nr_dirtied_pause); if (unlikely(task_ratelimit == 0)) { period = max_pause; pause = max_pause; goto pause; } period = HZ * pages_dirtied / task_ratelimit; pause = period; if (current->dirty_paused_when) pause -= now - current->dirty_paused_when; /* * For less than 1s think time (ext3/4 may block the dirtier * for up to 800ms from time to time on 1-HDD; so does xfs, * however at much less frequency), try to compensate it in * future periods by updating the virtual time; otherwise just * do a reset, as it may be a light dirtier. */ if (pause < min_pause) { trace_balance_dirty_pages(wb, sdtc->thresh, sdtc->bg_thresh, sdtc->dirty, sdtc->wb_thresh, sdtc->wb_dirty, dirty_ratelimit, task_ratelimit, pages_dirtied, period, min(pause, 0L), start_time); if (pause < -HZ) { current->dirty_paused_when = now; current->nr_dirtied = 0; } else if (period) { current->dirty_paused_when += period; current->nr_dirtied = 0; } else if (current->nr_dirtied_pause <= pages_dirtied) current->nr_dirtied_pause += pages_dirtied; break; } if (unlikely(pause > max_pause)) { /* for occasional dropped task_ratelimit */ now += min(pause - max_pause, max_pause); pause = max_pause; } pause: trace_balance_dirty_pages(wb, sdtc->thresh, sdtc->bg_thresh, sdtc->dirty, sdtc->wb_thresh, sdtc->wb_dirty, dirty_ratelimit, task_ratelimit, pages_dirtied, period, pause, start_time); if (flags & BDP_ASYNC) { ret = -EAGAIN; break; } __set_current_state(TASK_KILLABLE); bdi->last_bdp_sleep = jiffies; io_schedule_timeout(pause); current->dirty_paused_when = now + pause; current->nr_dirtied = 0; current->nr_dirtied_pause = nr_dirtied_pause;

      This part of the code makes algorithmic decision on how long a task should throttle based on the rate at which it dirties memory pages.

    1. it is hard to invoke the code under test conditions

      ??? Выключит и включить сервер, во время работы, действительно так сложно???

    1. A Russian agency is pushing new rules of conduct on Central Asian migrants that severely restrict usage of their native languages and warn them about praying in public and sacrificing animals for religious purposes. Central Asian migrants are told about the strict rules and code of behavior in a 70-minute course created by Russia's Federal Agency of Ethnic Affairs (FADN) in seminars in certain parts of the country. It includes for them to have mandatory knowledge of the Russian language and the country's migration laws, not to use their native language when they talk about Russians, and not to whistle at members of the opposite sex. The course also tells them "not to even whisper" in public using their mother tongue, Kommersant reported. Moreover, the common Central Asian practice of addressing people as "brother" or "sister" is said to be unacceptable when referring to Russians. Along with the language and behavioral restrictions, there are also religious limitations. The slaughtering of animals for religious worship in public will also be prohibited.

      Russia cracks down on use of Central Asian languages.

    1. A recipe is like a block of code in programming, except that once it is set, it can be executed like a loop, while the cook is the one who executes the recipe. This is a very similar logic model.

    1. if ((vm_flags & VM_EXEC) && folio_is_file_lru(folio)) { nr_rotated += folio_nr_pages(folio); list_add(&folio->lru, &l_active); continue; }

      The goal is to identify executable code regions and keep them in memory under moderate memory pressure. The heuristic is to check VM_EXEC flag and whether folio is file-backed, and add identified folio back to active list.

    1. /* Soft offline could migrate non-LRU movable pages */ if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page)) return true;

      The code includes a policy for soft offline pages (MF_SOFT_OFFLINE). This is a feature where the kernel attempts to migrate pages to avoid using faulty memory areas. The policy allows non-LRU movable pages (pages that aren’t in the Least Recently Used list) to be migrated.

    1. #ifdef CONFIG_HIBERNATION

      If CONFIG_HIBERNATION is defined, the kernel includes code to write the entire system memory state to the swapfile before powering down the system. This involves allocating swap slots for the entire memory state and ensuring that the data is properly stored.

    1. Table des matières : Exploitation sexuelle des mineures placées Source : Extrait du podcast "Prostitution : des ados placées exploitées sexuellement" par Le Parisien (2 octobre 2024)

      I. Introduction (0:02 - 0:50)

      Présentation du podcast Code Source et du sujet : l'exploitation sexuelle des mineures placées à l'Aide Sociale à l'Enfance (ASE).

      Introduction des journalistes du Parisien, Elsa Mari et Stéphanie Forestier, qui ont mené l'enquête.

      II. Le contexte de l'ASE et les mécanismes de la prostitution (0:51 - 4:48)

      Explication du fonctionnement de l'ASE et du profil des enfants placés. (0:51 - 2:18)

      Dénonciation du manque de moyens de l'ASE et des conséquences pour les enfants placés (foyers saturés, manque d'éducateurs, augmentation du nombre de placements). (2:19 - 3:08)

      Description des méthodes utilisées par les proxénètes pour cibler les adolescentes placées ( internet, devant les foyers, lover boys). (3:09 - 4:29)

      Analyse des facteurs de vulnérabilité des adolescentes placées (recherche d'affection, manque de confiance, passé familial difficile). (4:30 - 4:48)

      III. Témoignage : Le cas de Clara et Dorine (4:49 - 8:07)

      Présentation du cas de Dorine, mère de Clara, placée à l'ASE suite à des difficultés familiales et un événement traumatique. (4:49 - 8:07)

      IV. L'enfer de la prostitution : méthodes et conséquences (8:08 - 12:14)

      Découverte progressive par Dorine de la prostitution de sa fille à travers ses réseaux sociaux. (8:08 - 10:16)

      Description de l'engrenage de la prostitution : passage de la séduction à la violence, drogue, isolement. (5:26 - 6:25)

      Localisation et organisation de l'exploitation sexuelle (appartements loués, Airbnb, sites internet). (6:26 - 6:58)

      Difficultés rencontrées par les éducateurs face à ce phénomène : impuissance, sentiment d'échec, burn-out. (7:03 - 7:52)

      Multiplicité des signalements et arrestations de Clara, témoignant de sa situation et de la difficulté de l'aider. (11:52 - 12:14)

      V. Espoir et conséquences (12:15 - 14:00)

      Décision de placer Clara dans un foyer spécialisé pour mineures victimes de prostitution, loin de son environnement et de sa famille. (12:15 - 12:58)

      Volonté de Dorine de porter plainte contre l'ASE pour manquement à son devoir de protection. (13:01 - 13:29)

      Espoir de Dorine de retrouver sa fille et de la voir reconstruire sa vie. (13:30 - 14:00)

      **VI. Un fléau plus large et un appel à l'action (14:01 - 18:06) **

      Témoignage d'un éducateur inquiet pour une adolescente sous l'emprise d'un homme rencontré en ligne. (14:01 - 14:34)

      Difficulté de chiffrer le nombre de mineures placées victimes de prostitution et ampleur du phénomène au-delà de l'ASE. (14:35 - 15:24)

      Lien entre l'explosion de la prostitution des mineurs et la crise sanitaire (augmentation du temps passé en ligne, banalisation de la prostitution en ligne). (15:25 - 15:46)

      Présentation du cas d'Eva, 16 ans, proxénète de deux Ukrainiennes mineures, illustrant la banalisation de la prostitution chez les jeunes. (15:47 - 17:31)

      Appel à l'action face à l'expansion de ce fléau et à la nécessité de protéger les mineurs. (17:32 - 17:53) Conclusion du podcast et invitation à l'écoute des autres épisodes. (18:07 - 18:39)

    1. We will not consider those to be bots, since they aren’t run by a computer.

      I was attempting to use code to automate publishing tweets on Reddit, but the website detected my activity, and my account was banned. I'm now wondering how I can prevent this in the future when running a bot. How can I make the bot’s actions more discreet and avoid detection?

    2. Note that sometimes people use “bots” to mean inauthentically run accounts, such as those run by actual humans, but are paid to post things like advertisements or political content. We will not consider those to be bots, since they aren’t run by a computer. Though we might consider these to be run by “human computers” who are following the instructions given to them, such as in a click farm:

      By understanding this, one factor that contributes to the "bot" is programming in computers. The action is made by humans, but computer programming with code is more necessary. Also, it distinguishes the unrelated relationship between bots and automations.

    1. Create code so that after x and y are defined, they are compared and if the value of x is less than y it sets the variable result to "x is less than y"; if x is greater than y then result is set to "x is greater than y"; and result is "x and y must be equal" if the values are equal.

      This should also work right?

      x = 10 y = 10

      if x < y: result = "x is less than y" if x > y: result = "x is greater than y" else: result = "x and y must be equal"

    1. An earlier example of single source publishing is One Document Does it all (or ODD). Its principle is to write the documentation for an XML (or TEI) schema, the encoding rules, and the customization details all in the same file (Viglianti 2019). More fundamentally, it is about writing code and documentation in a single process, and it is the origin of the word processors: the developers started to write their documentation behind the code—with a typewriter—before the creation of the first word processor, Electric Pencil (Kirschenbaum 2016, 100).

      C'est intéressant.

    1. Reviewer #1 (Public review):

      Summary:

      In the manuscript the authors describe a new pipeline to measure changes in vasculature diameter upon opt-genetic stimulation of neurons.<br /> The work is useful to better understand the hemodynamic response on a network /graph level.

      Strengths:

      The manuscript provides a pipeline that allows to detect changes in the vessel diameter as well as simultaneously allows to locate the neurons driven by stimulation.<br /> The resulting data could provide interesting insights into the graph level mechanisms of regulating activity dependent blood flow.

      Weaknesses:

      (1) The manuscript contains (new) wrong statements and (still) wrong mathematical formulas.<br /> (2) The manuscript does not compare results to existing pipelines for vasculature segmentation (opensource or commercial).<br /> Comparing performance of the pipeline to a random forest classifier (illastik) on images that are not preprocessed (i.e. corrected for background etc.) seems not a particularly useful comparison.<br /> (3) The manuscript does not clearly visualize performance of the segmentation pipeline (e.g. via 2d sections, highlighting also errors etc.). Thus, it is unclear how good the pipeline is, under what conditions it fails or what kind of errors to expect.<br /> (4) The pipline is not fully open-source due to use of matlab. Also, the pipeline code was not made available during review contrary to the authors claims (the provided link did not lead to a repository). Thus, the utility of the pipeline was difficult to judge.

      Detailed remarks to the revision and new manuscript:

      - Generalizability: The authors addressed the point of generalizability by applying the pipeline to other data sets. This demonstrates that their pipeline can be applied to other data sets and makes it more useful.<br /> However, from the visualizations it's unclear to see the performance of the pipeline, where the pipelines fails etc. The 3d visualizations are not particularly helpful in this respect .<br /> In addition, the dice measure seems quite low, indicating roughly 20-40% of voxels do not overlap between inferred and ground truth. I did not notice this high discrepancy earlier. A through discussion of the errors appearing in the segmentation pipeline would be necessary in my view to better asses the quality of the pipeline.

    1. And, as a nice side note, you happen to now know what the phrase “fully qualified base64 primitives” in KERIpy means. All that means is that your encoded value has been pre-padded, pre-conversion, and has had its type code added to the front, as we did here with substitution, with the exception that some CESR primitives

      BONUS!!!! Thx for pointing this out

    1. Where procedural games differ from walking simulators is in their lack of curation: they let you walk wherever you want, including perhaps into uninteresting places, rather than down a well-prepared path that tells a particular story.

      Even though I am not a huge gamer myself, I have noticed that when I do play games, I often do find myself getting frustrated with parts of games that I deem “unnecessary” to the progression of the story. For example, even during my gameplay of “Gone Home,” I began to get frustrated when I couldn’t find the code to open the safe in the basement, which was making the gameplay take longer than anticipated. Overall, “Gone Home” is an uncomfortable game to play as it goes against many norms and conventions present in more typical online games, such as the existence of a clear and explicit objective or goal that keeps the player interested in the outcome (ex. “winning” the game, defeating a monster, etc.).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reports:

      In the public reports there is only one point we would like to discuss. It concerns our use of a computational model to analyse spatial tumour growth. Citing from the eLife assessment, which reflects several comments of the referees:

      The paper uses published data and a proposed cell-based model to understand how growth and death mechanisms lead to the observed data. This work provides an important insight into the early stages of tumour development. From the work provided here, the results are solid, showing a thorough analysis. However, the work has not fully specified the model, which can lead to some questions around the model’s suitability.

      The observables we use to determine the (i) growth mode and the (ii) dispersion of cells are modelindependent. The method to determine the (iii) rate of cell death does not use a spatial model. Throughout, our computational model of spatial growth is not used to analyze data. Instead, it is used to check that the observables we use can actually discriminate between different growth modes given the limitations of the data. We have expanded the description of the computational model in the revised version, and have released our code on Github. However, the conclusions we reach do not rely on a computational model. Instead, where we estimate parameters, we use population dynamics as described in section S5. The other observables are parameter free and model-independent. We view this as a strength of our approach.

      Recommendations for the authors:

      Reviewer #1:

      (1.1) In Figure 1, the data presented by Ling et al. demonstrate a distinctive “comb” pattern. While this pattern diverges from the conventional observations associated with simulated surface growth, it also differs from the simulated volume growth pattern. Is this discrepancy attributable to insufficient data? Alternatively, could the emergence of such a comb-like structure be feasible in scenarios featuring multiple growth centers, wherein clones congregate into spatial clusters?

      We are unsure what you are referring to. One possibility is you refer to the honey-comb structure formed by the samples of the Ling et al. data shown in Fig. 1A of the main text. This is an artefact arising from the cutting of the histological cut into four quadrants, see Fig. S1 in the SI of Ling et al. The perceived horizontal and vertical “white lines” in our Fig. 1A stems from the lack of samples near the edges of these quadrants. We have added this information to the figure caption.

      An alternative is you are referring to the peaks in Fig 2A of the main text. The three of these peaks indeed stem from individual clones. We have placed additional figures in the SI (S2 B and S2 C) to disentangle the contribution from different clones. The peaks have a simple explanation: each clone contributes the same weight to the histogram. If a clone only has few offspring, this statistical weight is concentrated on a few angles only, see SI Figure S2 B.

      (1.2) I am not sure why there are two sections about “Methods” in the main text: Line 50 as well as Line 293. Furthermore, the methods outlined in the main paper lack the essential details necessary for readers to navigate through the critical aspects of their analysis. While these details are provided in the Supplementary Information, they are not adequately referenced within the methods section of the main text. I would recommend that the authors revise the method sections of the main text to include pertinent descriptions of key concepts or innovations, while also directing readers to the corresponding supplementary method section for further elucidation.

      We have merged the Section “Materials and Methods” at the end of the main text with the SI description of the data in SI 4.2 and placed a reference to this material in the main body.

      (1.3) The impact of the particular push method (proposed in the model) on the resultant spatial arrangement of clones remains unclear. For instance, it’s conceivable that employing a different pushing method (for example, with more strict constraints on direction) could yield a varied pattern of spatial diversity. Furthermore, there is ambiguity regarding the criteria for determining the sequence of the queue housing overlapping cells.

      Regarding the off-lattice dynamics we use, there are indeed many variants one could use. In nonexhaustive trials, we found that the details of the off-lattice dynamics did not affect the results. The reason may be that at each computational step, each cell only moves a very small amount, and differences in the dynamics tend to average out over time.

      We deliberately do not give constraints on the direction. Such constraints emerge in lattice-based models (when preferred directions arise from the lattice symmetry), but these are artifacts of the lattice.

      At cell division the offspring is placed in a random direction next to the parent regardless of whether this introduces an overlap. Cells then push each other along the axis connecting their two centers of mass – unlike in lattice based models a sequence of pushes does not propagate through the tumor straight away but sets off of a cascade of pushes. Equal pushing of two cells (i.e. two initial displacements as opposed to pushing one of the two) results in the same patterns of directed, low dispersion surface and undirected, high dispersion volume growth but is much harder computationally as it reintroduces overlaps that have been resolved in the previous step.

      We have rewritten the description of the pushing queue in the SI Section 1. The choice of the pushing sequence is somewhat arbitrary but we found that it also has no noticable effect on the growth mode. Maybe putting it in contrast to depth-first approaches helps to illustrate this: We tried two queueing schemes for iterating through overlapping cells, width-first and depth-first. In both cases, we begin by scanning a given cell’s (the root’s) neighborhood for overlaps and shuffle the list of overlapping neighbours. In a width-first approach we then add this list to the queue. Subsequent iterations append their lists of overlapping cells to the queue, such that we always resolve overlaps within the neighborhood of the root first. A depth-first approach follows a sequence of pushes by immediately checking a pushed cell’s neighborhood for new overlaps and adding these to the front of the queue (which works more like a stack then). This can be efficiently implemented by recursion but has no noticeable performance advantage and results in the same patterns of directed, low dispersion surface and undirected, high dispersion volume growth. In our opinion the width-first approach of first resolving overlaps in the immediate neighborhood is more intuitive, which is why we adopted it for our simulation model.

      (1.4) For the example presented in S5.1, how can the author identify from genomic data that mutation 3 does not replace its ancestral clade mutation 2? In other words, if mutation 2, 3 and 4 are linked meaning clone 4 survives but 2 and 3 dies, how does one know if clone 3 dies before clone 2? I understand that this is a conceptual example, but if one cannot identify this situation from the real data, how can the clade turnover be computed?

      Thank you for this comment, which points to an error of ours in the turnover example of the SI: Clade 3 does in fact replace 2 and contributes to the turnover! (The algorithm correctly annotated clade 3 as orphaned and computes a turnover of 3/15 for this example). We have corrected this.

      In this example, it does not matter for the clade turnover whether clone 3 dies before clone 2. As long as its ancestor (clone 2) becomes extinct it adds to the clade turnover. The term “replaces” applies to the clade of 3 which has a surviving subclone and thereby eventually replaces clade 2. The clade turnover its solely based on the presence of the mutations (which define their clade) and not on the individual clones.

      (1.5) After reviewing reference 24 (Li et al.), I noticed that the assertions made therein contradict the findings presented in S3 (Mutation Density on Rings). Specifically, Li et al. state that “peripheral regions not only accumulated more mutations, but also contained more changes in genes related to cell proliferation and cell cycle function” (Page 6) and “Phylogenetic trees show that branch lengths vary greatly with the long-branched subclones tending to occur in peripheral regions” (Page 4). However, upon re-analysis of their data, the authors demonstrated a decrease in mutation density near the surface. It is crucial to comprehend the underlying cause of such a disparity.

      The reason for this disparity is the way Li et al. labelled samples as belonging to peripheral or central regions of the tumour. We have added a new figure in the SI to show this: Fig. S14 shows the number of mutations found in samples of Li et al. against their distances from the centre, along with the classification of samples as center/periphery given in Li et al. In the case of tumor T1, the classification of a sample in reference Li et al. does not agree with the distance from the center: samples classified as core are often more distant from the center than those classified as peripheral. Furthermore, Lewinsohn et al. (see below) show in their Fig. 5 that samples classified as ‘center’ by Li all fall into a single clade, and we believe this affects all results derived from this classification. For this reason, we do not consider the classification in reference 24 (Li et al.) further. We now briefly discuss this in Section S3.3.

      (1.6) The authors consider coinciding mutations to occur when offspring clades align with an ancestral clade. Nevertheless, since multiple mutations can arise simultaneously in a single generation (such as kataegis), it becomes essential to discern its impact on clade turnover and, consequently, the estimation of d/b.

      The mutational signatures found here show no sign of kataegis. Also, the number of polymorphic sites in the whole-exome data is small and the mutations are uniformly spread across the exome. The point is well taken, however, the method requires single mutations per generation. In practice, this can be achieved by subsampling a random part of the genome or exome (see [45]). We tested this point by processing the data from only a fraction of the exome; this did not change the results. In particular, Figure S30 shows the turnover-based inference for different subsampling rates L of the Ling et al. data. Subsampling of sites reduces the exome-wide mutation rate, the inferred rate scales linearly with L, as expected.

      (1.7) I could not understand Step 2 in Section S2.1, an illustration may be helpful.

      We have added figure S2 explaining the directional angle algorithm to Section S2.1 in the supplementary information.

      (1.8) Figure S2, does a large rhoc lead to volume growth rather than surface growth, not the other way around?

      Thank you for catching this mix-up!

      Reviewer #2

      I do have a few minor comments/questions, but I am confident the authors will be able to address them appropriately.

      (2.1) Line 56: I am not sure what the units of “average read depth 74X” is in terms of SI units?

      This number gives the number of sequence reads covering a particular nucleotide and is dimensionless. We have added this information.

      (2.2) Lines 63 - 68: I am unsure what is meant by the terms “T1 of ca.” and “T2 of ca.”. Can these also be explained/defined please?

      These refer to the approximate (circa) diameters of tumor 1 and tumor 2 in the data by Li et al. We have expanded the abbreviations.

      (2.3) Line 69: I would like to see a more extensive description of the cell-based model here in the main text, such as how do the cells move. Moreover, do cells have a finite reach in space, do they have a volume/area?

      We have expanded the model description in the main body of the paper and placed information there that previously was only in the SI.

      (2.4) Line 76: You have said cells can “push” one another in your model. Do they also “pull” one another? Cell adhesion is know to contribute to tumour integrity - so this seems important for a model of this nature.

      We have not implemented adhesive forces between pairs of cells so far. This would cause a higher pressure under cell growth (which can have important physiological consequences). However, the hard potential enforcing a distance between adjacent cells would still lead to cells pushing each other apart under population growth, so we expect to see the dispersion effect we discuss even when there is adhesion.

      (2.5) Line 80-81: “due to lack of nutrient”. Is nutrient included in this work? It is my understanding it is not. No problem if so, it is just that this line makes it seem like it is and important. If it is not, the authors should mention this in the same sentence.

      Thank you for pointing out this source of misunderstanding, your understanding is correct and we have modified the text to remove the ambiguity.

      (2.6) Line 94-95: Since you are interested in tissue growth, recent work has indicated how the cell boundary (and therefore tissue boundary) description influences growth. Please also be sure to indicate this when you describe the model.

      We presume you refer to the recent paper by Lewinsohn et al. (Nature Ecology and Evolution, 2023), which reports a phylogenetic analysis based on the Li et al. data. Lewinsohn et al. find that cells near the tumour boundary grow significantly faster than those in the tumour’s core. This is at variance with what we find; we were not aware of this paper at the time of submission. We now refer to this paper in the main text, and also have included a new section S3.4 in the SI accounting for this discrepancy. If you refer to a different paper, please let us know.

      Briefly, we repeat the analysis of Lewinsohn et al., using their algorithm on artificial data generated by our model under volume growth. Samples were placed precisely like they were placed in the tumor analyzed by Li et al. We find that, even though the data was generated by volume growth, the algorithm of Lewinsohn et al. finds a signal of surface growth, in many cases even stronger compared to the signal which Lewinsohn et al. find in the empirical data. We have added subsection S3.4 with new figure S15 in the Supplementary Information.

      (2.7) Line 107: “thus no evidence for enhanced cell growth near the edge of the tumour”. It is unclear to me how this tells us information relative to the tumour edge. It seems to me this is an artifact that at the edge of the tumour, there are less cells to compare with? Could you please expand on this a bit?

      The direction angles tell us if new mutations arise predominantly radially outwards. With this observable, surface growth would lead to a non-uniform distribution of these angles even if we restrict the analysis to samples from the interior of the tumor (which, under surface growth, was once near the surface). So the effect is not linked to fewer cells for comparison. Also, we have checked the direction angles in simulations under different growth modes with the samples placed in the same way as in the data (see Figs. S3 and S4 right panels). We have expanded the text in the main text, section Results accordingly.

      (2.8) I really enjoyed the clear explanation between lines 119 and 122 regarding cell dispersion!

      Thank you!

      (2.9) Figure 2B: Since you are looking at a periodic feature in theta, I would have expected the distribution to be periodic too, and therefore equal at theta=-180=180. Can you explain why it is different, please? Interestingly, you simulated data does seem to obey this!

      The distribution of theta is periodic but the binning and midpoints of bins were chosen badly. We have replotted the diagram with bin boundaries that handle the edge-points -180/180 correctly. Thank you for pointing this out.

      (2.10) Figure 3B: This plot does not have a title. Also, what do the red vertical lines in plots 3B, 3C and 3D indicate?

      We have added the title. The red lines indicate the expectation values of the distributions.

      (2.11) Figure 4: I am unsure how to read the plot in 4B. Also, what does the y-axis represent in 4C and 4D?

      We have added explanations for 4B and have placed the labels for 4C and 4D in the correct position on the y-axes.

      (2.12) Lines 194-199: you discuss your inferred parameters here, but you do not indicate how you inferred these parameters. May you please briefly mention how you inferred these, please?

      These were inferred using the turnover method explained in the paragraph above, we have expanded the information. A full account is given in the SI Section S5.

      (2.13) Line 258-260: “... mutagen (aristolochic acid) found in herbal traditional Chinese medicine and thought to cause liver cancer.” I do not see what this sentence adds to the work. Could you please be clearer with the claim you are making here?

      Mutational signatures allow to infer underlying mutational processes. The strongest signature found in the data is associated with a mutagen that has in the past been used in traditional Chinese medicines. The patients from whom the tumours were biopsied were from China, so past exposure to this potent mutagen is possible. We are not making a big claim here, the mutational signature of aristolochic acid and its cancerogenic nature has been well studied and is referenced here. The result is interesting in our context because in one of the datasets (Li et al.) the signature is present in early (clonal) mutations but absent in later ones, allowing to make inferences from present data on the past. We have added the information that the patients were from China.

      (2.14) In your Supplementary Information, S1, I believe your summation should not be over i, as you state in the following it is over cells within 7 cell radii. Please fix this by possibly defining a set which are those within 7 cell radii.

      We have done this.

    1. Create code that sets var to the sentence “It takes us 165 minutes to get home from camp.”. Then append the sentence “165 minutes is also 2 hours and 45 minutes.” to the variable. The blocks have been mixed up and include extra blocks that aren’t correct.

      The "correct" answer gives 'It takes us 165 minutes to get home from camp. 165 minutes is also 2 hours and 45.' instead of 'It takes us 165 minutes to get home from camp. 165 minutes is also 2 hours and 45 minutes.' so actually it is wrong and is missing '+ "minutes" '

    1. Epigenese

      Epigenese = proces waarbij omgevingsinvloeden veranderingen veroorzaken in de werking van genen zonder dat de genetische code zelf verandert

  2. Sep 2024
    1. Welcome back in this video I want to talk about SSL and TLS.

      At a very high level they do the same thing.

      SSL stands for Secure Sockets Layer whereas TLS is Transcore Layer Security.

      TLS is just a newer and more secure version of SSL.

      Now we've got a lot to cover so let's jump in and get started.

      TLS and historically SSL provide privacy and data integrity between client and server.

      If you browse through this site to Netflix, to your bank and to almost any responsible internet business, TLS will be used for the communications between the client and the server.

      TLS performs a few main functions and while these are separate, they're usually performed together and referred to as TLS or SSL.

      First, TLS ensures privacy and it does this by ensuring communications made between a client and server are encrypted so that only the client and server have access to the unencrypted information.

      When using TLS the process starts with an asymmetric encryption architecture.

      If you've watched my encryption 101 video, you'll know that this means that a server can make its public key available to any clients so that clients can encrypt data that only that server can decrypt.

      Asymmetric encryption allows for this trustless encryption where you don't need to arrange for the transfer of keys over a different secure medium.

      As soon as possible though you should aim to move from asymmetric towards symmetric encryption and use symmetric encryption for any ongoing encryption requirements because computationally it's far easier to perform symmetric encryption.

      So part of the negotiation process which TLS performs is moving from asymmetric to symmetric encryption.

      Another function that TLS provides is identity verification.

      This is generally used so that the server that you think you're connecting to, for example Netflix.com, is in fact Netflix.com.

      TLS is actually capable of performing full two-way verification but generally for the vast majority of situations it's the client which is verifying the server and this is done using public key cryptography which I'll talk more about soon.

      Finally TLS ensures a reliable connection.

      This is a very simple way to do it.

      The client can detects against the alteration of data in transit.

      If data is altered then the protocol can detect this alteration.

      Now in order to understand TLS a little better let's have a look at the architecture visually.

      When a client initiates communications with a server and TLS is used there are three main phases to initiate secure communication.

      First Cypher suites are agreed, authentication happens and then keys are exchanged.

      These three phases start from the point that a TCP connection is active between the client and the server so this is layer four.

      And at the end of the three phases there's an encryption communication channel between a client and a server.

      This each stage is responsible for one very specific set of functions.

      The first stage focuses on Cypher suites.

      Now a Cypher suite is a set of protocols used by TLS.

      This includes a key exchange algorithm, a bulk encryption algorithm and a message authentication code algorithm or MAC.

      Now there are different algorithms and versions of algorithms for each of these and specific versions and types grouped together are known as a Cypher suite.

      So to communicate the client and server have to agree a Cypher suite to use.

      Now let's step through this visually.

      We have a client and a server and at this starting point we already have a TCP connection so TCP segments between the client and the server.

      The first step is that the client does a client hello and this contains the SSL or TLS version, a list of Cypher suites that the client supports and other things like a session ID and extensions.

      Hopefully at this point the server supports one of the Cypher suites that the client also supports.

      If not then the connection fails.

      If it does then it picks a specific one and it returns this as part of the server hello.

      Now included in this server hello is also the server certificate which includes the server's public key.

      Now this public key can be used to encrypt data which the client can send to the server which only the server can decrypt using its private key.

      But keep in mind that this is asymmetric encryption and it's really computationally heavy and we want to move away from this as soon as possible.

      Now at some point in the past the server has generated a private and public key pair and it's the public part of this which is sent back to the client.

      But and this is important part of TLS is ID validation.

      If the client just confirmed that the server it's communicating with is valid then you could exploit this.

      I could create a server which pretends to be Netflix.com without being Netflix.com and this is suboptimal.

      So it's important to understand and I'll talk more about this in a second that part of the functionality provided by TLS is to verify that the server that you're communicating with is the server that it claims to be.

      The next step of the TLS process is authentication.

      The client needs to be able to validate that the server certificate the server provides is valid, that its public key is valid and as such that the server itself is valid.

      To illustrate how this works let's rewind a little from a time perspective.

      So the server has a certificate.

      Now the certificate you can think of as a document, a piece of data which contains its public key, its DNS name and other pieces of organizational information.

      Now there's another entity involved here known as a public certificate authority or CA.

      Now there are a few of these run by independent companies and your operating system and browser trust many of these authorities and which ones is controlled by the operating system and browser vendors.

      Now at some point in the past our server and let's say this is for Categoram.io created a public and private key pair and in addition it generated a certificate signing request or CSR.

      It provided the CSR to one of the public certificate authorities and in return this CA delivered back a signed certificate.

      The CA signed a certificate which means that you can verify that the CA signed that certificate.

      If your operating system or browser trusts the certificate authority then it means your operating system or browser can verify that the CA that it trusts signed that cert.

      This means that your OS or browser trusts the certificate and now the Categoram.io server that we're using as an example has this certificate and that certificate has been provided to the client as part of the server hello in Stage 1 of the TLS negotiation.

      In Stage 2 of authentication our client which has the server certificate validates that the public certificate authority signed that certificate.

      It makes sure that it was signed by that specific CA, it makes sure that the certificate hasn't expired, it verifies that the certificate hasn't been revoked and it verifies that the DNS name that the browser is using in this case Categoram.io matches the name or the names on the certificate.

      This proves that the server ID is valid and it does this using this third party CA.

      Next the client attempts to encrypt some random data and send it to the server using the public key within the certificate and this makes sure that the server has the corresponding private key.

      This is the final stage of authentication.

      If we're at this point and everything is good, the client trusts the server, its ID has been validated and the client knows that the server can decrypt data which is being sent.

      It's at this point that we move on to the final phase which is the key exchange phase.

      This phase is where we move from asymmetric encryption to symmetric encryption.

      This means it's much easier computationally to encrypt and decrypt data at high speeds.

      We start this phase with a valid public key on the client and a matching private key on the server.

      The client generates what's known as a pre-master key and it encrypts this using the server's public key and sends it through to the server.

      The server decrypts this with its private key and so now both sides have the exact same pre-master key.

      Now based on the cipher suite that's being used, both sides now follow the same process to convert this pre-master key into what's known as a master secret.

      And because the same process is followed on the same pre-master key, both sides then have the same master secret.

      The master secret is used over the lifetime of the connection to create many session keys and it's these keys which are used to encrypt and decrypt data.

      So at this point both sides confirm the process and from this point onwards the connection between the client and server is encrypted using different session keys over time.

      So this is the process that's followed when using TLS.

      Essentially we verified the identity of the server that we're communicating with, we've negotiated an encryption method to use, we've exchanged asymmetric for symmetric encryption keys and we've initiated this secure communications channel.

      And this process happens each and every time that you communicate with the server using HTTPS.

      Now that's everything I wanted to cover within this video so go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies. 

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful, and the data generally support the conclusions. 

      Strengths: 

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions. 

      Weaknesses: 

      (1) Based on the exceedingly small volume of solution used to form the hydrogel in the well, there may be many unexpanded cells in the well and possibly underneath the expanded hydrogel at the end of this. How would this affect the image acquisition, analysis, and interpretation of HiExM data? 

      The hydrogel footprint covers approximately 5% of the surface within an individual well and only cells within this area are embedded in the polymerized hydrogel for subsequent processing steps. Cells that are outside of this footprint are not incorporated into the gel because these cells are digested by Proteinase K and washed away by the excess water exchange in the gel swelling step. Note that different cell types may require higher or lower concentrations of Proteinase K to adequately digest cells for expansion while maintaining fluorescence signal. Given the compatibility of HiExM with 96-well plates, this titration can be performed rapidly in a single experiment. Although cells outside of the hydrogel footprint are removed prior to imaging, we do occasionally observe Hoechst signal that appears to be underneath the gels. We believe this signal is likely from excess DNA from digested cells that was not fully washed out in the gel swelling step. This signal is both spatially and morphologically distinct from the nuclear signal of intact cells and it does not affect image acquisition, analysis, or data interpretation. 

      (2) It is unclear why the expansion factor is so variable between plates (e.g., Figure 2H). This should be discussed in more detail. 

      The variability in expansion factor across plates can likely be attributed to the small volume of gel solution (~250 nL) required for expansion within 96 well plates. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, gels in HiExM are more sensitive to evaporation because of the ~1000x reduced volume compared to standard expansion gel preparations, resulting in an increased air-liquid-interface. Evaporation in HiExM gels would increase monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that variance is slightly increased between plates. These considerations are discussed in the revised manuscript.

      (3) The authors claim that CF dyes are more resistant to bleaching than other dyes. However, in Figure. S3, it appears that half of the CF dyes tested still show bleaching, and no data is shown supporting the claim that Alexa dyes bleach. It would be helpful to include data supporting the claim that Alexa dyes bleach more than CF dyes and the claim that CF dyes in general are resistant to bleaching should be modified to more accurately reflect the data shown. 

      We did not show data using Alexa dyes because these fluorophores are highly sensitive to photobleaching using Irgacure and thus we could not obtain images. In contrast, some CF dyes are more robust to bleaching in HiExM including CF488A, CF568, and CF633 dyes.  We have recently adapted our protocol to PhotoExM chemistry which is compatible with a wider range of fluorophores as described by Günay et al. (2023) and as shown in Fig. S16.

      (4) Related to the above point, it appears that Figure S11 may be missing the figure legend. This makes it hard to understand how HiExM can use other photo-inducible polymerization methods and dyes other than CF dyes.

      We revised the legend for revised Fig. S11 (now Fig. S16) as follows: Example of a cell expanded in HiExM using Photo-ExM gel chemistry. Photo-ExM does not require an anoxic environment for gel deposition and polymerization, improving ease of use of HiExM. Mitochondria were stained with an Alexa 647 conjugated secondary antibody, demonstrating that HiExM is compatible with additional fluorophores when combined with Photo-ExM.

      (5) The use of automated high-content imaging is impressive. However, it is unclear to me how the increased search space across the extended planar area and focal depths in expanded samples is overcome. It would be helpful to explain this automated imaging strategy in more detail. 

      We imaged plates on the Opera Phenix using the PreciScan Acquisition Software in Harmony. In brief, each well is imaged at 5x magnification in the Hoechst channel to capture the full well at low resolution. Hoechst is used for this step given its signal brightness, ubiquity across established staining protocols, and spectral independence from most fluorophores commonly conjugated to secondary antibodies. Using this information, the microscope detects regions of interest (nuclei) based on criteria including size, brightness, circularity, etc. Finally, the positional information for each region is stored, and the microscope automatically images those regions at 63x magnification. The working distance for the objective used in this study is 600 µm which is sufficient to capture the entirety of expanded cells in the Z direction. This strategy minimizes offtarget imaging and allows robust image acquisition even in cultures with lower seeding density. A detailed description of the automated imaging strategy is included in the methods section of the revised manuscript.

      (6) The general method of imaging pre- and post-expansion is not entirely clear to me. For example, on page 5 the authors state that pre-expansion imaging was done at the center of each gel. Is pre-expansion imaging done after the initial gel polymerization? If so, this would assume that the gelation itself has no effect on cell size and shape if these gelled but not yet expanded cells are used as the reference for calculating expansion factor and isotropy. 

      Pre-expansion imaging is performed after staining is complete, but prior to the application of AcX, which is the first step of the HiExM protocol. Following staining and imaging, plates can be sealed with parafilm and stored at 4˚C for up to a week prior to starting the expansion protocol. We typically image 61 fields of view at the center of the well plate (where the gel will be deposited) to obtain sufficient pre-expansion images as shown in Figure 2b (left). After preexpansion imaging, we perform the HiExM protocol followed by image acquisition. We then tile all the images, as shown in Figure 2b, and compare tiled images from the same well pre- and post-expansion to manually identify the same cells. Comparisons of the pre- and postexpansion images of the same cell are used to calculate expansion factor and isotropy measurements as described. A detailed description of this process is included in the revised manuscript.

      (7) In the dox experiments, are only 4 expanded nuclei analyzed? It is unclear in the Figure 3 legend what the replicates are because for the unexpanded cells, it says the number of nuclei but for expanded it only says n=4. If only 4 nuclei are analyzed, this does not play to the strengths of HiExM by having high throughput.

      We performed the doxorubicin titration assay across four different well plates (n=4). For each condition, the total number of expanded nuclei measured was 118, 111, 110, 113, and 77 for DMSO, 1nM, 10nM, 100nM, and 1µM, respectively. For SEM calculations, we included the number of independent experiments to avoid underestimating error. We revised the Fig. 3 legend to include these experimental details.

      (8) I am not sure if the analysis of dox-treated cells is accurate for the overall phenotype because only a single slice at the midplane is analyzed. It would be helpful to show, at least in one or two example cases, that this trend of changing edge intensity occurs across the whole 3D nucleus.  

      For this analysis, the result is heavily dependent on the angle at which the edge of the nucleus intersects the image plane in the orthogonal view. For this reason, we opted to only use the optimal image plane for each nucleus. We repeated our analysis on an image using multiple optical sections to demonstrate this point. These new data are included as Fig. S11 of the revised manuscript.

      (9) It would be helpful to provide an actual benchmark of imaging speed or throughput to support the claims on page 8 that HiExM can be combined with autonomous imaging to capture thousands of cells a day. What is the highest throughput you have achieved so far?  

      The parameters that dictate imaging speed in HiExM include exposure time, z-stack height, and number of fluorophore channels. Depending on the signal intensity for a given channel, exposure times vary from 200ms to 1000ms. For z-stack height, we found that imaging 65 sections with 1µm spacing allowed for robust identification of each region of interest in the 5x pre-scan. As an example, collecting images for a full well plate (e.g., 20 images per well with 4 channels) requires approximately 24 hours of autonomous image acquisition using the Opera Phenix. Depending on cell size, this process yields imaging data for 1200 cells (1 cell per field of view) to 6000 cells (5 cells per field of view). Different autonomous imagers as well as improving staining techniques that increase signal:noise can be expected to significantly decrease the exposure time as it will reduce the number of z-stacks needed for each region.

      Reviewer #2 (Public Review): 

      Summary: 

      In the present work, the authors present an engineering solution to sample preparation in 96well plates for high-throughput super-resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit the expansion of the gel. A device was engineered that can spot a small droplet of hydrogel solution and keep it in place as it polymerizes. It occupies only a small portion of space at the center of each well, the gel can expand into all directions, and imaging and staining can proceed by liquid handling robots and an automated microscope. 

      Strengths: 

      In contrast to Reference 8, the authors' system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high-throughput ExM and highthroughput super-resolution microscopy, which is a timely and important goal. 

      Weaknesses: 

      The assay they chose to demonstrate what high-throughput ExM could be useful for, is not very convincing. But for this reviewer that is not important. 

      We believe the data provide an example of the utility of HiExM that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.) by enabling easier sample processing and autonomous acquisition of thousands of nanoscale images in parallel. The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this work is to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM.

      Reviewer #3 (Public Review):

      Summary: 

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand the toroidal gel within each well.  

      Strengths: 

      This configuration eliminates the need for transferring gels to other dishes or wells, thereby enhancing the throughput and reproducibility of parallel expansion microscopy. This methodological uniqueness indicates the applicability of HiExM in detecting subtle cellular changes on a large scale. 

      Weaknesses: 

      To demonstrate the potential utility of HiExM in cell phenotyping, drug studies, and toxicology investigations, the authors treated hiPS-derived cardiomyocytes with a low dose of doxycycline (dox) and quantitatively assessed changes in nuclear morphology. However, this reviewer is not fully convinced of the validity of this specific application. Furthermore, some data about the effect of expansion require reconsideration. 

      The application we chose was intended as a methods proof-of-concept that could enable future deep biological investigations using HiExM. We believe the data provide an example of the utility of HiExM for collecting thousands of nanoscale images that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.). The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this experiment was to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM. 

      The variability in expansion factor across plates can likely be attributed to the small volume (~250 nL) deposited by the device posts. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, HiExM gels are more sensitive to evaporation due to an increased air-liquid-interface because they are ~1000x smaller than standard expansion gel preparations. Evaporation in HiExM gels likely increases monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that the expansion factor can be more variable between plates, likely due to differences in gel volumes and evaporation. Future iterations of the platform are expected to control for these environmental conditions. These differences are discussed in the revised manuscript.

      Recommendations for the authors:.

      Reviewer #1 (Recommendations For The Authors):

      (1) Please include a scale bar in Figure 3a.

      A scale bar has been added to Figure 3a.

      (2) Please show the data related to nuclear volume after dox treatment.

      We have added a supplementary figure (Fig. S10) showing nuclear volume and sphericity for post-expansion nuclei as well as nuclear area and circularity for pre-expansion nuclei.

      (3) I think it would be extremely helpful for the method as a whole if analysis code and files for device fabrication were made publicly available rather than upon request.

      The analysis code has been included in the supplementary files as CM_Hoechst_Analysis_for publication.ipynb. Device design files are also available at the supplementary files link as hiExM_device.SLDPRT (96-well plate device) and MultiExM_24_July28_2022.SLDPRT (24-well plate device).

      (4) Some details are missing from the methods, such as the concentration of AcX used for HiExM, the concentration of antibodies, etc. Related, how long does the photopolymerization take? Just the 60 seconds that the UVA light is on?

      Additional protocol details are included in the methods section of the revised manuscript. The photopolymerization does only take 60 seconds.

      Reviewer #2 (Recommendations For The Authors):

      (1) The first three references are chosen a little strangely here. I suggest citing STED, SIM, and PALM/STORM from the original manuscripts here. Also, EM is technically not a super-resolution technique as it is within the resolution of electron beams. This reviewer would stay with light microscopy methods when discussing "super-resolution".

      We removed the reference to EM and added citations to the original publications for SIM, STED, and STORM.

      (2) The sentence after citation 4 is a little off in its meaning.

      We have edited the sentence to improve clarity.

      (3) It is highly useful and great that the authors include the observations on the effect of photopolymerization with Irgacure 2959 on dyes.

      (4) In the discussion, the authors could mention new high NA silicone oil objectives that may further optimise the resolution in their scheme.

      We added a sentence in the discussion to reflect this important point.

      (5) The files for the manufacture of the HiExM devices must be in the supplementary data rather than available on request.

      The Solidworks designs for the 96 and 24 well plate devices are included in the supplementary files as hiExM_device.SLDPRT and MultiExM_24_July28_2022.SLDPRT, respectively.

      (6) It would be useful if the authors could discuss their thoughts on the high throughput processing of expansion factors in the data analysis routine.

      We added details to the methods section describing how images are processed and analyzed.

      Reviewer #3 (Recommendations For The Authors):

      Major:

      (1) In the experiments depicted in Figure 3, the authors attempted cellular phenotyping using hiPCS-derived cardiomyocytes treated with doxorubicin (dox). They addressed that the relative intensity of Hoechst at the nuclear periphery increased solely in post-expansion images, although this trend is not clearly evidenced in the provided data (e.g., DMSO control vs. 1 nM dox, Figure 3b). Moreover, this observed phenomenon lacks clear biological significance and may not be suitable as a demonstration for proof-of-concept (POC) acquisition. It is crucial to delineate the biological processes linked with the specific enhancement of DNA binding dye signals in the nuclear periphery and how to rule out the possibility of heterogeneous redistribution of nuclear components rather than enhancing resolution. For instance, if this change can be associated with a biological process such as DNA damage, quantitative detection of the accumulated proteins related to DNA repair, or the specific histone marks, may be more suitable and less susceptible to heterogeneous expansion factors. Additionally, the authors noted the absence of significant changes in nuclear volume, yet the corresponding data was not presented. Moreover, the application insufficiently demonstrated the HiExM's scalable feature employing various well plates. If only acquiring images of dozens of nuclei (Figure 3 legend, p15), a single well per condition would suffice. Therefore, it is necessary to elucidate why this application necessitates a 96-well format for demonstration purposes. The potential experimental design should also incorporate the requirement for well-to-well replication and the acquisition of features at the individual well level, rather than at the single-cell level. Also, related to Figure S10, whether outer gradient slope, but not inner gradient slope, is linked to apoptosis (Page 8, Line 2-4) remains unclear in the H2O2-treated cells.

      We believe the data provide an example of the utility of HiExM that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.) by enabling easier sample processing and autonomous acquisition of thousands of nanoscale images in parallel. The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this work is to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of the HiExM method. As discussed in the manuscript, dox treatment is associated with DNA damage, cellular stress, and apoptosis, and commonly observed at high dox concentrations (>200 nM) in in vitro studies using conventional microscopy. Our data suggest that cardiomyocytes exhibit sensitivity to lower concentrations of dox than previously anticipated. Although direct evidence specifically linking dox to increased DNA condensation at the nuclear periphery is limited, the known proapoptotic effects of dox strongly suggest that our observations correlate with these changes. We have now included the data analysis on nuclear morphology in revised Fig. S10. We agree that deeper biological interpretation of the observed changes in Hoechst signal upon dox treatment (or other cellular stressors such as H2O2) using HiExM and whether these changes are correlated with DNA damage or other cellular alterations remains an exciting future direction to develop a more sensitive platform for assessing drug responses.

      For expanded samples, we performed the doxorubicin titration assay across four different well plates (n=4). For each condition, the total number of nuclei measured was 118, 111, 110, 113, and 77 for DMSO, 1nM, 10nM, 100nM, and 1µM, respectively. We apologize for the confusion with respect to the number of replicates and cells analyzed. For SEM calculations, we used the number of independent experiments to avoid underestimating error. 

      (2) In Figure 2b, do the orange arrows indicate the same cell with a unique shape in both the pre- and post-expansion images? Additionally, in Figure 3b, why do the pre- and post-expansion nuclei exhibit such different global shapes? Considering that the gel may freely rotate within the well during expansion, it raises doubts about whether one can identify cells with consistent shapes in both the pre- and post-expansion images. Furthermore, this reviewer observed a similar issue regarding reproducibility among different well plates, as shown in Figure 2h. The panel illustrates that different plates yielded distinct populations of gel sizes. The expansion factors provided in the figure legend (page 13) ranged from 3.5x to 5.1x across gels, indicating a relatively large variation in expansion size. What is the reason behind these variations, and how can they be minimized? These variations could become critical when considering large-scale screening across multiple plates.

      The orange arrow is intended to indicate the same cell with a unique shape in both the pre- and post-expansion images, albeit at a different orientation given that the gel is not fixed within the well. We agree that improved methods to identify the same cells pre- and post-expansion could facilitate error measurements. We have referenced recent methods that could be combined with HiExM to automate and improve error and distortion detection to the discussion of the revised manuscript. 

      Fig. 2 illustrates the ability of HiExM to achieve reproducible gel formation with minimal error within gels, wells, and across plates, measurements consistent with proExM. While uniform within gels, the expansion factor is somewhat variable between gels and plates. We attribute these differences primarily to the small size of the gels, making them vulnerable to the effects of evaporation between experiments. We note this variability should be taken into consideration for studies where absolute length measurements between plates are important for biological interpretation. Future iterations of the platform that allow precise delivery of gel volumes and that minimizes environmental exposure are expected to improve the expansion factor reproducibility across plates to further enable the use of HiExM as a tool for high-throughput nanoscale imaging.

      Minor:

      (1) Considering the signal loss due to photobleaching and fluorophore dilution during expansion, protein imaging may occasionally lack the sensitivity required to detect subtle morphological changes in cellular machinery. This potential limitation should be addressed or discussed in the text.

      A sentence reflecting this point has been added to the manuscript.

      (2) On page 15, the figure legend for panel d states, "Heatmaps of nuclei in b showing..." However, it appears that the panel referred to in this sentence corresponds to panel c.

      The typo has been fixed.

      (3) The type of glass 96-well plate utilized in this study should be specified, as the quality of the product could impact the expansion results.

      The supplier and product number of the well plate used in our study has been added to the methods section.

      (4) In Figure S3, the raw pixel values of CF305 dye are exceptionally low. Is there a specific reason for the very low signals observed when using this dye?

      CF® 350 (305 was a typo) does not excite well at 405 nm, which is the excitation wavelength for the channel we used.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Meissner and colleagues described a novel take on a classic social cognition paradigm developed for marmosets. The classic pull task is a powerful paradigm that has been used for many years across numerous species, but its analog approach has several key limitations. As such, it has not been feasible to adopt the task for neuroscience experiments. Here the authors capture the spirit of the classic task but provide several fundamental innovations that modernize the paradigm - technically and conceptually. By developing the paradigm for marmosets, the authors leverage the many advantages of this primate model for studies of social brain functions and their particular amenability to freely-moving naturalistic approaches.

      Strengths:

      The current manuscript describes one of the most exciting paradigms in primate social cognition to be developed in many years. By allowing for freely-moving marmosets to engage in high numbers of trials, while precisely quantifying their visual behavior (e.g. gaze) and recording neural activity this paradigm has the potential to usher in a new wave of research on the cognitive and neural mechanisms underlying primate social cognition and decision-making. This paradigm is an elegant illustration of how naturalistic questions can be adapted to more rigorous experimental paradigms. Overall, I thought the manuscript was well written and provided sufficient details for others to adopt this paradigm. I did have a handful of questions and requests about topics and information that could help to further accelerate its adoption across the field.

      Weaknesses:

      LN 107 - Otters have also been successful at the classic pull task (https://link.springer.com/article/10.1007/s10071-017-1126-2)

      We have added this reference to the manuscript.

      LN 151 - Can you provide a more precise quantification of timing accuracy than the 'sub-second level'. This helps determine synchronization with other devices.

      We have included more precise timing details, noting that data is stored at the millisecond level.

      Using this paradigm, the marmosets achieved more trials than in the conventional task (146 vs 10). While this is impressive, given that only ~50 are successful Mutual Cooperation trials it does present some challenges for potential neurophysiology experiments and particular cognitive questions. The marmosets are only performing the task for 20 minutes, presumably because they become sated and are no longer motivated. This seems a limitation of the task and is something worth discussing in the manuscript. Did the authors try other food rewards, reduce the amount of reward, food/water restrict the animals for more than the stated 1-3 hours? How might this paradigm be incorporated into in-cage approaches that have been successful in marmosets? Any details on this would help guide others seeking to extend the number of trials performed each day.

      We have added a discussion addressing the use of liquid rewards, minimal food and water restriction, and the potential for further optimization to increase task engagement and trial numbers. This is now reflected in the revised manuscript.

      Can you provide more details on the DLC/Anipose procedure? How were the cameras synchronized? What percentage of trials needed to be annotated before the model could be generalized? Did each monkey require its own model, or was a single one applied to all animals?

      We have added more detailed information on the DLC and Anipose tracking which can be found in the Multi-animal 3D tracking section under Materials & Methods.

      Will the schematics and more instructions on building this system be made publicly available? A number of the components listed in Table 1 are custom-designed. Although it is stated that CAD files will be made available upon request, sharing a link to these files in an accessible folder would significantly add to the potential impact of this paradigm by making it easier for others to adopt.

      We have made the SolidWorks CAD files publicly available. They can now be found in the Github repository alongside the apparatus and task code.

      In the Discussion, it would be helpful to have some discussion of how this paradigm might be used more broadly. The classic pulling paradigm typically allows one to ask a specific question about social cognition, but this task has the potential to be more widely applied to other social decision-making questions. For example, how might this task be adopted to ask some of the game-theory-type approaches common in this literature? Given the authors' expertise in this area, this discussion could serve to provide a roadmap for the broader field to adopt.

      Although this paradigm was developed specifically for marmosets, it seems to me that it could readily be adopted in other species with some modifications. Could the authors speak to this and their thoughts on what may need to be changed to be used in other species? This is particularly important because one of the advantages of the classic paradigm is that it has been used in so many species, providing the opportunity to compare how different species approach the same challenge. For example, though both chimps and bonobos are successful, their differences are notably illuminating about the nuances of their respective social cognitive faculties.

      We have expanded the discussion for the broader applications of this apparatus both for other decision-making research questions as well as its adaptability for use in other species.

      Reviewer #2 (Public Review):

      Summary:

      This important work by Meisner et al., developed an automated apparatus (MarmoAPP) to collect a wide array of behavioral data (lever pulling, gaze direction, vocalizations) in marmoset monkeys, with the goal of modernizing collection of behavioral data to coincide with the investigation of neurological mechanisms governing behavioral decision making in an important primate neuroscience model. The authors show a variety of "proof-of-principle" concepts that this apparatus can collect a wide range of behavioral data, with higher behavioral resolution than traditional methods. For example, the authors highlight that typical behavioral experiments on primate cooperation provide around 10 trials per session, while using their approach the authors were able to collect over 100 trials per 20-minute session with the MarmoAAP.

      Overall the authors argue that this approach has a few notable advantages:<br /> (1) it enhances behavioral output which is important for measuring small or nuanced effects/changes in behavior;<br /> (2) allows for more advanced analyses given the higher number of trials per session;<br /> (3) significantly reduces the human labor of manually coding behavioral outcomes and experimenter interventions such as reloading apparatuses for food or position;<br /> (4) allows for more flexibility and experimental rigor in measuring behavior and neural activity simultaneously.

      Strengths:

      The paper is well-written and the MarmoAPP appears to be highly successful at integrating behavioral data across many important contexts (cooperation, gaze, vocalizations), with the ability to measure significantly many more behavioral contexts (many of which the authors make suggestions for).

      The authors provide substantive information about the design of the apparatus, how the apparatus can be obtained via a long list of information Apparatus parts and information, and provide data outcomes from a wide number of behavioral and neurological outcomes. The significance of the findings is important for the field of social neuroscience and the strength of evidence is solid in terms of the ability of the apparatus to perform as described, at least in marmoset monkeys. The advantage of collecting neural and freely-behaving behavioral data concurrently is a significant advantage.

      Weaknesses:

      While this paper has many significant strengths, there are a few notable weaknesses in that many of the advantages are not explicitly demonstrated within the evidence presented in the paper. There are data reported (as shown in Figures 2 and 3), but in many cases, it is unclear if the data is referenced in other published work, as the data analysis is not described and/or self-contained within the manuscript, which it should be for readers to understand the nature of the data shown in Figures 2 and 3.

      (1) There is no data in the paper or reference demonstrating training performance in the marmosets. For example, how many sessions are required to reach a pre-determined criterion of acceptable demonstration of task competence? The authors reference reliably performing the self-reward task, but this was not objectively stated in terms of what level of reliability was used. Moreover, in the Mutual Cooperation paradigm, while there is data reported on performance between self-reward vs mutual cooperation tasks, it is unclear how the authors measured individual understanding of mutual cooperation in this paradigm (cooperation performance in the mutual cooperation paradigm in the presence or absence of a partner; and how, if at all, this performance varied across social context). What positive or negative control is used to discern gained advantages between deliberate cooperation vs two individuals succeeding at self-reward simultaneously?

      Thank you for your comment. This Tools & Resources paper is focused solely on the development of the apparatus and methods. Future publications will provide more details on training performance, learning behaviors, and include appropriate controls to distinguish deliberate cooperation from simultaneous success in self-reward tasks.

      (2) One of the notable strengths of this approach argued by the authors is the improved ability to utilize trials for data analysis, but this is not presented or supported in the manuscript. For example, the paper would be improved by explicitly showing a significant improvement in the analytical outcome associated with a comparison of cooperation performance in the context of ~150 trials using MarmoAAP vs 10-12 trials using conventional behavioral approaches beyond the general principle of sample size. The authors highlight the dissection of intricacies of behavioral dynamics, but more could be demonstrated to specifically show these intricacies compared to conventional approaches. Given the cost and expertise required to build and operate the MarmoAAP, it is critical to provide an important advantage gained on this front. The addition of data analysis and explicit description(s) of other analytical advantages would likely strengthen this paper and the advantages of MarmoAAP over other behavioral techniques.

      Thank you for the suggestion. While this manuscript focuses on the apparatus and methods, the increase in trial numbers itself provides clear advantages, including greater statistical power and more robust analyses of behavioral dynamics. Future publications will offer more in-depth analyses comparing the performance and cooperation behavior observed with MarmoAAP, further demonstrating these analytical benefits.

      Reviewer #3 (Public Review):

      Summary:

      The authors set out to devise a system for the neural and behavioral study of socially cooperative behaviors in nonhuman primates (common marmosets). They describe instrumentation to allow for a "cooperative pulling" paradigm, the training process, and how both behavioral and neural data can be collected and analyzed. This is a valuable approach to an important topic, as the marmoset stands as a great platform to study primate social cognition. Given that the goals of such a methods paper are to (a) describe the approach and instrumentation, (b) show the feasibility of use, and (c) quantitatively compare to related approaches, the work is easily able to meet those criteria. My specific feedback on both strengths and weaknesses is therefore relatively limited in scope and depth.

      Strengths:

      The device is well-described, and the authors should be commended for their efforts in both designing this system but also in "writing it up" so that others can benefit from their R&D.

      The device appears to generate more repetitions of key behavior than other approaches used in prior work (with other species).

      The device allows for quantitative control and adjustment to control behavior.

      The approach also supports the integration of markerless behavioral analysis as well as neurophysiological data.

      Weaknesses:

      A few ambiguities in the descriptions are flagged below in the "Recommendations for authors".

      The system is well-suited to marmosets, but it is less clear whether it could be generalized for use in other species (in which similar behaviors have been studied with far less elegant approaches). If the system could impact work in other species, the scope of impact would be significantly increased, and would also allow for more direct cross-species comparisons. Regardless, the future work that this system will allow in the marmoset will itself be novel, unique, and likely to support major insights into primate social cognition.

      Thank you for this feedback. We have expanded the discussion to include how the apparatus could be adapted for use in other species, highlighting the potential modifications required, such as adjusting the size and strength of the servo motor and components. These changes would enable broader applications and facilitate cross-species comparisons.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Understanding large-scale neural activity remains a formidable challenge in neuroscience. While several methods have been proposed to discover the assemblies from such large-scale recordings, most previous studies do not explicitly model the temporal dynamics. This study is an attempt to uncover the temporal dynamics of assemblies using a tool that has been established in other domains.

      The authors previously introduced the compositional Restricted Boltzmann Machine (cRBM) to identify neuron assemblies in zebrafish brain activity. Building upon this, they now employ the Recurrent Temporal Restricted Boltzmann Machine (RTRBM) to elucidate the temporal dynamics within these assemblies. By introducing recurrent connections between hidden units, RTRBM could retrieve neural assemblies and their temporal dynamics from simulated and zebrafish brain data.

      Strengths:

      The RTRBM has been previously used in other domains. Training in the model has been already established. This study is an application of such a model to neuroscience. Overall, the paper is well-structured and the methodology is robust, the analysis is solid to support the authors' claim.

      Weaknesses:

      The overall degree of advance is very limited. The performance improvement by RTRBM compared to their cRBM is marginal, and insights into assembly dynamics are limited.

      (1) The biological insights from this method are constrained. Though the aim is to unravel neural ensemble dynamics, the paper lacks in-depth discussion on how this method enhances our understanding of zebrafish neural dynamics. For example, the dynamics of assemblies can be analyzed using various tools such as dimensionality reduction methods once we have identified them using cRBM. What information can we gain by knowing the effective recurrent connection between them? It would be more convincing to show this in real data.

      See below in the recommendations section.

      (2) Despite the increased complexity of RTRBM over cRBM, performance improvement is minimal. Accuracy enhancements, less than 1% in synthetic and zebrafish data, are underwhelming (Figure 2G and Figure 4B). Predictive performance evaluation on real neural activity would enhance model assessment. Including predicted and measured neural activity traces could aid readers in evaluating model efficacy.

      See below in the recommendations section.

      Recommendations:

      (1) The biological insights from this method are constrained. Though the aim is to unravel neural ensemble dynamics, the paper lacks in-depth discussion on how this method enhances our understanding of zebrafish neural dynamics. For example, the dynamics of assemblies can be analyzed using various tools such as dimensionality reduction methods once we have identified them using cRBM. What information can we gain by knowing the effective recurrent connection between them? It would be more convincing to show this in real data.

      We agree with the reviewer that our analysis does not explore the data far enough to reach the level of new biological insights. For practical reasons unrelated to the science, we cannot further explore the data in this direction at this point, however, funding permitting, we will pick up this question at a later stage. The only change we have made to the corresponding figure at the current stage was to adapt the thresholds, which better emphasizes the locality of the resulting clusters.

      (2) Despite the increased complexity of RTRBM over cRBM, performance improvement is minimal. Accuracy enhancements, less than 1% in synthetic and zebrafish data, are underwhelming (Figure 2G and Figure 4B). Predictive performance evaluation on real neural activity would enhance model assessment. Including predicted and measured neural activity traces could aid readers in evaluating model efficacy.

      We thank the reviewer kindly for the comments on the performance comparison between the two models. We would like to highlight that the small range of accuracy values for the predictive performance is due to both the sparsity and stochasticity of the simulated data, and is not reflective of the actual percentage in performance improvement. To this end, we have opted to use a rescaled metric that we call the normalised Mean Squared Error (nMSE), where the MSE is equal to 1 minus the accuracy, as the visible units take on binary values. This metric is also more in line with the normalised Log-Likelihood (nLLH) metric used in the cRBM paper in terms of interpretability. The figure shows that the RTRBM can significantly predict the state of the visible units in subsequent time-steps, whereas the cRBM captures the correct time-independent statistics but has no predictive power over time.

      We also thank the reviewer for pointing out that there is no predictive performance evaluation on the neural data. This has been chosen to be omitted for two reasons. First, it is clear from Fig. 2 that the (c)RBM has no temporal dependencies, meaning that the predictive performance is determined mostly by the average activity of the visible units. If this corresponds well with the actual mean activity per neuron, the nMSE will be around 0. This correspondence is already evaluated in the first panel of 3F. Second, as this is real data, we can not make an estimate of a lower bound on the MSE that is due to neural noise. Because of this, the scale of the predictive performance score will be arbitrary, making it difficult to quantitatively assess the difference in performance between both models.

      (3) The interpretation of the hidden real variable $r_t$ lacks clarity. Initially interpreted as the expectation of $\mathbf{h}_t$, its interpretation in Eq (8) appears different. Clarification on this link is warranted.

      We thank the reviewer kindly for the suggested clarification. However, we think the link between both values should already be sufficiently clear from the text in lines 469-470:

      “Importantly, instead of using binary hidden unit states 𝐡[𝑡−1], sampled from the expected real valued hidden states 𝐫[𝑡−1], the RTRBM propagates these real-valued hidden unit states directly.”

      In other words, both indeed are the same, one could sample a binary-valued 𝐡[𝑡-1] from the real-valued 𝐫[𝑡-1] through e.g. a Bernoulli distribution, where 𝐫[𝑡-1] would thus indeed act as an expectation over 𝐡[𝑡−1]. However, the RTRBM formulation keeps the real-valued 𝐫[𝑡-1] to propagate the hidden-unit states to the next time-step. The motivation for this choice is further discussed in the original RTRBM paper (Sutskever et al. 2008).

      (4) In Figure 3 panel F, the discrepancy in x-axis scales between upper and lower panels requires clarification. Explanation regarding the difference and interpretation guidelines would enhance understanding.

      Thank you for pointing out the discrepancy in x-axis scales between the upper and lower panels of Figure 3F. The reason why these scales are different is that the activation functions in the two models differ in their range, and showing them on the same scale would not do justice to this difference. But we agree that this could be unclear for readers. Therefore we added an additional clarification for this discrepancy in line 215:

      “While a direct comparison of the hidden unit activations between the cRBM and the RTRBM is hindered by the inherent discrepancy in their activation functions (unbounded and bounded, respectively), the analysis of time-shifted moments reveals a stronger correlation for the RTRBM hidden units ($r_s = 0.92$, $p<\epsilon$) compared to the cRBM ($r_s = 0.88$, $p<\epsilon$)”

      (5) Assessing model performance at various down-sampling rates in zebrafish data analysis would provide insights into model robustness.

      We agree that we would have liked to assess this point in real data, to verify that this holds as well in the case of the zebrafish whole-brain data. The main reason why we did not choose to do this in this case is that we would only be able to further downsample the data. Current whole brain data sets are collected at a few Hz (here 4 Hz, only 2 Hz in other datasets), which we consider to be likely slower than the actual interaction speed in neural systems, which is on the order of milliseconds between neurons, and on the order of ~100 ms (~10 Hz) between assemblies. Therefore reducing the rate further, we expect to only see a reduction in quality, which we considered less interesting than finding an optimum. Higher rates of imaging in light-sheet imaging are only achievable currently by imaging only single planes (which defies the goal of whole brain recordings), but may be possible in the future when the limiting factors (focal plane stepping and imaging) are addressed. For completeness, we have now performed the downstepping for the experimental data, which showed the expected decrease in performance. The results have been integrated into Figure 4.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors propose an extension to some of the last author's previous work, where a compositional restricted Boltzmann machine was considered as a generative model of neuron-assembly interaction. They augment this model by recurrent connections between the Boltzmann machine's hidden units, which allow them to explicitly account for temporal dynamics of the assembly activity. Since their model formulation does not allow the training towards a compositional phase (as in the previous model), they employ a transfer learning approach according to which they initialise their model with a weight matrix that was pre-trained using the earlier model so as to essentially start the actually training in a compositional phase. Finally, they test this model on synthetic and actual data of whole-brain light-sheet-microscopy recordings of spontaneous activity from the brain of larval zebrafish.

      Strengths:

      This work introduces a new model for neural assembly activity. Importantly, being able to capture temporal assembly dynamics is an interesting feature that goes beyond many existing models. While this work clearly focuses on the method (or the model) itself, it opens up an avenue for experimental research where it will be interesting to see if one can obtain any biologically meaningful insights considering these temporal dynamics when one is able to, for instance, relate them to development or behaviour.

      Weaknesses:

      For most of the work, the authors present their RTRBM model as an improvement over the earlier cRBM model. Yet, when considering synthetic data, they actually seem to compare with a "standard" RBM model. This seems odd considering the overall narrative, and it is not clear why they chose to do that. Also, in that case, was the RTRBM model initialised with the cRBM weight matrix?

      Thank you for raising the important point regarding the RTRBM comparison in the synthetic data section. Initially, we aimed to compare the performance of the cRBM with the cRTRBM. However, we encountered significant challenges in getting the RTRBM to reach the compositional phase. To ensure a fair and robust comparison, we opted to compare the RBM with the RTRBM.

      A few claims made throughout the work are slightly too enthusiastic and not really supported by the data shown. For instance, when the authors refer to the clusters shown in Figure 3D as "spatially localized", this seems like a stretch, specifically in view of clusters 1, 3, and 4.

      Thanks for pointing out this inaccuracy. When going back to the data/analyses to address the question about locality, we stumbled upon a minor bug in the implementation of the proportional thresholding, causing the threshold to be too low and therefore too many neurons to be considered.

      Fixing this bug reduces the number of neurons, thereby better showing the local structure of the clusters. Furthermore, if one would lower the threshold within the hierarchical clustering, smaller, and more localized, clusters would appear. We deliberately chose to keep this threshold high to not overwhelm the reader with the number of identified clusters. We hope the reviewer agrees with these changes and that the spatial structure in the clusters presented are indeed rather localized.

      Moreover, when they describe the predictive performance of their model as "close to optimal" when the down-sampling factor coincided with the interaction time scale, it seems a bit exaggerated given that it was more or less as close to the upper bound as it was to the lower bound.

      We thank the reviewer for catching this error. Indeed, the best performing model does not lay very close to the estimated performance of an optimal model. The text has been updated to reflect this.

      When discussing the data statistics, the authors quote correlation values in the main text. However, these do not match the correlation values in the figure to which they seem to belong. Now, it seems that in the main text, they consider the Pearson correlation, whereas in the corresponding figure, it is the Spearman correlation. This is very confusing, and it is not really clear as to why the authors chose to do so.

      Thank you for identifying the discrepancy between the correlation values mentioned in the text and those presented in the figure. We updated the manuscript to match the correlation coefficient values in the figure with the correct values denoted in the text.

      Finally, when discussing the fact that the RTRBM model outperforms the cRBM model, the authors state it does so for different moments and in different numbers of cases (fish). It would be very interesting to know whether these are the same fish or always different fish.

      Thank you for pointing this out. Keeping track of the same fish across the different metrics makes sense. We updated the figure to include a color code for each individual fish. As it turns out each time the same fish are significantly better performing.

      Recommendations:

      Figure 1: While the schematic in A and D only shows 11 visible units ("neurons"), the weight matrices and the activity rasters in B and C and E and F suggest that there should be, in fact, 12 visible units. While not essential, I think it would be nice if these numbers would match up.

      Thank you for pointing out the inconsistency in the number of visible units depicted in Figure 1. We agree that this could have been confusing for readers. The figure has been updated accordingly. As you suggested, the schematic representation now accurately reflects the presence of 12 visible units in both the RBM and RTRBM models.

      Figure 3: Panel G is not referenced in the main text. Yet, I believe it should be somewhere in lines 225ff.

      Thank you for mentioning this. We added in line 233 a reference to figure 3 panel G to refer to the performance of the cRBM and RTRBM on the different fish.

      Line 637ff: The authors consider moments <v\_i h\_μ> and <v\_i h\_j>, and from the context, it seems they are not the same. However, it is not clear as to why because, judging from the notation, they should be the same.

      The second-order statistic <v\_i h\_j> on line 639 was indeed already mentioned and denoted as <v\_i h\_μ> on line 638. It has now been removed accordingly in the updated manuscript.

      I found the usage of U^ and U throughout the manuscript a bit confusing. As far as I understand, U^ is a learned representation of U. However, maybe the authors could make the distinction clearer.

      We understand the usage of Û and U throughout the text may be confusing for the reader. However, we would like to notify the reviewer that the distinction between these two variables is explained in line 142: “in addition to providing a close estimate (̂Û) to the true assembly connectivity matrix U”. However, for added clarification to the reader, we added additional mentions of the estimated nature of Û throughout the text in the updated manuscript.

      Equation 3: It would be great if the authors could provide some more explanation of how they arrived at the identities.

      These identities have previously been widely described in literature. For this reason, we decided not to include their derivation in our manuscript. However, for completeness, we kindly refer to:

      Goodfellow, I., Bengio, Y., & Courville, A. (2016). Chapter 20: Deep generative models [In Deep Learning]. MIT Press. https://www.deeplearningbook.org/contents/generative_models.html

      Typos:

      -  L. 196: "connectiivty" -> "connectivity"

      -  L. 197: Does it mean to say "very strong stronger"?

      -  L. 339: The reference to Dunn et al. (2016) should appear in parentheses.

      -  L. 504f: The colon should probably be followed by a full sentence.

      -  Eq. 2: In the first line, the potential V still appears, which should probably be changed to show the concrete form (-b * h) as in the second line.

      -  L. 351: Is there maybe a comma missing after "cRBM"?

      -  L. 271: Instead of "correlation", shouldn't it rather be "similarity"? - L. 218: "Figure 3D" -> "Figure 3F"

      We thank the reviewer for pointing out these typos, which have all (except one) been fixed in the text. We do emphasize the potential V to show that there are alternative hidden unit potentials that can be chosen. For instance, the cRBM utilizes dReLu hidden unit potentials.

      Reviewer #3 (Public Review):

      With ever-growing datasets, it becomes more challenging to extract useful information from such a large amount of data. For that, developing better dimensionality reduction/clustering methods can be very important to make sense of analyzed data. This is especially true for neuroscience where new experimental advances allow the recording of an unprecedented number of neurons. Here the authors make a step to help with neuronal analyses by proposing a new method to identify groups of neurons with similar activity dynamics. I did not notice any obvious problems with data analyses here, however, the presented manuscript has a few weaknesses:

      (1) Because this manuscript is written as an extension of previous work by the same authors (van der Plas et al., eLife, 2023), thus to fully understand this paper it is required to read first the previous paper, as authors often refer to their previous work for details. Similarly, to understand the functional significance of identified here neuronal assemblies, it is needed to go to look at the previous paper.

      We agree that the present Research Advance has been written in a way that builds on our previous publication. It was our impression that this was the intention of the Research Advance format, as spelled out in its announcement "eLife has introduced an innovative new type of article – the Research Advance – that invites the authors of any eLife paper to present significant additions to their original research". In the previous formatting guidelines from eLife this was more evident with a strong limitation on the number of figures and words, however, also for the present, more liberal guidelines, place an emphasis on the relation to the previous article. We have nonetheless tried in several places to fill in details that might simplify the reading experience.

      (2) The problem of discovering clusters in data with temporal dynamics is not unique to neuroscience. Therefore, the authors should also discuss other previously proposed methods and how they compare to the presented here RTRBM method. Similarly, there are other methods using neural networks for discovering clusters (assemblies) (e.g. t-SNE: van der Maaten & Hinton 2008, Hippocluster: Chalmers et al. 2023, etc), which should be discussed to give better background information for the readers.

      The clustering methods suggested by the reviewer do not include modeling any time dependence, which is the crucial advance presented here by the introduction of the RTRBM, in extending the (c)RBM. In our previous publication on the cRBM (an der Plas et al., eLife, 2023), this comparison was part of the discussion, although it focussed on a different set of methods. While clustering methods like t-SNE, UMAP and others certainly have their value in scientific analysis, we think it might be misleading the reader to think that they achieve the same task as an RTRBM, which adds the crucial dimension of temporal dependence.

      (3) The above point to better describe other methods is especially important because the performance of the presented here method is not that much better than previous work. For example, RTRBM outperforms the cRBM only on ~4 out of 8 fish datasets. Moreover, as the authors nicely described in the Limitations section this method currently can only work on a single time scale and clusters have to be estimated first with the previous cRBM method. Thus, having an overview of other methods which could be used for similar analyses would be helpful.

      We think that the perception that the RTRBM performs only slightly better is based on a misinterpretation of the performance measure, which we have tried to address (see comments above) in this rebuttal and the manuscript. In addition we would like to emphasize that the structural estimation (which is still modified by the RTRBM, only seeded by the cRBMs output), as shown in the simulated data, makes improved structural estimates, which is important, even in cases where the performance is comparable (which can be the case if the RBM absorbs temporal dependencies of assemblies into modified structure of assemblies). We have clarified this now in the discussion.

      Recommendations:

      (1) Line 181: it is not explained how a reconstruction error is defined.

      Dear reviewer, thanks for pointing this out. A definition of the (mean square) reconstruction error is added in this line.

      (2) How was the number of hidden neurons chosen and how does it affect performance?

      Thank you for pointing this out. Due to the fact that we use transfer learning, the number of hidden units used for the RTRBM is given by the number of hidden units used for training the cRBM. In further research, when the RTRBM operates in the compositional phase, we can exploit a grid search over a set of hyper parameters to determine the optimal set of hidden units and other parameters.

    1. How do you divide out responsibility for a bots actions between the person writing the code and the person running the program?

      I feel that most of the responsibility should lie in the person who ran the code. They had the final say on what the bot would do rather than the programmer who just created the bot.

    1. About a year ago, the Wall Street Journal reported that Microsoft's $10-a-month Github Copilot (which generates code and suggests changes  to the software you're building) loses the company on average $20-a-user-a-month, and in some cases costs Microsoft as much as $80-a-user-a-month. While it's possible that Microsoft could have found ways to make Github Copilot more efficient, this seriously suggests that Microsoft 365 Copilot loses money in much the same way, though generating code is a little more compute-intensive.

      Really? REALLY really?!?

    1. Standards Back to main menu Standards Understand the various specifications, their maturity levels on the web standards track, and their adoption. Explore web standards About W3C web standards W3C standards & drafts Types of documents W3C publishes Translations of W3C standards & drafts Reviews & public feedback Liaisons Promote web standards Groups Back to main menu Groups A variety of groups develop Web Standards, guidelines, or supporting materials. About W3C groups Working Groups Interest Groups Community Groups Business Groups Technical Architecture Group Invited Experts Participant guidebook Positive work environment Get involved Back to main menu Get involved W3C works at the nexus of core technology, industry needs, and societal needs. Find ways to get involved Browse our work by industry Become a Member Member Home (restricted) Mailing lists Make a donation Sponsor an event Resources Back to main menu Resources Master Web fundamentals, use our developer tools, or contribute code. Learn from W3C resources Developers Validators & tools Accessibility fundamentals Internationalization Translations of W3C standards and drafts Code of conduct Reports News & events Back to main menu News & events Recent content across news, blogs, press releases, media; upcoming events. Follow news & events News Blog Press releases Press & media Events Annual W3C Conference (TPAC) Code of conduct About Back to main menu About Understand our values and principles, learn our history, look into our policies, meet our people. Find out more about us Our mission Leadership Staff Evangelists Careers Diversity Corporation Sponsoring W3C Media kit Contact Policies & legal information Help Search

      Clear, well-organized menus simplify navigation for all users, including those relying on screen readers or keyboard navigation. It enhances usability by reducing confusion and making content easier to find.

    1. There are two varieties of compilers: standard compiler: takes a whole computer program and turn it all into binary so it can be run later interpreter: turns the computer language code into binary as it is running the program

      I don't have any knowledge surrounding programming but I wonder whether compilers or interpreters would run code faster. I would assume compilers would but perchance it's negligible.

    1. Research participation was confidential and on a voluntarybasis. All interviews were recorded with respondents’ permissionasked at the beginning of each interview (see Appendix B).The study was conducted according to the Ethical Principles ofPsychologists and Code of Conduct of the American PsychologicalAssociation, 2019. An ethics approval was not required byinstitutional guidelines or national regulations in line with the“German Research Foundation” guidelines, as the used data wereanonymized, and no disclosure outside the research is possibl

      no ethical concerns

    Annotators

    1. And when I inevitably need to review something a few months from now, I know exactly where to look. For example, I will want to measure whether the app is actually getting faster, and I will want to use the exact same methodology and code as at the start. Thankfully, both are right there in my memex trail.

      I find writing something, and coming back a couple days later to reread it with a fresh mind is very helpful. A "memex" medium would remind me to do that, if you spend 3 hours writing something it should call you up and ask, hey wana read through this again so future you a long time from now will make sure to understand it

    2. Imagine that you and I are working in the same company. I tell you there’s a new project for us two to work on. I explain it to you and you get reasonably excited. And then I tell you that I’ve started a new “bloorp” in BloorpyBase, a piece of software from 2012 that almost nobody uses. You grudgingly install BloorpyBase. The app doesn’t use the same keyboard shortcuts you’re used to. The shortcut normally assigned to adding a comment instead minimizes all windows. Sigh. You try to link some exploratory source code to it, but BloorpyBase only works with Mercurial. Sigh. You read some of my initial thoughts and try to respond but you don’t know what’s the best way to do it. Should you create another bloorp? Should you make a suggestion, or an edit? You spend half an hour reading a “How To Bloorp” guide on the internet but come back empty handed. Sigh.

      This is too real...

    3. A piece of software that works with your existing files, and which people around you can use, will generally win over some new way of doing things that you first need to migrate to, and then also ask others around you to migrate to as well.

      I got a friend of mine using Obsidian, but they don't know how to share stuff with people... teaching people git can be hard

      Multiuser git, is a nightmare when you are not writing code

    1. “dynamically typed”,

      a programming language where the interpreter assigns a type to a variable based on the variable's value at runtime. Interpreter Translate the code line-by-line as the code runs.

    2. a programming language where the interpreter assigns a type to a variable based on the variable's value at runtime. Interpreter Translate the code line-by-line as the code runs.

    1. In the contemporary era, both print and electronic texts are deeply interpenetrated by code. Digital technologies are now so thoroughly integrated with commercial printing processes that print is more properly considered a particular output form of electronic text than an entirely separate medium. Nevertheless, electronic text remains distinct from print in that it literally cannot be accessed until it is performed by properly executed code

      This statement emphasises the convergence of print and digital media, as both are influenced by code, but also highlights a fundamental difference: electronic texts depend on the execution of code for access and perception. Whereas print is a static outcome of digital processes, e-literature requires a dynamic interaction where the text only fully exists when the technology is activated. This emphasises the performative nature of digital texts and their dependence on code for meaning and accessibility, which distinguishes them from traditional print.

    2. The immediacy of code to the text's performance is fundamental to understanding electronic literature, especially to appreciating its specificity as a literary and technical production. Major genres in the canon of electronic literature emerge not only from different ways in which the user experiences them but also from the structure and specificity of the underlying code.

      This is the most interesting thing about EL. The interweaving of computer and media technologies creates really surprising things, including the creation of new forms of literature. Coding makes it possible to introduce interactivity, visibility, and other forms of working with text. Even now, I use these technologies, which would not be possible without electronic devices and coding. It would have no sense in a printed form (and, in fact, it is not possible).

    1. minifiers

      To minify JS, CSS and HTML files, comments and extra spaces need to be removed, as well as crunch variable names so as to minimize code and reduce file size. The minified file version provides the same functionality while reducing the bandwidth of network requests.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrate that it is possible to carry out eQTL experiments for the model eukaryote S. cerevisiae, in "one pot" preparations, by using single-cell sequencing technologies to simultaneously genotype and measure expression. This is a very appealing approach for investigators studying genetic variation in single-celled and other microbial systems, and will likely inspire similar approaches in non-microbial systems where comparable cell mixtures of genetically heterogeneous individuals could be achieved.

      Strengths:

      While eQTL experiments have been done for nearly two decades (the corresponding author's lab are pioneers in this field), this single-cell approach creates the possibility for new insights about cell biology that would be extremely challenging to infer using bulk sequencing approaches. The major motivating application shown here is to discover cell occupancy QTL, i.e. loci where genetic variation contributes to differences in the relative occupancy of different cell cycle stages. The authors dissect and validate one such cell cycle occupancy QTL, involving the gene GPA1, a G-protein subunit that plays a role in regulating the mating response MAPK pathway. They show that variation at GPA1 is associated with proportional differences in the fraction of cells in the G1 stage of the cell cycle. Furthermore, they show that this bias is associated with differences in mating efficiency.

      Weaknesses:

      While the experimental validation of the role of GPA1 variation is well done, the novel cell cycle occupancy QTL aspect of the study is somewhat underexploited. The cell occupancy QTLs that are mentioned all involve loci that the authors have identified in prior studies that involved the same yeast crosses used here. It would be interesting to know what new insights, besides the "usual suspects", the analysis reveals. For example, in Cross B there is another large effect cell occupancy QTL on Chr XI that affects the G1/S stage. What candidate genes and alleles are at this locus? And since cell cycle stages are not biologically independent (a delay in G1, could have a knock-on effect on the frequency of cells with that genotype in G1/S), it would seem important to consider the set of QTLs in concert.

      We thank the reviewer for this suggested clarification. We have modified the text to make it clear that cell cycle occupancy is a compositional phenotype. Like the reviewer, we also noticed the distal trans eQTL hotspot on Chr XI in Cross B, but we were not able to identify compelling candidate gene(s) or variant(s) despite extensive effort.

      Reviewer #2 (Public Review):

      Boocock and colleagues present an approach whereby eQTL analysis can be carried out by scRNA-Seq alone, in a one-pot-shot experiment, due to genotypes being able to be inferred from SNPs identified in RNA-Seq reads. This approach obviates the need to isolate individual spores, genotype them separately by low-coverage sequencing, and then perform RNA-Seq on each spore separately. This is a substantial advance and opens up the possibility to straightforwardly identify eQTLs over many conditions in a cost-efficient manner. Overall, I found the paper to be well-written and well-motivated, and have no issues with either the methodological/analytical approach (though eQTL analysis is not my expertise), or with the manuscript's conclusions.

      I do have several questions/comments.

      393 segregant experiment:

      For the experiment with the 393 previously genotyped segregants, did the authors examine whether averaging the expression by genotype for single cells gave expression profiles similar to the bulk RNA-Seq data generated from those genotypes? Also, is it possible (and maybe not, due to the asynchronous nature of the cell culture) to use the expression data to aid in genotyping for those cells whose genotypes are ambiguous? I presume it might be if one has a sufficient number of cells for each genotype, though, for the subsequent one-pot experiments, this is a moot point.

      As mentioned in our preliminary response, while it is possible to expand the analysis along these lines, this is not relevant for the subsequent one-pot experiments. We have made all the data available so that anyone interested can try these analyses.

      Figure 1B:

      Is UMAP necessary to observe an ellipse/circle - I wouldn't be surprised if a simple PCA would have sufficed, and given the current discussion about whether UMAP is ever appropriate for interpreting scRNA-Seq (or ancestry) data, it seems the PCA would be a preferable approach. I would expect that the periodic elements are contained in 2 of the first 3 principal components. Also, it would be nice if there were a supplementary figure similar to Figure 4 of Macosko et al (PMID 26000488) to indeed show the cell cycle dependent expression.

      We have added two new figures (S2 and S3) that represent alternative visualizations of the cell-cycle that are not dependent on UMAP. Figure S2 shows plots of different pairs of principal components, with each cell colored by its assigned cell-cycle stage. We do not observe a periodic pattern in the first 3 principal components as the reviewer expected, but when we explore the first 6 principal components, we see combinations of components that clearly separate the cell cycle clusters. We emphasize that the clusters were generated using the Louvain algorithm and assigned to cell-cycle stages using marker genes, and that UMAP was used only for visualization.

      We could not create a figure similar to Macosko et al. because of differences between the cell cycle categories we used and those of Spellman et al (PMID 9843569). We instead created Figure S3 to address the reviewer's comment. This figure uses a heatmap in a style similar to that of Macosko et al. to display cell-cycle-dependent expression of the 22 genes we used as cell cycle markers across each of the five cell cycle stages (M/G1, G1, G1/S, S, G2/M).

      We have renumbered the supplementary figures after incorporating these two additional supplementary figures into the manuscript.

      Aging, growth rate, and bet-hedging:

      The mention of bet-hedging reminded me of Levy et al (PMID 22589700), where they saw that Tsl1 expression changed as cells aged and that this impacted a cell's ability to survive heat stress. This bet-hedging strategy meant that the older, slower-growing cells were more likely to survive, so I wondered a couple of things. It is possible from single-cell data to identify either an aging, or a growth rate signature? A number of papers from David Botstein's group culminated in a paper that showed that they could use a gene expression signature to predict instantaneous growth rate (PMID 19119411) and I wondered if a) this is possible from single-cell data, and b) whether in the slower growing cells, they see markers of aging, whether these two signatures might impact the ability to detect eQTLs, and if they are detected, whether they could in some way be accounted for to improve detection.

      As mentioned in our preliminary response, we are not sure how to look for gene expression signatures of aging in yeast scRNA-seq data. We believe that the proposed analyses are beyond the scope of the current paper. As noted above, we have made all the data available so that anyone interested can explore these hypotheses.

      AIL vs. F2 segregants:

      I'm curious if the authors have given thought to the trade-offs of developing advanced intercross lines for scRNA-Seq eQTL analysis. My impression is that AIL provides better mapping resolution, but at the expense of having to generate the lines. It might be useful to see some discussion on that.

      We thank the reviewer for the comments. We believe that a discussion of trade-offs between different approaches for constructing mapping populations, such as AIL and F2 segregants, is beyond the scope of this paper.

      10x vs SPLit-Seq

      10x is a well established, but fairly expensive approach for scRNA-Seq - I wondered how the cost of the 10x approach compares to the previously used approach of genotyping segregants and performing bulk RNA-Seq, and how those costs would change if one used SPLiT-Seq (see PMID 38282330).

      We thank the reviewer for the comments. We believe that a discussion of cost trade-offs between 10x and other approaches is beyond the scope of this paper, especially given the rapidly evolving costs of different technologies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Throughout the results section the authors point to File S1 for additional information. This file is a tarball with about 20 Excel documents in it, each with several sheets embedded. The authors should provide a detailed README describing how to understand the organizations of the files in File S1 and the many embedded sheets in each file. Statements made in the manuscript about File S1 should explicitly direct the reader to a specific spreadsheet and table to refer to.

      We have added an additional README file to the tarball that explains the organization of File S1 and describes the data contained in each sheet. Throughout the text, we now reference specific spreadsheets to assist the reader. In addition, these spreadsheets have been added to a github repository https://github.com/theboocock/finemapping_spreadsheets_single_cell

      Neither of the two GitHub repositories referenced under "Code availability" has adequate documentation that would allow a reader to try and reproduce the analyses presented here. The one entitled https://github.com/joshsbloom/single_cell_eQTL has no functional README, while https://github.com/theboocock/yeast_single_cell_post_analysis is somewhat better but still hard to navigate. Basic information on expected inputs, file formats, file organization, output types, and formats, etc. is required to get any of these pipelines to run and should be provided at a minimum.

      We thank the reviewer for the comment. In response, we have refactored both GitHub repositories and added extensive documentation to improve usability. We updated the versions of software and packages, this has been reflected in the methods section.

      S. cerevisiae strains are preferentially diploid in nature and many genes involved in the mating pathway are differentially regulated in diploids vs haploids. Have the authors explored the fitness effects of the GPA1 82R allele in diploids? What is the dominance relationship between 82W and 82R?

      We thank the reviewer for the comment. In diploid yeast, the mating pathway is repressed, and thus we would not expect there to be any fitness consequences due to the presence of different alleles of GPA1.

      The diploid expression profiling (page 5 and Table S9) doesn't implicate GPA1; can you the authors comment on this in light of their finding in haploids?

      The mating pathway, including GPA1, is repressed in diploids, and hence the expression of GPA1 cannot be studied in these strains (PMID: 3113739). In addition, allele-specific expression differences only identify cis-regulatory effects. We know that the GPA1 variant results in a protein-coding change, which may or may not influence the levels of mRNA in cis, so that even if GPA1 were expressed in diploids, there would be no expectation of an allele-specific difference in expression.

      With respect to the candidate CYR1 QTL -- note that strains with compromised Cyr1 function also generally show increased sporulation rates and/or sporulation in rich media conditions (cAMP-PKA signaling represses sporulation). Is this the case in diploids with the CBS2888 allele at CYR1? If the CBS2888 allele is a CYR1 defect one might expect reduced cAMP levels. It is possible to estimate adenylate cyclase levels using a fairly straightforward ELISA assay. This would provide more convincing evidence of the causal mechanism of the alleles identified.

      We thank the reviewer for the comment, and we agree that a functional study of the CYR1 alleles would provide more convincing evidence for the causal mechanism of the connection between cell cycle occupancy, cAMP levels, and growth. However, we believe that the proposed experiments are beyond the scope of our current study. The evidence we provide is sufficient to establish that CYR1 is a strong candidate gene for the eQTL hotspot.

      Re: CYR1 candidate QTL -- The authors should reference the work of [Patrick Van Dijck] (https://pubmed.ncbi.nlm.nih.gov/?sort=date&term=Van+Dijck+P&cauthor_id= 20924200) and [Johan M Thevelein] (https://pubmed.ncbi.nlm.nih.gov/?sort=date&term=Thevelein+JM&cauth or_id=20924200) on CYR1 allelic variation, and other papers besides the Matsumoto/ Ishikawa papers, as the effects of cAMP-PKA signaling on stress can be quite variable. cAMP pathway variants, including in CYR1, have popped up in quite a few other yeast QTL mapping and experimental evolution papers. These should be referenced as well.

      We thank the reviewer for these references; we have added a comment about the relationship between stress tolerance and CYR1 variation, and cited the relevant references accordingly.

      Figure S10 - the subfigure showing the frequency of the GPA 82R compared to 82W suggests a fairly large and deleterious fitness effect of this allele; on the order of 7-8% fewer cells per cell cycle stage than the 82W allele. Can the authors reconcile this with the more modest growth rate effect they report on page 8?

      Figure S12C displays the allele frequency of the 82R allele across the cell cycle in the single-cell data from allele-replacement strains. These strains were grown separately and processed using two individual 10x chromium runs. The resulting sequenced library had 11,695 cells with the 82R allele and 14,894 cells with the 82W allele. The 7-8% difference in the number of cells is due to slight differences in the number of captured cells per run, not due to growth differences, because we attempted to pool cells in equal numbers from separate mid-log cultures.

      The proportion of cells in G1 increases by ~3% in strains with the 82R allele relative to the baseline proportion of cells in the experiment, which, to the reviewers point, is still larger than the ~1% growth difference we observed. Cell cycle occupancy is a compositional phenotype. As shown in figure S12C, the 82R variant increases the fraction of cells in G1 and slightly decreases the fraction of cells in M/G1. There is no obvious expectation for quantitatively translating a change in cell cycle occupancy to a change in growth rate.

      The authors refer to the Lang et al. 2009 paper w/respect to GPA1 variant S469I but that paper seems to have explored a different GPA1 allele, GPA1-G1406T, with respect to growth rates.

      We thank the reviewer for their comment. The S469I variant is the same as the G1406T variant, one denoting the amino acid change at position 469 in the protein and the other denoting the corresponding nucleotide change at position 1406 in the DNA coding sequence. We have altered the text to make this clear to the reader.

      Reviewer #2 (Recommendations For The Authors):

      I make no recommendations as to additional work for the authors. The manuscript is complete. I suggested some things I would like to see in my review, but it's up to them to decide whether they think any of those would further enhance the manuscript.

      However, I do have I have some pedantic formatting notes:

      - Microliters are variously presented as uL, ul, and µl - it should be µL

      - Similarly, milliliters are presented as ml and ML - it should be mL

      - Also, there should be a space between the number and the unit, e.g. 10 µL

      - Some gene names in the manuscript are not italicized in all instances, e.g., GPA1

      We thank the reviewer for these formatting suggestions, we have made these changes throughout the text.

    1. Reviewer #2 (Public review):

      Summary:

      This study takes advantage of multiple methodological advances to perform layer-specific staining of cortical neurons and tracking of their axons to identify the pattern of their projections. This publication offers a mesoscale view of the projection patterns of neurons in the whisker primary and secondary somatosensory cortex. The authors report that, consistent with the literature, the pattern of projection is highly different across cortical layers and subtype, with targets being located around the whole brain. This was tested across 6 different mouse types that expressed a marker in layer 2/3, layer 4, layers 5 (3 sub-types) and layer 6.

      Looking more closely to the projections from primary somatosensory cortex into the primary motor cortex, they found that there was a significant spatial clustering of projections from topographically separated neurons across the primary somatosensory cortex. This was true for neurons with cell bodies located across all tested layers/types.

      Strengths:

      This study successfully looks at the relevant scale to study projection patterns, which is the whole brain. This is acheived thanks to an ambitious combination of mouse lines, immuno-histochemistry, imaging and image processing, which results in a standardized histological pipeline that processes the whole-brain projection patterns of layer-selected neurons of the primary and secondary somatosensory cortex.<br /> This standardization means that comparisons between cell-types projection patterns are possible and that both the large scale structure of the pattern and the minute details of the intra-areas pattern are available.<br /> This reference dataset and the corresponding analysis code are made available to the research community.

      Weaknesses:

      One major question raised by this dataset is the risk of missing axons during the post-processing step. Following the previous review round, my concerns have been addressed regarding this point.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This is a fine paper that serves the purpose to show that the use of light sheet imaging may be used to provide whole brain imaging of axonal projections. The data provided suggest that at this point the technique provides lower resolution than with other techniques. Nonetheless, the technique does provide useful, if not novel, information about particular brain systems. 

      Strengths: 

      The manuscript is well written. In the introduction a clear description of the functional organization of the barrel cortex is provided provides the context for applying the use of specific Cre-driver lines to map the projections of the main cortical projection types using whole brain neuroanatomical tracing techniques. The results provided are also well written, with sufficient detail describing the specifics of how techniques were used to obtain relevant data. Appropriate controls were done, including the identification of whisker fields for viral injections and determination of the laminar pattern of Cre expression. The mapping of the data provides a good way to visualize low resolution patterns of projections. 

      Weaknesses: 

      (1) The results provided are, as stated in the discussion, "largely in agreement with previously reported studies of the major projection targets". However it must be stated that the study does not "extend current knowledge through the high sensitivity for detecting sparse axons, the high specificity of labeling of genetically defined classes of neurons and the brain wide analysis for assigning axons to detailed brain regions" which have all been published in numerous other studies. ( the allen connectivity project and related papers, along with others). If anything the labeling of axons obtained with light sheet imaging in this study does not provide as detailed mapping obtained with other techniques. Some detail is provided of how the raw images are processed to resolve labeled axons, but the images shown in the figures do not demonstrate how well individual axons may be resolved, of particular interest would be to see labeling in terminal areas such as other cortical areas, striatum and thalamus. As presented the light sheet imaging appears to be rather low resolution compared to the many studies that have used viral tracing to look at cortical projections from genetically identified cortical neurons. 

      We agree with the reviewer that the resolution of imaging should be further improved in future studies, as also mentioned in the original manuscript. On P. 17 of the revised manuscript we write “Probably most important for future studies is the need to increase the light-sheet imaging resolution perhaps combined with the use of expansion microscopy to provide brain-wide micron-resolution data (Glaser et al., 2023; Wassie et al., 2019).” However, even at somewhat lower resolution, through bright sparse labelling, individual axonal segments can nonetheless be traced through machine learning to define axonal skeletons, whose length can be quantified as we do in this study. This methodology highlights sparse wS1 and wS2 innervation of a large number of brain areas, some of which are not typically considered, and our anatomical results might therefore help the neuronal circuit analysis underlying various aspects of whisker sensorimotor processing. Despite impressive large-scale projection mapping projects such as the Allen connectivity atlas, there remains relatively sparse cell typespecific projection map data for the representations of the large posterior whiskers in wS1 and wS2, and our data in this study thus adds to a growing body of cell-type specific projection mapping with the specific focus on the output connectivity of these whisker-related neocortical regions of sensory cortex.

      In the revised manuscript, we now provide an additional supplementary figure (Figure 1 – figure supplement 2) showing examples of the axonal segmentation from further additional image planes including branching axons in the key innervation regions mentioned by the reviewer, namely “other cortical areas, striatum and thalamus”.

      (2) Amongst the limitations of this study is the inability to resolve axons of passage and terminal fields. This has been done in other studies with viral constructs labeling synaptophysin. This should be mentioned. 

      The reviewer brings up another important point for future methodological improvements to enhance connectivity mapping. Indeed, we already mentioned this in our original submission near the end of the first paragraph under the Limitations and future perspectives section. In the revised manuscript on P. 17, we write “Future studies should also aim to identify neurotransmitter release sites along the axon, which could be achieved by fluorescent labeling of prominent synaptic components, such as synaptophysin-GFP (Li et al., 2010).”

      (3) There is no quantitative analysis of differences between the genetically defined neurons projecting to the striatum, what is the relative area innervated by, density of terminals, other measures. 

      The reviewer raises an interesting question, and in the revised manuscript, we now present a more detailed analysis of cell class-specific axonal projections focusing specifically on the striatum. Following the reviewer’s suggestion, in a new supplementary figure (Figure 7 – figure supplement 1), we now report spatial axonal density maps in the striatum from SSp-bfd and SSs, finding potentially interesting differences comparing the projections of Rasgrf2-L2/3, Scnn1a-L4 and Tlx3-L5IT neurons. On P. 12 of the revised manuscript, we now write “We also investigated the spatial innervation pattern of Rasgrf2-L2/3, Scnn1a-L4 and Tlx3-L5IT neurons in the striatum (Figure 7 – figure supplement 1), where we found that axonal density from Rasgrf2-L2/3 neurons in both SSp-bfd and SSs was concentrated in a posterior dorsolateral part of the ipsilateral striatum, whereas Tlx3-L5IT neurons had extensive axonal density across a much larger region of the striatum, including bilateral innervation by SSp-bfd neurons. Striatal innervation by Scnn1a-L4 neurons was intermediate between Rasgrf2-L2/3 and Tlx3-L5IT neurons.” We think the reviewer’s comment has helped reveal further interesting aspects of our data set, and we thank the reviewer.

      (4) Figure 5 is an example of the type of large sets of data that can be generated with whole brain mapping and registration to the Allen CCF that provides information of questionable value. Ordering the 50 plus structures by the density of labeling does not provide much in terms of relative input to different types of areas. There are multiple subregions for different functional types ( ie, different visual areas and different motor subregions are scattered not grouped together. Makes it difficult to understand any organizing principles.

      We agree with the reviewer, and fully support the importance of considering subregions within the relatively coarse compartmentalization of the current Allen CCF. In order to provide some further information about connectivity that may help give the reader further insights into the data, we have now added further quantification of cortex-specific axonal density ranked according to functional subregions in a new supplementary figure (Figure 5 – figure supplement 2). 

      (5) The GENSAT Cre driver lines used must have the specific line name used, not just the gene name as the GENSAT BAC-Cre lines had multiple lines for each gene and often with very different expression patterns. Rbp4_KL100, Tlx3_PL56, Sim1_KJ18, Ntsr1_ GN220. 

      In the revised manuscript, we now write out a fuller description of the mouse lines the first time they are mentioned in the Results section on P. 7. The full mouse line names, accession numbers and references were of course already described in the methods section, which remains the case in the revised manuscript.

      Reviewer #2 (Public Review): 

      Summary: 

      This study takes advantage of multiple methodological advances to perform layer-specific staining of cortical neurons and tracking of their axons to identify the pattern of their projections. This publication offers a mesoscale view of the projection patterns of neurons in the whisker primary and secondary somatosensory cortex. The authors report that, consistent with the literature, the pattern of projection is highly different across cortical layers and subtype, with targets being located around the whole brain. This was tested across 6 different mouse types that expressed a marker in layer 2/3, layer 4, layer 5 (3 sub-types) and layer 6.  Looking more closely at the projections from primary somatosensory cortex into the primary motor cortex, they found that there was a significant spatial clustering of projections from topographically separated neurons across the primary somatosensory cortex. This was true for neurons with cell bodies located across all tested layers/types. 

      Strengths: 

      This study successfully looks at the relevant scale to study projection patterns, which is the whole brain. This is achieved thanks to an ambitious combination of mouse lines, immunohistochemistry, imaging and image processing, which results in a standardized histological pipeline that processes the whole-brain projection patterns of layer-selected neurons of the primary and secondary somatosensory cortex. 

      This standardization means that comparisons between cell-types projection patterns are possible and that both the large-scale structure of the pattern and the minute details of the intra-areas pattern are available. 

      This reference dataset and the corresponding analysis code are made available to the research community. 

      Weaknesses: 

      One major question raised by this dataset is the risk of missing axons during the postprocessing step. Indeed, it appears that the control and training efforts have focused on the risk of false positives (see Figure 1 supplementary panels). And indeed, the risk of overlooking existing axons in the raw fluorescence data id discussed in the article. 

      Based on the data reported in the article, this is more than a risk. In particular, Figure 2 shows an example Rbp4-L5 mouse where axonal spread seems massive in Hippocampus, while there is no mention of this area in the processed projection data for this mouse line. 

      In Figure 2, we show the expression of tdTomato in double-transgenic mice in which the Cre-driver lines were crossed with a Cre-dependent reporter mouse expressing cytosolic tdTomato. In addition to the specific labelling of L5PT neurons in the somatosensory cortex, Rbp4-Cre mice also express Cre-recombinase in other brain regions including the hippocampus. In the reporter mice crossed with Rbp4-Cre mice, tdTomato is expressed in neurons with cell bodies in the hippocampus which is clearly visualized in Figure 2. Because our axonal labelling is based on localized viral vector expression of tdTomato in SSp-bfd and SSs, the expression of Cre in hippocampus does not affect our analysis. In order to clarify to the reader, in the legend to Figure 2D, we now specifically write “As for panel A, but for Rbp4-L5 neurons. Note strong expression of Cre in neurons with cell bodies located in the hippocampus, which does not affect our analysis of axonal density based on virus injected locally into the neocortex.” Consistent with this observation, the Allen Institute’s ISH data support

      expression of Rbp4 in neurons of the hippocampus e.g. https://mouse.brainmap.org/gene/show/19425 and https://mouse.brainmap.org/experiment/show/68632655.

      Similarily, the Ntsr1-L6CT example shows a striking level of fluorescence in Striatum, that does not reflect in the amount of axons that are detected by the algorithms in the next figures.  These apparent discrepancies may be due to non axonal-specific fluorescence in the samples. In any case, further analysis of such anatomical areas would be useful to consolidate the valuable dataset provided by the article. 

      As pointed out above, Figure 2 shows cytosolic tdTomato fluorescence in transgenic crosses of the Cre-driver mice with Cre-dependent tdTomato reporter mice. For the Ntsr1-Cre x LSL-tdTomato mice, all corticothalamic L6CT neurons from across the entire cortex drive tdTomato expression. The axon of each neuron must traverse the striatum giving rise to fluorescence in the striatum. As discussed above, labelling of synaptic specialisations will be important in future studies to separate travelling axon from innervating axon. However, the overall impact of the axons traversing the striatum is again mitigated in our study by considering the axonal projections from local sparse infections in SSp-bfd and SSs rather than from cortex-wide tdTomato expression.

      Reviewer #3 (Public Review): 

      Summary: 

      The paper offers a systematic and rigorous description of the layer-and sublayer specific outputs of the somatosensory cortex using a modern toolbox for the analysis of brain connectivity which combines: 1) Layer-specific genetic drivers for conditional viral tracing; 2) whole brain analyses of axon tracts using tissue clearing and imaging; 3) Segmentation and quantification of axons with normalization to the number of transduced neurons; 4) registration of connectivity to a widely used anatomical reference atlas; 5) functional validation of the connectivity using optogenetic approaches in vivo. 

      Strengths: 

      Although the connectivity of the somatosensory cortex is already known, precise data are dispersed in different accounts (papers, online resources,) using different methods. So the present account has the merit of condensing this information in one very precisely documented report. It also brings new insights on the connectivity, such as the precise comparison of layer specific outputs, and of the primary and secondary somatosensory areas. It also shows a topographic organization of the circuits linking the somatosensory and motor cortices. The paper also offers a clear description of the methodology and of a rigorous approach to quantitative anatomy. 

      Weaknesses: 

      The weakness relates to the intrinsic limitations of the in toto approaches, that currently lack the precision and resolution allowing to identify single axons, axon branching or synaptic connectivity. These limitations are identified and discussed by the authors. 

      We agree with the reviewer.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      No additional comment 

      OK

      Reviewer #2 (Recommendations For The Authors): 

      In Figure 8, we don't get to see much raw data, while the diversity of functional responses pattern to the primary and supplementary S1 activations is highly intriguing (and this diversity exists as suggested by the results in Figure 8E, LRPT). 

      Can Figure 8C be less blurred? Maybe give more space to individual examples, such as an overlay of the delineations of the activated area across the tested mice? 

      Also, can we have a view on the time dynamics of the functional activation and integration window? 

      Raw data - We have now added a new supplementary figure (Figure 8 – figure supplement 1) to show data from individual mice, as well as plotting the time-course of the evoked jRGECO fluorescence signals in the frontal cortex hotspot. 

      Image blur - Each pixel represents 62.5 x 62.5 um on the cortical surface. The images in Figure 8B&C were averaged across mice, which causes some additional spatial blurring. However, the most likely explanation for the ‘blurred’ impression, is the overall large horizontal extent of the axonal innervation as well as likely rapid lateral spread of excitation both at the stimulation area and in the target region, as for example also indicated in rapid voltage-sensitive imaging experiments (Ferezou et al., 2007).  

      Reviewer #3 (Recommendations For The Authors): 

      At the time being, the abstract is really centred on the methodology which is no longer very novel as it has actually been already been described previously by other groups. In my view the paper would gain visibility, and be a useful tool for the community if amended to better point out the significant results of the study, for instance, i) the layer and sub-layer specificity of the outputs, using the listed genetic drivers; ii) the comparison of primary and secondary somatosensory areas, iii) the functional validation. The layer specificity of each cre- line should be indicated in the abstract. 

      We have tried to improve the writing of the abstract along the lines suggested by the reviewer. Specifically, we have now added layer and projection class of the various Cre-lines, and we now also highlight the most obvious differences in the innervation patterns.

      There is some degree of redundancy in the description in the result section. One suggestion, for an easier flow of reading, would be to join the paragraphs " Laminar characterization of the Cre-lines.." and: "Axonal projections...". Start for each Cre-line with a description of the laminar localisation of recombination in the somatosensory cortices, followed therefrom by the description of outputs from SSp-bfd and SSs; Then the general description/overview of the outputs can be summarized as a legend to Figure 5-supplementary 2, which could appear as a main figure. 

      Although we agree with the reviewer that there is some level of redundancy in the text, the results of the characterization of the Cre-line (Figure 2) is quite a different experiment compared to the viral injections described in other figures, and we therefore prefer to keep these sections separate.

      Other minor points: 

      In the text; Indicate the genetic background of the transgenic mouse lines. 

      On P. 18, we now indicate that all mice were “back-crossed with C57BL/6 mice”.

      Keep consistency in the designation of the areas, S1 appears sometimes as SSp-bfd or as SSp 

      We thank the reviewer for pointing out the inconsistent nomenclature, which we have now corrected in the revised manuscript. ‘SSp’ remains used on P. 9 and P. 16 of the revised manuscript to indicate a region including SSp-bfd but also extending beyond.

      Figure 1 supplement 2 is not really necessary to show (as the viral tools have previously been validated) can just be stated in the text. Conversely one would like to see a higher resolution image of the injection sites that allowed to do the cell counts used for normalization, as this can be pretty tricky. 

      In response to the reviewer’s suggestion, we have now added a new supplemental figure to show an example of how cells in the injection site were counted (Figure 1 – figure supplement 3).

      Figure 2: the most important here is the higher magnification to show the precise laminar localisation of the recombination, rather than the atlas landmarks that is already shown in Figure 1. This would allow more space for clearer higher magnification panels comparing SSs and SSp. The present image hints to some real differences, but difficult to appreciate with the current resolution. The legend should also comment on the labelling seen in layer 1, in the Tlx2 and Rbp4 lines. Could be dendritic labelling, but this needs a word of clarification.

      We think both the overview images as well as the high-resolution images are of value to the reader. Following the reviewer’s comment, in the legends to Figure 2C&D, we have now added text suggesting that the layer 1 fluorescence is likely axonal or dendritic in origin : “Labelling in layer 1 is likely of axonal or dendritic origin, and no cell bodies were labelled in this layer.” In addition, we have added a new supplemental figure which shows the cortical labelling in SSp and SSS in a more magnified view (Figure 2 – figure supplement 1).

      Figure 3: the comparison of the 3 transgenic lines labelling layer 5 and showing sublaminar identities is really interesting in showing the heterogeneity of this layer and possible regional differences. However, the cases shown for illustration for Rbp4 and Tlx3 seem pretty massive in comparison with the other drivers. Maybe cases with smaller injections could be chosen for illustration. 

      Figure 3 shows grand average axonal density maps across different mice normalized to the number of neurons in the injection site. The large amount of axon per neuron observed in Rbp4 and Tlx3 mice therefore shows their long, wide-ranging axons compared to other neuronal classes.

      Figure 6A could be a supplementary figure in my view; 6B is clearer. 

      We think both representations are useful, and we think different readers might better appreciate either of the two analyses.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work is potentially useful because it has generated a mineable yield of new candidate immune inhibitory receptors, which can serve both as drug targets and as subjects for further biological investigation. It is noted however that the argument of the work is rather incomplete, in that it does very little to validate the putative new receptors, and merely makes a study of their putative distribution across cell types. Experimental follow-up to demonstrate the claimed properties for the proteins identified, or mining existing experimental data sources on gene expression across tissues to at least show that the pipeline correctly identified genes likely to be specific to immune cells (or something along these lines), would make this work more complete and compelling. 

      We thank the editors for their critical reading and assessment of our manuscript. We acknowledge that the present study is limited by a lack of experimental follow-up. However, we purposely chose to make this pipeline of putative novel inhibitory receptors public at this early stage for our work to be a starting point for further functional investigation of these targets by the scientific community.   

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript proposes a new bioinformatics approach identifying several hundreds of previously unknown inhibitory immunoreceptors. When expressed in immune cells (such as neutrophils, monocytes, CD8+, CD4+, and T-cells), such receptors inhibit the functional activity of these cells. Blocking inhibitory receptors represents a promising therapeutic strategy for cancer treatment.

      As such, this is a high-quality and important bioinformatics study. One general concern is the absence of direct experimental validation of the results. In addition to the fact that the authors bioinformatically identified 51 known receptors, providing such experimental evaluation (of at least one, or better few identified receptors) would, in my opinion, significantly strengthen the presented evidence.

      I will now briefly summarize the results and give my comments.

      First, using sequence comparison analysis, the authors identify a large set of putative receptors based on the presence of immunoreceptor tyrosine-based inhibitory motifs (ITIMs), or immunoreceptor tyrosinebased switch motifs (ITSMs). They further filter the identified set of receptors for the presence of the ITIMs or ITSMs in an intracellular domain of the protein. Second, using AlphaFold structure modeling, the authors select only receptors containing ITIMs/ITSMs in structurally disordered regions. Third, the evaluation of gene expression profiles of known and putative receptors in several immune cell types was performed. Fourth, the authors classified putative receptors into functional categories, such as negative feedback receptors, threshold receptors, threshold disinhibition, and threshold-negative feedback. The latter classification was based on the available data from Nat Rev Immunol 2020. Fifth, using publicly available single-cell RNA sequencing data of tumor-infiltrating CD4+ and CD8+ cells from nearly twenty types of cancer, the authors demonstrate that a significant fraction of putative receptors are indeed expressed in these datasets.

      In summary, in my opinion, this is an interesting, important, high-quality bioinformatics work. The manuscript is clearly written and all technical details are carefully explained.

      One comment/suggestion regarding the methodology of evaluating gene expression profiles of putative receptors: perhaps it might be important to look at clusters of genes that are co-expressed with putative inhibitory receptors. 

      We thank the reviewer for their comments and suggestions.  We acknowledge that looking at co-expressed genes and subsequently at gene ontology enrichment could be an interesting approach to prioritize the inhibitory receptors. However, since there are many ways to approach the results of the gene coexpression networks, which also depend on the cell type and activation status of interest, we have chosen to discuss the implications of these networks in the discussion with the following paragraph, rather than reporting all these different approaches in the paper:

      “To further prioritize inhibitory receptors in immune cell subsets or diseases of interest, gene coexpression networks of putative inhibitory receptors could be assessed. On the one hand, the cooccurrence of putative inhibitory receptors with known inhibitory receptors within a module could be one approach, while on the other hand the presence of putative inhibitory receptors in a different module could suggest novel regulation of different biological functions than the known receptors. The location of the putative inhibitory receptors in the network could also change depending on the cell type and the activation status of the cell. Additionally, one could look at the co-expression of candidates with other genes within a gene module to look at potential biological function, and at co-expression with signalling molecules known to interact with inhibitory receptors, such as Csk, SHP-1, SHP-2 and SHIP1, although their regulation might be more post-translationally regulated rather than at mRNA level.”

      Reviewer #2 (Public Review):

      Summary:

      The authors developed a bioinformatic pipeline to aid the screening and identification of inhibitory receptors suitable as drug targets. The challenge lies in the large search space and lack of tools for assessing the likelihood of their inhibitory function. To make progress, the authors used a consensus protein membrane topology and sequence motif prediction tool (TOPCOS) combined with both a statistical measure assessing their likelihood function and a machine learning protein structural prediction model (AlphaFold) to greatly cut down the search space. After obtaining a manageable set of 398 high-confidence known and putative inhibitory receptors through this pipeline, the authors then mapped these receptors to different functional categories across different cell types based on their expression both in the resting and activated state. Additionally, by using publicly available pan-cancer scRNA-seq for tumor-infiltrating T-cell data, they showed that these receptors are expressed across various cellular subsets.

      Strengths:

      The authors presented sound arguments motivating the need to efficiently screen inhibitory receptors and to identify those that are functional. Key components of the algorithm were presented along with solid justification for why they addressed challenges faced by existing approaches. To name a few:

      • TOPCON algorithm was elected to optimize the prediction of membrane topology.

      • A statistical measure was used to remove potential false positives.

      • AlphaFold is used to filter out putative receptors that are low confidence (and likely intrinsically disordered).

      To examine receptors screened through this pipeline through a functional lens, the authors proposed to look at their expression of various immune cell subsets to assign functional categories. This is a reasonable and appropriate first step for interpreting and understanding how potential drug targets are differentially expressed in some disease contexts.

      Weaknesses:

      The paper has strength in the pipeline they presented, but the weakness, in my opinion, lies in the lack of concrete demonstration on how this pipeline can be used to at least "rediscover" known targets in a

      disease-specific manner. For example, the result that both known and putative immune inhibitory receptors are expressed across a wide variety of tumor-infiltrating T-cell subsets is reassuring, but this would have been more informative and illustrative if the authors could demonstrate using a disease with known targets, as opposed to a pan-cancer context. Additionally, a discussion that contrasts the known and putative receptors in the context above would help readers better identify use cases suitable for their research using this pipeline. Particularly,

      • For known receptors, does the pipeline and the expression analysis above rediscover the known target in the disease of interest?

      • For putative receptors, what do the functional category mapping and the differential expression across various tumor-infiltrating T-cell subsets imply on a potential therapeutic target?

      We thank the reviewer for their assessment and comments. The primary purpose of the bioinformatics pipeline was to identify putative inhibitory receptors in a disease-agnostic manner and allow the scientific community to further explore targets in their specific diseases of interest. We performed our pan-cancer expression analysis as a preliminary proof of concept and agree that exploring targets in specific diseases, cancer or otherwise, could be more informative. To validate that we rediscovered known immunotherapeutic targets, we analyzed the expression of known inhibitory receptors on tumorinfiltrating T cells of melanoma patients using the same dataset as figure 3. We find high expression of known therapeutic targets, such as PD-1, in addition to other known inhibitory receptors that are being targeted in clinical trials, one of which being TIGIT. We have added this information to the results section and added the corresponding graph as supplementary figure 5. 

      For the putative inhibitory receptors, we believe the functional categorization can assist in selecting targets that are more likely to be successful in a therapeutic context. As we previously proposed in our perspective on functional categorization of inhibitory receptors (Rumpret et al., Nat Imm, 2020), it might be beneficial to target inhibitory receptors of different functional categories in cancer immunotherapy. Targeting a threshold receptor to lower the threshold for activation and a negative feedback receptor to lengthen and strengthen the cellular response might therefore be more effective than targeting two receptors of a single functional category. Even though we realize RNA sequencing data of in vitro stimulated immune cells is not identical to data from TILs, we have tried to characterize the functional categories expressed by TILs by extrapolating the defined functional categorization per gene from figure 2, and added the corresponding graphs as supplementary figure 4. This shows that mainly threshold receptors and some (threshold-)negative feedback receptors are expressed by the different T cell subsets, which would open the possibility of using the proposed therapeutic strategy of targeting different functional categories. However, we acknowledge that this will require further validation of expression patterns in vivo in different cancers and immune cell subsets. 

      Reviewer #1 (Recommendations For The Authors):

      One comment/suggestion regarding the methodology of evaluating gene expression profiles of putative receptors: perhaps it might be important to look at clusters of genes that are co-expressed with putative inhibitory receptors.

      See our reply to the suggestion above.

      Reviewer #2 (Recommendations For The Authors):

      Results section

      (a) "Putative ITIM/ITSM-bearing immune inhibitory receptors can be found in the human genome"

      i. Figure 1 could benefit from additional labeling. For example, in B, the grey line indicates 5%, etc. Additionally, in panel B&C, I assume by "predicted" the author meant using TOPCONS?

      ii. Figure 1B doesn't seem to be consistent with this sentence "However, for 10 out of 51, we observed ITIM/ITSM sequences in the permutated sequence up to ~25% of the time" [page 2, line 1-3], as all 51 data points in Figure 1B (under "Known" panel) are below the 0.25 horizontal line?

      i. We have adjusted the figure legend to better indicate the information provided in the figures. The predicted genes are all unknown transmembrane candidates that contain an ITIM or ITSM in their intracellular domain, as determined using TOPCONS.

      ii. Due to the nature of permutation testing, there is some variation in the individual likelihood values for each protein sequence. However, as they were generally below 0.25 in any given iteration, we decided to define this value as a threshold for inclusion. 

      (b) "AlphaFold structure predictions can assist in identifying likely functional ITIM/ITSMs"

      i. Readability would increase if the author indicate how pLDDT score is computed and in what range is it (between 0 and 100.)

      ii. Third paragraph. Can the author comment on why 80 pLDDT is chosen as the cutoff? The first sentence of this paragraph states "We found that 99 out of 101 ITIM/ITSMs of the 51 known receptors had low confidence score, i.e., less than 80 pLDDT, with an average confidence score of 49.3 pLDDT..." However, it was later stated in the Discussion, page 10, starting Line 11 "We determined a threshold of 80 pLDDT based on the average prediction scores of the ITIM/ITSMs in known inhibitory receptors....". If 99 out of 101 ITIM/ITSMs had pLDDT<80, then it seems strange that the average of the 101 is at 80pLDDT, even in the extreme where the remaining 101-99=2 ITIM/ITSMs attain the maximum pLDDT score at 100, unless the distribution of those 99 is narrowly centered around 80? A distribution of the pLDDT would help clarify.

      i. The pLDDT scores are computed by AlphaFold as a way to determine how well a specific residue and/or region is expected to be modelled in three-dimensional space. We now refer to the corresponding AlphaFold publications and references therein to clarify this (10.1093/nar/gkab1061, 10.1038/s41586021-03819-2, 10.1093/bioinformatics/btt473). We also have now included the range (i.e., 0-100) in the text.

      ii. The threshold of 80 pLDDT was chosen as this still encompasses all known inhibitory receptors and was not calculated based on an average of the prediction scores. In this way, we still included ITIM/ITSMs with a relatively high pLDDT, such as those observed in PD-1 and LAIR-1. The previous text ‘average prediction scores of the ITIM/ITSMs in known inhibitory receptors’ referred to the averaging of the confidence score for each of the six amino acids encompassing the ITIM/ITSM into one overall score per ITIM/ITSM. We have adjusted the text to better reflect this.

      (c) "Putative inhibitory receptors are expressed across immune cell subsets"

      Figure S2, the last sentence in the caption (relevant for panel C) states "Cell subsets without uniquely expressed putative inhibitory receptors i.e., B cells and T cell, are excluded from the panel for clarity", but B cells and T cells are present in panel C?

      Indeed, but they are only included for the cases where the cell subsets share receptor expression with other immune cell subsets. The B and T cells do not express any unique putative multi-spanning receptors, all receptors are shared with at least one other immune cell subset. 

      (d) "Known and putative inhibitory receptors are expressed on tumour infiltrating T cells"

      i. Missing panel C label in Figure 3 and S3.

      ii. By comparing Figure 3 and S3, it looks to me that there's not a big difference between single-spanning and multi-spanning inhibitory receptors. I wonder if the authors can comment or speculate on this similarity in addition to differences of expression among T-cell subsets. Would the similarities and differences above be explained by cancer type?

      i. Figure 3 and S3 do not contain a panel C, but panel B consists of a lower (CD8+) and an upper (CD4+) subpanel, we have more clearly indicated this in the figure legend in the revised manuscript. 

      ii. While some T cell subsets, such as exhausted CD8+ T cells and CD4+ regulatory T cells, appear to not differ much in their expression of either single- or multi-spanning receptors, we do observe that, for example, effector memory CD4+ T cells or EMRA CD8+ T cells express single-spanning inhibitory receptors to a higher extent than multi-spanning inhibitory receptors. It is possible that these differences and similarities reflect some of the roles multi-spanning inhibitory receptors could play in regulating immune cells, for example in response to chemokines, as many chemokine receptors are multi-spanning proteins. 

      Data and Code availability

      Although the Methods section provides some context for the computational analysis and citations for relevant data, software availability and a data availability statement are lacking.

      We have included a data availability statement to the data files and code in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive reviews.  Taken together, the comments and suggestions from reviewers made it clear that we needed to focus on improving the clarity of the methods and results.  We have revised the manuscript with that in mind.  In particular, we have restructured the results to make the logic of the manuscript clearer and we have added details to the methods section.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The work of Muller and colleagues concerns the question of where we place our feet when passing uneven terrain, in particular how we trade-off path length against the steepness of each single step. The authors find that paths are chosen that are consistently less steep and deviate from the straight line more than an average random path, suggesting that participants indeed trade-off steepness for path length. They show that this might be related to biomechanical properties, specifically the leg length of the walkers. In addition, they show using a neural network model that participants could choose the footholds based on their sensory (visual) information about depth. 

      Strengths: 

      The work is a natural continuation of some of the researchers' earlier work that related the immediately following steps to gaze [17]. Methodologically, the work is very impressive and presents a further step forward towards understanding real-world locomotion and its interaction with sampling visual information. While some of the results may seem somewhat trivial in hindsight (as always in this kind of study), I still think this is a very important approach to understanding locomotion in the wild better. 

      Weaknesses: 

      The manuscript as it stands has several issues with the reporting of the results and the statistics. In particular, it is hard to assess the inter-individual variability, as some of the data are aggregated across individuals, while in other cases only central tendencies (means or medians) are reported without providing measures of variability; this is critical, in particular as N=9 is a rather small sample size. It would also be helpful to see the actual data for some of the information merely described in the text (e.g., the dependence of \Delta H on path length). When reporting statistical analyses, test statistics and degrees of freedom should be given (or other variants that unambiguously describe the analysis).

      There is only one figure (Figure 6) that shows data pooled over subjects and this is simply to illustrate how the random paths were calculated. The actual paths generated used individual subject data. We don’t draw our conclusions from these histograms – they are instead used to generate bounds for the simulated paths.  We have made clear both in the text and in the figure legends when we have plotted an example subject. Other plots show the individual subject data. We have given the range of subject medians as well as the standard deviation for data illustrated in Figure (random vs chosen), we have also given the details of the statistical test comparing the flatness of the chosen paths versus the randomly generated paths.  We have added two supplemental figures to show individual walker data more directly: (Fig. 14) the per subject histograms of step parameters, (Fig. 18) the individual subject distributions for straight path slopes and tortuosity.

      The CNN analysis chosen to link the step data to visual sampling (gaze and depth features) should be motivated more clearly, and it should describe how training and test sets were generated and separated for this analysis.

      We have motivated the CNN analysis and moved it earlier in the manuscript to help clarify the logic the manuscript. Details of the training and test are now provided, and the data have been replotted. The values are a little different from the original plot after making a correction in the code, but the conclusions drawn from this analysis are unchanged. This analysis simply shows that there is information in the depth images from the subject’s perspective that a network can use to learn likely footholds. This motivates the subsequent analysis of path flatness.

      There are also some parts of figures, where it is unclear what is shown or where units are missing. The details are listed in the private review section, as I believe that all of these issues can be fixed in principle without additional experiments. 

      Several of the Figures have been replotted to fix these issues.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript examines how humans walk over uneven terrain using vision to decide where to step. There is a huge lack of evidence about this because the vast majority of locomotion studies have focused on steady, well-controlled conditions, and not on decisions made in the real world. The author team has already made great advances in this topic, but there has been no practical way to map 3D terrain features in naturalistic environments. They have now developed a way to integrate such measurements along with gaze and step tracking, which allows quantitative evaluation of the proposed trade-offs between stepping vertically onto vs. stepping around obstacles, along with how far people look to decide where to step. 

      Strengths: 

      (1) I am impressed by the overarching outlook of the researchers. They seek to understand human decision-making in real-world locomotion tasks, a topic of obvious relevance to the human condition but not often examined in research. The field has been biased toward well-controlled studies, which have scientific advantages but also serious limitations. A well-controlled study may eliminate human decisions and favor steady or periodic motions in laboratory conditions that facilitate reliable and repeatable data collection. The present study discards all of these usually-favorable factors for rather uncontrolled conditions, yet still finds a way to explore real-world behaviors in a quantitative manner. It is an ambitious and forward-thinking approach, used to tackle an ecologically relevant question. 

      (2) There are serious technical challenges to a study of this kind. It is true that there are existing solutions for motion tracking, eye tracking, and most recently, 3D terrain mapping. However most of the solutions do not have turn-key simplicity and require significant technical expertise. To integrate multiple such solutions together is even more challenging. The authors are to be commended on the technical integration here.

      (3) In the absence of prior studies on this issue, it was necessary to invent new analysis methods to go with the new experimental measures. This is non-trivial and places an added burden on the authors to communicate the new methods. It's harder to be at the forefront in the choice of topic, technical experimental techniques, and analysis methods all at once. 

      Weaknesses: 

      (1) I am predisposed to agree with all of the major conclusions, which seem reasonable and likely to be correct. Ignoring that bias, I was confused by much of the analysis. There is an argument that the chosen paths were not random, based on a comparison of probability distributions that I could not understand. There are plots described as "turn probability vs. X" where the axes are unlabeled and the data range above 1. I hope the authors can provide a clearer description to support the findings. This manuscript stands to be cited well as THE evidence for looking ahead to plan steps, but that is only meaningful if others can understand (and ultimately replicate) the evidence. 

      We have rewritten the manuscript with the goal of clarifying the analyses, and we have re-labelled the offending figure.

      (2) I wish a bit more and simpler data could be provided. It is great that step parameter distributions are shown, but I am left wondering how this compares to level walking.  The distributions also seem to use absolute values for slope and direction, for understandable reasons, but that also probably skews the actual distribution. Presumably, there should be (and is) a peak at zero slope and zero direction, but absolute values mean that non-zero steps may appear approximately doubled in frequency, compared to separate positive and negative. I would hope to see actual distributions, which moreover are likely not independent and probably have a covariance structure. The covariance might help with the argument that steps are not random, and might even be an easy way to suggest the trade-off between turning and stepping vertically. This is not to disregard the present use of absolute values but to suggest some basic summary of the data before taking that step. 

      We have replotted the step parameter distributions without absolute values. Unfortunately, the covariation of step parameters (step direction and step slope) is unlikely to help establish this tradeoff.  Note that the primary conclusion of the manuscript is that works make turns to keep step slope low (when possible). Thus, any correlation that might exist between goal direction and step slope would be difficult to interpret without a direct comparison to possible alternative paths (as we have done in this paper). As such we do not draw our conclusions from them.  We use them primarily to generate plausible random paths for comparison with the chosen paths.  We have added two supplementary figures including distributions (Fig 15) and covariation of all the step parameters discussed in the methods (Fig 16).

      (3) Along these same lines, the manuscript could do more to enable others to digest and go further with the approach, and to facilitate interpretability of results. I like the use of a neural network to demonstrate the predictiveness of stepping, but aside from above-chance probability, what else can inform us about what visual data drives that?

      The CNN analysis simply shows that the information is there in the image from the subject’s viewpoint and is used to motivate the subsequent analysis.  As noted above, we have generally tried to improve the clarity of the methods.

      Similarly, the step distributions and height-turn trade-off curves are somewhat opaque and do not make it easy to envision further efforts by others, for example, people who want to model locomotion. For that, clearer (and perhaps) simpler measures would be helpful. 

      We have clarified the description of these plots in the main text and in the methods.  We have also tried to clarify why we made the choices that we did in measuring the height-turn trade-off and why it is necessary in order to make a fair comparison.

      I am absolutely in support of this manuscript and expect it to have a high impact. I do feel that it could benefit from clarification of the analysis and how it supports the conclusions. 

      Reviewer #3 (Public Review): 

      Summary: 

      The systematic way in which path selection is parametrically investigated is the main contribution. 

      Strengths: 

      The authors have developed an impressive workflow to study gait and gaze in natural terrain. 

      Weaknesses: 

      (1) The training and validation data of the CNN are not explained fully making it unclear if the data tells us anything about the visual features used to guide steering. It is not clear how or on what data the network was trained (training vs. validation vs. un-peeked test data), and justification of the choices made. There is no discussion of possible overfitting. The network could be learning just e.g. specific rock arrangements. If the network is overfitting the "features" it uses could be very artefactual, pixel-level patterns and not the kinds of "features" the human reader immediately has in mind. 

      The CNN analysis has now been moved earlier in the manuscript to help clarify its significance and we have expanded the description of the methods. Briefly, it simply indicates that there is information in the depth structure of the terrain that can be learned by a network. This helps justify the subsequent analyses.  Importantly, the network training and testing sets were separated by terrain to ensure that the model was being tested on “unseen” terrain and avoid the model learning specific arrangements.  This is now clarified in the text.

      (2) The use of descriptive terminology should be made systematic. 

      Specifically, the following terms are used without giving a single, clear definition for them: path, step, step location, foot plant, foothold, future foothold, foot location, future foot location, foot position. I think some terms are being used interchangeably. I would really highly recommend a diagrammatic cartoon sketch, showing the definitions of all these terms in a single figure, and then sticking to them in the main text. 

      We have made the language more systematic and clarified the definition of each term (see Methods). Path refers to the sequence of 5 steps. Foothold is where the foot was placed in the environment. A step is the transition from one foothold to the next.

      (3) More coverage of different interpretations / less interpretation in the abstract/introduction would be prudent.  The authors discuss the path selection very much on the basis of energetic costs and gait stability. At least mention should be given to other plausible parameters the participants might be optimizing (or that indeed they may be just satisficing). That is, it is taken as "given" that energetic cost is the major driver of path selection in your task, and that the relevant perception relies on internal models. Neither of these is a priori obvious nor is it as far as I can tell shown by the data (optimizing other variables, satisficing behavior, or online "direct perception" cannot be ruled out). 

      The abstract has been substantially rewritten.  We have adjusted our language in the introduction/discussion to try to address this concern.

      Recommendations for the authors:

      Reviewing Editor comments 

      You will find a full summary of all 3 reviews below. In addition to these reviews, I'd like to highlight a few points from the discussion among reviewers. 

      All reviewers are in agreement that this study has the potential to be a fundamental study with far-reaching empirical and practical implications. The reviewers also appreciate the technical achievements of this study. 

      At the same time, all reviewers are concerned with the overall lack of clarity in how the results are presented. There are a considerable number of figures that need better labeling, text parts that require clearer definitions, and the description of data collection and analysis (esp. with regard to the CNN) requires more care. Please pay close attention to all comments related to this, as this was the main concern that all reviewers shared. 

      At a more specific level, the reviewers discussed the finding around leg length, and admittedly, found it hard to believe, in short: "extraordinary claims need strong evidence". It would be important to strengthen this analysis by considering possible confounds, and by including a discussion of the degree of conviction. 

      We have weakened the discussion of this finding and provided some an additional analyses in a supplemental figure (Figure 17) to help clarify the finding.

      Reviewer #1 (Recommendations For The Authors): 

      First, let me apologize for the long delay with this review. Despite my generally positive evaluation (see public review), I have some concerns about the way the data are presented and questions about methodological details. 

      (1) Representation of results: I find it hard to decipher how much variability arises within an individual and how much across individuals. For example, Figure 7b seems to aggregate across all individuals, while the analysis is (correctly) based on the subject medians.

      Figure 7b That figure was just one subject. This is now clarified.

      It would be good to see the distribution of all individuals (maybe use violin plots for each observer with the true data on one side and the baseline data on the other, or simple histograms for each). To get a feeling for inter-individual and intra-individual variability is crucial, as obviously (see the leg-length analysis) there are larger inter-individual differences and representations like these would be important to appreciate whether there is just a scaling of more or less the same effect or whether there are qualitative differences (especially in the light of N=9 being not a terribly huge sample size). 

      The medians for the individual subjects are now provided with the standard deviations between subjects to indicate the extent of individual differences. Note that the random paths were chosen from the distribution of actual step slopes for that subject as one of the constraints. This makes the random paths statistically similar to the chosen paths with the differences only being generated by the particular visual context. Thus the test for a difference between chosen and random is quite conservative

      Similarly, seeing \DeltaH plotted as a function of steps in the path as a figure rather than just having the verbal description would also help. 

      To simplify the discussion of our methods/results we have removed the analyses that examine mean slope as a function of steps.  Because of the central limit theorem the slopes of the chosen paths remain largely unchanged regardless of the choice path length.  The slopes of the simulated paths are always larger irrespective of the choice of path length.

      (2) Reporting the statistical analyses: This is related to my previous issue: I would appreciate it if the test statistics and degrees-of-freedom of the statistical tests were given along with the p-values, instead of only the p-values. This at some points would also clarify how the statistics were computed exactly (e.g., "All subjects showed comparable difference and the difference in medians evaluated across subjects was highly significant (p<<0.0001).", p.10, is ambiguous to me). 

      Details have been added as requested.

      (3) Why is the lower half ("tortuosity less than the median tortuosity") of paths used as "straight" rather than simply the minimum of all viable paths)?

      The benchmark for a straight path is somewhat arbitrary. Using the lower half rather than the minimum length path is more conservative.

      (4) For the CNN analysis, I failed to understand what was training and what was test set. I understand that the goal is to predict for all pixels whether they are a potential foothold or not, and the AUC is a measure of how well they can be discriminated based on depth information and then this is done for each image and the median over all images taken. But on which data is the CNN trained, and on which is it tested? Is this leave-n-out within the same participant? If so, how do you deal with dependencies between subsequent images? Or is it leave-1-out across participants? If so, this would be more convincing, but again, the same image might appear in training and test. If the authors just want to ask how well depth features can discriminate footholds from non-footholds, I do not see the benefit of a supervised method, which leaves the details of the feature combinations inside a black box. Rather than defining the "negative set" (i.e., the non-foothold pixels) randomly, the simulated paths could also be used, instead. If performance (AUC) gets lower than for random pixels, this would confirm that the choice of parameters to define a "viable path" is well-chosen. 

      This has been clarified as described above.

      Minor issues: 

      (5) A higher tortuosity would also lead a participant to require more steps in total than a lower tortuosity. Could this partly explain the correlation between the leg length and the slope/tortuosity correlation? (Longer legs need fewer steps in total, thus there might be less tradeoff between \Delta H and keeping the path straight (i.e., saving steps)). To assess this, you could give the total number of steps per (straight) distance covered for leg length and compare this to a flat surface.

      The calculations are done on an individual subject basis and the first and last step locations are chosen from the actual foot placements, then the random paths are generated between those endpoints. The consequence of this is that the number of steps is held constant for the analysis.  We have clarified the methods for this analysis to try to make this more clear.

      (6) As far as I understand, steps happen alternatingly with the two feet. That is, even on a flat surface, one would not reach 0 tortuosity. In other words, does the lateral displacement of the feet play a role (in particular, if paths with even and paths with odd number of steps were to be compared), and if so, is it negligible for the leg-length correlation? 

      All the comparisons here are done for 5 step sequences so this potential issue should not affect the slope of the regression lines or the leg length correlation.

      (7) Is there any way to quantify the quality of the depth estimates? Maybe by taking an actual depth image (e.g., by LIDAR or similar) for a small portion of the terrain and comparing the results to the estimate? If this has been done for similar terrain, can a quantification be given? If errors would be similar to human errors, this would also be interesting for the interpretation of the visual sampling data.

      Unfortunately, we do not have the ground truth depth image from LIDAR.  When these data were originally collected, we had not imagined being able to reconstruct the terrain.  However, we agree with the reviewers that this would be a good analysis to do. We plan to collect LIDAR in future experiments. 

      To provide an assessment of quality for these data in the absence of a ground truth depth image, we have performed an evaluation of the reliability of the terrain reconstruction across repeats of the same terrain both between and within participants.  We have expanded the discussion of these reliability analyses in the results section entitled “Evaluating Terrain Reconstruction”, as well as in the corresponding methods section (see Figure 10).

      (8) The figures are sometimes confusing and a bit sloppy. For example, in Figure 7a, the red, cyan, and green paths are not mentioned in the caption, in Figure 8 units on the axes would be helpful, in Figure 9 it should probably be "tortuosity" where it now states "curviness". 

      These details have been fixed.

      (9) I think the statement "The maximum median AUC of 0.79 indicates that the 0.79 is the median proportion of pixels in the circular..." is not an appropriate characterization of the AUC, as the number of correctly classified pixels will not only depend on the ROC (and thus the AUC), but also on the operating point chosen on the ROC (which is not specified by the AUC alone). I would avoid any complications at this point and just characterize the AUC as a measure of discriminability between footholds and non-footholds based on depth features. 

      This has been fixed.

      (10) Ref. [16]is probably the wrong Hart paper (I assume their 2012 Exp. Brain Res. [https://doi.org/10.1007/s00221-012-3254-x] paper is meant at this point) 

      Fixed

      Typos (not checked systematically, just incidental discoveries): 

      (11) "While there substantial overlap" (p.10) 

      (12) "field.." (p.25) 

      (13) "Introduction", "General Discussion" and "Methods" as well as some subheadings are numbered, while the other headings (e.g., Results) are not. 

      Fixed

      Reviewer #2 (Recommendations For The Authors): 

      The major suggestions have been made in the Public Review. The following are either minor comments or go into more detail about the major suggestions. All of these comments are meant to be constructive, not obstructive. 

      Abstract. This is well written, but the main conclusions "Walkers avoid...This trade off is related...5 steps ahead" sound quite qualitative. They could be strengthened by more specificity (NOT p-values), e.g. "positive correlation between the unevenness of the path straight ahead and the probability that people turned off that path." 

      The abstract has been substantially rewritten.

      P. 5 "pinning the head position estimated from the IMU to the Meshroom estimates" sounds like there are two estimates. But it does not sound like both were used. Clarify, e.g. the Meshroom estimate of head position was used in place of IMU? 

      Yes that’s correct.  We have clarified this in the text.

      Figure 5. I was confused by this. First, is a person walking left to right? When the gaze position is shown, where was the eye at the time of that gaze? There are straight lines attached to the blue dots, what do they represent? The caption says gaze is directed further along the path, which made me guess the person is walking right to left, and the line originates at the eye. Except the origins do not lie on or close to the head locations. There's also no scale shown, so maybe I am completely misinterpreting. If the eye locations were connected to gaze locations, it would help to support the finding that people look five steps ahead of where they step. 

      We have updated the figure and clarified the caption to remove these confusions.  There was a mistake in the original figure (where the yellow indicated head locations, we had plotted the center of mass and the choice of projection gave the incorrect impression that the fixations off the path, in blue, were separated from the head).

      The view of the data is now presented so the person is walking left to right and with a projection of the head location (orange), gaze locations (blue or green) and feet (pink).

      Figure 6. As stated in the major comments, the step distributions would be expected to have a covariance structure (in terms of raw data before taking absolute values). It would be helpful to report the covariances (6 numbers). As an example of a simple statistical analysis, a PCA (also based on a data covariance) would show how certain combinations of slope/distance/direction are favored over others. Such information would be a simple way to argue that the data are not completely random, and may even show a height-turn trade-off immediately. (By the way, I am assuming absolute values are used because the slopes and directions are only positive, but it wasn't clear if this was the definition.) A reason why covariances and PCA are helpful is that such data would be helpful to compute a better random walk, generated from dynamics. I believe the argument that steps are not random is not served by showing the different histograms in Figure 7, because I feel the random paths are not fairly produced. A better argument might draw randomly from the same distribution as the data (or drive a dynamical random walk), and compare with actual data. There may be correlations present in the actual data that differ from random. I could be mistaken, because it is difficult or impossible to draw conclusions from distributions of absolute values, or maybe I am only confused. In any case, I suspect other readers will also have difficulty with this section. 

      This has been addressed above in the major comments.

      p. 9, "average step slope" I think I understand the definition, but I suggest a diagram might be helpful to illustrate this.

      There is a diagram of a single step slope in Figure 6 and a diagram of the average step slope for a path segment in Figure 12.

      Incidentally, the "straight path slope" is not clearly defined. I suspect "straight" is the view from above, i.e. ignoring height changes. 

      Clarified

      p. 11 The tortuosity metric could use a clearer definition. Should I interpret "length of the chosen path relative to a straight path" as the numerator and denominator? Here does "length" also refer to the view from above? Why is tortuosity defined differently from step slope? Couldn't there be an analogue to step slope, except summing absolute values of direction changes? Or an analogue to tortuosity, meaning the length as viewed from the side, divided by the length of the straight path? 

      We followed the literature in the definition of tortuosity.  We have clarified the definition of tortuosity in the methods, but yes, you can interpret the length of the chosen path relative to a straight path, as the numerator and denominator, and length refers to 3D length.  We agree that there are many interesting ways to look at the data but for clarity we have limited the discussion to a single definition of tortuosity in this paper.

      Figure 8 could use better labeling. On the left, there is a straight path and a more tortuous path, why not report the metrics for these? On the right, there are nine unlabeled plots. The caption says "turn probability vs. straight path slope" but the vertical axis is clearly not a probability. Perhaps the axis is tortuosity? I presume the horizontal axis is a straight path slope in degrees, but this is not explained. Why are there nine plots, is each one a subject? I would prefer to be informed directly instead of guessing. (As a side note, I like the correlations as a function of leg length, it is interesting, even if slightly unbelievable. I go hiking with people quite a bit shorter and quite a lot taller than me, and anecdotally I don't think they differ so much from each other.) 

      We have fixed Figure 8 which shows the average “mean slope” as a function of tortuosity.  We have added a supplemental figure which shows a scatter plot of the raw data (mean slope vs. tortuosity for each path segment).  

      Note that when walking with friends other factors (e.g. social) will contribute to the cost function. As a very short person my experience is that it is a problem. In any case, the data are the data, whatever the underlying reasons. It does not seem so surprising that people of different heights make different tradeoffs. We know that the preferred gait depends on individual’s passive dynamics as described in the paper, and the terrain will change what is energetically optimal as described in the Darici and Kuo paper.

      Figure 9 presumably shows one data point per subject, but this isn't clear. 

      The correlations are reported per subject, and this has been clarified. 

      p. 13 CNN. I like this analysis, but only sort of. It is convincing that there is SOME sort of systematic decision-making about footholds, better than chance. What it lacks is insight. I wonder what drives peoples' decisions. As an idle suggestion, the AlexNet (arXiv: Krizhevsky et al.; see also A. Karpathy's ConvNETJS demo with CIFAR-10) showed some convolutional kernels to give an idea of what the layers learned. 

      Further exploration of CNN’s would definitely be interesting, but it is outside the scope of the paper. We use it simply to make a modest point, as described above.

      p. 15 What is the definition of stability cost? I understand energy cost, but it is unclear how circuitous paths have a higher stability cost. One possible definition is an energetic cost having to do with going around and turning. But if not an energy cost, what is it? 

      We meant to say that the longer and flatter paths are presumably more stable because of the smaller height changes. You are correct that we can’t say what the stability cost is and we have clarified this in the discussion.

      p. 16 "in other data" is not explained or referenced.

      Deleted 

      p. 10 5 step paths and p. 17 "over the next 5 steps". I feel there is very little information to really support the 5 steps. A p-value only states the significance, not the amount of difference. This could be strengthened by plotting some measures vs. the number of steps ahead. For example, does a CNN looking 1-5 steps ahead predict better than one looking N<5 steps ahead? I am of course inclined to believe the 5 steps, but I do not see/understand strong quantitative evidence here. 

      We have weakened the statements about evidence for planning 5 steps ahead.

      p. 25 CNN. I did not understand the CNN. The list of layers seems incomplete, it only shows four layers. The convolutional-deconvolutional architecture is mentioned as if that is a common term, which I am unfamiliar with but choose to interpret as akin to encoder-decoder. However, the architecture does not seem to have much of a bottleneck (25x25x8 is not greatly smaller than 100x100x4), so what is the driving principle? It's also unclear how the decoder culminates, does it produce some m x m array of probabilities of stepping, where m is some lower dimension than the images? It might be helpful also to illustrate the predictions, for example, show a photo of the terrain view, along with a probability map for that view. I would expect that the reader can immediately say yes, I would likely step THERE but not there. 

      We have clarified the description of the CNN. An illustration is shown in Figure 11.

      Reviewer #3 (Recommendations For The Authors): 

      (This section expands on the points already contained in the Public Review). 

      Major issues 

      (1) The training and validation data of the CNN are not explained fully making it unclear if the data tells us anything about the visual features used to guide steering. A CNN was used on the depth scenes to identify foothold locations in the images. This is the bit of the methods and the results that remains ambiguous, and the authors may need to revisit the methods/results. It is not clear how or on what data the network was trained (training vs. validation vs. un-peeked test data), and justification of the choices made. There is no discussion of possible overfitting. The network could be learning just for example specific rock arrangements in the particular place you experimented. Training the network on data from one location and then making it generalize to another location would of course be ideal. Your network probably cannot do this (as far as I can tell this was not tried), and so the meaning of the CNN results cannot really be interpreted. 

      I really like the idea, of getting actual retinotopic depth field approximations. But then the question would be: what features in this information are relevant and useful for visual guidance (of foot placement)? But this question is not answered by your method. 

      "If a CNN can predict these locations above chance using depth information, this would indicate that depth features can be used to explain some variation in foothold selection." But there is no analysis of what features they are. If the network is overfitting they could be very artefactual, pixel-level patterns and not the kinds of "features" the human reader immediately has in mind. As you say "CNN analysis shows that subject perspective depth features are predictive of foothold locations", well, yes, with 50,000 odd parameters the foothold coordinates can be associated with the 3D pixel maps, but what does this tell us? 

      See previous discussion of these issues.

      It is true that we do not know the precise depth features used. We established that information about height changes was being used, but further work is needed to specify how the visual system does this. This is mentioned in the Discussion.

      You open the introduction with a motivation to understand the visual features guiding path selection, but what features the CNN finds/uses or indeed what features are there is not much discussed. You would need to bolster this, or down-emphasize this aspect in the Introduction if you cannot address it. 

      "These depth image features may or may not overlap with the step slope features shown to be predictive in the previous analysis, although this analysis better approximates how subjects might use such information." I do not think you can say this. It may be better to approximate the kind of (egocentric) environment the subjects have available, but as it is I do not see how you can say anything about how the subject uses it. (The results on the path selection with respect to the terrain features, viewpoint viewpoint-independent allocentric properties of the previous analyses, are enough in themselves!) 

      We have rewritten the section on the CNN to make clearer what it can and cannot do and its role in the manuscript. See previous discussion.

      (2) The use of descriptive terminology should be made systematic. Overall the rest of the methodology is well explained, and the workflow is impressive. However, to interpret the results the introduction and discussion seem to use terminology somewhat inconsistently. You need to dig into the methods to figure out the exact operationalizations, and even then you cannot be quite sure what a particular term refers to. Specifically, you use the following terms without giving a single, clear definition for them (my interpretation in parentheses): 

      foothold (a possible foot plant location where there is an "affordance"? or a foot plant location you actually observe for this individual? or in the sample?) 

      step (foot trajectory between successive step locations) 

      step location (the location where the feet are placed) 

      path (are they lines projected on the ground, or are they sequences of foot plants? The figure suggests lines but you define a path in terms of five steps. 

      foot plant (occurs when the foot comes in contact with step location?) 

      future foothold (?) 

      foot location (?) 

      future foot location (?) 

      foot position (?) 

      I think some terms are being used interchangeably here? I would really highly recommend a diagrammatic cartoon sketch, showing the definitions of all these terms in a single figure, and then sticking to them in the main text. Also, are "gaze location" and "fixation" the same? I.e. is every gaze-ground intersection a "gaze location" (I take it it is not a "fixation", which you define by event identification by speed and acceleration thresholds in the methods)? 

      We have cleaned up the language. A foothold is the location in the terrain representation (mesh) where the foot was placed. A step is the transition from one foothold to the next. A path is the sequences of 5 steps. The lines simply illustrate the path in the Figures. A gaze location is the location in the terrain representation where the walker is holding gaze still (the act of fixating). See Muller et al (2023) for further explanation.

      (3) More coverage of different interpretations / less interpretation in the abstract/introduction would be prudent. You discuss the path selection very much on the basis of energetic costs and gait stability. At least mention should be given to other plausible parameters the participants might be optimizing (or that indeed they may be just satisficing). Temporal cost (more circuitous route takes longer) and uncertainty (the more step locations you sample the more chance that some of them will not be stable) seem equally reasonable, given the task ecology / the type of environment you are considering. I do not know if there is literature on these in the gait-scene, but even if not then saying you are focusing on just one explanation because that's where there is literature to fall back on would be the thing to do. 

      Also in the abstract and introduction you seem to take some of this "for granted". E.g. you end the abstract saying "are planning routes as well as particular footplants. Such planning ahead allows the minimization of energetic costs. Thus locomotor behavior in natural environments is controlled by decision mechanisms that optimize for multiple factors in the context of well-calibrated sensory and motor internal models". This is too speculative to be in the abstract, in my opinion. That is, you take as "given" that energetic cost is the major driver of path selection in your task, and that the relevant perception relies on internal models. Neither of these is a priori obvious nor is it as far as I can tell shown by your data (optimizing other variables, satisficing behavior, or online "direct perception" cannot be ruled out). 

      We have rewritten the abstract and Discussion with these concerns in mind.

      You should probably also reference: 

      Warren, W. H. (1984). Perceiving affordances: Visual guidance of stair climbing. Journal of Experimental Psychology: Human Perception and Performance, 10(5), 683-703. https://doi.org/10.1037/0096-1523.10.5.683 

      Warren WH Jr, Young DS, Lee DN. Visual control of step length during running over irregular terrain. J Exp Psychol Hum Percept Perform. 1986 Aug;12(3):259-66. doi: 10.1037//0096-1523.12.3.259. PMID: 2943854. 

      We have added these references to the introduction.

      Minor point 

      Related to (2) above, the path selection results are sometimes expressed a bit convolutedly, and the gist can get lost in the technical vocabulary. The generation of alternative "paths" and comparison of their slope and tortuousness parameters show that the participants preferred smaller slope/shorter paths. So, as far as I can tell, what this says is that in rugged terrain people like paths that are as "flat" as possible. This is common sense so hardly surprising. Do not be afraid to say so, and to express the result in plain non-technical terms. That an apple falls from a tree is common sense and hardly surprising. Yet quantifying the phenomenon, and carefully assessing the parameters of the path that the apple takes, turned out to be scientifically valuable - even if the observation itself lacked "novelty". 

      Thanks.  We have tried to clarify the methods/results with this in mind.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with an appropriate and well-structured dataset.

      The study's descriptive analyses and figures are useful and will be of interest to the neuroscience community. However, with regard to the statistical comparisons and regression models, I believe that there are methodological flaws that may limit the validity of the presented results. These issues primarily affect the uncertainty of estimates and the statistical inference made on comparisons and model estimates - the fundamental direction and magnitude of the results are unlikely to change in most cases. I have included detailed statistical comments below for reference.

      Conceptually, I think this study will be very effective at providing context and empirical evidence for a broader conversation around self-citation. And while I believe that there is room for a deeper quantitative dive into some finer-grained questions, this paper will be a valuable catalyst for new areas of inquiry around citation behavior - e.g., do authors change self-citation behavior when they move to more or less prestigious institutions? do self-citations in neuroscience benefit downstream citation accumulation? do journals' reference list policies increase or decrease self-citation? - that I hope that the authors (or others) consider exploring in future work.

      Thank you for your suggestions and your generally positive view of our work. As described below, we have made the statistical improvements that you suggested.

      Statistical comments:

      (1) Throughout the paper, the nested nature of the data does not seem to be appropriately handled in the bootstrapping, permutation inference, and regression models. This is likely to lead to inappropriately narrow confidence bands and overly generous statistical inference.

      We apologize for this error. We have now included nested bootstrapping and permutation tests. We defined an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

      We first describe this in the results (page 3, line 110):

      “Importantly, we accounted for the nested structure of the data in bootstrapping and permutation tests by forming co-authorship exchangeability blocks.”

      We also describe this in 4.8 Confidence Intervals (page 21, line 725):

      “Confidence intervals were computed with 1000 iterations of bootstrap resampling at the article level. For example, of the 100,347 articles in the dataset, we resampled articles with replacement and recomputed all results. The 95% confidence interval was reported as the 2.5 and 97.5 percentiles of the bootstrapped values.

      We grouped data into exchangeability blocks to avoid overly narrow confidence intervals or overly optimistic statistical inference. Each exchangeability block comprised any authors who published together as a First Author / Last Author pairing in our dataset. We only considered shared First/Last Author publications because we believe that these authors primarily control self-citations, and otherwise exchangeability blocks would grow too large due to the highly collaborative nature of the field. Furthermore, the exchangeability blocks do not account for co-authorship in other journals or prior to 2000. A distribution of the sizes of exchangeability blocks is presented in Figure S15.”

      In describing permutation tests, we also write (page 21, line 739):

      “4.9 P values

      P values were computed with permutation testing using 10,000 permutations, with the exception of regression P values and P values from model coefficients. For comparing different fields (e.g., Neuroscience and Psychiatry) and comparing self-citation rates of men and women, the labels were randomly permuted by exchangeability block to obtain null distributions. For comparing self-citation rates between First and Last Authors, the first and last authorship was swapped in 50% of exchangeability blocks.”

      For modeling, we considered doing a mixed effects model but found difficulties due to computational power. For example, with our previous model, there were hundreds of thousands of levels for the paper random effect, and tens of thousands of levels for the author random effect. Even when subsampling or using packages designed for large datasets (e.g., mgcv’s bam function: https://www.rdocumentation.org/packages/mgcv/versions/1.9-1/topics/bam), we found computational difficulties.

      As a result, we switched to modeling results at the paper level (e.g., self-citation count or rate). We found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We updated our description of our models in the Methods section (page 21, line 754):

      “4.10 Exploring effects of covariates with generalized additive models

      For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

      We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

      For our models, we used generalized additive models from mgcv’s “gam” function in R 49. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 50 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 49. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 51. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 51. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

      In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 49. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

      The direction of our results primarily stayed the same, with the exception of gender results. Men tended to self-cite slightly less (or equal self-citation rates) after accounting for numerous covariates. As such, we also modeled the number of previous papers to explain the discrepancy between our raw data and the modeled gender results. Please find the updated results text below (page 11, line 316):

      “2.9 Exploring effects of covariates with generalized additive models

      Investigating the raw trends and group differences in self-citation rates is important, but several confounding factors may explain some of the differences reported in previous sections. For instance, gender differences in self-citation were previously attributed to men having a greater number of prior papers available to self-cite 7,20,21. As such, covarying for various author- and article-level characteristics can improve the interpretability of self-citation rate trends. To allow for inclusion of author-level characteristics, we only consider First Author and Last Author self-citation in these models.

      We used generalized additive models (GAMs) to model the number and rate of self-citations for First Authors and Last Authors separately. The data were randomly subsampled so that each author only appeared in one paper. The terms of the model included several article characteristics (article year, average time lag between article and all cited articles, document type, number of references, field, journal impact factor, and number of authors), as well as author characteristics (academic age, number of previous papers, gender, and whether their affiliated institution is in a low- and middle-income country). Model performance (adjusted R2) and coefficients for parametric predictors are shown in Table 2. Plots of smooth predictors are presented in Figure 6.

      First, we considered several career and temporal variables. Consistent with prior works 20,21, self-citation rates and counts were higher for authors with a greater number of previous papers. Self-citation counts and rates increased rapidly among the first 25 published papers but then more gradually increased. Early in the career, increasing academic age was related to greater self-citation. There was a small peak at about five years, followed by a small decrease and a plateau. We found an inverted U-shaped trend for average time lag and self-citations, with self-citations peaking approximately three years after initial publication. In addition, self-citations have generally been decreasing since 2000. The smooth predictors showed larger decreases in the First Author model relative to the Last Author model (Figure 6).

      Then, we considered whether authors were affiliated with an institution in a low- and middle-income country (LMIC). LMIC status was determined by the Organisation for Economic Co-operation and Development. We opted to use LMIC instead of affiliation country or continent to reduce the number of model terms. We found that papers from LMIC institutions had significantly lower self-citation counts (-0.138 for First Authors, -0.184 for Last Authors) and rates (-12.7% for First Authors, -23.7% for Last Authors) compared to non-LMIC institutions. Additional results with affiliation continent are presented in Table S5. Relative to the reference level of Asia, higher self-citations were associated with Africa (only three of four models), the Americas, Europe, and Oceania.

      Among paper characteristics, a greater number of references was associated with higher self-citation counts and lower self-citation rates (Figure 6). Interestingly, self-citations were greater for a small number of authors, though the effect diminished after about five authors. Review articles were associated with lower self-citation counts and rates. No clear trend emerged between self-citations and journal impact factor. In an analysis by field, despite the raw results suggesting that self-citation rates were lower in Neuroscience, GAM-derived self-citations were greater in Neuroscience than in Psychiatry or Neurology.

      Finally, our results aligned with previous findings of nearly equivalent self-citation rates for men and women after including covariates, even showing slightly higher self-citation rates in women. Since raw data showed evidence of a gender difference in self-citation that emerges early in the career but dissipates with seniority, we incorporated two interaction terms: one between gender and academic age and a second between gender and the number of previous papers. Results remained largely unchanged with the interaction terms (Table S6).

      2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      (2) The discussion of the data structure used in the regression models is somewhat opaque, both in the main text and the supplement. From what I gather, these models likely have each citation included in the model at least once (perhaps twice, once for first-author status and one for last-author status), with citations nested within citing papers, cited papers, and authors. Without inclusion of random effects, the interpretation and inference of the estimates may be misleading.

      Please see our response to point (1) to address random effects. We have also switched to GAMs (see point #3 below) and provided more detail in the methods. Notably, we decided against using author-level effects due to poor model stability, as there can be as few as one author per group. Instead, we subsampled the dataset such that only one paper appeared from each author.

      (3) I am concerned that the use of the inverse hyperbolic sine transform is a bit too prescriptive, and may be producing poor fits to the true predictor-outcome relationships. For example, in a figure like Fig S8, it is hard to know to what extent the sharp drop and sign reversal are true reflections of the data, and to what extent they are artifacts of the transformed fit.

      Thank you for raising this point. We have now switched to using generalized additive models (GAMs). GAMs provide a flexible approach to modeling that does not require transformations. We described this in detail in point (1) above and in Methods 4.10 Exploring effects of covariates with generalized additive models (page 21, line 754).

      “4.10 Exploring effects of covariates with generalized additive models

      For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

      We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

      For our models, we used generalized additive models from mgcv’s “gam” function in R 48. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 49 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 48. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 50. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 50. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

      In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 48. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

      (4) It seems there are several points in the analysis where papers may have been dropped for missing data (e.g., missing author IDs and/or initials, missing affiliations, low-confidence gender assessment). It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for comparisons across countries it would be important for the authors to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

      Thank you for raising this important point. In the methods section, we describe how the data are missing (page 18, line 623):

      “4.3 Data exclusions and missingness

      Data were excluded across several criteria: missing covariates, missing citation data, out-of-range values at the citation pair level, and out-of-range values at the article level (Table 3). After downloading the data, our dataset included 157,287 articles and 8,438,733 citations. We excluded any articles with missing covariates (document type, field, year, number of authors, number of references, academic age, number of previous papers, affiliation country, gender, and journal). Of the remaining articles, we dropped any for missing citation data (e.g., cannot identify whether a self-citation is present due to lack of data). Then, we removed citations with unrealistic or extreme values. These included an academic age of less than zero or above 38/44 for First/Last Authors (99th percentile); greater than 266/522 papers for First/Last Authors (99th percentile); and a cited year before 1500 or after 2023. Subsequently, we dropped articles with extreme values that could contribute to poor model stability. These included greater than 30 authors; fewer than 10 references or greater than 250 references; and a time lag of greater than 17 years. These values were selected to ensure that GAMs were stable and not influenced by a small number of extreme values.

      In addition, we evaluated whether the data were not missing at random (Table S8). Data were more likely to be missing for reviews relative to articles, for Neurology relative to Neuroscience or Psychiatry, in works from Africa relative to the other continents, and for men relative to women. Scopus ID coverage contributed in part to differential missingness. However, our exclusion criteria also contribute. For example, Last Authors with more than 522 papers were excluded to help stabilize our GAMs. More men fit this exclusion criteria than women.”

      Due to differential missingness, we wrote in the limitations (page 16, line 529):

      “Ninth, data were differentially missing (Table S8) due to Scopus coverage and gender estimation. Differential missingness could bias certain results in the paper, but we hope that the dataset is large enough to reduce any potential biases.”

      Reviewer #2 (Public Review):

      The authors provide a comprehensive investigation of self-citation rates in the field of Neuroscience, filling a significant gap in existing research. They analyze a large dataset of over 150,000 articles and eight million citations from 63 journals published between 2000 and 2020. The study reveals several findings. First, they state that there is an increasing trend of self-citation rates among first authors compared to last authors, indicating potential strategic manipulation of citation metrics. Second, they find that the Americas show higher odds of self-citation rates compared to other continents, suggesting regional variations in citation practices. Third, they show that there are gender differences in early-career self-citation rates, with men exhibiting higher rates than women. Lastly, they find that self-citation rates vary across different subfields of Neuroscience, highlighting the influence of research specialization. They believe that these findings have implications for the perception of author influence, research focus, and career trajectories in Neuroscience.

      Overall, this paper is well written, and the breadth of analysis conducted by authors, with various interactions between variables (eg. gender vs. seniority), shows that the authors have spent a lot of time thinking about different angles. The discussion section is also quite thorough. The authors should also be commended for their efforts in the provision of code for the public to evaluate their own self-citations. That said, here are some concerns and comments that, if addressed, could potentially enhance the paper:

      Thank you for your review and your generally positive view of our work.

      (1) There are concerns regarding the data used in this study, specifically its bias towards top journals in Neuroscience, which limits the generalizability of the findings to the broader field. More specifically, the top 63 journals in neuroscience are based on impact factor (IF), which raises a potential issue of selection bias. While the paper acknowledges this as a limitation, it lacks a clear justification for why authors made this choice. It is also unclear how the "top" journals were identified as whether it was based on the top 5% in terms of impact factor? Or 10%? Or some other metric? The authors also do not provide the (computed) impact factors of the journals in the supplementary.

      We apologize for the lack of clarity about our selection of journals. We agree that there are limitations to selecting higher impact journals. However, we needed to apply some form of selection in order to make the analysis manageable. For instance, even these 63 journals include over five million citations. We better describe our rationale behind the approach as follows (page 17, line 578):

      “We collected data from the 25 journals with the highest impact factors, based on Web of Science impact factors, in each of Neurology, Neuroscience, and Psychiatry. Some journals appeared in the top 25 list of multiple fields (e.g., both Neurology and Neuroscience), so 63 journals were ultimately included in our analysis. We recognize that limiting the journals to the top 25 in each field also limits the generalizability of the results. However, there are tradeoffs between breadth of journals and depth of information. For example, by limiting the journals to these 63, we were able to look at 21 years of data (2000-2020). In addition, the definition of fields is somewhat arbitrary. By restricting the journals to a set of 63 well-known journals, we ensured that the journals belonged to Neurology, Neuroscience, or Psychiatry research. It is also important to note that the impact factor of these journals has not necessarily always been high. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. To further recognize the effects of impact factor, we decided to include an impact factor term in our models.”

      In addition, we have now provided the 2020 impact factors in Table S1.

      By exclusively focusing on high impact journals, your analysis may not be representative of the broader landscape of self-citation patterns across the neuroscience literature, which is what the title of the article claims to do.

      We agree that this article is not indicative of all neuroscience literature, but rather the top journals. Thus, we have changed the title to: “Trends in Self-citation Rates in High-impact Neurology, Neuroscience, and Psychiatry Journals”. We would also like to note that compared to previous bibliometrics works in neuroscience (Bertolero et al. 2020; Dworkin et al. 2020; Fulvio et al. 2021), this article includes a wider range of data.

      (2) One other concern pertains to the possibility that a significant number of authors involved in the paper may not be neuroscientists. It is plausible that the paper is a product of interdisciplinary collaboration involving scientists from diverse disciplines. Neuroscientists amongst the authors should be identified.

      In our opinion, neuroscience is a broad, interdisciplinary field. Individuals performing neuroscience research may have a neuroscience background. Yet, they may come from many backgrounds, such as physics, mathematics, biology, chemistry, or engineering. As such, we do not believe that it is feasible to characterize whether each author considers themselves a neuroscientist or not. We have added the following to the limitations section (page 16, line 528):

      “Eighth, authors included in this work may not be neurologists, neuroscientists, or psychiatrists. However, they still publish in journals from these fields.”

      (3) When calculating self-citation rate, it is important to consider the number of papers the authors have published to date. One plausible explanation for the lower self-citation rates among first authors could be attributed to their relatively junior status and short publication record. As such, it would also be beneficial to assess self-citation rate as a percentage relative to the author's publication history. This number would be more accurate if we look at it as a percentage of their publication history. My suspicion is that first authors (who are more junior) might be more likely to self-cite than their senior counterparts. My suspicion was further raised by looking at Figures 2a and 3. Considering the nature of the self-citation metric employed in the study, it is expected that authors with a higher level of seniority would have a greater number of publications. Consequently, these senior authors' papers are more likely to be included in the pool of references cited within the paper, hence the higher rate.

      While the authors acknowledge the importance of the number of past publications in their gender analysis, it is just as important to include the interplay of seniority in (1) their first and last author self-citation rates and (2) their geographic analysis.

      Thank you for this thoughtful comment. We agree that seniority and prior publication history play an important role in self-citation rates.

      For comparing First/Last Author self-citation rates, we have now included a plot similar to Figure 2a, where self-citation as a percentage of prior publication history is plotted.

      (page 4, line 161): “Analyzing self-citations as a fraction of publication history exhibited a similar trend (Figure S3). Notably, First Authors were more likely than Last Authors to self-cite when normalized by prior publication history.

      For the geographic analysis, we made two new maps: 1) that of the number of previous papers, and 2) that of the journal impact factor (see response to point #4 below).

      (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r\=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r\=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

      Finally, we included a model term for the number of previous papers (Table 2). We analyzed this both for self-citation counts and self-citation rates and found a strong relationship between publication history and self-citations. We also included the following section where we modeled the number of previous papers for each author (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      (4) Because your analysis is limited to high impact journals, it would be beneficial to see the distribution of the impact factors across the different countries. Otherwise, your analysis on geographic differences in self-citation rates is hard to interpret. Are these differences really differences in self-citation rates, or differences in journal impact factor? It would be useful to look at the representation of authors from different countries for different impact factors.

      We made a map of this in Figure S4 (see our response to point #3 above).

      (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

      We also included impact factor as a term in our model. The results suggest that there are still geographic differences (Table 2, Table S5).

      (5) The presence of self-citations is not inherently problematic, and I appreciate the fact that authors omit any explicit judgment on this matter. That said, without appropriate context, self-citations are also not the best scholarly practice. In the analysis on gender differences in self-citations, it appears that authors imply an expectation of women's self-citation rates to align with those of men. While this is not explicitly stated, use of the word "disparity", and also presentation of self-citation as an example of self-promotion in discussion suggest such a perspective. Without knowing the context in which the self-citation was made, it is hard to ascertain whether women are less inclined to self-promote or that men are more inclined to engage in strategic self-citation practices.

      We agree that on the level of an individual self-citation, our study is not useful for determining how related the papers are. Yet, understanding overall trends in self-citation may help to identify differences. Context is important, but large datasets allow us to investigate broad trends. We added the following text to the limitations section (page 16, line 524):

      “In addition, these models do not account for whether a specific citation is appropriate, as some situations may necessitate higher self-citation rates.”

      Reviewer #3 (Public Review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. There are some minor methodological clarifications needed, but more importantly, the interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated, and more importantly, the extent to which self-citations are "problematic" remains unclear.

      Thank you for your review. We attempted to improve the interpretation of results, as described in the following responses.

      When are self-citations problematic? As the authors themselves also clarify, "self-citations may often be appropriate". Researchers cite their own previous work for perfectly good reasons, similar to reasons of why they would cite work by others. The "problem", in a sense, is that researchers cite their own work, just to increase the citation count, or to promote their own work and make it more visible. This self-promotional behaviour might be incentivised by certain research evaluation procedures (e.g. hiring, promoting) that overly emphasise citation performance. However, the true problem then might not be (self-)citation practices, but instead, the flawed research evaluation procedures that emphasis citation performance too much. So instead of problematising self-citation behaviour, and trying to address it, we might do better to address flawed research evaluation procedures. Of course, we should expect references to be relevant, and we should avoid self-promotional references, but addressing self-citations may just have minimal effects, and would not solve the more fundamental issue.

      We agree that this dataset is not designed to investigate the downstream effects of self-citations. However, self-citation practices are more likely to be problematic when they differ across specific groups. This work can potentially spark more interest in future longitudinal designs to investigate whether differences in self-citation practices leads to differences in career outcomes, for example. We added the following text to clarify (page 17, line 565):

      “Yet, self-citation practices become problematic when they are different across groups or are used to “game the system.” Future work should investigate the downstream effects of self-citation differences to see whether they impact the career trajectories of certain groups. We hope that this work will help to raise awareness about factors influencing self-citation practices to better inform authors, editors, funding agencies, and institutions in Neurology, Neuroscience, and Psychiatry.”

      Some other challenges arise when taking a statistical perspective. For any given paper, we could browse through the references, and determine whether a particular reference would be warranted or not. For instance, we could note that there might be a reference included that is not at all relevant to the paper. Taking a broader perspective, the irrelevant reference might point to work by others, included just for reasons of prestige, so-called perfunctory citations. But it could of course also include self-citations. When we simply start counting all self-citations, we do not see what fraction of those self-citations would be warranted as references. The question then emerges, what level of self-citations should be counted as "high"? How should we determine that? If we observe differences in self-citation rates, what does it tell us?

      Our focus is when the self-citation practices differ across groups. We agree that, on a case-by-case basis, there is no exact number for a self-citation rate that is “high.” With a dataset of the current size, evaluating whether each individual self-citation is appropriate is not feasible. If we observe differences in self-citation rate, this may tell us about broad (not individual-level) trends and differences in self-citing practice. If one group is self-citing much more highly compared to another group–even after covarying relevant variables such as prior publication history–then the self-citation differences can likely be attributed to differences in self-citation practices/behaviors.

      For example, the authors find that the (any author) self-citation rate in Neuroscience is 10.7% versus 15.9% in Psychiatry. What does this difference mean? Are psychiatrists citing themselves more often than neuroscientists? First author men showed a self-citation rate of 5.12% versus a self-citation rate of 3.34% of women first authors. Do men engage in more problematic citation behaviour? Junior researchers (10-year career) show a self-citation rate of about 5% compared to a self-citation rate of about 10% for senior researchers (30-year career). Are senior researchers therefore engaging in more problematic citation behaviour? The answer is (most likely) "no", because senior authors have simply published more, and will therefore have more opportunities to refer to their own work. To be clear: the authors are aware of this, and also take this into account. In fact, these "raw" various self-citation rates may, as the authors themselves say, "give the illusion" of self-citation rates, but these are somehow "hidden" by, for instance, career seniority.

      We included numerous covariates in our model. In addition, to address the difference between “raw” and “modeled” self-citation rates, we added the following section (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      Again, the authors do consider this, and "control" for career length and number of publications, et cetera, in their regression model. Some of the previous observations then change in the regression model. Neuroscience doesn't seem to be self-citing more, there just seem to be junior researchers in that field compared to Psychiatry. Similarly, men and women don't seem to show an overall different self-citation behaviour (although the authors find an early-career difference), the men included in the study simply have longer careers and more publications.

      But here's the key issue: what does it then mean to "control" for some variables? This doesn't make any sense, except in the light of causality. That is, we should control for some variable, such as seniority, because we are interested in some causal effect. The field may not "cause" the observed differences in self-citation behaviour, this is mediated by seniority. Or is it confounded by seniority? Are the overall gender differences also mediated by seniority? How would the selection of high-impact journals "bias" estimates of causal effects on self-citation? Can we interpret the coefficients as causal effects of that variable on self-citations? If so, would we try to interpret this as total causal effects, or direct causal effects? If they do not represent causal effects, how should they be interpreted then? In particular, how should it "inform author, editors, funding agencies and institutions", as the authors say? What should they be informed about?

      We apologize for our misuse of language. We will be more clear, as in most previous self-citation papers, that our analysis is NOT causal. Causal datasets do have some benefits in citation research, but a limitation is that they may not cover as wide of a range of authors. Furthermore, non-causal correlational studies can still be useful in informing authors, editors, funding agencies, and institutions. Association studies are widely used across various fields to draw non-causal conclusions. We made numerous changes to reduce our causal language.

      Before: “We then developed a probability model of self-citation that controls for numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

      After (page 3, line 113): “We then developed a probability model of self-citation that includes numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

      Before: “As such, controlling for various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

      After (page 11, line 321): “As such, covarying various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

      Before: “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after controlling for various confounds, the self-citation rates are higher in Neuroscience.”

      After (page 15, line 468): “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after considering several covariates, the self-citation rates are higher in Neuroscience.”

      We also added the following text to the limitations section (page 16, line 526):

      “Seventh, the analysis presented in this work is not causal. Association studies are advantageous for increasing sample size, but future work could investigate causality in curated datasets.”

      The authors also "encourage authors to explore their trends in self-citation rates". It is laudable to be self-critical and review ones own practices. But how should authors interpret their self-citation rate? How useful is it to know whether it is 5%, 10% or 15%? What would be the "reasonable" self-citation rate? How should we go about constructing such a benchmark rate? Again, this would necessitate some causal answer. Instead of looking at the self-citation rate, it would presumably be much more informative to simply ask authors to check whether references are appropriate and relevant to the topic at hand.

      We believe that our tool is valuable for authors to contextualize their own self-citation rates. For instance, if an author has published hundreds of articles, it is not practical to count the number of self-citations in each. We have added two portions of text to the limitations section:

      (page 16, line 524): “In addition, these models do not account for whether a specific citation is appropriate, though some situations may necessitate higher self-citation rates.”

      (page 16, line 535): “Despite these limitations, we found significant differences in self-citation rates for various groups, and thus we encourage authors to explore their trends in self-citation rates. Self-citation rates that are higher than average are not necessarily wrong, but suggest that authors should further reflect on their current self-citation practices.”

      In conclusion, the study shows some interesting and relevant differences in self-citation rates. As such, it is a welcome contribution to ongoing discussions of (self) citations. However, without a clear causal framework, it is challenging to interpret the observed differences.

      We agree that causal studies provide many benefits. Yet, association studies also provide many benefits. For example, an association study allowed us to analyze a wider range of articles than a causal study would have.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Statistical suggestions:

      (1) To improve statistical inference, nesting should be accounted for in all of the analyses. For example, the logistic regression model using citing/cited pairs should include random effects for article, author, and perhaps subfield, in order for independence of observations to be plausible. Similarly, bootstrapping and permutation would ideally occur at the author level rather than (or in addition to) the paper level.

      Detailed updates addressing these points are in the public review. In short, we found computational challenges with many levels of the random effects (>100,000) and millions of observations at the citation pairs level. As such, we decided to model citations rates and counts by paper. In this case, we found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We repeated the random resampling 100 times (Figure S12). We updated our description of our models in the Methods section (page 21, line 754).

      For permutation tests and bootstrapping, we now define an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

      (2) In general, I am having trouble understanding the structure of the regression models. My current belief is that rows are composed of individual citations from papers' reference lists, with the outcome representing their status as a self-citation or not, and with various citing article and citing author characteristics as predictors. However, the fact that author type is included in the model as a predictor (rather than having a model for FA self-citations and another for LA self-citations) suggests to me that each citation is entered as two separate rows - once noting whether it was a FA self-citation and once noting whether it was an LA self-citation - and then it is run as a single model.

      (2a) If I am correct, the model is unlikely to be producing valid inference. I would recommend breaking this analysis up into two separate models, and including article-, author-, and subfield-level random effects. You could theoretically include a citation-level random effect and keep it as one model, but each 'group' would only have two observations and the model would be fairly unstable as a result.

      (2b) If I am misunderstanding (and even if not), I would encourage you to provide a more detailed description of the dataset structure and the model - perhaps with a table or diagram

      We split the data into two models and decided to model on the level of a paper (self-citation rate and self-citation count). In addition, we subsampled the dataset such that each author only appears once to avoid misestimation of confidence intervals (see point (1) above). As described in the public review, we included much more detail in our methods section now to improve the clarity of our models.

      (3) I would suggest removing the inverse hyperbolic sine transform and replacing it with a more flexible approach to estimating the relationships' shape, like generalized additive models or other spline-based methods to ensure that the chosen method is appropriate - or at the very least checking that it is producing a realistic fit that reflects the underlying shape of the relationships.

      More details are available in the public review, but we now use GAMs throughout the manuscript.

      (4) For the "highly self-citing" analysis, it is unclear why papers in the 15-25% range were dropped rather than including them as their own category in an ordinal model. I might suggest doing the latter, or explaining the decision more fully

      We previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

      (5) It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for your team to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

      Thank you for this suggestion. We added more detailed missingness data to 4.3 Data exclusions and missingness. We did find differential missingness and added it to the limitations section. However, certain aspects of this cannot be corrected because the data are just not available (e.g., Scopus coverage issues). Further details are available in the public review.

      Conceptual thoughts:

      (1) I agree with your decision to focus on the second definition of self-citation (self-cites relative to my citations to others' work) rather than the first (self-cites relative to others' citations to my work). But it does seem that the first definition is relevant in the context of gaming citation metrics. For example, someone who writes one paper per year with a reference list of 30% self-citations will have much less of an impact on their H-index than someone who writes 10 papers per year with 10% self-citations. It could be interesting to see how these definitions interact, and whether people who are high on one measure tend to be high on the other.

      We agree this would be interesting to investigate in the future. Unfortunately, our dataset is organized at the level of the paper and thus does not contain information regarding how many times the authors cite a particular work. We hope that we can explore this interaction in the future.

      (2) This is entirely speculative, but I wonder whether the increasing rate of LA self-citation relative to FA self-citation is partly due to PIs over-citing their own lab to build up their trainees' citation records and help them succeed in an increasingly competitive job market. This sounds more innocuous than doing it to benefit their own reputation, but it would provide another mechanism through which students from large and well-funded labs get a leg-up in the job market. Might be interesting to explore, though I'm not exactly sure how :)

      This is a very interesting point. We do not have any means to investigate this with the current dataset, but we added it to the discussion (page 14, line 421):

      “A third, more optimistic explanation is that principal investigators (typically Last Authors) are increasingly self-citing their lab’s papers to build up their trainee’s citation records for an increasingly competitive job market.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In regards to point 1 in the public review: In the spirit of transparency, the authors would benefit from providing a rationale for their choice of top journals, and the methodology used to identify them. It would also be valuable to include the impact factor of each journal in the S1 table alongside their names.

      Given the availability and executability of code, it would be useful to see how and if the self-citation trends vary amongst the "low impact" journals (as measured by the IF). This could go in any of the three directions:

      a. If it is found that self-citations are not as prevalent in low impact journals, this could be a great starting point for a conversation around the evaluation of journals based on impact factor, and the role of self-citations in it.

      b. If it is found that self-citations are as prevalent in low impact journals as high impact journals, that just strengthens your results further.

      c. If it is found that self-citations are more prevalent in low impact journals, this would mean your current statistics are a lower bound to the actual problem. This is also intuitive in the sense that high impact journals get more external citations (and more exposure) than low impact journals, as such authors (and journals) may be less likely to self-cite.

      Expanding the dataset to include many more journals was not feasible. Instead, we included an impact factor term in our models, as detailed in the public review. We found no strong trends in the association between impact factor and self-citation rate/count. Another important note is that these journals were considered “high impact” in 2020, but many had lower impact factors in earlier years. Thus, our modeling allows us to estimate how impact factor is related to self-citations across a wide range of impact factors.

      It is crucial to consider utilizing such a comprehensive database as Scopus, which provides a more thorough list of all journals in Neuroscience, to obtain a more representative sample. Alternatively, other datasets like Microsoft Academic Graph, and OpenAlex offer information on the field of science associated with each paper, enabling a more comprehensive analysis.

      We agree that certain datasets may offer a wider view of the entire field. However, we included a large number of papers and journals relative to previous studies. In addition, Scopus provides a lot of detailed and valuable author-level information. We had to limit our calls to the Scopus API so restricted journals by 2020 impact factor.

      (2) In regards to point 2 in the public review: To enhance the accuracy and specificity of the analysis, it would be beneficial to distinguish neuroscientists among the co-authors. This could be accomplished by examining their publication history leading up to the time of publication of the paper, and identify each author's level of engagement and specialization within the field of neuroscience.

      Since the field of neuroscience is largely based on collaborations, we find that it might be impossible to determine who is a neuroscientist. For example, a researcher with a publication history in physics may now be focusing on computational neuroscience research. As such, we feel that our current work, which ensures that the papers belong to neuroscience, is representative of what one may expect in terms of neuroscience research and collaboration.

      (3) In regards to point 3 in the public review: I highly recommend plotting self-citation rate as the number of papers in the reference list over the number of total publications to date of paper publication.

      As described in the public review, we have now done this (Figure S3).

      (4) In regards to point 5 in the public review: It would be useful to consider the "quality" of citations to further the discussion on self-citations. For instance, differentiating between self-citations that are perfunctory and superficial from those that are essential for showing developmental work, would be a valuable contribution.

      Other databases may have access to this information, but ours unfortunately does not. We agree that this is an interesting area of work.

      (5) The authors are to be commended for their logistic regression models, as they control for many confounders that were lacking in their earlier descriptive statistics. However, it would be beneficial to rerun the same analysis but on a linear model whereby the outcome variable would be the number of self-citations per author. This would possibly resolve many of the comments mentioned above.

      Thank you for your suggestion. As detailed in the public review, we now model the number of self-citations. This is modeled on the paper level, not the author level, because our dataset was downloaded by paper, not by author.

      Minor suggestions:

      (1) Abstract says one of your findings is: "increasing self-citation rates of First Authors relative to Last Authors". Your results actually show the opposite (see Figure 1b).

      Thank you for catching this error. We corrected it to match the results and discussion in the paper:

      “…increasing self-citation rates of Last Authors relative to First Authors.”

      (2) It might be interesting to compute an average academic age for each paper, and look at self-citation vs average academic age plot.

      We agree that this would be an interesting analysis. However, to limit calls to the API, we collected academic age data only on First and Last Authors.

      (3) It may be interesting to look at the distribution of women in different subfields within neuroscience, and the interaction of those in the context of self-citations.

      Thank you for this interesting suggestion. We added the following analysis (page 9, line 305):

      “Furthermore, we explored topic-by-gender interactions (Figure S10). In short, men and women were relatively equally represented as First Authors, but more men were Last Authors across all topics. Self-citation rates were higher for men across all topics.”

      Reviewer #3 (Recommendations For The Authors):

      - In the abstract, "flaws in citation practices" seems worded rather strongly.

      We respectfully disagree, as previous works have shown significant bias in citation practices. For example, Dworkin et al. (Dworkin et al. 2020) found that neuroscience reference lists tended to under-cite women, even after including various covariates.

      - Links of the references to point to (non-accessible) paperpile references, you would probably want to update this.

      We apologize for the inconvenience and have now removed these links.

      - p 2, l 24: The explanation of ref. (5) seems to be a bit strangely formulated. The point of that article is that citations to work that reinforce a particular belief are more likely to be cited, which *creates* unfounded authority. The unfounded authority itself is hence no part of the citation practices

      Thank you for catching our misinterpretation. We have now removed this part of the sentence.

      - p 3, l 16: "h indices" or "citations" instead of "h-index".

      We now say “h-indices”.

      - p 5, l 5: how was the manual scoring done?

      We added the following to the caption of Figure S1.

      “Figure S1. Comparison between manual scoring of self-citation rates and self-citation rates estimated from Python scripts in 5 Psychiatry journals: American Journal of Psychiatry, Biological Psychiatry, JAMA Psychiatry, Lancet Psychiatry, and Molecular Psychiatry. 906 articles in total were manually evaluated (10 articles per journal per year from 2000-2020, four articles excluded for very large author list lengths and thus high difficulty of manual scoring). For manual scoring, we downloaded information about all references for a given article and searched for matching author names.”

      - p 5, l 23: Why this specific p-value upper bound of 4e-3? From later in the article, I understand that this stems from the 10000 bootstrap sample, with then taking a Bonferroni correction? Perhaps good to clarify this briefly somewhere.

      Thank you for this suggestion. We now perform Benjamini/Hochberg false discovery rate (FDR) correction, but we added a description of the minimum P value from permutations (page 21, line 748):

      “All P values described in the main text were corrected with the Benjamini/Hochberg 16 false discovery rate (FDR) correction. With 10,000 permutations, the lowest P value after applying FDR correction is P=2.9e-4, which indicates that the true point would be the most extreme in the simulated null distribution.”

      - Fig. 1, caption: The (a) and (b) labelling here is a bit confusing, because the first sentence suggests both figures portray the same, but do so for different time periods. Perhaps rewrite, so that (a) and (b) are both described in a single sentence, instead of having two different references to (a) and (b).

      Thank you for pointing this out. We fixed the labeling of this caption:

      “Figure 1. Visualizing recent self-citation rates and temporal trends. a) Kernel density estimate of the distribution of First Author, Last Author, and Any Author self-citation rates in the last five years. b) Average self-citation rates over every year since 2000, with 95% confidence intervals calculated by bootstrap resampling.”

      - p7, l 9: Regarding "academic age", note that there might be a difference between "age" effects and "cohort" effects. That is, there might be difference between people with a certain career age who started in 1990 and people with the same career age, but who started in 2000, which would be a "cohort" effect.

      We agree that this is a possible effect and have added it to the limitations (page 16, line 532):

      “Tenth, while we considered academic age, we did not consider cohort effects. Cohort effects would depend on the year in which the individual started their career.”

      - p 7, l 15: "jumps" suggests some sort of sudden or discontinuous transition, I would just say "increases".

      We now say “increases.”

      - Fig. 2: Perhaps it should be made more explicit that this includes only academics with at least 50 papers. Could the authors please clarify whether the same limitation of at least 50 papers also features in other parts of the analysis where academic age is used? This selection could affect the outcomes of the analysis, so its consequences should be carefully considered. One possibility for instance is that it selects people with a short career length who have been exceptionally productive, namely those that have had 50 papers, but only started publishing in 2015 or so. Such exceptionally productive people will feature more highly in the early career part, because they need to be so productive in order to make the cut. For people with a longer career, the 50 papers would be less of a hurdle, and so would select more and less productive people more equally.

      We apologize for the lack of clarity. We did not use this requirement where academic age was used. We mainly applied this requirement when aggregating by country, as we did not want to calculate self-citation rate in a country based on only several papers. We have clarified various data exclusions in our new section 4.3 Data exclusions and missingness.

      - p 8, l 11: The affiliated institution of an author is not static, but rather changes throughout time. Did the authors consider this? If not, please clarify that this refers to only the most recent affiliation (presumably). Authors also often have multiple affiliations. How did the authors deal with this?

      The institution information is at the time of publication for each paper. We added more detail to our description of this on page 19, line 656:

      “For both First and Last Authors, we found the country of their institutional affiliation listed on the publication. In the case of multiple affiliations, the first one listed in Scopus was used.”

      - p 10, l 6: How were these self-citation rates calculated? This is averaged per author (i.e. only considering papers assigned to a particular topic) and then averaged across authors? (Note that in this way, the average of an author with many papers will weigh equally with the average of an author with few papers, which might skew some of the results).

      We calculate it across the entire topic (i.e., do NOT calculate by author first). We updated the description as follows (page 7, line 211):

      “We then computed self-citation rates for each of these topics (Figure 4) as the total number of self-citations in each topic divided by the total number of references in each topic…”

      - p 13, l 18: Is the academic age analysis here again limited to authors having at least 50 papers?

      This is not limited to at least 50 papers. To clarify, the previous analysis was not limited to authors with 50 papers. It was instead limited to ages in our dataset that had at least 50 data points. e.g., If an academic age of 70 only had 20 data points in our dataset, it would have been excluded.

      - Fig. 5: Here, comparing Fig. 5(d) and 5(f) suggests that partly, the self-citation rate differences between men and women, might be the result of the differences in number of papers. That is, the somewhat higher self-citation rate at a given academic age, might be the result of the higher number of papers at that academic age. It seems that this is not directly described in this part of the analysis (although this seems to be the case from the later regression analysis).

      We agree with this idea and have added a new section as follows (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates by highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      - Section 2.10. Perhaps the authors could clarify that this analysis takes individual articles as the unit of analysis, not citations.

      We updated all our models to take individual articles and have clarified this with more detailed tables.

      - p 18, l 10: "Articles with between 15-25% self-citation rates were 10 discarded" Why?

      We agree that these should not be discarded. However, we previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

      - p 20, l 5: "Thus, early-career researchers may be less incentivized to 5 self-promote (e.g., self-cite) for academic gains compared to 20 years ago." How about the possibility that there was less collaboration, so that first authors would be more likely to cite their own paper, whereas with more collaboration, they will more often not feature as first author?

      This is an interesting point. We feel that more collaboration would generally lead to even more self-citations, if anything. If an author collaborates more, they are more likely to be on some of the references as a middle author (which by our definition counts toward self-citation rates).

      - p 20, l 15: Here the authors call authors to avoid excessive self-citations. Of course, there's nothing wrong with calling for that, but earlier the authors were more careful to not label something directly as excessive self-citations. Here, by stating it like this, the authors suggest that they have looked at excessive self-citations.

      We rephrased this as follows:

      Before: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid excessive self-citations.”

      After: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid unnecessary self-citations.”

      - p 22, l 11: Here again, the same critique as p 20, l15 applies.

      We switched “excessively” to “unnecessarily.”

      - p 23, l 12: The authors here critique ref. (21) of ascertainment bias, namely that they are "including only highly-achieving researchers in the life 12 sciences". But do the authors not do exactly the same thing? That is, they also only focus on the top high-impact journals.

      We included 63 high-impact journals with tens of thousands of authors. In addition, some of these journals were not high-impact at the time of publication. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. This still is a limitation of our work, but we do cover a much broader range of works than the listed reference (though their analysis also has many benefits since it included more detailed information).

      - p 26, l 22-26: It seems that the matching is done quite broadly (matching last names + initials at worst) for self-citations, while later (in section 4.9, p 31, l 9), the authors switch to only matching exact Scopus Author IDs. Why not use the same approach throughout? Or compare the two definitions (narrow / broad).

      Thank you for catching this mistake. We now use the approach of matching Scopus Author IDs throughout.

      - S8: it might be nice to explore open alternatives, such as OpenAlex or OpenAIRE, instead of the closed Scopus database, which requires paid access (which not all institutions have, perhaps that could also be corrected in the description in GitHub).

      Thank you for this suggestion. Unfortunately, switching databases would require starting our analysis from the beginning. On our GitHub page, we state: “Please email matthew.rosenblatt@yale.edu if you have trouble running this or do not have institutional access. We can help you run the code and/or run it for you and share your self-citation trends.” We feel that this will allow us to help researchers who may not have institutional access. In addition, we released our aggregated, de-identified (title and paper information removed) data on GitHub for other researchers to use.

    1. Kernel Level InstrumentationThis new method allows to trace the messages ex-changed between containers at the kernel level. Theexperiment has focused on the ZeroMQ library, butthe instrumentation process would be the same forall Message-Oriented Middleware and other messag-ing systems. It does not require any modification ofthe messaging library or application source code

      This means tracing messages between containers at the kernel level - the Linux kernel of the operating system of the host

    1. Résumé de la vidéo [00:01:29][^1^][1] - [00:28:02][^2^][2]:

      Cette vidéo présente une réunion pour les parents d'élèves de première et terminale technologiques, abordant divers sujets importants pour l'année scolaire.

      Temps forts: + [00:04:15][^3^][3] Introduction et accueil * Importance des classes technologiques * Réussites des élèves * Objectifs de l'année + [00:06:05][^4^][4] Règles de comportement * Respect des horaires * Comportement dans les couloirs * Utilisation des téléphones portables + [00:10:02][^5^][5] Gestion des retards et absences * Augmentation des retards et absences * Stratégies pour améliorer la ponctualité * Importance de la présence en cours + [00:13:02][^6^][6] Préparation au baccalauréat * Importance du contrôle continu * Stratégies pour obtenir de bonnes notes * Rôle des parents dans le suivi scolaire + [00:19:01][^7^][7] Présentation des professeurs * Introduction des professeurs principaux * Rôle des professeurs dans le suivi des élèves * Importance de la communication entre parents et enseignants

      Résumé de la vidéo [00:28:03][^1^][1] - [00:51:59][^2^][2]:

      Cette vidéo est une réunion d'information pour les parents d'élèves de première et terminale technologiques. Elle couvre divers sujets importants concernant la scolarité, les aides disponibles, et les orientations futures.

      Temps forts: + [00:28:03][^3^][3] Portail scolarité et bourses * Accès via le portail scolarité service * Utilisation du code EduConnect * Importance de voter électroniquement + [00:30:01][^4^][4] Carte jeunesse et avantages * Réductions sur les places de cinéma * Aides pour les activités sportives * Financement par la Région + [00:34:02][^5^][5] Orientation et projets futurs * Importance de se renseigner sur les formations * Participation au salon Oraction * Conseils pour choisir les formations + [00:37:00][^6^][6] Parcours après le bac technologique * Options de BTS et DUT * Possibilités de poursuivre en école d'ingénieur * Importance de bien préparer son dossier + [00:45:02][^7^][7] Internat d'excellence * Aide pour les élèves en difficulté * Cours de soutien le soir * Importance de l'entraide et de la mutualisation

      Résumé de la vidéo [00:01:29][^1^][1] - [00:28:02][^2^][2]:

      Cette vidéo présente une réunion pour les parents d'élèves de première et terminale technologiques, abordant divers aspects de la vie scolaire et des attentes académiques.

      Temps forts: + [00:04:15][^3^][3] Introduction et accueil * Diffusion en direct sur YouTube * Importance des classes technologiques * Fierté des réussites académiques + [00:05:23][^4^][4] Règlement intérieur et comportement * Règles de comportement strictes * Gestion des horaires et des récréations * Interdiction des téléphones portables + [00:10:50][^5^][5] Retards et absences * Augmentation des retards et absences * Importance de la régularité et de l'assiduité * Conséquences des absences stratégiques + [00:13:02][^6^][6] Préparation au baccalauréat * Importance des notes de contrôle continu * Stratégies pour obtenir de bonnes notes * Rôle des parents dans le suivi scolaire + [00:19:01][^7^][7] Présentation des professeurs principaux * Introduction des professeurs et de leurs classes * Importance de la collaboration entre parents et enseignants * Encouragement à la participation des parents aux conseils de classe

      Résumé de la vidéo [00:28:03][^1^][1] - [00:51:59][^2^][2]:

      Cette vidéo est une réunion d'information pour les parents d'élèves de première et terminale technologiques. Elle couvre divers sujets importants concernant la scolarité, les aides disponibles, et les orientations futures.

      Temps forts: + [00:28:03][^3^][3] Portail scolarité et bourses * Accès via le portail Éduc Connect * Utilisation pour le vote électronique des parents * Accès à Mon Bureau Numérique + [00:30:01][^4^][4] Carte Jeunesse et avantages * Réductions sur les places de cinéma * Aides pour l'inscription à l'UNSS * Aides à la restauration scolaire + [00:34:02][^5^][5] Orientation et projets futurs * Participation au salon Oraction * Importance de bien se renseigner sur les formations post-bac * Conseils pour choisir les formations adaptées + [00:40:01][^6^][6] Filières technologiques et débouchés * Présentation des spécialités en STI2D et STL * Possibilités de poursuite d'études après le bac * Importance des classes préparatoires et des BTS + [00:47:02][^7^][7] Parcoursup et choix des vœux * Calendrier et étapes de Parcoursup * Importance de discuter des choix en famille * Conseils pour bien préparer son dossier et ses vœux

      Résumé de la vidéo [00:52:00][^1^][1] - [01:04:42][^2^][2]:

      Cette vidéo aborde les étapes cruciales pour les élèves de première et terminale technologiques concernant la saisie des vœux sur Parcoursup, l'importance des résultats scolaires et des appréciations des professeurs, ainsi que les perspectives d'insertion professionnelle après l'obtention du bac.

      Temps forts: + [00:52:00][^3^][3] Saisie des vœux sur Parcoursup * Importance de saisir les vœux avant la date limite * Risques techniques de dernière minute * Nécessité de réflexion préalable + [00:52:42][^4^][4] Résultats et rebondissements * Possibilité de ne pas être accepté en prépa TSI * Importance de demander plusieurs formations * Rôle des professeurs dans la construction du dossier + [00:54:30][^5^][5] Importance des appréciations des professeurs * Impact des appréciations positives sur les admissions * Conséquences des absences fréquentes * Sélection des élèves par les professeurs pour les BTS + [00:57:00][^6^][6] Dates des épreuves du baccalauréat * Dates des épreuves de français et de philosophie * Importance de la préparation tout au long de l'année * Révision continue et importance du contrôle continu + [01:00:01][^7^][7] Insertion professionnelle et niveaux de diplôme * Taux d'emploi selon le niveau de diplôme * Importance du niveau de diplôme pour les salaires * Différence de rémunération sur une vie entière

    1. Author response:

      We thank the reviewers for their productive comments on our work. While we have chosen to not revise the manuscript further, we reply to the public reviewer comments here so as to provide clarification on certain points.

      Reviewer #1 (Public Review):

      Summary:

      The aim of the study described in this paper was to test whether visual stimuli that pulse synchronously with the systole phase of the cardiac cycle are suppressed compared with stimuli that pulse in the diastole phase. To this end, the authors employed a binocular rivalry task and used the duration of the perceived image as the metric of interest. The authors predicted that if there was global suppression of the visual stimulus during systole then the durations of the stimulus that were pulsing synchronously with systole should be of shorter duration than those pulsing in diastole. However, the results observed were the opposite of those predicted. The authors speculate on what this facilitation effect might mean for the baroreceptor suppression hypothesis.

      Strengths:

      This is an interesting and timely study that uses a clever paradigm to test the baroreceptor suppression hypothesis in vision. This is a refreshingly focussed paper with interesting and seemingly counterintuitive results.

      Weaknesses:

      The paper could benefit from a clearer explanation of the predicted results. For those not experts in binocular rivalry, it would be useful to explain the predicted results. Does pulsing stimuli in this way change durations in such a task? If there is global suppression of visual stimuli why would this lead to shorter/longer durations in the systole compared to the diastole conditions? In addition, the duration lengths in both conditions seem to be longer than one cardiac cycle. If the cardiac cycle modulates duration it would be interesting to discuss why this occurs on some cycles but not on others. If there is a facilitation effect why does it only occur on some cycles?

      In general, pulsing stimuli (i.e. moving gratings) show longer dominance durations when in competition with non-pulsing stimuli; in other words, pulses increase the “stimulus strength” of a visual grating (Wade, De Weert & Swanston, 1984). The Baroreceptor Hypothesis predicts global suppression of visual cortex during systole (and not during diastole), so the stimulus strength boost yielded by a pulse should be attenuated during systole. Thus, the stimulus that only pulses during systole would have lower stimulus strength (and thus shorter dominance durations) than that which pulses during diastole; however, we observe the opposite pattern in our data, seemingly contradicting the Baroreceptor Hypothesis.

      In typical binocular rivalry paradigms, dominance durations are biased by stimulus strength, but perception remains bistable such that the stronger stimulus is not necessarily dominant at a given time. We see no reason, then, why switching would have to occur every cycle. The dominance durations we see are quite typical of binocular rivalry paradigms, whereas durations shorter than a cardiac cycle would be rather unusual (Carmel et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      This is a binocular rivalry study that uses electrocardiogram events to modulate visual stimuli in real-time, relative to participants' heartbeats. The main finding is that modulations during the period around when the heart has contracted (systole) increase rivalry dominance durations. This is a really neat result, that demonstrates the link between interoception and vision. I thought the Bayesian mixture modelling was a really smart way to identify cardiac non-perceivers, and the finding that the main result is preserved in this group is compelling. Overall, the study has been conducted to a high standard, is appropriately powered, and reported clearly. I have one suggestion about interpretation, which concerns the explanation of increased dominance durations with reference to contemporary models of binocular rivalry, and a few minor queries. However, I think this paper is a worthwhile addition to the literature.

      The point Reviewer 2 makes with respect to contemporary models of binocular rivalry is important – perhaps more so than its brief statement in this public review suggests. As we already expand upon in our Discussion, the effects of global (neural) inhibition depend on the preexisting role that inhibition plays in a given neural circuit. The original framing of the Baroreceptor Hypothesis describes baroreceptor activity of uniformly impeding sensory processing (Lacey, 1967; Lacey & Lacey, 1978, American Psychologist), which is contradicted by our present results. This account is often interpreted as implying the effects of baroreceptor activation is inhibitory in terms of neural mechanism (e.g. Rau et al., 1993, Psychophysiology; Edwards et al., 2009, Psychophysiology). Some researchers argue this serves a parallel function to the inhibitory projections from motor to sensory areas during volitional movement, “cancelling” the sensory effects of heartbeats (Van Elk, et al., 2014, Biological Psychology).

      However, baroreceptor activity has also been described as introducing noise into sensory processing rather than inhibiting it directly (e.g. Allen et al., 2022, PLoS Computational Biology). Lacey and Lacey’s own account actually seemed to point toward attention as a mediating mechanism (Hahn, 1973, Psychological Bulletin), with the disproportionate focus on cortical inhibition emerging in the literature over time. All this is to say that, while our results seem to falsify the behavioral predictions of the original Baroreceptor Hypothesis, subsequent versions of that hypothesis that describe an inhibitory neural mechanism, rather than an inhibition of perception per se, could potentially still be compatible with our results. This is a topic we plan to explore in future work.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript addresses a question inspired by the Baroceptor Hypothesis and its links to visual awareness and interoception. Specifically, the reported study aimed to determine if the effects of cardiac contraction (systole) on binocular rivalry (BR) are facilitatory or suppressive. The main experiment - relying on a technically challenging procedure of presenting stimuli synchronised with the heartbeats of participants - has been conducted with great care, and numerous manipulation checks the authors report convincingly show that the methods they used work as intended. Moreover, the control experiment allows for excluding alternative explanations related to participants being aware of their heartbeats. Therefore, the study convincingly shows the effect of cardiac activity on BR - and this is an important finding. The results, however, do not allow for unambiguously determining if this effect is facilitatory or suppressive (see details below), which renders the study not as informative as it could be.

      While the authors strongly focus on interoception and awareness, this study will be of interest to researchers studying BR as such. Moreover, the code and the data the authors share can facilitate the adoption of their methods in other labs.

      Strengths:

      (1) The study required a complex technical setup and the manuscript both describes it well and demonstrates that it was free from potential technical issues (e.g. in section 3.3. Manipulation check).

      (2) The sophisticated statistical methods the authors used, at least for a non-statistician like me, appear to be well-suited for their purpose. For example, they take into account the characteristics of BR (gamma distributions of dominance durations). Moreover, the authors demonstrate that at least in one case their approach is more conservative than a more basic one (Binomial test) would be.

      (3) Finally, the control experiment, and the analysis it enabled, allow for excluding a multitude of alternative explanations of the main results.

      (4) The authors share all their data and materials, even the code for the experiment.

      (5) The manuscript is well-written. In particular, it introduces the problem and methods in a way that should be easy to understand for readers coming from different research fields.

      Weaknesses:

      (1) The interpretation of the main result in the context of the Baroceptor hypothesis is not clear. The manuscript states: The Baroreceptor Hypothesis would predict that the stimulus entrained to systole would spend more time suppressed and, conversely, less time dominant, as cortical activity would be suppressed each time that stimulus pulses. The manuscript does not specify why this should be the case, and the term 'entrained' is not too helpful here (does it refer to neural entrainment? or to 'being in phase with'?). The answer to this question is provided by the manuscript only implicitly, and, to explain my concern, I try to spell it out here in a slightly simplified form.

      During systole (cardiac contraction), the visual system is less sensitive to external information, so it 'ignores' periods when the systole-synchronised stimulus is at the peak of its pulse. Conversely, the system is more sensitive during diastole, so the stimulus that is at the peak of its pulse then should dominate for longer, because its peaks are synchronised with the periods of the highest sensitivity of the visual system when the information used to resolve the rivalry is sampled from the environment. This idea, while indeed being a clever test of the hypothesis in question, rests on one critical assumption: that the peak of the stimulus pulse (as defined in the manuscript) is the time when the stimulus is the strongest for the visual system. The notion of 'stimulus strength' is widely used in the BR literature (see Brascamp et al., 2015 for a review). It refers to the stimulus property that, simply speaking, determines its tendency to dominate in the BR. The strength of a stimulus is underpinned by its low-level visual properties, such as contrast and spatial frequency content. Coming back to the manuscript, the pulsing of the stimuli affected at least spatial frequency (and likely other low-level properties), and it is unknown if it was in phase with the pulsing of the stimulus strength, or not. If my understanding of the premise of the study is correct, the conclusions drawn by the authors stand only if it was.

      In other words, most likely the strength of one of the stimuli was pulsating in sync with the systole, but is it not clear which stimulus it was. It is possible that, for the visual system, the stimulus meant to pulse in sync with the systole was pulsing strength-wise in phase with the diastole (and the one intended to pulse with in sync with the diastole strength-wise pulsed with the systole). If this is the case, the predictions of the Baroceptor Hypothesis hold, which would change the conclusion of the manuscript.

      We agree with Reviewer 3’s argumentation here. If the pulses decreased, rather than increased, effective stimulus strength, then the present results would indeed be consistent with the Baroreceptor Hypothesis. However, Wade et al. (1984) demonstrated that grating stimuli which pulse in the same manner (i.e. by dynamically varying the spatial frequency of the grating) as in our experiment indeed show increased stimulus strength relative to static stimuli, even if the dynamic stimuli have lower spatial frequency on average (https://doi.org/10.3758/BF03203891).

      We admit our results would be stronger had we included a replication of Wade at al. (1984) in our study, but in light of this previous work, our interpretation is indeed supported.

      (2) Using anaglyph goggles necessitates presenting stimuli of a different colour to each eye. The way in which different colours are presented can impact stimulus strength (e.g. consider that different anaglyph foils can attenuate the light they let through to different degrees). To deal with such effects, at least some studies on BR employed procedures of adjusting the colours for each participant individually (see Papathomas et al., 2004; Patel et al., 2015 and works cited there). While I think that counterbalancing applied in the study excludes the possibility that colour-related effects influenced the results, the effects of interest still could be stronger for one of the coloured foils.

      It is the case that, when we split the data up by eye (and thus by color), we only see statistically significant results for one eye – though the nominal direction of the effect is consistent across both eyes. So it is indeed possible that the effect could be stronger for one of the colored foils, but the present experiment was not designed to be powered to test that cardiac phase-by-color interaction.

      We concur with the Reviewer, however, that our use of counterbalancing excludes color-related effects as an explanation for our main findings.

      (3) Several aspects of the methods (e.g. the stimuli), are not described at the level of detail some readers might be accustomed to. The most important issue here is the task the participants performed. The manuscript says that they pressed a button whenever they experienced a switch in perception, but it is only implied that there were different buttons for each stimulus.

      There were indeed different buttons for each stimulus (i.e. a button to indicate their perception had switched to the red stimulus and another to indicate it had switched to blue). Our full, unmodified experiment code has been made available and is permanently archived (https://doi.org/10.5281/zenodo.10367327), so the full procedure is well documented and can be replicated exactly.

      Brascamp, J. W., Klink, P. C., & Levelt, W. J. M. (2015). The 'laws' of binocular rivalry: 50 years of Levelt's propositions. Vision Research, 109, 20-37. https://doi.org/10.1016/j.visres.2015.02.019

      Papathomas, T. V., Kovács, I., & Conway, T. (2004). Interocular grouping in binocular rivalry: Basic attributes and combinations. In D. Alais & R. Blake (Eds.), Binocular Rivalry (pp. 155-168). MIT Press

      Patel, V., Stuit, S., & Blake, R. (2015). Individual differences in the temporal dynamics of binocular rivalry and stimulus rivalry. Psychonomic Bulletin and Review, 22(2), 476-482. https://doi.org/10.3758/s13423-014-0695-1

    1. Reviewer #1 (Public review):

      Ellis et al. investigated the functional and topographical organization of visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data (3-18 minutes) is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (e.g., Knapen, 2021) to strengthen our understanding of infant vision during naturalistic contexts and further evidence for the usefulness of movie-based experiments.<br /> - This study provides novel evidence that functional alignment approaches (specifically, shared response modeling) can be usefully applied to infant fMRI data. Further, code for reproducing such analyses (and others) will be made publicly available.<br /> - Awake infant fMRI data are rare and time-consuming and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      Weakness:

      - As the authors clearly state, movie-viewing experiments may not work as well as traditional retinotopy tasks; that is, this approach cannot currently be considered a replacement for retinotopy when accurate maps are needed.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the potential of short-movie viewing fMRI protocol to explore the functional and topographical organization of the visual system in awake infants and toddlers. Although the data are compelling given the difficulty of studying this population, the evidence presented is incomplete and would be strengthened by additional analyses to support the authors' claims. This study will be of interest to cognitive neuroscientists and developmental psychologists, especially those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited fMRI tolerance.

      We are grateful for the thorough and thoughtful reviews. We have provided point-bypoint responses to the reviewers’ comments, but first, we summarize the major revisions here. We believe these revisions have substantially improved the clarity of the writing and impact of the results.

      Regarding the framing of the paper, we have made the following major changes in response to the reviews:

      (1) We have clarified that our goal in this paper was to show that movie data contains topographic, fine-grained details of the infant visual cortex. In the revision, we now state clearly that our results should not be taken as evidence that movies could replace retinotopy and have reworded parts of the manuscript that could mislead the reader in this regard.

      (2) We have added extensive details to the (admittedly) complex methods to make them more approachable. An example of this change is that we have reorganized the figure explaining the Shared Response Modelling methods to divide the analytic steps more clearly.

      (3) We have clarified the intermediate products contributing to the results by adding 6 supplementary figures that show the gradients for each IC or SRM movie and each infant participant.

      In response to the reviews, we have conducted several major analyses to support our findings further:

      (1) To verify that our analyses can identify fine-grained organization, we have manually traced and labeled adult data, and then performed the same analyses on them. The results from this additional dataset validate that these analyses can recover fine-grained organization of the visual cortex from movie data.

      (2) To further explore how visual maps derived from movies compare to alternative methods, we performed an anatomical alignment control analysis. We show that high-quality maps can be predicted from other participants using anatomical alignment.

      (3) To test the contribution of motion to the homotopy analyses, we regressed out the motion effects in these analyses. We found qualitatively similar results to our main analyses, suggesting motion did not play a substantial role.

      (4) To test the contribution of data quantity to the homotopy analyses, we correlated the amount of movie data collected from each participant with the homotopy results. We did not find a relationship between data quantity and the homotopy results. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ellis et al. investigated the functional and topographical organization of the visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, and elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in the visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses, but further evidence is necessary to support their claims and the study motivation needs refining, in light of prior research.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (Knapen, 2021).

      - Awake infant fMRI data are rare, time-consuming, and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      We are grateful to the reviewer for their clear and thoughtful description of the strengths of the paper, as well as their helpful outlining of areas we could improve.

      Weaknesses:

      - The Methods are at times difficult to understand and in some cases seem inappropriate for the conclusions drawn. For example, I believe that the movie-defined ICA components were validated using independent data from the retinotopy task, but this was a point of confusion among reviewers. 

      We acknowledge the complexity of the methods and wish to clarify them as best as possible for the reviewers and the readers. We have extensively revised the methods and results sections to help avoid potential misunderstandings. For instance, we have revamped the figure and caption describing the SRM pipeline (Figure 5).

      To answer the stated confusion directly, the ICA components were derived from the movie data and validated on the (completely independent) retinotopy data. There were no additional tasks. The following text in the paper explains this point:

      “To assess the selected component maps, we correlated the gradients (described above) of the task-evoked and component maps. This test uses independent data: the components were defined based on movie data and validated against task-evoked retinotopic maps.” Pg. 11

      In either case: more analyses should be done to support the conclusion that the components identified from the movie reproduce retinotopic maps (for example, by comparing the performance of movie-viewing maps to available alternatives (anatomical ROIs, group-defined ROIs). 

      Before addressing this suggestion, we want to restate our conclusions: features of the retinotopic organization of infant visual cortex could be predicted from movie data. We did not conclude that movie data could ‘reproduce’ retinotopic maps in the sense that they would be a replacement. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously23 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses27, here we find that functional alignment is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      As per the reviewer’s suggestion and alluded to in the paragraph above, we have created anatomically aligned visual maps, providing an analogous test to the betweenparticipant analyses like SRM. We find that these maps are highly similar to the ground truth. We describe this result in a new section of the results:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment > functional alignment: ∆Fisher Z M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment > functional alignment: ∆Fisher Z M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment > functional alignment: ∆Fisher Z

      M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment > functional alignment: ∆Fisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Also, the ROIs used for the homotopy analyses were defined based on the retinotopic task rather than based on movie-viewing data alone - leaving it unclear whether movie-viewing data alone can be used to recover functionally distinct regions within the visual cortex.

      We agree with the reviewer that our approach does not test whether movie-viewing data alone can be used to recover functionally distinct regions. The goal of the homotopy analyses was to identify whether there was functional differentiation of visual areas in the infant brain while they watch movies. This was a novel question that provides positive evidence that these regions are functionally distinct. In subsequent analyses, we show that when these areas are defined anatomically, rather than functionally, they also show differentiated function (e.g., Figure 2). Nonetheless, our intention was not to use the homotopy analyses to define the regions. We have added text to clarify the goal and novelty of this analysis.

      “Although these analyses cannot define visual maps, they test whether visual areas have different functional signatures.” Pg. 6

      Additionally, even if the goal were to define areas based on homotopy, we believe the power of that analysis would be questionable. We would need to use a large amount of the movie data to define the areas, leaving a low-powered dataset to test whether their function is differentiated by these movie-based areas.

      - The authors previously reported on retinotopic organization of the visual cortex in human infants (Ellis et al., 2021) and suggest that the feasibility of using movie-viewing experiments to recover these topographic maps is still in question. They point out that movies may not fully sample the stimulus parameters necessary for revealing topographic maps/areas in the visual cortex, or the time-resolution constraints of fMRI might limit the use of movie stimuli, or the rich, uncontrolled nature of movies might make them inferior to stimuli that are designed for retinotopic mapping, or might lead to variable attention between participants that makes measuring the structure of visual responses across individuals challenging. This motivation doesn't sufficiently highlight the importance or value of testing this question in infants. Further, it's unclear if/how this motivation takes into account prior research using movie-viewing fMRI experiments to reveal retinotopic organization in adults (e.g., Knapen, 2021). Given the evidence for retinotopic organization in infants and evidence for the use of movie-viewing experiments in adults, an alternative framing of the novel contribution of this study is that it tests whether retinotopic organization is measurable using a limited amount of movie-viewing data (i.e., a methodological stress test). The study motivation and discussion could be strengthened by more attention to relevant work with adults and/or more explanation of the importance of testing this question in infants (is the reason to test this question in infants purely methodological - i.e., as a way to negate the need for retinotopic tasks in subsequent research, given the time constraints of scanning human infants?).

      We are grateful to the reviewer for giving us the opportunity to clarify the innovations of this research. We believe that this research contributes to our understanding of how infants process dynamic stimuli, demonstrates the viability and utility of movie experiments in infants, and highlights the potential for new movie-based analyses (e.g., SRM). We have now consolidated these motivations in the introduction to more clearly motivate this work:

      “The primary goal of the current study is to investigate whether movie-watching data recapitulates the organization of visual cortex. Movies drive strong and naturalistic responses in sensory regions while minimizing task demands12, 13, 24 and thus are a proxy for typical experience. In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion25–27. Movies have been useful in awake infant fMRI for studying event segmentation28, functional alignment29, and brain networks30. However, this past work did not address the granularity and specificity of cortical organization that movies evoke. For example, movies evoke similar activity in infants in anatomically aligned visual areas28, but it remains unclear whether responses to movie content differ between visual areas (e.g., is there more similarity of function within visual areas than between31). Moreover, it is unknown whether structure within visual areas, namely visual maps, contributes substantially to visual evoked activity. Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity – rather than anatomy – and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses27, 32–34.” Pg. 3-4

      Furthermore, the introduction culminates in the following statement on what the analyses will tell us about the nature of movie-driven activity in infants:

      “These three analyses assess key indicators of the mature visual system: functional specialization between areas, organization within areas, and consistency between individuals.” Pg. 5

      Furthermore, in the discussion we revisit these motivations and elaborate on them further:

      [Regarding homotopy:] “This suggests that visual areas are functionally differentiated in infancy and that this function is shared across hemispheres31.” Pg. 19

      [Regarding ICA:] “This means that the retinotopic organization of the infant brain accounts for a detectable amount of variance in visual activity, otherwise components resembling these maps would not be discoverable.” Pg. 19–20

      [Regarding SRM:] “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45.” Pg. 21

      Additionally, we have expanded our discussion of relevant work that uses similar methods such as the excellent research from Knapen (2021) and others:

      “In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion25-27.” Pg. 4

      “We next explored whether movies can reveal fine-grained organization within visual areas by using independent components analysis (ICA) to propose visual maps in individual infant brains25,26,35,42,43.” Pg. 9

      Reviewer #2 (Public Review):

      Summary:

      This manuscript shows evidence from a dataset with awake movie-watching in infants, that the infant brain contains areas with distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. However, substantial new analyses would be required to support the novel claim that movie-watching data in infants can be used to identify retinotopic areas or to capture within-area functional organization.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. These data position the authors to test their novel claim, that movie-watching data in infants can be used to identify retinotopic areas.

      Weaknesses:

      To claim that movie-watching data can identify retinotopic regions, the authors should provide evidence for two claims:

      - Retinotopic areas defined based only on movie-watching data, predict retinotopic responses in independent retinotopy-task-driven data.

      - Defining retinotopic areas based on the infant's own movie-watching response is more accurate than alternative approaches that don't require any movie-watching data, like anatomical parcellations or shared response activation from independent groups of participants.

      We thank the reviewer for their comments. Before addressing their suggestions, we wish to clarify that we do not claim that movie data can be used to identify retinotopic areas, but instead that movie data captures components of the within and between visual area organization as defined by retinotopic mapping. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously23 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses27, here we find that functional alignment with infants is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      In response to the reviewer’s suggestion, we compare the maps identified by SRM to the averaged, anatomically aligned maps from infants. We find that these maps are highly similar to the task-based ground truth and we describe this result in a new section:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment < functional alignment: ∆Fisher Z M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment < functional alignment: ∆Fisher Z M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment < functional alignment: ∆Fisher Z

      M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment < functional alignment: ∆Fisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Note that we do not compare the anatomically aligned maps with the ICA maps statistically. This is because these analyses are not comparable: ICA is run within-participant whereas anatomical alignment is necessarily between-participant — either infant or adults. Nonetheless, an interested reader can refer to the Table where we report the results of anatomical alignment and see that anatomical alignment outperforms ICA in terms of the correlation between the predicted and task-based maps.

      Both of these analyses are possible, using the (valuable!) data that these authors have collected, but these are not the analyses that the authors have done so far. Instead, the authors report the inverse of (1): regions identified by the retinotopy task can be used to predict responses in the movies. The authors report one part of (2), shared responses from other participants can be used to predict individual infants' responses in the movies, but they do not test whether movie data from the same individual infant can be used to make better predictions of the retinotopy task data, than the shared response maps.

      So to be clear, to support the claims of this paper, I recommend that the authors use the retinotopic task responses in each individual infant as the independent "Test" data, and compare the accuracy in predicting those responses, based on:

      -  The same infant's movie-watching data, analysed with MELODIC, when blind experimenters select components for the SF and meridian boundaries with no access to the ground-truth retinotopy data.

      -  Anatomical parcellations in the same infant.

      -  Shared response maps from groups of other infants or adults.

      -  (If possible, ICA of resting state data, in the same infant, or from independent groups of infants).

      Or, possibly, combinations of these techniques.

      If the infant's own movie-watching data leads to improved predictions of the infant's retinotopic task-driven response, relative to these existing alternatives that don't require movie-watching data from the same infant, then the authors' main claim will be supported.

      These are excellent suggestions for additional analyses to test the suitability for moviebased maps to replace task-based maps. We hope it is now clear that it was never our intention to claim that movie-based data could replace task-based methods. We want to emphasize that the discoveries made in this paper — that movies evoke fine-grained organization in infant visual cortex — do not rely on movie-based maps being better than alternative methods for producing maps, such as the newly added anatomical alignment.

      The proposed analysis above solves a critical problem with the analyses presented in the current manuscript: the data used to generate maps is identical to the data used to validate those maps. For the task-evoked maps, the same data are used to draw the lines along gradients and then test for gradient organization. For the component maps, the maps are manually selected to show the clearest gradients among many noisy options, and then the same data are tested for gradient organization. This is a double-dipping error. To fix this problem, the data must be split into independent train and test subsets.

      We appreciate the reviewer’s concern; however, we believe it is a result of a miscommunication in our analytic strategy. We have now provided more details on the analyses to clarify how double-dipping was avoided. 

      To summarize, a retinotopy task produced visual maps that were used to trace both area boundaries and gradients across the areas. These data were then fixed and unchanged, and we make no claims about the nature of these maps in this paper, other than to treat them as the ground truth to be used as a benchmark in our analyses. The movie data, which are collected independently from the same infant in the session, used the boundaries from the retinotopy task (in the case of homotopy) or were compared with the maps from the retinotopy task (in the case of ICA and SRM). In other words, the statement that “the data used to generate maps is identical to the data used to validate those maps” is incorrect because we generated the maps with a retinotopy task and validated the maps with the movie data. This means no double dipping occurred.

      Perhaps a cause of the reviewer’s interpretation is that the gradients used in the analysis are not clearly described. We now provide this additional description:  “Using the same manually traced lines from the retinotopy task, we measured the intensity gradients in each component from the movie-watching data. We can then use the gradients of intensity in the retinotopy task-defined maps as a benchmark for comparison with the ICA-derived maps.” Pg. 10

      Regarding the SRM analyses, we take great pains to avoid the possibility of data contamination. To emphasize how independent the SRM analysis is, the prediction of the retinotopic map from the test participant does not use their retinotopy data at all; in fact, the predicted maps could be made before that participant’s retinotopy data were ever collected. To make this prediction for a test participant, we need to learn the inversion of the SRM, but this only uses the movie data of the test participant. Hence, there is no double-dipping in the SRM analyses. We have elaborated on this point in the revision, and we remade the figure and its caption to clarify this point:

      We also have updated the description of these results to emphasize how double-dipping was avoided:

      “We then mapped the held-out participant's movie data into the learned shared space without changing the shared space (Figure 5c). In other words, the shared response model was learned and frozen before the held-out participant’s data was considered.

      This approach has been used and validated in prior SRM studies45.” Pg. 14

      The reviewer suggests that manually choosing components from ICA is double-dipping. Although the reviewer is correct that the manual selection of components in ICA means that the components chosen ought to be good candidates, we are testing whether those choices were good by evaluating those components against the task-based maps that were not used for the ICA. Our statistical analyses evaluate whether the components chosen were better than the components that would have been chosen by random chance. Critically: all decisions about selecting the components happen before the components are compared to the retinotopic maps. Hence there is no double-dipping in the selection of components, as the choice of candidate ICA maps is not informed by the ground-truth retinotopic maps. We now clarify what the goal of this process is in the results:

      “Success in this process requires that 1) retinotopic organization accounts for sufficient variance in visual activity to be identified by ICA and 2) experimenters can accurately identify these components.” Pg. 10

      The reviewer also alludes to a concern that the researcher selecting the maps was not blind to the ground-truth retinotopic maps from participants and this could have influenced the results. In such a scenario, the researcher could have selected components that have the gradients of activity in the places that the infant has as ground truth. The researcher who made the selection of components (CTE) is one of the researchers who originally traced the areas in the participants approximately a year prior to the identification of ICs. The researcher selecting the components didn’t use the ground-truth retinotopic maps as reference, nor did they pay attention to the participant IDs when sorting the IC components. Indeed, they weren’t trying to find participants-specific maps per se, but rather aimed to find good candidate retinotopic maps in general. In the case of the newly added adult analyses, the ICs were selected before the retinotopic mapping was reviewed or traced; hence, no knowledge about the participant-specific ground truth could have influenced the selection of ICs. Even with this process from adults, we find results of comparable strength as we found in infants, as shown in Figure S3. Nonetheless, there is a possibility that this researcher’s previous experience of tracing the infant maps could have influenced their choice of components at the participant-specific level. If so, it was a small effect since the components the researcher selected were far from the best possible options (i.e., rankings of the selected components averaged in the 64th percentile for spatial frequency maps and the 68th percentile for meridian maps). We believe all reasonable steps were taken to mitigate bias in the selection of ICs.

      Reviewer #3 (Public Review):

      The manuscript reports data collected in awake toddlers recording BOLD while watching videos. The authors analyse the BOLD time series using two different statistical approaches, both very complex but do not require any a priori determination of the movie features or contents to be associated with regressors. The two main messages are that 1) toddlers have occipital visual areas very similar to adults, given that an SRM model derived from adult BOLD is consistent with the infant brains as well; 2) the retinotopic organization and the spatial frequency selectivity of the occipital maps derived by applying correlation analysis are consistent with the maps obtained by standard and conventional mapping.

      Clearly, the data are important, and the author has achieved important and original results. However, the manuscript is totally unclear and very difficult to follow; the figures are not informative; the reader needs to trust the authors because no data to verify the output of the statistical analysis are presented (localization maps with proper statistics) nor so any validation of the statistical analysis provided. Indeed what I think that manuscript means, or better what I understood, may be very far from what the authors want to present, given how obscure the methods and the result presentation are.

      In the present form, this reviewer considers that the manuscript needs to be totally rewritten, the results presented each technique with appropriate validation or comparison that the reader can evaluate.

      We are grateful to the reviewer for the chance to improve the paper. We have broken their review into three parts: clarification of the methods, validation of the analyses, and enhancing the visualization.

      Clarification of the methods

      We acknowledge that the methods we employed are complex and uncommon in many fields of neuroimaging. That said, numerous papers have conducted these analyses on adults (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017) and non-human primates (Arcaro & Livingstone, 2017; Moeller et al., 2009). We have redoubled our efforts in the revision to make the methods as clear as possible, expanding on the original text and providing intuitions where possible. These changes have been added throughout and are too vast in number to repeat here, especially without context, but we hope that readers will have an easier time following the analyses now. 

      Additionally, we updated Figures 3 and 5 in which the main ICA and SRM analyses are described. For instance, in Figure 3’s caption we now add details about how the gradient analyses were performed on the components: 

      “We used the same lines that were manually traced on the task-evoked map to assess the change in the component’s response. We found a monotonic trend within area from medial to lateral, just like we see in the ground truth.” Pg. 11

      Regarding Figure 5, we reconsidered the best way to explain the SRM analyses and decided it would be helpful to partition the diagram into steps, reflecting the analytic process. These updates have been added to Figure 5, and the caption has been updated accordingly.

      We hope that these changes have improved the clarity of the methods. For readers interested in learning more, we encourage them to either read the methods-focused papers that debut the analyses (e.g., Chen et al., 2015), read the papers applying the methods (e.g., Guntupalli et al., 2016), or read the annotated code we publicly release which implements these pipelines and can be used to replicate the findings.

      Validation of the analyses

      One of the requests the reviewer makes is to validate our analyses. Our initial approach was to lean on papers that have used these methods in adults or primates (e.g., Arcaro,

      & Livingstone, 2017; Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Moeller et al., 2009) where the underlying organization and neurophysiology is established. However, we have made changes to these methods that differ from their original usage (e.g., we used SRM rather than hyperalignment, we use meridian mapping rather than traveling wave retinotopy, we use movie-watching data rather than rest). Hence, the specifics of our design and pipeline warrant validation. 

      To add further validation, we have rerun the main analyses on an adult sample. We collected 8 adult participants who completed the same retinotopy task and a large subset of the movies that infants saw. These participants were run under maximally similar conditions to infants (i.e., scanned using the same parameters and without the top of the head-coil) and were preprocessed using the same pipeline. Given that the relationship between adult visual maps and movie-driven (or resting-state) analyses has been shown in many studies (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017), these adult data serve as a validation of our analysis pipeline. These adult participants were included in the original manuscript; however, they were previously only used to support the SRM analyses (i.e., can adults be used to predict infant visual maps). The adult results are described before any results with infants, as a way to engender confidence. Moreover, we have provided new supplementary figures of the adult results that we hope will be integrated with the article when viewing it online, such that it will be easy to compare infant and adult results, as per the reviewer’s request. 

      As per the figures and captions below, the analyses were all successful with the adult participants: 1) Homotopic correlations are higher than correlations between comparable areas in other streams or areas that are more distant within stream. 2) A multidimensional scaling depiction of the data shows that areas in the dorsal and ventral stream are dissimilar. 3) Using independent components analysis on the movie data, we identified components that are highly correlated with the retinotopy task-based spatial frequency and meridian maps. 4) Using shared response modeling on the movie data, we predicted maps that are highly correlated with the retinotopy task-based spatial frequency and meridian maps.

      These supplementary analyses are underpowered for between-group comparisons, so we do not statistically compare the results between infants and adults. Nonetheless, the pattern of adult results is comparable overall to the infant results. 

      We believe these adult results provide a useful validation that the infant analyses we performed can recover fine-grained organization.

      The reviewer raises an additional concern about the lack of visualization of the results. We recognize that the plots of the summary statistics do not provide information about the intermediate analyses. Indeed, we think the summary statistics can understate the degree of similarity between the components or predicted visual maps and the ground truth. Hence, we have added 6 new supplementary figures showing the intensity gradients for the following analyses: 1. spatial frequency prediction using ICA, 2. meridian prediction using ICA, 3. spatial frequency prediction using infant SRM, 4.

      meridian prediction using infant SRM, 5. spatial frequency prediction using adult SRM, and 6. meridian prediction using adult SRM.

      We hope that these visualizations are helpful. It is possible that the reviewer wishes us to also visually present the raw maps from the ICA and SRM, akin to what we show in Figure 3A and 3B. We believe this is out of scope of this paper: of the 1140 components that were identified by ICA, we selected 36 for spatial frequency and 17 for meridian maps. We also created 20 predicted maps for spatial frequency and 20 predicted meridian maps using SRM. This would result in the depiction of 93 subfigures, requiring at least 15 new full-page supplementary figures to display with adequate resolution. Instead, we encourage the reader to access this content themselves: we have made the code to recreate the analyses publicly available, as well as both the raw and preprocessed data for these analyses, including the data for each of these selected maps.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) As mentioned in the public review, the authors should consider incorporating relevant adult fMRI research into the Introduction and explain the importance of testing this question in infants.

      Our public response describes the several citations to relevant adult research we have added, and have provided further motivation for the project.

      (2) The authors should conduct additional analyses to support their conclusion that movie data alone can generate accurate retinotopic maps (i.e., by comparing this approach to other available alternatives).

      We have clarified in our public response that we did not wish to conclude that movie data alone can generate accurate retinotopic maps, and have made substantial edits to the text to emphasize this. Thus, because this claim is already not supported by our analyses, we do not think it is necessary to test it further.

      (3) The authors should re-do the homotopy analyses using movie-defined ROIs (i.e., by splitting the movie-viewing data into independent folds for functional ROI definition and analyses).

      As stated above, defining ROIs based on the movie content is not the intended goal of this project. Even if that were the general goal, we do not believe that it would be appropriate to run this specific analysis with the data we collected. Firstly, halving the data for ROI definition (e.g., using half the movie data to identify and trace areas, and then use those areas in the homotopy analysis to run on the other half of data) would qualitatively change the power of the analyses described here. Secondly, we would be unable to define areas beyond hV4/V3AB with confidence, since our retinotopic mapping only affords specification of early visual cortex. Thus we could not conduct the MDS analyses shown in Figure 2.

      (4) If the authors agree that a primary contribution of this study and paper is to showcase what is possible to do with a limited amount of movie-viewing data, then they should make it clearer, sooner, how much usable movie data they have from infants. They could also consider conducting additional analyses to determine the minimum amount of fMRI data necessary to reveal the same detailed characteristics of functional responses in the visual cortex.

      We agree it would be good to highlight the amount of movie data used. When the infant data is first introduced in the results section, we now state the durations:

      “All available movies from each session were included (Table S2), with an average duration of 540.7s (range: 186--1116s).” Pg. 5

      Additionally, we have added a homotopy analysis that describes the contribution of data quantity to the results observed. We compare the amount of data collected with the magnitude of same vs. different stream effect (Figure 1B) and within stream distance effect (Figure 1C). We find no effect of movie duration in the sample we tested, as reported below:

      “We found no evidence that the variability in movie duration per participant correlated with this difference [of same stream vs. different stream] (r=0.08, p=.700).” Pg. 6-7

      “There was no correlation between movie duration and the effect (Same > Adjacent: r=-

      0.01, p=.965, Adjacent > Distal: r=-0.09, p=.740).” Pg. 7

      (5) If any of the methodological approaches are novel, the authors should make this clear. In particular, has the approach of visually inspecting and categorizing components generated from ICA and movie data been done before, in adults/other contexts?

      The methods we employed are similar to others, as described in the public review.

      However, changes were necessary to apply them to infant samples. For instance, Guntupalli et al. (2016) used hyperalignment to predict the visual maps of adult participants, whereas we use SRM. SRM and hyperalignment have the same goal — find a maximally aligned representation between participants based on brain function — but their implementation is different. The application of functional alignment to infants is novel, as is their use in movie data that is relatively short by comparison to standard adult data. Indeed, this is the most thorough demonstration that SRM — or any functional alignment procedure — can be usefully applied to infant data, awake or sleeping. We have clarified this point in the discussion.

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45, which may prove especially useful for infant fMRI52.” Pg. 21

      (6) The authors found that meridian maps were less identifiable from ICA and movie data and suggest that this may be because these maps are more susceptible to noise or gaze variability. If this is the case, you might predict that these maps are more identifiable in adult data. The authors could consider running additional analyses with their adult participants to better understand this result.

      As described in the manuscript, we hypothesize that meridian maps are more difficult to identify than spatial frequency maps because meridian maps are a less smooth, more fine-grained map than spatial frequency. Indeed, it has previously been reported (Moeller et al., 2009) that similar procedures can result in meridian maps that are constituted by multiple independent components (e.g., a component sensitive to horizontal orientations, and a separate component sensitive to vertical components). Nonetheless, we have now conducted the ICA procedure on adult participants and again find it is easier to identify spatial frequency components compared to meridian maps, as reported in the public review.

      Minor corrections:

      (1) Typo: Figure 3 title: "Example retintopic task vs. ICA-based spatial frequency maps.".

      Fixed

      (2) Given the age range of the participants, consider using "infants and toddlers"? (Not to diminish the results at all; on the contrary, I think it is perhaps even more impressive to obtain awake fMRI data from ~1-2-year-olds). Example: Figure 3 legend: "A) Spatial frequency map of a 17.1-monthold infant.".

      We agree with the reviewer that there is disagreement about the age range at which a child starts being considered a toddler. We have changed the terms in places where we refer to a toddler in particular (e.g., the figure caption the reviewer highlights) and added the phrase “infants and toddlers” in places where appropriate. Nonetheless, we have kept “infants” in some places, particularly those where we are comparing the sample to adults. Adding “and toddlers” could imply three samples being compared which would confuse the reader.

      (3) Figure 6 legend: The following text should be omitted as there is no bar plot in this figure: "The bar plot is the average across participants. The error bar is the standard error across participants.".

      Fixed

      (4) Table S1 legend: Missing first single quote: Runs'.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      I request that this paper cite more of the existing literature on the fMRI of human infants and toddlers using task-driven and resting-state data. For example, early studies by (first authors) Biagi, Dehaene-Lambertz, Cusack, and Fransson, and more recent studies by Chen, Cabral, Truzzi, Deen, and Kosakowski.

      We have added several new citations of recent task-based and resting state studies to the second sentence of the main text:

      “Despite the recent growth in infant fMRI1-6, one of the most important obstacles facing this research is that infants are unable to maintain focus for long periods of time and struggle to complete traditional cognitive tasks7.”

      Reviewer #3 (Recommendations For The Authors):

      In the following, I report some of my main perplexities, but many more may arise when the material is presented more clearly.

      The age of the children varies from 5 months to about 2 years. While the developmental literature suggests that between 1 and 2 years children have a visual system nearly adult-like, below that age some areas may be very immature. I would split the sample and perhaps attempt to validate the adult SRM model with the youngest children (and those can be called infants).

      We recognize the substantial age variability in our sample, which is why we report participant-specific data in our figures. While splitting up the data into age bins might reveal age effects, we do not think we can perform adequately powered null hypothesis testing of the age trend. In order to investigate the contribution of age, larger samples will be needed. That said, we can see from the data that we have reported that any effect of age is likely small. To elaborate: Figures 4 and 6 report the participant-specific data points and order the participants by age. There are no clear linear trends in these plots, thus there are no strong age effects.

      More broadly, we do not think there is a principled way to divide the participants by age. The reviewer suggests that the visual system is immature before the first year of life and mature afterward; however, such claims are the exact motivation for the type of work we are doing here, and the verdict is still out. Indeed, the conclusion of our earlier work reporting retinotopy in infants (Ellis et al., 2021) suggests that the organization of the early visual cortex in infants as young as 5 months — the youngest infant in our sample — is surprisingly adult-like.

      The title cannot refer to infants given the age span.

      There is disagreement in the field about the age at which it is appropriate to refer to children as infants. In this paper, and in our prior work, we followed the practice of the most attended infant cognition conference and society, the International Congress of Infant Studies (ICIS), which considers infants as those aged between 0-3 years old, for the purposes of their conference. Indeed, we have never received this concern across dozens of prior reviews for previous papers covering a similar age range. That said, we understand the spirit of the reviewer’s comment and now refer to the sample as “infants and toddlers” and to older individuals in our sample as “toddlers” wherever it is appropriate (the younger individuals would fairly be considered “infants” under any definition).

      Figure 1 is clear and an interesting approach. Please also show the average correlation maps on the cortical surface.

      While we would like to create a figure as requested, we are unsure how to depict an area-by-area correlation map on the cortical surface. One option would be to generate a seed-based map in which we take an area and depict the correlation of that seed (e.g., vV1) with all other voxels. This approach would result in 8 maps for just the task-defined areas, and 17 maps for anatomically-defined areas. Hence, we believe this is out of scope of this paper, but an interested reader could easily generate these maps from the data we have released.

      Figure 2 results are not easily interpretable. Ventral and dorsal V1-V3 areas represent upper or lower VF respectively. Higher dorsal and ventral areas represent both upper and lower VF, so we should predict an equal distance between the two streams. Again, how can we verify that it is not a result of some artifacts?

      In adults, visual areas differ in their functional response properties along multiple dimensions, including spatial coding. The dorsal/ventral stream hypothesis is derived from the idea that areas in each stream support different functions, independent of spatial coding. The MDS analysis did not attempt to isolate the specific contribution of spatial representations of each area but instead tested the similarity of function that is evoked in naturalistic viewing. Other covariance-based analyses specifically isolate the contribution of spatial representations (Haak et al., 2013); however, they use a much more constrained analysis than what was implemented here. The fact that we find broad differentiation of dorsal and ventral visual areas in infants is consistent with adults (Haak & Beckman, 2018) and neonate non-human primates (Arcaro & Livingstone, 2017). 

      Nonetheless, we recognize that we did not mention the differences in visual field properties across areas and what that means. If visual field properties alone drove the functional response then we would expect to see a clustering of areas based on the visual field they represent (e.g., hV4 and V3AB should have similar representations). Since we did not see that, and instead saw organization by visual stream, the result is interesting and thus warrants reporting. We now mention this difference in visual fields in the manuscript to highlight the surprising nature of the result.

      “This separation between streams is striking when considering that it happens despite differences in visual field representations across areas: while dorsal V1 and ventral V1 represent the lower and upper visual field, respectively, V3A/B and hV4 both have full visual field maps. These visual field representations can be detected in adults41; however, they are often not the primary driver of function39. We see that in infants too: hV4 and V3A/B represent the same visual space yet have distinct functional profiles.” Pg. 8

      The reviewer raises a concern that the MDS result may be spurious and caused by noise. Below, we present three reasons why we believe these results are not accounted for by artifacts but instead reflect real functional differentiation in the visual cortex. 

      (1) Figure 2 is a visualization of the similarity matrix presented in Figure S1. In Figure S1, we report the significance testing we performed to confirm that the patterns differentiating dorsal and ventral streams — as well as adjacent areas from distal areas — are statistically reliable across participants. If an artifact accounted for the result then it would have to be a kind of systematic noise that is consistent across participants.

      (2) One of the main sources of noise (both systematic and non-systematic) with infant fMRI is motion. Homotopy is a within-participant analysis that could be biased by motion. To assess whether motion accounts for the results, we took a conservative approach of regressing out the framewise motion (i.e., how much movement there is between fMRI volumes) from the comparisons of the functional activity in regions. Although the correlations numerically decreased with this procedure, they were qualitatively similar to the analysis that does not regress out motion:

      “Additionally, if we control for motion in the correlation between areas --- in case motion transients drive consistent activity across areas --- then the effects described here are negligibly different (Figure S5).” Pg. 7

      (3) We recognize that despite these analyses, it would be helpful to see what this pattern looks like in adults where we know more about the visual field properties and the function of dorsal and ventral streams. This has been done previously (e.g., Haak & Beckman, 2018), but we have now run those analyses on adults in our sample, as described in the public review. As with infants, there are reliable differences in the homotopy between streams (Figure S1). The MDS results show that the adult data was more complex than the infant data, since it was best described by 3 dimensions rather than 2. Nonetheless, there is a rotation of the MDS such that the structure of the ventral and dorsal streams is also dissociable. 

      Figure 3 also raises several alternative interpretations. The spatial frequency component in B has strong activity ONLY at the extreme border of the VF and this is probably the origin of the strong correlation. I understand that it is only one subject, but this brings the need to show all subjects and to report the correlation. Also, it is important to show the putative average ICA for retinotopy and spatial frequencies across subjects and for adults. All methods should be validated on adults where we have clear data for retinotopy and spatial frequency.

      The reviewer notes that the component in Figure 3 shows strong negative response in the periphery. It is often the case, as reported elsewhere (Moeller et al., 2009), that ICA extracts portions of visual maps. To make a full visual map would require combining components into a composite (e.g., a component that has a high response in the periphery and another component that has a high response in the fovea). If we were to claim that this component, or others like it, could replace the need for retinotopic mapping, then we would want to produce these composite maps; however, our conclusion in this project is that the topographic information of retinotopic maps manifest in individual components of ICA. For this purpose, the analysis we perform adequately assesses this topography.

      Regarding the request to show the results for all subjects, we address this in the public response and repeat it here briefly: we have added 6 new figures to show results akin to Figure 3C and D. It is impractical to show the equivalent of Figure 3A and B for all participants, yet we do release the data necessary to see to visualize these maps easily.

      Finally, the reviewer suggests that we validate the analyses on adult participants. As shown in Figure S3 and reported in the public response, we now run these analyses on adult participants and observe qualitatively similar results to infants.

      How much was the variation in the presumed spatial frequency map? Is it consistent with the acuity range? 5-month-old infants should have an acuity of around 10c/deg, depending on the mean luminance of the scene.

      The reviewer highlights an important weakness of conducting ICA: we cannot put units on the degree of variation we see in components. We now highlight this weakness in the discussion:

      “Another limitation is that ICA does not provide a scale to the variation: although we find a correlation between gradients of spatial frequency in the ground truth and the selected component, we cannot use the component alone to infer the spatial frequency selectivity of any part of cortex. In other words, we cannot infer units of spatial frequency sensitivity from the components alone.” Pg. 20

      Figure 5 pipeline is totally obscure. I presumed that I understood, but as it is it is useless. All methods should be clearly described, and the intermediate results should be illustrated in figures and appropriately discussed. Using such blind analyses in infants in principle may not be appropriate and this needs to be verified. Overall all these techniques rely on correlation activities that are all biased by head movement, eye movement, and probably the dummy sucking. All those movements need to be estimated and correlated with the variability of the results. It is a strong assumption that the techniques should work in infants, given the presence of movements.

      We recognize that the SRM methods are complex. Given this feedback, we remade Figure 5 with explicit steps for the process and updated the caption (as reported in the public review).

      Regarding the validation of these methods, we have added SRM analyses from adults and find comparable results. This means that using these methods on adults with comparable amounts of data as what we collected from infants can predict maps that are highly similar to the real maps. Even so, it is not a given that these methods are valid in infants. We present two considerations in this regard. 

      First, as part of the SRM analyses reported in the manuscript, we show that control analyses are significantly worse than the real analyses (indicated by the lines on Figure 6). To clarify the control analysis: we break the mapping (i.e., flip the order of the data so that it is backwards) between the test participant and the training participants used to create the SRM. The fact that this control analysis is significantly worse indicates that SRM is learning meaningful representations that matter for retinotopy. 

      Second, we believe that this paper is a validation of SRM for infants. Infant fMRI is a nascent field and SRM has the potential to increase the signal quality in this population. We hope that readers will see these analyses as a proof of concept that SRM can be used in their work with infants. We have stated this contribution in the paper now.

      “Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity -- rather than anatomy -- and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses27,32-34.” Pg. 4

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45.” Pg. 21

      Regarding the reviewer’s concern that motion may bias the results, we wish to emphasize the nature of the analyses being conducted here: we are using data from a group of participants to predict the neural responses in a held-out participant. For motion to explain consistency between participants, the motion would need to be timelocked across participants. Even if motion was time-locked during movie watching, motion will impair the formation of an adequate model that can contain retinotopic information. Thus, motion should only hurt the ability for a shared response to be found that can be used for predicting retinotopic maps. Hence, the results we observed are despite motion and other sources of noise.

      What is M??? is it simply the mean value??? If not, how it is estimated?

      M is an abbreviation for mean. We have now expanded the abbreviation the first time we use it.

      Figure 6 should be integrated with map activity where the individual area correlation should be illustrated. Probably fitting SMR adult works well for early cortical areas, but not for more ventral and associative, and the correlation should be evaluated for the different masks.

      With the addition of plots showing the gradients for each participant and each movie (Figures S10–S13) we hope we have addressed this concern. We additionally want to clarify that the regions we tested in the analysis in Figure 6 are only the early visual areas V1, V2, V3, V3A/B, and hV4. The adult validation analyses show that SRM works well for predicting the visual maps in these areas. Nonetheless, it is an interesting question for future research with more extensive retinotopic mapping in infants to see if SRM can predict maps beyond extrastriate cortex.

      Occipital masks have never been described or shown.

      The occipital mask is from the MNI probabilistic structural atlas (Mazziotta et al., 2001), as reported in the original version and is shared with the public data release. We have added the additional detail that the probabilistic atlas is thresholded at 0% in order to be liberally inclusive. 

      “We used the occipital mask from the MNI structural atlas63 in standard space -- defined liberally to include any voxel with an above zero probability of being labelled as the occipital lobe -- and used the inverted transform to put it into native functional space.” Pg. 27–28

      Methods lack the main explanation of the procedures and software description.

      We hope that the additions we have made to address this reviewer’s concerns have provided better explanations for our procedures. Additionally, as part of the data and code release, we thoroughly explain all of the software needed to recreate the results we have observed here.

    1. Thus, Massachusetts, in 1786, passed a law similar to the colonial one of which we have spoken. The law of 1786, like the law of 1705, forbids the marriage of any white person with any negro, Indian, or mulatto, and inflicts a penalty of fifty pounds upon any one who shall join them in marriage; and declares all such marriage absolutely null and void, and degrades thus the unhappy issue of the marriage by fixing upon it the stain of bastardy. And this mark of degradation was renewed, and again impressed upon the race, in the careful and deliberate preparation of their revised code published in 1836. This code forbids any person from joining in marriage any white person with any Indian, negro, or mulatto, and subjects the party who shall offend in this respect, to imprisonment, not exceeding six months, in the common jail, or to hard labor, and to a fine of not less than fifty nor more than two hundred dollars; and, like the law of 1786, it declares the marriage to be absolutely null and void. It will be seen that the punishment is increased by the code upon the person who shall marry them, by adding imprisonment to a pecuniary penalty.

      Prior to my last annotation regarding about marriage between Negroes and white people being prohibited, In Massachusetts a law was passed similar to the colonial one which I had annotated on top. This was a law that was passed and prohibited and punishable to even be imprisoned , hard labor or even a fine.

    1. Summary: Markdown is a simple and lightweight markup language. Text files with .md extension can be parsed to HTML code with this markup. Basic features:-

      Heading

      Horizontal Rule


      Paragraphs

      This is text.

      Linebreaks

      This is a line. <br> Another line.

      Lists

      1. item 1

      2. item 2

      Formatting

      bold italics bolditalics

      Block Quotes

      This is a quote

      Code Blocks

      code

      Links

      link

      References

      Here is a citation

      PS: There is no native feature to comment in markdown

    1. Reservations Reservations Make a Reservation View / Modify / Cancel Get e-Receipt Avis TripIt Service Avis PreCheck Offers Offers US Offers International Offers Partner Offers Locations Locations Find a Location Top Airport Car Rental Locations All US Locations All Global Locations Travel Guides Chicago Las Vegas Los Angeles Cars & Services Cars & Services Avis App Car Guide Avis Signature Series Electric Car Rental Products & Services Protections & Coverages Miles, Points & Partners Long Term Car Rental Meetings & Groups Car Rentals Business Rentals Business Rentals Small & Mid-Sized Business Car Rental Affiliate Program Car Sales Car Sales Avis Car Sales RubyCar Help Help Customer Service US FAQs Worldwide Telephone Numbers TTY/TDD Information GET E-RECEIPT Avis Preferred Sign Up Log In Welcome, WELCOME, Feedback STAY IN COMPLIANCE WITH YOUR COMPANY POLICY Are you still working for or associated with ? YES I AM NO I'M NOT Continue Continue var usaaUpsellCode = ""; var usaaBenefitsList = ""; usaaBenefitsList = usaaBenefitsList.replaceAll("</p>","|"); var usaaLogoImage = ""; var aarpLogoImage = ""; var aarpUpsellCode = ""; var aarpBenefitsList = ""; aarpBenefitsList = aarpBenefitsList.replaceAll("</p>","</p>|"); Member Benefits Terms Best Rate GuaranteeCreate or log in to your Avis.com account to get the best rate.Exceptions:Car rental rates from other car rental companies (including but not limited to Avis, National, Enterprise, Alamo, Sixt, Dollar, Payless, etc) do not qualify.Rates obtained through the use of discounts, coupons, upgrade offers, pre-negotiated (e.g., group, government, corporate, tour, insurance replacement rentals) or similar rates do not qualify.Car rental included as part of a package rate (e.g. airfare + hotel + car rental, hotel + car rental, airfare + car rental) does not qualify.Car rental rates found on an auction or wholesale websites which do not display the name of the car rental company until after purchase, do not qualify.Car rental rates obtained from a website that requires a member login order to obtain the rate do not qualify. Free Day Earned on 3rd Day MinimumOffer of one day free of the daily time and mileage charges on an intermediate (group C) through a full-size four-door (group E) car. Taxes, concession recovery fee, customer facility charges ($10/contract in CA) and fuel charges are extra. Optional items such as LDW and other surcharges may apply and are extra. Offer valid on minimum three-day rental. The renter is responsible for any additional time and mileage charges over one day. Coupon cannot be used for one-way rentals. One coupon per rental. Offer may not be used in conjunction with any other coupon, promotion or offer. Coupon valid at Avis locations in the contiguous U.S. (excluding the New York Metro area). Holiday and other blackout periods may apply. If a rental begins during a blackout period, the whole rental is blacked out and does not qualify for use of coupon. An advanced reservation is required. Customer must provide profile number associated with the coupon. Reservation must be cancelled by rental date, or coupon will be used. Offer subject to vehicle availability at time of reservation and may not be available on some rates at some times. For reservations made on Avis.com, free day will be applied at time of rental. Avis reserves the right to alter the terms and conditions and use of coupons. Avis reserves the right to refuse or expire coupons at any time without prior notification. Coupons cannot be applied to completed rentals. Renter must meet Avis age, driver and credit requirements. Minimum age may vary by location. An additional daily surcharge may apply for renters under 25 years old. Rental must begin on or before 12/31/24. $("a[data-dismiss-modal=tncmodal]").click(function(){ $('#inner-tnc-modal').modal('hide'); setTimeout(function () { $("body").addClass("modal-open") }, 200); }); var preferredCarClass = ""; var redirectionCountryList = "MX|https:\/\/www.avislac.com\/|Are you located in Mexico?|Visit our Avis website for Latin America and the Caribbean.|Avis Latinoamérica|Avis.com,EC|https:\/\/www.avislac.com\/|Are you located in Ecuador?|Visit our Avis website for Latin America and the Caribbean.|Avis Latinoamérica|Avis.com,CO|https:\/\/www.avislac.com\/|Are you located in Colombia?|Visit our Avis website for Latin America and the Caribbean.|Avis Latinoamérica|Avis.com,BR|https:\/\/brasil.avislac.com\/|Are you located in Brazil?|Visit our Avis website for Latin America and the Caribbean.|Avis Brazil|Avis.com,AR|https:\/\/www.avislac.com\/|Are you located in Argentina?|Visit our Avis website for Latin America and the Caribbean.|Avis Latin America|Avis.com,PE|https:\/\/www.avislac.com\/|Are you located in Peru?|Visit our Avis website for Latin America and the Caribbean.|Avis Latin America|Avis.com,CL|https:\/\/www.avislac.com\/|Are you located in Chile?|Visit our Avis website for Latin America and the Caribbean.|Avis Latin America|Avis.com,LB|https:\/\/www.avis.com.lb\/|Are you located in Lebanon?|Visit our Avis website for Lebanon.|Avis Lebanon|Avis.com,GB|https:\/\/www.avis.co.uk\/|Are you located in the UK?|Visit our Avis website for the UK.|Avis UK|Avis.com"; var locCode = ""; var solidBackgroundImage = ""; var checknewui = "true"; var geoType = ""; var iata = ""; var promoCouponFlag = ''; var promoCoupon = ''; var counterProd = ""; var protectionsAndCoverages = ""; var discountNumber = ""; var bundlePackage = ""; var couponCode =""; var alternateCarList=""; if(alternateCarList !=''){ alternateCarList= alternateCarList.split(','); } Dictionary.I18n.saveOrUpdate({"":"","err.res.checkout.email.required":"\u003cstrong\u003eEmail Address\u003c/strong\u003e is a required field.","lbl.res.redirect.redirectURLButton.EC,avis":"Avis Latin America","lbl.res.redirect.redirectMessage.CO,avis":"Visit our Avis website for Latin America and the Caribbean.","lbl.res.redirect.redirectURLButton.UK,avis":"Avis.co.uk","lbl.res.redirect.redirectURLButton.MX,avis":"Avis Latin America","msg.res.unitedStates":"United States","inf.res.message.keeptrying":"Keep typing to refine search","res.step3.extras.button.continue":"Continue","lbl.global.PartialDiscount.USAA,avis":"Your USAA member discount has been applied","lbl.res.sugg.city":"City Suggestions","lbl.global.PartialDiscount.AARP,avis":"Your AARP member discount has been applied","lbl.global.partnerlabel1,avis":" member number available","err.guidedRes.category1":"selection","lbl.res.redirect.redirectMessage.BR,avis":"Visit our Avis website for Latin America and the Caribbean.","lbl.global.partnerMember,avis":" Member #","lbl.res.VehicleAvailability.allVehicles":"All Vehicles","lbl.res.redirect.stayOnAvisNz,avis":"Stay on Avis.co.nz","lbl.res.redirect.locationMessage.BR,avis":"Are you located in Brazil?","msg.res.returnLocation":"Return to same location","lbl.global.timeout.sessionTimeoutAlert":"Your session has expired. Please click OK to start your search again.","00032":"The vehicle you have selected is unavailable at the pick-up location you have entered. Either select a different location or choose a below car to continue..","lbl.res.redirect.redirectMessage.EC,avis":"Visit our Avis website for Latin America and the Caribbean.","err.guidedRes.category":"Please make a ","lbl.res.date.selectReturnDate":"Select Return Date","lbl.res.memberRate,avis":" Member #","lbl.global.partnerlabel,avis":"I don\u0027t have my ","lbl.res.vehicleTypeValue":"Vehicle Type *","lbl.res.redirect.redirectURLButton.CO,avis":"Avis Latin America","lbl.res.memberRates":"Member Rates","inf.res.message.results":"results","lbl.res.redirect.locationMessage.MX,avis":"Are you located in Mexico?","lbl.res.sugg.poi":"Points of Interest","lbl.res.redirect.stayOnAvisAu,avis":"Stay on Avis.com.au","lbl.global.amazonBenefits":"Amazon Benefits","lbl.res.redirect.locationMessage.EC,avis":"Are you located in Ecuador?","lbl.res.redirect.redirectMessage.UK,avis":"You may be interested in visiting the Avis UK website to make your booking.","err.guidedRes.tripselectionerror":"Please select the purpose of your trip","inf.res.message.viewmore":"View More","lbl.res.date.selectPickUpDate":"Select Pick-up Date","lbl.global.couponCode":"Coupon Code","lbl.res.redirect.redirectMessage.MX,avis":"Visit our Avis website for Latin America and the Caribbean.","inf.res.message.searching":"Searching...","lbl.res.redirect.stayOnAvisCom,avis":"Stay on Avis.com","inf.res.message.searchresfound":"search results found","msg.res.pickupLocation":"Enter your pick-up location or zip code","msg.res.defaultTime":"noon","lbl.res.redirect.redirectURLButton.BR,avis":"Avis Latin America","lbl.global.partner.discountmsg,avis":"Please enter valid membership number.","msg.res.midnight":"midnight","lbl.res.redirect.stayOnAvisCa,avis":"Stay on Avis.ca","lbl.res.redirect.locationMessage.UK,avis":"Are you located in United Kingdom?","lbl.res.redirect.locationMessage.CO,avis":"Are you located in Colombia?","lbl.global.PartialDiscount.Costco,avis":"Your Costco member discount has been applied","inf.res.message.noresultfound":"No results found"}) FIND YOUR BEST CAR RENTAL WITH AVIS   var Campaign= Campaign || {};Campaign['EZM5G_content_dam_avis_na_us_common_locations_FeaturedImage_2x_png']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/webp"},{"width":1220,"height":800,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.webp","defaultFileMimeType":"image/png"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/png"},{"width":2440,"height":1600,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.webp","defaultFileMimeType":"image/webp"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]}; EXPLORE document.addEventListener("DOMContentLoaded", function(){ var waitUntilAngularReady = waitUntilAngularReady || {}; (function(randomString) { waitUntilAngularReady[randomString] = setInterval(function() { var scope = angular.element($('.mainContainer')).injector() .get('$rootScope'); if (angular.isDefined(window.angular) && angular.isDefined(scope) && angular.isDefined(scope.recompile)) { clearInterval(waitUntilAngularReady[randomString]); scope.recompile(); } }, 1500); })(Math.random().toString(36).substring(2, 15)+ Math.random().toString(36).substring(2, 15)); }); FIND YOUR BEST CAR RENTAL WITH AVIS     Your corporate discount code is invalid. Please have your travel manager verify the discount code in SAP Concur. We are sorry, the site has not properly responded to your request. Please try again. If the problem persists, please Contact Us .<> Reference Number <> Your Member Benefits Have Been Applied!   |   Start Your Reservation Below. Terms Apply Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Age: 25+24232221201918 Residency: AfghanistanAlbaniaAlgeriaAndorraAngolaAnguillaAntiguaArgentinaArmeniaArubaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBenin (Peoples Republic of)BermudaBhutanBoliviaBonaireBosniaBotswanaBrazilBruneiBulgariaBurkina FasoBurmaBurundiCameroonCanadaCape Verdi Is.Caroline IslandsCayman IslandsCentral African RepublicChadChileChinaColombiaComoresCongoCongo (Dem. Rep. of the)Cook Islands (Rarotonga)Costa RicaCroatiaCubaCuracao (Netherland Antilles)CyprusCzech RepublicDenmarkDjibouti RepDominicaDominican RepublicEcuadorEgyptEllice IslandsEl SalvadorEquatorial GuineaEstoniaEthiopiaFalkland IslandsFaroe IslandsFiji IslandsFinlandFranceFrench GuianaGabonGambiaGeorgiaGermanyGhanaGibraltarGilbert IslandsGreeceGreenlandGrenadaGuadeloupe(French West Indies)GuamGuatemalaGuineaGuinea-BissauGuyanaHaitiHondurasHong KongHungaryIcelandIndiaIndonesiaIranIraqIreland (Republic)IsraelItalyIvory CoastJamaicaJapanJordanKazakhstanKenyaKhmer RepublicKiribatiKuwaitLaosLatviaLebanonLeichtensteinLesothoLiberiaLibyaLine IslandsLithuaniaLuxembourgMacauMacedonia (Fyrom)MadagascarMalawiMalaysiaMaldive IslandsMaliMaltaMariana IslandsMarshall IslandsMartiniqueMauritaniaMauritiusMexicoMoldovaMongoliaMoroccoMozambiqueNamibiaNauruNepalNew CaledoniaNew ZealandNicaraguaNigerNigeriaNorfolk IslandsNorth KoreaNorwayOman (Sultanate of)PakistanPanamaPapua New GuineaParaguayPeruPhilippinesPhoenix IslandsPolandPortugalPuerto RicoQatarReunion IslandsRomaniaRussian FederationRwandaSabahSaipanSamoa (American)Samoa (Western)San MarinoSaudi ArabiaSenegalSerbia & MontenegroSeychellesSierra LeonaSingaporeSlovak RepublicSloveniaSoa TomeSociety IslandsSolomon IslandsSomali Dem RepSouth AfricaSouth KoreaSpainSri LankaSt BarthelemySt EustatiusSt JohnSt Kitts, NevisSt LuciaSt Martin /St MaartenSt VincentSudanSurinameSwazilandSwedenSwitzerlandSyriaTahiti (French Polynesia)TaiwanTanzaniaThailandThe NetherlandsTimorTogoTongaTortola (British Virgin Isl)Trinidad & TobagoTunisiaTurkeyTurks and CaicosUgandaUkraineUnited Arab EmiratesUnited KingdomUruguayU S AUS Virgin Islands (St Croix)US Virgin Islands (St Thomas)VanuatuVenezuelaVietnamYemenZambiaZimbabwe Avis Wizard Number * Discount Codes * Vehicle Type * Enter Wizard Number and Last Name Enter a Discount Code Quantity Member Rates AARP None * Optional Select My Car Continue .modal-upper-half { background-image: url("") !important; } LOG IN TO GET OUR BEST RATES Terms Apply Log In Don't have an account? It's easy and only takes a minute Create an Account var modalID = ""; var enableMemberBenefitsModal = "true"; if (enableMemberBenefitsModal != "" && enableMemberBenefitsModal == "true") { modalID = "show-generic-modal"; } $(".link-details").click(function () { $("html").addClass("intro"); }); $(".close-icon-black").click(function () { $("html").removeClass("intro"); }); /*** Landing Page Modal Starts*****/ if (window.brand == "avis" || window.location.href.includes("avis")) { var isMemberRentalPricesPage = window.location.pathname.includes( "member-rental-prices" ); if (isMemberRentalPricesPage) { var isLoggedIn = window.sessionStorage.getItem("ngStorage-customer") != null; if (isLoggedIn) { var enableLandingPageModal = ""; if (enableLandingPageModal) { modalID = "landing-page-modal-container"; } } } } /*** Landing Page Modal Ends*****/ jQuery(document).ready(function() { setTimeout(function(){ if(sessionStorage.getItem("benefitsmodaldisplay")!= 'true'){ jQuery("body").removeClass('modal-open'); } },3000) }); Your Benefits Have Been Applied! Start Your Reservation Below. Terms Apply var Campaign= Campaign || {};Campaign['ZNJ9I_content_dam_avis_na_us_common_locations_FeaturedImage_2x_png']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/webp"},{"width":1220,"height":800,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.webp","defaultFileMimeType":"image/png"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/png"},{"width":2440,"height":1600,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.webp","defaultFileMimeType":"image/webp"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]}; document.addEventListener("DOMContentLoaded", function(){ var waitUntilAngularReady = waitUntilAngularReady || {}; (function(randomString) { waitUntilAngularReady[randomString] = setInterval(function() { var scope = angular.element($('.mainContainer')).injector() .get('$rootScope'); if (angular.isDefined(window.angular) && angular.isDefined(scope) && angular.isDefined(scope.recompile)) { clearInterval(waitUntilAngularReady[randomString]); scope.recompile(); } }, 1500); })(Math.random().toString(36).substring(2, 15)+ Math.random().toString(36).substring(2, 15)); });   Select My Car Continue Close Make a Reservation Your corporate discount code is invalid. Please have your travel manager verify the discount code in SAP Concur. Pick-up and Return to same location Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... Pick-up Date midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Pick-up Time Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... Return Date midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Return Time Renter's age is 25 or over 25+24232221201918 Age AfghanistanAlbaniaAlgeriaAndorraAngolaAnguillaAntiguaArgentinaArmeniaArubaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBenin (Peoples Republic of)BermudaBhutanBoliviaBonaireBosniaBotswanaBrazilBruneiBulgariaBurkina FasoBurmaBurundiCameroonCanadaCape Verdi Is.Caroline IslandsCayman IslandsCentral African RepublicChadChileChinaColombiaComoresCongoCongo (Dem. Rep. of the)Cook Islands (Rarotonga)Costa RicaCroatiaCubaCuracao (Netherland Antilles)CyprusCzech RepublicDenmarkDjibouti RepDominicaDominican RepublicEcuadorEgyptEllice IslandsEl SalvadorEquatorial GuineaEstoniaEthiopiaFalkland IslandsFaroe IslandsFiji IslandsFinlandFranceFrench GuianaGabonGambiaGeorgiaGermanyGhanaGibraltarGilbert IslandsGreeceGreenlandGrenadaGuadeloupe(French West Indies)GuamGuatemalaGuineaGuinea-BissauGuyanaHaitiHondurasHong KongHungaryIcelandIndiaIndonesiaIranIraqIreland (Republic)IsraelItalyIvory CoastJamaicaJapanJordanKazakhstanKenyaKhmer RepublicKiribatiKuwaitLaosLatviaLebanonLeichtensteinLesothoLiberiaLibyaLine IslandsLithuaniaLuxembourgMacauMacedonia (Fyrom)MadagascarMalawiMalaysiaMaldive IslandsMaliMaltaMariana IslandsMarshall IslandsMartiniqueMauritaniaMauritiusMexicoMoldovaMongoliaMoroccoMozambiqueNamibiaNauruNepalNew CaledoniaNew ZealandNicaraguaNigerNigeriaNorfolk IslandsNorth KoreaNorwayOman (Sultanate of)PakistanPanamaPapua New GuineaParaguayPeruPhilippinesPhoenix IslandsPolandPortugalPuerto RicoQatarReunion IslandsRomaniaRussian FederationRwandaSabahSaipanSamoa (American)Samoa (Western)San MarinoSaudi ArabiaSenegalSerbia & MontenegroSeychellesSierra LeonaSingaporeSlovak RepublicSloveniaSoa TomeSociety IslandsSolomon IslandsSomali Dem RepSouth AfricaSouth KoreaSpainSri LankaSt BarthelemySt EustatiusSt JohnSt Kitts, NevisSt LuciaSt Martin /St MaartenSt VincentSudanSurinameSwazilandSwedenSwitzerlandSyriaTahiti (French Polynesia)TaiwanTanzaniaThailandThe NetherlandsTimorTogoTongaTortola (British Virgin Isl)Trinidad & TobagoTunisiaTurkeyTurks and CaicosUgandaUkraineUnited Arab EmiratesUnited KingdomUruguayU S AUS Virgin Islands (St Croix)US Virgin Islands (St Thomas)VanuatuVenezuelaVietnamYemenZambiaZimbabwe Residency I have a Wizard Number I have a discount code Quantity Member Rates AARPNone Your discount code is invalid, Learn Why? Your discount code is invalid, Learn Why? Select My Car Close msg.res.rateinfo Your Rate Code cannot be used for this reservation due to following reason(s): Rate Code requires minimum length of 5 days Close Age Providing your age allows us to give you a more accurate rental estimate. Restrictions and fees may apply for underage driver's. Close Country Providing your country allows us to give you a more accurate rental estimate. Close Do you have an Avis Wizard Number? Select the Avis Wizard Number option to enter both your Avis Wizard Number and Last Name Close Do you have a Discount Code? Select the Discount Code option to enter an AWD (Avis Worldwide Discount), Coupon Code or Rate Code Close Coupon Count These are the available options that may be redeemed. Close Reserve Reserve your bookings in One click. COUPON INFO Your coupon number cannot be used for this reservation due to following reason(s): The coupon code entered is not valid. Coupon codes are seven characters, four letters followed by three numbers. err.bcd.bcdFormat .session-timeout .close-btn{position:absolute;right:10px;top:15px}.modal-close-btn .modal-title{display:block;margin:0}.session-inner-wrap{padding:0 10px}.session-info-text{text-align:center}.session-timer{border:1px solid #c1c1c1;padding:5px;width:60px;background:#f2f2f2;margin:12px auto 24px}.session-info-text h4{font-size:1.2em}.session-timeout .btn{text-align:center;margin-bottom:15px;width:100%;display:block;font-size:1.2em}.session-timeout .modal-header .close-btn img{display:block}.btn-redirect{background:#d4002a;color:#fff;border:0;padding:10px 25px;font-size:1.2em;width:100%}@media only screen and (min-width:768px){.session-timeout .modal-dialog{width:485px}.session-timeout .modal-title{display:none}.session-timeout .modal-body{padding:20px 20px 20px}.session-timeout .btn-primary-avis{padding:10px 15px;min-width:200px}.session-inner-wrap{padding:0 10px 0}.session-hdr-info{margin-bottom:20px}.session-info-text{text-align:left;float:left;width:75%}.session-info-text h4{font-size:1.25em;margin-bottom:7px}.session-timer{float:right;margin:17px auto 18px}.session-btn-info .btn-primary-avis{float:right;display:inline-block;width:auto}.session-btn-info .btn-primary-avis:first-child{float:left}} Some customers have reported problems when using this Operating System/Browser. If you are unable to complete a reservation, please try an alternate Operating System or Browser. We apologize for the inconvenience and we are working to fix all issues shortly. .session-timeout .close-btn{position:absolute;right:10px;top:15px}.modal-close-btn .modal-title{display:block;margin:0}.session-inner-wrap{padding:0 10px}.session-info-text{text-align:center}.session-timer{border:1px solid #c1c1c1;padding:5px;width:60px;background:#f2f2f2;margin:12px auto 24px}.session-info-text h4{font-size:1.2em}.session-timeout .btn{text-align:center;margin-bottom:15px;width:100%;display:block;font-size:1.2em}.session-timeout .modal-header .close-btn img{display:block}.btn-redirect{background:#d4002a;color:#fff;border:0;padding:10px 25px;font-size:1.2em;width:100%}@media only screen and (min-width:768px){.session-timeout .modal-dialog{width:485px}.session-timeout .modal-title{display:none}.session-timeout .modal-body{padding:20px 20px 20px}.session-timeout .btn-primary-avis{padding:10px 15px;min-width:200px}.session-inner-wrap{padding:0 10px 0}.session-hdr-info{margin-bottom:20px}.session-info-text{text-align:left;float:left;width:75%}.session-info-text h4{font-size:1.25em;margin-bottom:7px}.session-timer{float:right;margin:17px auto 18px}.session-btn-info .btn-primary-avis{float:right;display:inline-block;width:auto}.session-btn-info .btn-primary-avis:first-child{float:left}} Account Locked We are sorry, the maximum number of attempts has been reached. For your security your account has been locked. To unlock your account, please click on the link we sent to your email, if (window.ContextHub && ContextHub.SegmentEngine) { ContextHubJQ(function() { ContextHub.eventing.on(ContextHub.Constants.EVENT_TEASER_LOADED, function(event, data){ data.data.forEach(function(evData) { if (evData.key === "_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic\u002Dpromo1_text") { $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); } }); }); ContextHub.SegmentEngine.PageInteraction.Teaser({ locationId: '_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic\u002Dpromo1_text', variants: [{"path":"/en/home/default","name":"default","title":"Default","campaignName":"","thumbnail":"/en/home.thumb.png","url":"/en/home/_jcr_content/reservation/dynamic-promo1/text.default.html","campaignPriority":0,"tags":[]}], strategy: '', trackingURL: null }); // Make the targeted content visible if no teasers were loaded after 5s setTimeout(function(){ $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); }, 5000); }); } else { $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); } Fall Sale: Free Upgrade + Up to 35% off Pay Now.  var Campaign= Campaign || {};Campaign['Z38BR_content_dam_avis_na_us_common_offers_avis_hp_fallsale22_2560x500_jpg']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.768.504.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1536.1008.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.375.375.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/jpg"},{"width":1220,"height":500,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1220.500.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1220.500.webp","defaultFileMimeType":"image/jpg"},{"width":2440,"height":1000,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.2440.1000.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.2440.1000.webp","defaultFileMimeType":"image/jpg"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]}; FALL SALE: FREE UPGRADE Plus, get up to 35% OFF when you pay now. BOOK NOW

      Ensuring that websites are clutter-free also ensures that they are understandable to people that may experience challenges processing visuals. The website is neatly arranged with the use of simple language also making it readable.

    2. FIND YOUR BEST CAR RENTAL WITH AVIS   var Campaign= Campaign || {};Campaign['EZM5G_content_dam_avis_na_us_common_locations_FeaturedImage_2x_png']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/webp"},{"width":1220,"height":800,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.webp","defaultFileMimeType":"image/png"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/png"},{"width":2440,"height":1600,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.webp","defaultFileMimeType":"image/webp"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]}; EXPLORE document.addEventListener("DOMContentLoaded", function(){ var waitUntilAngularReady = waitUntilAngularReady || {}; (function(randomString) { waitUntilAngularReady[randomString] = setInterval(function() { var scope = angular.element($('.mainContainer')).injector() .get('$rootScope'); if (angular.isDefined(window.angular) && angular.isDefined(scope) && angular.isDefined(scope.recompile)) { clearInterval(waitUntilAngularReady[randomString]); scope.recompile(); } }, 1500); })(Math.random().toString(36).substring(2, 15)+ Math.random().toString(36).substring(2, 15)); }); FIND YOUR BEST CAR RENTAL WITH AVIS     Your corporate discount code is invalid. Please have your travel manager verify the discount code in SAP Concur. We are sorry, the site has not properly responded to your request. Please try again. If the problem persists, please Contact Us .<> Reference Number <> Your Member Benefits Have Been Applied!   |   Start Your Reservation Below. Terms Apply Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Age: 25+24232221201918 Residency: Avis Wizard Number * Discount Codes * Vehicle Type * Enter Wizard Number and Last Name Enter a Discount Code Quantity Member Rates AARP None * Optional Select My Car Continue .modal-upper-half { background-image: url("") !important; } LOG IN TO GET OUR BEST RATES Terms Apply Log In Don't have an account? It's easy and only takes a minute Create an Account var modalID = ""; var enableMemberBenefitsModal = "true"; if (enableMemberBenefitsModal != "" && enableMemberBenefitsModal == "true") { modalID = "show-generic-modal"; } $(".link-details").click(function () { $("html").addClass("intro"); }); $(".close-icon-black").click(function () { $("html").removeClass("intro"); }); /*** Landing Page Modal Starts*****/ if (window.brand == "avis" || window.location.href.includes("avis")) { var isMemberRentalPricesPage = window.location.pathname.includes( "member-rental-prices" ); if (isMemberRentalPricesPage) { var isLoggedIn = window.sessionStorage.getItem("ngStorage-customer") != null; if (isLoggedIn) { var enableLandingPageModal = ""; if (enableLandingPageModal) { modalID = "landing-page-modal-container"; } } } } /*** Landing Page Modal Ends*****/ jQuery(document).ready(function() { setTimeout(function(){ if(sessionStorage.getItem("benefitsmodaldisplay")!= 'true'){ jQuery("body").removeClass('modal-open'); } },3000) }); Your Benefits Have Been Applied! Start Your Reservation Below. Terms Apply var Campaign= Campaign || {};Campaign['ZNJ9I_content_dam_avis_na_us_common_locations_FeaturedImage_2x_png']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/webp"},{"width":1220,"height":800,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.webp","defaultFileMimeType":"image/png"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/png"},{"width":2440,"height":1600,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.webp","defaultFileMimeType":"image/webp"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]}; document.addEventListener("DOMContentLoaded", function(){ var waitUntilAngularReady = waitUntilAngularReady || {}; (function(randomString) { waitUntilAngularReady[randomString] = setInterval(function() { var scope = angular.element($('.mainContainer')).injector() .get('$rootScope'); if (angular.isDefined(window.angular) && angular.isDefined(scope) && angular.isDefined(scope.recompile)) { clearInterval(waitUntilAngularReady[randomString]); scope.recompile(); } }, 1500); })(Math.random().toString(36).substring(2, 15)+ Math.random().toString(36).substring(2, 15)); });   Select My Car Continue Close Make a Reservation Your corporate discount code is invalid. Please have your travel manager verify the discount code in SAP Concur. Pick-up and Return to same location Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... Pick-up Date midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Pick-up Time Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... Return Date midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Return Time Renter's age is 25 or over 25+24232221201918 Age Residency I have a Wizard Number I have a discount code Quantity Member Rates AARPNone Your discount code is invalid, Learn Why? Your discount code is invalid, Learn Why? Select My Car Close msg.res.rateinfo Your Rate Code cannot be used for this reservation due to following reason(s): Rate Code requires minimum length of 5 days Close Age Providing your age allows us to give you a more accurate rental estimate. Restrictions and fees may apply for underage driver's. Close Country Providing your country allows us to give you a more accurate rental estimate. Close Do you have an Avis Wizard Number? Select the Avis Wizard Number option to enter both your Avis Wizard Number and Last Name Close Do you have a Discount Code? Select the Discount Code option to enter an AWD (Avis Worldwide Discount), Coupon Code or Rate Code Close Coupon Count These are the available options that may be redeemed. Close Reserve Reserve your bookings in One click. COUPON INFO Your coupon number cannot be used for this reservation due to following reason(s): The coupon code entered is not valid. Coupon codes are seven characters, four letters followed by three numbers. err.bcd.bcdFormat .session-timeout .close-btn{position:absolute;right:10px;top:15px}.modal-close-btn .modal-title{display:block;margin:0}.session-inner-wrap{padding:0 10px}.session-info-text{text-align:center}.session-timer{border:1px solid #c1c1c1;padding:5px;width:60px;background:#f2f2f2;margin:12px auto 24px}.session-info-text h4{font-size:1.2em}.session-timeout .btn{text-align:center;margin-bottom:15px;width:100%;display:block;font-size:1.2em}.session-timeout .modal-header .close-btn img{display:block}.btn-redirect{background:#d4002a;color:#fff;border:0;padding:10px 25px;font-size:1.2em;width:100%}@media only screen and (min-width:768px){.session-timeout .modal-dialog{width:485px}.session-timeout .modal-title{display:none}.session-timeout .modal-body{padding:20px 20px 20px}.session-timeout .btn-primary-avis{padding:10px 15px;min-width:200px}.session-inner-wrap{padding:0 10px 0}.session-hdr-info{margin-bottom:20px}.session-info-text{text-align:left;float:left;width:75%}.session-info-text h4{font-size:1.25em;margin-bottom:7px}.session-timer{float:right;margin:17px auto 18px}.session-btn-info .btn-primary-avis{float:right;display:inline-block;width:auto}.session-btn-info .btn-primary-avis:first-child{float:left}} Some customers have reported problems when using this Operating System/Browser. If you are unable to complete a reservation, please try an alternate Operating System or Browser. We apologize for the inconvenience and we are working to fix all issues shortly. .session-timeout .close-btn{position:absolute;right:10px;top:15px}.modal-close-btn .modal-title{display:block;margin:0}.session-inner-wrap{padding:0 10px}.session-info-text{text-align:center}.session-timer{border:1px solid #c1c1c1;padding:5px;width:60px;background:#f2f2f2;margin:12px auto 24px}.session-info-text h4{font-size:1.2em}.session-timeout .btn{text-align:center;margin-bottom:15px;width:100%;display:block;font-size:1.2em}.session-timeout .modal-header .close-btn img{display:block}.btn-redirect{background:#d4002a;color:#fff;border:0;padding:10px 25px;font-size:1.2em;width:100%}@media only screen and (min-width:768px){.session-timeout .modal-dialog{width:485px}.session-timeout .modal-title{display:none}.session-timeout .modal-body{padding:20px 20px 20px}.session-timeout .btn-primary-avis{padding:10px 15px;min-width:200px}.session-inner-wrap{padding:0 10px 0}.session-hdr-info{margin-bottom:20px}.session-info-text{text-align:left;float:left;width:75%}.session-info-text h4{font-size:1.25em;margin-bottom:7px}.session-timer{float:right;margin:17px auto 18px}.session-btn-info .btn-primary-avis{float:right;display:inline-block;width:auto}.session-btn-info .btn-primary-avis:first-child{float:left}} Account Locked We are sorry, the maximum number of attempts has been reached. For your security your account has been locked. To unlock your account, please click on the link we sent to your email, if (window.ContextHub && ContextHub.SegmentEngine) { ContextHubJQ(function() { ContextHub.eventing.on(ContextHub.Constants.EVENT_TEASER_LOADED, function(event, data){ data.data.forEach(function(evData) { if (evData.key === "_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic\u002Dpromo1_text") { $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); } }); }); ContextHub.SegmentEngine.PageInteraction.Teaser({ locationId: '_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic\u002Dpromo1_text', variants: [{"path":"/en/home/default","name":"default","title":"Default","campaignName":"","thumbnail":"/en/home.thumb.png","url":"/en/home/_jcr_content/reservation/dynamic-promo1/text.default.html","campaignPriority":0,"tags":[]}], strategy: '', trackingURL: null }); // Make the targeted content visible if no teasers were loaded after 5s setTimeout(function(){ $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); }, 5000); }); } else { $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); } Fall Sale: Free Upgrade + Up to 35% off Pay Now.  var Campaign= Campaign || {};Campaign['Z38BR_content_dam_avis_na_us_common_offers_avis_hp_fallsale22_2560x500_jpg']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.768.504.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1536.1008.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.375.375.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/jpg"},{"width":1220,"height":500,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1220.500.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1220.500.webp","defaultFileMimeType":"image/jpg"},{"width":2440,"height":1000,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.2440.1000.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.2440.1000.webp","defaultFileMimeType":"image/jpg"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]}; FALL SALE: FREE UPGRADE Plus, get up to 35% OFF when you pay now. BOOK NOW

      Ensuring that websites are clutter-free also ensures that they are understandable to people that may experience challenges processing visuals. The website is neatly arranged with the use of simple language also making it readable.

    3. FIND YOUR BEST CAR RENTAL WITH AVIS   var Campaign= Campaign || {};Campaign['EZM5G_content_dam_avis_na_us_common_locations_FeaturedImage_2x_png']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/webp"},{"width":1220,"height":800,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.webp","defaultFileMimeType":"image/png"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/png"},{"width":2440,"height":1600,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.webp","defaultFileMimeType":"image/webp"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]}; EXPLORE document.addEventListener("DOMContentLoaded", function(){ var waitUntilAngularReady = waitUntilAngularReady || {}; (function(randomString) { waitUntilAngularReady[randomString] = setInterval(function() { var scope = angular.element($('.mainContainer')).injector() .get('$rootScope'); if (angular.isDefined(window.angular) && angular.isDefined(scope) && angular.isDefined(scope.recompile)) { clearInterval(waitUntilAngularReady[randomString]); scope.recompile(); } }, 1500); })(Math.random().toString(36).substring(2, 15)+ Math.random().toString(36).substring(2, 15)); }); FIND YOUR BEST CAR RENTAL WITH AVIS     Your corporate discount code is invalid. Please have your travel manager verify the discount code in SAP Concur. We are sorry, the site has not properly responded to your request. Please try again. If the problem persists, please Contact Us .<> Reference Number <> Your Member Benefits Have Been Applied!   |   Start Your Reservation Below. Terms Apply Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Age: 25+24232221201918 Residency: AfghanistanAlbaniaAlgeriaAndorraAngolaAnguillaAntiguaArgentinaArmeniaArubaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBenin (Peoples Republic of)BermudaBhutanBoliviaBonaireBosniaBotswanaBrazilBruneiBulgariaBurkina FasoBurmaBurundiCameroonCanadaCape Verdi Is.Caroline IslandsCayman IslandsCentral African RepublicChadChileChinaColombiaComoresCongoCongo (Dem. Rep. of the)Cook Islands (Rarotonga)Costa RicaCroatiaCubaCuracao (Netherland Antilles)CyprusCzech RepublicDenmarkDjibouti RepDominicaDominican RepublicEcuadorEgyptEllice IslandsEl SalvadorEquatorial GuineaEstoniaEthiopiaFalkland IslandsFaroe IslandsFiji IslandsFinlandFranceFrench GuianaGabonGambiaGeorgiaGermanyGhanaGibraltarGilbert IslandsGreeceGreenlandGrenadaGuadeloupe(French West Indies)GuamGuatemalaGuineaGuinea-BissauGuyanaHaitiHondurasHong KongHungaryIcelandIndiaIndonesiaIranIraqIreland (Republic)IsraelItalyIvory CoastJamaicaJapanJordanKazakhstanKenyaKhmer RepublicKiribatiKuwaitLaosLatviaLebanonLeichtensteinLesothoLiberiaLibyaLine IslandsLithuaniaLuxembourgMacauMacedonia (Fyrom)MadagascarMalawiMalaysiaMaldive IslandsMaliMaltaMariana IslandsMarshall IslandsMartiniqueMauritaniaMauritiusMexicoMoldovaMongoliaMoroccoMozambiqueNamibiaNauruNepalNew CaledoniaNew ZealandNicaraguaNigerNigeriaNorfolk IslandsNorth KoreaNorwayOman (Sultanate of)PakistanPanamaPapua New GuineaParaguayPeruPhilippinesPhoenix IslandsPolandPortugalPuerto RicoQatarReunion IslandsRomaniaRussian FederationRwandaSabahSaipanSamoa (American)Samoa (Western)San MarinoSaudi ArabiaSenegalSerbia & MontenegroSeychellesSierra LeonaSingaporeSlovak RepublicSloveniaSoa TomeSociety IslandsSolomon IslandsSomali Dem RepSouth AfricaSouth KoreaSpainSri LankaSt BarthelemySt EustatiusSt JohnSt Kitts, NevisSt LuciaSt Martin /St MaartenSt VincentSudanSurinameSwazilandSwedenSwitzerlandSyriaTahiti (French Polynesia)TaiwanTanzaniaThailandThe NetherlandsTimorTogoTongaTortola (British Virgin Isl)Trinidad & TobagoTunisiaTurkeyTurks and CaicosUgandaUkraineUnited Arab EmiratesUnited KingdomUruguayU S AUS Virgin Islands (St Croix)US Virgin Islands (St Thomas)VanuatuVenezuelaVietnamYemenZambiaZimbabwe Avis Wizard Number * Discount Codes * Vehicle Type * Enter Wizard Number and Last Name Enter a Discount Code Quantity Member Rates AARP None * Optional Select My Car Continue .modal-upper-half { background-image: url("") !important; } LOG IN TO GET OUR BEST RATES Terms Apply Log In Don't have an account? It's easy and only takes a minute Create an Account var modalID = ""; var enableMemberBenefitsModal = "true"; if (enableMemberBenefitsModal != "" && enableMemberBenefitsModal == "true") { modalID = "show-generic-modal"; } $(".link-details").click(function () { $("html").addClass("intro"); }); $(".close-icon-black").click(function () { $("html").removeClass("intro"); }); /*** Landing Page Modal Starts*****/ if (window.brand == "avis" || window.location.href.includes("avis")) { var isMemberRentalPricesPage = window.location.pathname.includes( "member-rental-prices" ); if (isMemberRentalPricesPage) { var isLoggedIn = window.sessionStorage.getItem("ngStorage-customer") != null; if (isLoggedIn) { var enableLandingPageModal = ""; if (enableLandingPageModal) { modalID = "landing-page-modal-container"; } } } } /*** Landing Page Modal Ends*****/ jQuery(document).ready(function() { setTimeout(function(){ if(sessionStorage.getItem("benefitsmodaldisplay")!= 'true'){ jQuery("body").removeClass('modal-open'); } },3000) }); Your Benefits Have Been Applied! Start Your Reservation Below. Terms Apply var Campaign= Campaign || {};Campaign['ZNJ9I_content_dam_avis_na_us_common_locations_FeaturedImage_2x_png']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/webp"},{"width":1220,"height":800,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.1220.800.webp","defaultFileMimeType":"image/png"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/png"},{"width":2440,"height":1600,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.png","webpFile":"/content/dam/avis/na/us/common/locations/FeaturedImage@2x.png/jcr:content/renditions/cq5dam.web.2440.1600.webp","defaultFileMimeType":"image/webp"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-mobileflat.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]}; document.addEventListener("DOMContentLoaded", function(){ var waitUntilAngularReady = waitUntilAngularReady || {}; (function(randomString) { waitUntilAngularReady[randomString] = setInterval(function() { var scope = angular.element($('.mainContainer')).injector() .get('$rootScope'); if (angular.isDefined(window.angular) && angular.isDefined(scope) && angular.isDefined(scope.recompile)) { clearInterval(waitUntilAngularReady[randomString]); scope.recompile(); } }, 1500); })(Math.random().toString(36).substring(2, 15)+ Math.random().toString(36).substring(2, 15)); });   Select My Car Continue Close Make a Reservation Your corporate discount code is invalid. Please have your travel manager verify the discount code in SAP Concur. Pick-up and Return to same location Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... Pick-up Date midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Pick-up Time Searching... Please revise your search or click here to browse for a location Keep typing to refine search ... Return Date midnight12:30 AM1:00 AM1:30 AM2:00 AM2:30 AM3:00 AM3:30 AM4:00 AM4:30 AM5:00 AM5:30 AM6:00 AM6:30 AM7:00 AM7:30 AM8:00 AM8:30 AM9:00 AM9:30 AM10:00 AM10:30 AM11:00 AM11:30 AMnoon12:30 PM1:00 PM1:30 PM2:00 PM2:30 PM3:00 PM3:30 PM4:00 PM4:30 PM5:00 PM5:30 PM6:00 PM6:30 PM7:00 PM7:30 PM8:00 PM8:30 PM9:00 PM9:30 PM10:00 PM10:30 PM11:00 PM11:30 PM Return Time Renter's age is 25 or over 25+24232221201918 Age AfghanistanAlbaniaAlgeriaAndorraAngolaAnguillaAntiguaArgentinaArmeniaArubaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBenin (Peoples Republic of)BermudaBhutanBoliviaBonaireBosniaBotswanaBrazilBruneiBulgariaBurkina FasoBurmaBurundiCameroonCanadaCape Verdi Is.Caroline IslandsCayman IslandsCentral African RepublicChadChileChinaColombiaComoresCongoCongo (Dem. Rep. of the)Cook Islands (Rarotonga)Costa RicaCroatiaCubaCuracao (Netherland Antilles)CyprusCzech RepublicDenmarkDjibouti RepDominicaDominican RepublicEcuadorEgyptEllice IslandsEl SalvadorEquatorial GuineaEstoniaEthiopiaFalkland IslandsFaroe IslandsFiji IslandsFinlandFranceFrench GuianaGabonGambiaGeorgiaGermanyGhanaGibraltarGilbert IslandsGreeceGreenlandGrenadaGuadeloupe(French West Indies)GuamGuatemalaGuineaGuinea-BissauGuyanaHaitiHondurasHong KongHungaryIcelandIndiaIndonesiaIranIraqIreland (Republic)IsraelItalyIvory CoastJamaicaJapanJordanKazakhstanKenyaKhmer RepublicKiribatiKuwaitLaosLatviaLebanonLeichtensteinLesothoLiberiaLibyaLine IslandsLithuaniaLuxembourgMacauMacedonia (Fyrom)MadagascarMalawiMalaysiaMaldive IslandsMaliMaltaMariana IslandsMarshall IslandsMartiniqueMauritaniaMauritiusMexicoMoldovaMongoliaMoroccoMozambiqueNamibiaNauruNepalNew CaledoniaNew ZealandNicaraguaNigerNigeriaNorfolk IslandsNorth KoreaNorwayOman (Sultanate of)PakistanPanamaPapua New GuineaParaguayPeruPhilippinesPhoenix IslandsPolandPortugalPuerto RicoQatarReunion IslandsRomaniaRussian FederationRwandaSabahSaipanSamoa (American)Samoa (Western)San MarinoSaudi ArabiaSenegalSerbia & MontenegroSeychellesSierra LeonaSingaporeSlovak RepublicSloveniaSoa TomeSociety IslandsSolomon IslandsSomali Dem RepSouth AfricaSouth KoreaSpainSri LankaSt BarthelemySt EustatiusSt JohnSt Kitts, NevisSt LuciaSt Martin /St MaartenSt VincentSudanSurinameSwazilandSwedenSwitzerlandSyriaTahiti (French Polynesia)TaiwanTanzaniaThailandThe NetherlandsTimorTogoTongaTortola (British Virgin Isl)Trinidad & TobagoTunisiaTurkeyTurks and CaicosUgandaUkraineUnited Arab EmiratesUnited KingdomUruguayU S AUS Virgin Islands (St Croix)US Virgin Islands (St Thomas)VanuatuVenezuelaVietnamYemenZambiaZimbabwe Residency I have a Wizard Number I have a discount code Quantity Member Rates AARPNone Your discount code is invalid, Learn Why? Your discount code is invalid, Learn Why? Select My Car Close msg.res.rateinfo Your Rate Code cannot be used for this reservation due to following reason(s): Rate Code requires minimum length of 5 days Close Age Providing your age allows us to give you a more accurate rental estimate. Restrictions and fees may apply for underage driver's. Close Country Providing your country allows us to give you a more accurate rental estimate. Close Do you have an Avis Wizard Number? Select the Avis Wizard Number option to enter both your Avis Wizard Number and Last Name Close Do you have a Discount Code? Select the Discount Code option to enter an AWD (Avis Worldwide Discount), Coupon Code or Rate Code Close Coupon Count These are the available options that may be redeemed. Close Reserve Reserve your bookings in One click. COUPON INFO Your coupon number cannot be used for this reservation due to following reason(s): The coupon code entered is not valid. Coupon codes are seven characters, four letters followed by three numbers. err.bcd.bcdFormat .session-timeout .close-btn{position:absolute;right:10px;top:15px}.modal-close-btn .modal-title{display:block;margin:0}.session-inner-wrap{padding:0 10px}.session-info-text{text-align:center}.session-timer{border:1px solid #c1c1c1;padding:5px;width:60px;background:#f2f2f2;margin:12px auto 24px}.session-info-text h4{font-size:1.2em}.session-timeout .btn{text-align:center;margin-bottom:15px;width:100%;display:block;font-size:1.2em}.session-timeout .modal-header .close-btn img{display:block}.btn-redirect{background:#d4002a;color:#fff;border:0;padding:10px 25px;font-size:1.2em;width:100%}@media only screen and (min-width:768px){.session-timeout .modal-dialog{width:485px}.session-timeout .modal-title{display:none}.session-timeout .modal-body{padding:20px 20px 20px}.session-timeout .btn-primary-avis{padding:10px 15px;min-width:200px}.session-inner-wrap{padding:0 10px 0}.session-hdr-info{margin-bottom:20px}.session-info-text{text-align:left;float:left;width:75%}.session-info-text h4{font-size:1.25em;margin-bottom:7px}.session-timer{float:right;margin:17px auto 18px}.session-btn-info .btn-primary-avis{float:right;display:inline-block;width:auto}.session-btn-info .btn-primary-avis:first-child{float:left}} Some customers have reported problems when using this Operating System/Browser. If you are unable to complete a reservation, please try an alternate Operating System or Browser. We apologize for the inconvenience and we are working to fix all issues shortly. .session-timeout .close-btn{position:absolute;right:10px;top:15px}.modal-close-btn .modal-title{display:block;margin:0}.session-inner-wrap{padding:0 10px}.session-info-text{text-align:center}.session-timer{border:1px solid #c1c1c1;padding:5px;width:60px;background:#f2f2f2;margin:12px auto 24px}.session-info-text h4{font-size:1.2em}.session-timeout .btn{text-align:center;margin-bottom:15px;width:100%;display:block;font-size:1.2em}.session-timeout .modal-header .close-btn img{display:block}.btn-redirect{background:#d4002a;color:#fff;border:0;padding:10px 25px;font-size:1.2em;width:100%}@media only screen and (min-width:768px){.session-timeout .modal-dialog{width:485px}.session-timeout .modal-title{display:none}.session-timeout .modal-body{padding:20px 20px 20px}.session-timeout .btn-primary-avis{padding:10px 15px;min-width:200px}.session-inner-wrap{padding:0 10px 0}.session-hdr-info{margin-bottom:20px}.session-info-text{text-align:left;float:left;width:75%}.session-info-text h4{font-size:1.25em;margin-bottom:7px}.session-timer{float:right;margin:17px auto 18px}.session-btn-info .btn-primary-avis{float:right;display:inline-block;width:auto}.session-btn-info .btn-primary-avis:first-child{float:left}} Account Locked We are sorry, the maximum number of attempts has been reached. For your security your account has been locked. To unlock your account, please click on the link we sent to your email, if (window.ContextHub && ContextHub.SegmentEngine) { ContextHubJQ(function() { ContextHub.eventing.on(ContextHub.Constants.EVENT_TEASER_LOADED, function(event, data){ data.data.forEach(function(evData) { if (evData.key === "_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic\u002Dpromo1_text") { $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); } }); }); ContextHub.SegmentEngine.PageInteraction.Teaser({ locationId: '_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic\u002Dpromo1_text', variants: [{"path":"/en/home/default","name":"default","title":"Default","campaignName":"","thumbnail":"/en/home.thumb.png","url":"/en/home/_jcr_content/reservation/dynamic-promo1/text.default.html","campaignPriority":0,"tags":[]}], strategy: '', trackingURL: null }); // Make the targeted content visible if no teasers were loaded after 5s setTimeout(function(){ $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); }, 5000); }); } else { $CQ("#_content_avis_na_us_en_US_home_jcr_content_reservation_dynamic-promo1_text").css('visibility', 'visible'); } Fall Sale: Free Upgrade + Up to 35% off Pay Now.  var Campaign= Campaign || {};Campaign['Z38BR_content_dam_avis_na_us_common_offers_avis_hp_fallsale22_2560x500_jpg']={"Renditions":[{"width":768,"height":504,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.768.504.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.768.504.webp","defaultFileMimeType":"image/webp"},{"width":1536,"height":1008,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1536.1008.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1536.1008.webp","defaultFileMimeType":"image/webp"},{"width":375,"height":375,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.375.375.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.375.375.webp","defaultFileMimeType":"image/jpg"},{"width":1220,"height":500,"dpr":1,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1220.500.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.1220.500.webp","defaultFileMimeType":"image/jpg"},{"width":2440,"height":1000,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.2440.1000.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.2440.1000.webp","defaultFileMimeType":"image/jpg"},{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}],"MobileRenditions":[{"width":750,"height":750,"dpr":2,"defaultFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.jpg","webpFile":"/content/dam/avis/na/us/common/offers/avis-hp-fallsale22-2560x500.jpg/jcr:content/renditions/cq5dam.web.750.750.webp","defaultFileMimeType":"image/jpg"}]};

      Avis' website received a pass from the accessibility checker, Silktide, in the colour contrast test. In the description, it states that the contrast ratio for normal text should be 4.5:1 at minimum, while large text must be 3:1. The lowest contrast ratio on the website is 5.48:1, proving that Avis' website is perceivable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study used root tips from semi-hydroponic tea seedlings. The strategy followed sequential steps to draw partial conclusions.

      Initially, protoplasts obtained from root tips were processed for scRNA-seq using the 10x Genomics platform. The sequencing data underwent pre-filtering at both the cell and gene levels, leading to 10,435 cells. These cells were then classified into eight clusters using t-SNE algorithms. The present study scrutinised cell typification through protein sequence similarity analysis of homologs of cell type marker genes. The analysis was conducted to ensure accuracy using validated genes from previous scRNA-seq studies and the model plant Arabidopsis thaliana. The cluster cell annotation was confirmed using in situ RT-PCR analyses. This methodology provided a comprehensive insight into the cellular differentiation of the sample under study. The identified clusters, spanning 1 to 8, have been accurately classified as xylem, epidermal, stem cell niche, cortex/endodermal, root cap, cambium, phloem, and pericycle cells.

      Then, the authors performed a pseudo-time analysis to validate the cell cluster annotation by examining the differentiation pathways of the root cells. Lastly, they created a differentiation heatmap from the xylem and epidermal cells and identified the biological functions associated with the highly expressed genes.

      Upon thoroughly analysing the scRNA-seq data, the researchers delved into the cell heterogeneity of nitrate and ammonium uptake, transport, and nitrogen assimilation into amino acids. The scRNA-seq data was validated by in situ RT-PCR. It allows the localisation of glutamine and alanine biosynthetic enzymes along the cell clusters and confirms that both constituent the primary amino acid metabolism in the root. Such investigation was deemed necessary due to the paramount importance of these processes in theanine biosynthesis since this molecule is synthesised from glutamine and alanine-derived ethylamine.

      Afterwards, the authors analysed the cell-specific expression patterns of the theanine biosynthesis genes, combining the same molecular tools. They concluded that theanine biosynthesis is more enriched in cluster 8 "pericycle cells" than glutamine biosynthesis (Lines 271-272). However, the statement made in Line 250 states that the highest expression levels of genes responsible for glutamine biosynthesis were observed in Clusters 1, 3, 4, 6, and 8, leading to an unclear conclusion.

      Thank you for your interest in and feedback on the paper. We have made revisions to the manuscript as per your suggestions. We would like to emphasize that the precursors of theanine biosynthesis are alanine-derived ethylamine and glutamate, not glutamine. Furthermore, in terms of the intermediates, only ethylamine is specific to the theanine biosynthetic pathway, as glutamate is the primary product of nitrogen assimilation and serves as a precursor for the biosynthesis of amino acids, proteins, chlorophyll, and many secondary metabolites.

      In this study, we observed a high expression of genes encoding enzymes involved in the glutamate biosynthetic pathway (CsGOGATs and CsGDHs) across all 8 clusters, with particularly strong expression in cluster 1, 3, 4, 6, and 8 (Figure 4D and 5B). However, the gene encoding CsTHSI responsible for catalyzing theanine biosynthesis from glutamate and ethylamine was determined to be more enriched in cluster 8 (Figure 5B and 5C). Therefore, we concluded that theanine biosynthesis was more enriched in cluster 8, whereas glutamate biosynthesis was more broadly active in clusters 1, 3, 4, 6 and 8.

      The regulation of theanine biosynthesis by the MYB transcription factor family is well-established. In particular, CsMYB6, a transcription factor expressed specifically in roots, has been  to promote theanine biosynthesis by binding to the promoter of the TSI gene responsible for theanine synthesis. However, their findings indicate that CsMYB6 expression is present in Cluster 3 (SCN), Cluster 6 (cambium cells), and Cluster 1 (xylem cells) but not in Cluster 8 (pericycle cells), which is known for its high expression of CsTSI. Similarly, their scRNA-seq data indicated that CsMYB40 and CsHHO3, which activate and repress CsAlaDC expression, respectively, did not show high expression in Cluster 1 (the cell cluster with high CsAlaDC expression). Based on these findings, the authors hypothesised that transcription factors and target genes are not necessarily always highly expressed in the same cells. Nonetheless, additional evidence is essential to substantiate this presumption.

      Thank you for your advice. We fully agree that additional evidence is essential to support the presumption that transcription factors and target genes are not always highly expressed in the same cells. Therefore, in this study, we identified another transcription factor, CsLBD37, which was characterized to negatively regulate CsAlaDC expression in response to nitrogen levels. Consistent with our presumption, the expression of CsLBD37 was not enriched in cluster 1, where the expression of CsAlaDC was primarily enriched (Figure 5B and 6D; Line 365).

      To further identify supporting evidence, we also analyzed the expression of some transcription factors and their target genes in the model plant Arabidopsis, using published single cell RNA-seq data (Ryu et al., 2019; Wendrich et al., 2020; Zhang et al., 2019; Denyer et al., 2019; Jean-Baptiste et al. 2019; Shulse et al., 2019; Shahan et al., 2022) and database (Root Cell Atlas, https://rootcellatlas.org/; BAR, https://bar.utoronto.ca/#GeneExpressionAndProteinTools). Similar to the situation in tea plants, the regulators were not exactly the same as the cell types in which their target genes were highly expressed. For example, AtARF7 and AtARF19 were highly expressed in the cortex and stele, respectively, whereas their target genes AtLBD16 and AtLBD29 were highly expressed in endodermal cells (Okushima et al.,2007; Supplemental figure 8B and 8C; Line 312-325 and Line 525-526); AtPHR1 was highly expressed in root epidermal cells and pericyte cells, but its target gene AtF3’H was highly expressed in the cortex and AtRALF23 was highly expressed in xylem cells (Liu et al., 2022; Tang et al., 2022; Supplemental figure 8B and 8C; Line 322-327 and Line 527-530).

      At the same time, we discussed that we cannot rule out the possibility of transcription factors regulating their target genes in the same cell type and both being highly expressed. One of the reasons is that these theanine-associated genes are promiscuous, having many target genes and regulate multiple biological processes in tea plants. We have only shown that high expression in the same cell type is not a necessary condition (Line 534-554). We strongly agree with the reviewer's opinion that more evidence is needed to illustrate this model in the future.

      Reference:

      Denyer, T. et al. (2019). Spatiotemporal developmental trajectories in the arabidopsis root revealed using high-throughput single-cell RNA sequencing. Dev Cell. 48:840-852.e5.

      Liu, Z. et al. (2022). PHR1 positively regulates phosphate starvation-induced anthocyanin accumulation through direct upregulation of genes F3'H and LDOX in Arabidopsis. Planta. 256:42.

      Okushima, Y. et al. (2007). ARF7 and ARF19 regulate lateral root formation via direct activation of LBD/ASL genes in Arabidopsis. Plant Cell. 19:118-30.

      Ryu, K. H., Huang, L., Kang, H. M. & Schiefelbein, J. (2019). Single-cell RNA sequencing resolves molecular relationships among individual plant cells. Plant Physiol. 179:1444-1456.

      Shahan, R. et al. (2022). A single-cell Arabidopsis root atlas reveals developmental trajectories in wild-type and cell identity mutants. Dev Cell. 57:543-560.e9.

      Shulse, C. et al. (2019). High-throughput single-cell transcriptome profiling of plant cell types. Cell Rep. 27:2241-2247.e4.

      Tang, J. et al. (2022). Plant immunity suppression via PHR1-RALF-FERONIA shapes the root microbiome to alleviate phosphate starvation. EMBO J. 41:e109102.

      Wendrich, J.R., et al. (2020). Vascular transcription factors guide plant epidermal responses to limiting phosphate conditions. Science. 370:eaay4970.

      Zhang, T. et al. (2019). A single-cell RNA sequencing profiles the developmental landscape of arabidopsis root. Mol Plant. 12:648-660.

      Lastly, the authors have discovered a novel transcription factor belonging to the Lateral Organ Boundaries Domain (LBD) family known as CsLBD37 that can co-regulate the synthesis of theanine and the development of lateral roots. The authors observed that CsLBD37 is located within the nucleus and can repress the CsAlaDC promoter's activity. To investigate this mechanism further, the authors conducted experiments to determine whether CsLBD37 can inhibit CsAlaDC expression in vivo. They achieved this by creating transiently CsLBD37-silenced or over-expression tea seedlings through antisense oligonucleotide interference and generation of transgenic hairy roots. Based on their findings, the authors hypothesise that CsLBD37 regulates CsAlaDC expression to modulate the synthesis of ethylamine and theanine.

      Additionally, the available literature suggests that the transcription factors belonging to the Lateral Organ Boundaries Domain (LBD) family play a crucial role in regulating the development of lateral roots and secondary root growth. Considering this, they confirmed that pericycle cells exhibit a higher expression of CsLBD37. A recent experiment revealed that overexpression of CsLBD37 in transgenic Arabidopsis thaliana plants led to fewer lateral roots than the wild type. From this observation, the researchers concluded that CsLBD37 regulates lateral root development in tea plants. I respectfully submit that the current conclusion may require additional research before it can be considered definitive.

      Further efforts should be made to investigate the signalling mechanisms that govern CsLBD37 expression to arrive at a more comprehensive understanding of this process. In the context of Arabidopsis lateral root founder cells, the establishment of asymmetry is regulated by LBD16/ASL18 and other related LBD/ASL proteins, as well as the AUXIN RESPONSE FACTORs (ARF7 and ARF19). This is achieved by activating plant-specific transcriptional regulators such as LBD16/ASL18 (Go et al., 2012, https://doi.org/10.1242/dev.071928). On the other hand, other downstream homologues of LBD genes regulated by cytokinin signalling play a role in secondary root growth (Ye et al., 2021, https://doi.org/10.1016/j.cub.2021.05.036). It is imperative to shed light on the hormonal regulation of CsLBD37 expression in order to gain a comprehensive understanding of its involvement in the morphogenic process.

      We are very grateful for your valuable suggestions and we fully agree with you. In an earlier study, we also observed a link between theanine metabolism, hormone metabolism and root development (Chen et al., 2022), but there is still insufficient evidence to fully characterize these links. In the current study, the focus was on the cell-specific theanine biosynthesis, transport and regulation, and we identified that CsLBD37 negatively regulates theanine biosynthesis. However, the upstream regulatory mechanism of CsLBD37 has not been addressed in this study. It is a pertinent question for future investigation as to how CsLBD37 is regulated in root development. We have included the following additional discussion in the revised manuscript: “Besides, it has been reported that LBD family TFs were regulated by, or interacted with, regulators of hormone pathways (e.g., ARFs) to regulate the process of root morphogenesis (Goh et al., 2012; Ye et al., 2021). Based on these findings, we speculated that CsLBD37 is likely regulated by, or interacts with, other proteins to form a complex to regulate root development or theanine biosynthesis.” (Line 573-576). At the same time, we revised the text “These results provided support for a model in which CsLBD37 plays a role in regulating lateral root development in tea plants” to “These findings suggested that CsLBD37 may play a role in regulating lateral root development in tea plant roots” (Line 401-402).

      Reference:

      Chen, T. et al. (2022). Theanine, a tea plant specific non-proteinogenic amino acid, is involved in the regulation of lateral root development in response to nitrogen status. Hortic. Res. 10:uhac267.

      Goh, T., Joi, S., Mimura, T. & Fukaki, H. (2012). The establishment of asymmetry in Arabidopsis lateral root founder cells is regulated by LBD16/ASL18 and related LBD/ASL proteins. Development 139:883-893.

      Ye, L. et al. (2021). Cytokinins initiate secondary growth in the Arabidopsis root through a set of LBD genes. Curr. Biol. 31:3365-3373.e3367.

      Strength:

      The manuscript showcases significant dedication and hard work, resulting in valuable insights that serve as a fundamental basis for generating knowledge. The authors skillfully integrated various tools available for this type of study and meticulously presented and illustrated every step involved in the survey. The overall quality of the work is exceptional, and it would be a valuable addition to any academic or professional setting.

      Weaknesses:

      In its current form, the article presents certain weaknesses that need to be addressed to improve its overall quality. Specifically, the authors' conclusions appear to have been drawn in haste without sufficient experimental data and a comprehensive discussion of the entire plant. It is strongly advised that the authors devote additional effort to resolving the abovementioned issues to bolster the article's credibility and dependability. This will ensure that the article is of the highest quality, providing readers with reliable and trustworthy information.

      Thank you for your feedback. We acknowledge that our experiments and data require further improvement. Currently, the genetic transformation of the tea plant remains a challenge, making it difficult to obtain sufficient in vivo evidence. Despite this situation, we have made every effort to obtain support for our conclusions based on the current situation and available technology. Indeed, additional studies will be performed once the impediment associated with genetic transformation of the tea plant has been resolved.

      Reviewer #2 (Public Review):

      Summary:

      In their manuscript, Lin et al. present a comprehensive single-cell analysis of tea plant roots. They measured the transcriptomes of 10,435 cells from tea plant root tips, leading to the identification and annotation of 8 distinct cell clusters using marker genes. Through this dataset, they delved into the cell-type-specific expression profiles of genes crucial for the biosynthesis, transport, and storage of theanine, revealing potential multicellular compartmentalization in theanine biosynthesis pathways. Furthermore, their findings highlight CsLBD37 as a novel transcription factor with dual regulatory roles in both theanine biosynthesis and lateral root development.

      Strengths:

      This manuscript provides the first single-cell dataset analysis of roots of the tea plants. It also enables detailed analysis of the specific expression patterns of the gene involved in theanine biosynthesis. Some of these gene expression patterns in roots were further validated through in-situ RT-PCR. Additionally, a novel TF gene CsLBD37's role in regulating theanine biosynthesis was identified through their analysis.

      Weaknesses:

      Several issues need to be addressed:<br /> (1) The annotation of single-cell clusters (1-8) in Figure 2 could benefit from further improvement. Currently, the authors utilize several key genes, such as CsAAP1, CsLHW, CsWAT1, CsIRX9, CsWOX5, CsGL3, and CsSCR, to annotate cell types. However, it is notable that some of these genes are expressed in only a limited number of cells within their respective clusters, such as CsAAP1, CsLHW, CsGL3, CsIRX9, and CsWOX5. It would be advisable to utilize other marker genes expressed in a higher percentage of cells or employ a combination of multiple marker genes for more accurate annotation.

      Thank you for your comments. In this study, we first utilized classical marker genes, such as CsWAT1 and CsPP2, to annotate cell types. The expression patterns of these marker genes were confirmed using in situ RT-PCR. Additionally, a combination of multiple marker genes was employed for cell type annotation. We also analyzed the top 10 cluster-enriched genes, in each cluster, and their homologous expression in Arabidopsis, populus, etc., to serve as a reference for cluster annotation (Figure 2D; Supplemental Figures 2-6; Supplemental data 3). Subsequently, differentiation trajectories of root cells were analyzed based on pseudo-time analyses, which aligned well with cell type annotation and further supported the reliability of our annotations through these combined methods.

      (2) Figure 3 could enhance clarity by displaying the trajectory of cell differentiation atop the UMAP, similar to the examples demonstrated by Monocle 3.

      Thanks for this advice. We have supplied the trajectory of cell differentiation atop the UMAP in the revised supplemental figure 7 (Line 185).

      (3) The identification of CsLBD37 primarily relies on bulk RNA-seq data. The manuscript could benefit from elaborating on the role of the single-cell dataset in this context.

      Thanks for your comments. In this study, we determined that CsTSI was highly expressed in cluster 8, but its regulator CsMYB6 was highly expressed in cluster 3, cluster 6 and cluster 1 (Line 301-304). Thus, target genes and their regulators seem not to always be highly expressed in the same cell cluster. A similar situation was also observed in terms of CsAlaDC transcriptional regulation (Line 305-311). Based on these findings, we hypothesized that, for the regulation of theanine biosynthesis, it is not necessary for transcription factors and target genes to always be highly expressed in the same cells. Thus, taking the transcriptional regulation of CsAlaDC as an example, we next analyzed the TFs that were co-expressed with CsAlaDC to test this notion. We used scRNA-seq data to screen for genes that were not highly co-expressed with CsAlaDC, such as CsLBD37, to test our hypothesis (Line 338-340 and Line 365).

      (4) The manuscript's conclusions predominantly rely on the expression patterns of key genes. This reliance might stem from the inherent challenges of tea research, which often faces limitations in exploring molecular mechanisms due to the lack of suitable genetic and molecular methods. The authors may consider discussing this point further in the discussion section.

      Thanks for your suggestions and we totally agree. We discussed this point in the discussion section, “In some non-model plants, including tea, transgenic technologies are not currently available and, hence, we used in situ RNA hybridization to establish the location(s) for gene expression. In some studies, isolation of different cell types was combined with q-RT-PCR to detect cell-type marker gene expression (Wang et al., 2022). However, this approach has two limitations in that it cannot display the gene location directly and has only low resolution”, “After numerous trials, we were able to optimize in situ RT-PCR assays (detailed in the Methods), which enabled a cell-specific characterization of gene expression in tea plant root cells, prior to establishing a genetic transformation system for tea…we note the challenge associated with weak calling of homologous marker genes…” (Line 431-444).

      Reviewer #3 (Public Review):

      Summary:

      Lin et al., performed a scRNA-seq-based study of tea roots, as an example, to elucidate the biosynthesis and regulatory processes for theanine, a root-specific secondary metabolite, and established the first map of tea roots comprised of 8 cell clusters. Their findings contribute to deepening our understanding of the regulation of the synthesis of important flavor substances in tea plant roots. They have presented some innovative ideas.

      It is notable that the authors - based on single-cell analysis results - proposed that TFs and target genes are not necessarily always highly expressed in the same cells. Many of the important TFs they previously identified, along with their target genes (CsTSI or CsAlaDC), were not found in the same cell cluster. Therefore, they proposed a model in which the theanine biosynthesis pathway occurs via multicellular compartmentation and does not require high co-expression levels of transcription factors and their target genes within the same cell cluster. Since it is not known whether the theanine content is absolutely high in the cell cluster 1 containing a high CsAlaDC expression level (due to the lack of cell cluster theanine content determination, which may be a current technical challenge), it is difficult to determine whether this non-coexpressing cell cluster 1 is a precise regulatory mechanism for inhibiting theanine content in plants.

      Thank you for your comments. We concur with your assessment that the accumulation level of the spatial distribution of theanine may affect the expression of these genes. However, as you said, due to some technical limitations, we are not currently in a position to verify this distribution of theanine at the root cell spatial level. The spatial distribution of theanine in the roots can be affected by transport processes. So, it is likely that the cell types in which theanine is distributed do not exactly correspond to the cell types in which theanine is being synthesized (Line 491-493). We will make efforts in this direction to characterize the spatial distribution of theanine using techniques such as spatial metabolome and mass spectrometry imaging in the future (Line 582-586).

      In fact, there are a small number of cells where TFs and CsAlaDC are simultaneously highly expressed, but the quantity is insufficient to form a separate cluster. However, these few cells may be sufficient to meet the current demands for theanine synthesis. This possibility may better align with some previous experiments and validation results in this study. Moreover, I feel that under normal conditions, plants may not mobilize a large number of cells to synthesize a particular substance. Perhaps, cell cluster 1 is actually a type of cell that inhibits the synthesis of theanine, aiming to prevent excessive theanine production? I do not oppose the model proposed by the author, but I feel there is a possibility as I mentioned. If it seems reasonable, the author may consider adding it to an appropriate position in the discussion.

      Thanks a lot for your suggestion. We agree that tea plant roots likely have mechanisms aiming to prevent excessive theanine production.We have improved our discussion according your suggestion. 

      Theanine is the most abundant free amino acid in the tea plant, accounting for 1-2% of leaf dry weight (Line 62-63), and can even reach 4-6% in the root, accounting for more than 60%-80% of the total free amino acids (Yang et al., 2020). This means that theanine biosynthesis indeed requires the root cells to consume significant resources and energy. Thus, theanine biosynthesis needs to be controlled by a series of regulation mechanisms, which would function as a “brake”. In a previous study, we suggested that CsMYB40 and CsHHO3 bound to the CsAlaDC promoter to regulate theanine synthesis, at the transcription level, in “accelerator” or “brake” mode to maintain stable synthesis of theanines (Guo et al., 2022). At a posttranslational level, CsTSI and CsAlaDC are modified by ubiquitination, which is probably involved in the degradation of these proteins in response to N levels (Wang et al., 2021). In the current study, we discovered a novel “brake” in the form of spatial separation. The differential expression of AlaDC and TSI suggests that ethylamine and theanine are synthesized in separate different cell types, allowing cell compartmentalization of the synthetic precursor and the product to form multicellular compartmentation of metabolites (Line 270-280). On the one hand, compartmentalization may effectively prevent interference between secondary metabolic pathways, whereas compartmentalization could also be used as a way of metabolic regulation to avoid excessive, or inhibition of, theanine synthesis (Line 483-488).

      Reference

      Guo, J. et al. (2022). Potential “accelerator” and “brake” regulation of theanine biosynthesis in tea plant (Camellia sinensis). Hortic. Res. 9:uhac169.

      Yang, T. et al. (2020). Transcriptional regulation of amino acid metabolism in response to nitrogen deficiency and nitrogen forms in tea plant root (Camellia sinensis L.). Sci. Rep. 10:6868.

      Wang, Y. et al. (2021). Nitrogen-Regulated Theanine and Flavonoid Biosynthesis in Tea Plant Roots: Protein-Level Regulation Revealed by Multiomics Analyses. J Agric Food Chem. 69:10002-10016.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) The dataset, including the raw sequencing data and processed files is *.Rdata and should be deposited in a public database for accessibility and reproducibility.

      Thanks for your comments and advice. The raw data and processed files have been submitted to the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE267845 (Line 763-764).

      (2) Providing the code for the primary analysis steps in a publicly accessible location would facilitate others in replicating the analysis.

      Thank you for your comment. Unfortunately, we have been unable to obtain permission to publicly release a portion of the primary analysis code due to its intellectual property belonging to OE Corporation.

      (3) Enhancements in the writing of the manuscript are recommended for improved clarity and coherence.

      Thanks. We revised our writing to improve the manuscript clarity and coherence.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for revisions:

      (1) Introduction and Discussion, there are too many paragraphs, even one sentence is a paragraph. I suggest that all the sentences in Introduction be merged into three big paragraphs. For example, lines 30-57 become the first paragraph, lines 58-87 become the second paragraph, lines 88-106 become the third paragraph, and the authors can merge them reasonably according to the content. The discussion part is also suggested to be divided into several paragraphs according to the focus, and perhaps it is more appropriate to give a title to each paragraph.

      Thank you for your comments and suggestions. We have merged several paragraphs and added a title to each paragraph in the Discussion section (“Cell cluster annotation of non-transgenic plants” in line 428; “Nitrogen metabolism and transport of tea plant root at the single cell level” in line 445; “Multicellular compartmentation of theanine metabolism and transport” in line 469; “The regulation of theanine biosynthesis at the single cell level” in line 517; “Cross-talk between theanine metabolism and root development” in line 554).

      (2) Tea is a food, while tea tree is a substance. It should be tea plant root instead of tea root, it is suggested to revise this issue in the whole text.

      Thanks. We corrected “tea root” to “tea plant root” in this manuscript.

      (3) Lines 35-43, this sentence is too long, suggest each example should be one sentence.

      Thanks. We revised this sentence into short sentences. We changed this part to “Root-synthesized flavonoids regulate root tip growth through affecting auxin transport and metabolism (Santelia et al., 2008; Wan et al., 2018). Legume roots secrete flavonoids as signaling agents to attract symbiotic bacteria, such as Rhizobium for nitrogen fixation (Hartman et al., 2017). In Abies nordmanniana, volatile organic compounds (e.g., propanal, g-nonalactone, and dimethyl disulfide) function to recruit certain bacteria or fungi, such as Paenibacillus. Paenibacillus sp. S37 produces high quantities of indole-3-acetic acid that can then promote plant root growth (Garcia-Lemos et al., 2020; Schulz-Bohm et al., 2018).” (Line 35-42)

      (4) Line 510 is missing a reference.

      Thank you - we have added the reference in the revised manuscript (Line 549 and Line 840-842).

    1. Reviewer #1 (Public review):

      Summary:

      In their paper, Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar video-radiography of markers implanted in the tongue. Their findings indicate that most units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which resulted in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.

      Strengths:

      The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. Moreover, they employed a nerve-blocking procedure to halt sensory feedback. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.

      Weaknesses:

      Aside from the last part of the result section, the majority of the analyses in this paper are focused on single units. I understand the need to characterize the number of single units that directly code for external variables like movement direction, especially for less-studied areas like the orofacial part of the sensory-motor cortex. However, as a field, our decade-long experience in the arm region of sensory-motor cortices suggests that many of the idiosyncratic behaviors of single units can be better understood when the neural activity is studied at the level of the state space of the population. By doing so, for the arm region, we were able to explain why units have "mixed selectivity" for external variables, why the tuning of units changes in the planning and execution phase of the movement, why activity in the planning phase does not lead to undesired muscle activity, etc. See (Gallego et al. 2017; Vyas et al. 2020; Churchland and Shenoy 2024) for a review. Therefore, I believe investigating the dynamics of the population activity in orofacial regions can similarly help the reader go beyond the peculiarities of single units and in a broader view, inform us if the same principles found in the arm region can be generalized to other segments of sensory-motor cortex.

      Further, for the nerve-blocking experiments, the authors demonstrate that the lack of sensory feedback severely alters how the movement is executed at the level of behavior and neural activity. However, I had a hard time interpreting these results since any change in neural activity after blocking the orofacial nerves could be due to either the lack of the sensory signal or, as the authors suggest, due to the NHPs executing a different movement to compensate for the lack of sensory information or the combination of both of these factors. Hence, it would be helpful to know if the authors have any hint in the data that can tease apart these factors. For example, analyzing a subset of nerve-blocked trials that have similar kinematics to the control.

    2. Reviewer #3 (Public review):

      Summary:

      In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray-based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. Using linear regressions, they characterize the tuning properties and distributions of the recorded population during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties, and susceptibility to perturbed sensory input are different.

      Strengths:

      The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data.

      Weaknesses:

      However, this paper has a number of weaknesses in the analysis of this data.

      It is unclear how reliable the neural responses are to the stimuli. The trial-by-trial variability of the neural firing rates is not reported. Thus, it is unclear if the methods used for establishing that a neuron is modulated and tuned to a direction are susceptible to spurious correlations. The authors do not use shuffling or bootstrapping tests to determine the robustness of their fits or determining the 'preferred direction' of the neurons. This weakness colors the rest of the paper.

      The authors compare the tuning properties during feeding to those during licking but only focus on the tongue-tip. However, the two behaviors are different also in their engagement of the jaw muscles. Thus many of the differences observed between the two 'tasks' might have very little to do with an alternation in the properties of the neural code - and more to do with the differences in the movements involved. Many of the neurons are likely correlated with both Jaw movements and tongue movements - this complicates the interpretations and raises the possibility that the differences in tuning properties across tasks are trivial.

      The population analyses for decoding are rudimentary and provide very coarse estimates (left, center, or right), it is also unclear what the major takeaways from the population decoding analyses are. The reduced classification accuracy could very well be a consequence of linear models being unable to account for the complexity of feeding movements, while the licking movements are 'simpler' and thus are better accounted for.

      The nature of the nerve block and what sensory pathways are being affected is unclear - the trigeminal nerve contains many different sensory afferents - is there a characterization of how effectively the nerve impulses are being blocked? Have the authors confirmed or characterized the strength of their inactivation or block, I was unable to find any electrophysiological evidence characterizing the perturbation.

      Overall, while this paper provides a descriptive account of the observed neural correlations and their alteration by perturbation, a synthesis of the observed changes and some insight into neural processing of tongue kinematics would strengthen this paper.

    1. The point of GPL licenses is to protect the user of the software, not the developer. If you want "protection" as a developer, use MIT (disclaimer of warranty). GPL "infects" other parts of a system to combat a work-around which was used to violate the software freedom of the user, by firewalling sections of GPL'ed code from the rest of the system. If you don't care about your users' software freedom in the first place, then (L)GPL is the wrong choice.
      • goal: protect user rights/freedoms
      • non-goal: protect developer rights/freedoms
    1. I don't expect everyone to read every single line of the code for a project they are trying to use, that isn't very reasonable. What I do see though, is that a lot developers have a mental barrier to actually opening up the source code for the project they are trying to use. They will read the documentation, run the tests, use the example code, but when they are faced with a problem that could be solved through a one or two line change in the source code, they shut down completely. The point is that you shouldn't be afraid to jump into the source code. Even if you don't fully understand the source code, in many cases you should be able to isolate your issue to a specific block. If you can reference this block ( or line numbers ) when opening up your support request, it will help the author better understand your problem.
    2. On many occasions, I've opened up requests for support in the form of a Github pull request. This way, I am telling the author: I have found a potential problem with your library, here is how I fixed it for my circumstance, here is the code I used for reference. You get extra internet points if you open the pull request with: "I don't expect this pull request to get merged, but I wanted to you show you what I did".
    1. When the insight arrived, I didn’t notice the connection to the trail I’d laid on the preceding pages. My experience was of making no progress, and then, finally, making some. In hindsight, I can see that I had been making plenty of progress over those weeks; I just couldn’t tell at the time. I suspect this is pretty common in my work. So, “I feel like I’m not making progress” is probably not a good local heuristic for guiding my work. Alternately, the lesson might be that I need to become more sensitive to the many subtler flavors of progress in this kind of work

      This rings true. The friction, the struggle is the work, at least when it comes to my knowledge work. Interesting is that when the jump happens I tend to phrase it as an escape, a way of fleeing forward. When I got stuck in a major research project in 2020, the key insight to unlock it was a gasp of desperation more than a bolt of lightning. Colleagues immediately told me that was the key, but to me it felt like using a cheat code. Now in hindsight, I think it was the best possible outcome but that oroginal sense of escape remains.

  3. qccmass-my.sharepoint.com qccmass-my.sharepoint.com
    1. Now, that’s something, ain’tit? Code meshing use the way people already speak and write and helpthem be more rhetorically effective. It do include teaching some punctua-tion rules, attention to meaning and word choice, and various kinds ofsentence structures and some standard English

      This is a summary of what Young believes, and I agree with him because it will help many writers from different backgrounds to be efficient.

    1. Foam is an open-source alternative to RoamResearch and Obsidian, and it works on the basis of Git version control system and Visual Studio Code code editor.

      for - notetaking software - Obsidian - Roam Research - open source alternative to - Foam

      notetaking software - Obsidian - Roam Research - open source alternative to - Foam - Microsoft owns Github and Foam is served from Github

      to - Foam - https://hyp.is/Pf6tKnXBEe-rkdcD0hmZGA/foambubble.github.io/foam/

    1. Résumé de la vidéo [00:00:00][^1^][1] - [00:19:03][^2^][2]:

      Cette conférence de Flavien Chervet explore l'impact de l'intelligence artificielle (IA) sur la société, en mettant l'accent sur les systèmes génératifs et les transformations technologiques récentes.

      Moments forts: + [00:00:07][^3^][3] Introduction et objectifs * Présentation de Flavien Chervet * Importance des démonstrations pratiques * Utilisation de l'art généré par IA + [00:01:54][^4^][4] Évolution de l'IA * Historique de l'IA depuis les années 1950 * Développement du machine learning dans les années 2000 * Importance de l'apprentissage pour l'IA + [00:09:00][^5^][5] Applications pratiques de l'IA * Utilisation de l'IA dans divers domaines * Exemples de machine learning et deep learning * Impact sur les industries et la médecine + [00:13:45][^6^][6] Révolution et disruption de l'IA * Chute des barrières à l'entrée pour l'IA * Disruption socio-économique causée par l'IA * Comparaison avec d'autres technologies émergentes + [00:14:47][^7^][7] Technologie des Transformers * Introduction des Transformers par Google en 2017 * Impact sur la traduction automatique * Compréhension sémantique et implications futures

      Résumé de la vidéo [00:19:06][^1^][1] - [00:38:53][^2^][2]:

      Cette vidéo explore les avancées et les implications de l'intelligence artificielle (IA), en mettant l'accent sur les modèles de fondation et leur capacité à générer des données et à comprendre la langue.

      Points forts : + [00:19:06][^3^][3] Dark Knowledge et intelligence numérique * Émergence de capacités de raisonnement et de créativité * Importance de la compréhension de la langue * Différence entre intelligence humaine et numérique + [00:20:29][^4^][4] Modèles de fondation et Transformers * Inversion du Deep Learning traditionnel * Entraînement sur des tâches généralistes * Applications variées des modèles de fondation + [00:24:47][^5^][5] Pré-entraînement et générativité * Pré-entraînement sur des tâches généralistes * Capacité à générer de nouveaux exemples * Exemple de l'expérience artistique avec Rembrandt + [00:28:39][^6^][6] Créativité et IA générative * IA participant activement à la création * Impact sur la culture et la civilisation * Démonstration de la co-créativité humain-machine + [00:31:01][^7^][7] Prompt engineering et interaction avec l'IA * Importance de bien formuler les demandes * Techniques pour obtenir des réponses créatives * Exemples de prompts pour la création de bijoux

      Résumé de la vidéo [00:38:56][^1^][1] - [01:00:24][^2^][2]:

      Cette partie de la vidéo explore l'utilisation de l'intelligence artificielle (IA) dans divers domaines, notamment la création de musique, la génération de code et la robotique. Flavien Chervet démontre comment l'IA peut simplifier des tâches complexes et améliorer l'expérience utilisateur.

      Moments forts : + [00:39:36][^3^][3] Reconnaissance sémantique d'image * Suppression de l'arrière-plan * Gain de temps par rapport à Photoshop * Utilisation instantanée de l'image + [00:40:29][^4^][4] Génération de musique avec Suno * Création de musique unique pour des bijoux * Utilisation de ChatGPT pour les prompts * Génération de paroles et de musique + [00:44:00][^5^][5] Création de pages web avec ChatGPT * Génération de code HTML * Intégration de musique et d'images * Simplification du processus de création + [00:48:00][^6^][6] Multimodalité et robotique * Unification des systèmes IA * Progrès en robotique grâce à l'IA * Exemples de robots avancés + [00:55:00][^7^][7] Agentivité des systèmes IA * Modèles du monde robustes * Utilisation d'outils par l'IA * Impact sur le travail et la société

    1. Latticework uses a similar pane-aware interaction

      This pane awareness is what seemd to clash with some other plugin I run. At least it did in feb. I notice their code repo still warns about clashes with other plugins and to run it in a separate vault with no other plugins.

    1. One way to better secure Internet communications is to use cryptographically verifiable Primitives and data structures inside Messages and in support of messaging protocols. Cryptographically verifiable Primitives provide essential building blocks for zero-trust computing and networking architectures. Traditionally, Cryptographic Primitives, including but not limited to digests, salts, seeds (private keys), public keys, and digital signatures, have been largely represented in some binary encoding. This limits their usability in domains or protocols that are human-centric or equivalently that only support ASCII text-printable characters RFC20. These domains include source code, documents, system logs, audit logs, legally defensible archives, Ricardian contracts, and human-readable text documents of many types [RFC4627].

      Security depends on cryptographically verifiable primitives. Cryptography is native to binary, which makes text-based human readable Protocols (like JSON) awkward.

    2. This limits their usability in domains or protocols that are human-centric or equivalently that only support ASCII text-printable characters RFC20. These domains include source code, documents, system logs, audit logs, legally defensible archives, Ricardian contracts, and human-readable text documents of many types [RFC4627].

      Ok, this confirms that "text domain" means Human Readable

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Deletion of the hrp2 and hrp3 loci in P. falciparum poses an immediate public health threat. This manuscript provides a more complete understanding of the dynamic nature with which these deletions are generated. By delving into the likely mechanisms behind their generation, the authors also provide interesting insight into general Plasmodium biology that can inform our broader understanding of the parasite's genomic evolution.

      Strengths:

      The sub-telomeric regions of P. falciparum (where hrp2 and hrp3 are located) are notoriously difficult to study with short-read sequence data. The authors take an appropriate, targeted approach toward studying the loci of interest, which includes read-depth analysis and local haplotype reconstruction. They additionally use both long-read and short-read data to validate their major findings. There is an extensive set of supplementary plots, which helps clarify several aspects of the data.

      Weaknesses:

      In this first version, there are a few factors that hinder a full assessment of the robustness and replicability of the results.

      Reviewer #1 (Recommendations For The Authors):

      Reviewer comment: First, a number of the analyses lack basic details in the methods; for instance, one must visit the authors' personal website to find some of the tools used.

      We have extensively updated the methods to clarify which tools were used and how they were run. All code and results for the analyses have been deposited in Zenodo at https://doi.org/10.5281/zenodo.12167687.

      Reviewer comment: Second, there are several tricky methodological points that are not fully documented. Read depths are treated (and plotted) discretely as 0/1/2 without any discussion of how thresholds were used and determined.

      We have added to the methods section the full details on how read depth was handled, including rounding to the closest 1 normalized coverage for visualizations. To ensure analysis of only highly confident deleted strains, normalized coverage of 0.1 or more was round to 1 instead of 0. Samples were considered for potential genomic deletion if they had zero coverage after rounding from chromosome 8 1,375,557 to 1,387,982 for pfhrp2, chromosome 13 from 2,841,776 to 2,844,785 for pfhrp3, and from chromosome 11 1,991,347 to 2,003,328. These numbers were chosen after visual inspection of samples with any zero coverage within the genomic region of pfhrp2/3.

      Reviewer comment: For read mapping to standard vs hybrid chromosomes, there is no documentation on how assignments were made if partially ambiguous or how final sample calls were determined when some reads were discordant. There is no mention of how missing data were handled. Without this, it is difficult to know when conclusions were based on analyses that were more quantitative (for instance, using pre-determined read thresholds) or more subjective (with patterns being extracted visually).

      We have updated several parts of the methods section to explicitly state what thresholds and analysis pipelines to use, making our documentation clearer. For mapping to the hybrid vs standard chromosomes for the long reads, spanning reads across the duplicated region were required to extend 50bp upstream and downstream of the region. These regions are significantly different between chromosomes 11 and 13, so requiring spanning reads to map to these regions prevented multi-mapping reads. Reads that started within the duplicated region were allowed to map to both the hybrid and standard chromosomes for visualization in Figure 4. Importantly, for both HB3 and SD01, no reads spanned from the duplicated region into chromosome 13, showing a complete lack of reads that contained the portion of chromosome 13 that came after the duplicated region. None of the other isolates had any spanning reads across the hybrid chromosomes. Details on deletion calls were based on initial visualization of pfhrp2/3 and then on read thresholds (see above response for details).

      Reviewer comment: Third, while a new method is employed for local haplotype reconstruction (PathWeaver), the manuscript does not include details on this approach or benchmarking data with which to evaluate its performance and understand any potential artifacts.

      We have added an analysis based on biallelic SNPs to compare to the PathWeaver results, which produced similar results to help validate the PathWeaver results. PathWeaver manuscript is in preparation.

      Reviewer #2 (Public Review):

      This work investigates the mechanisms, patterns, and geographical distribution of pfhrp2 and pfhrp3 deletions in Plasmodium falciparum. Rapid diagnostic tests (RDTs) detect P. falciparum histidine-rich protein 2 (PfHRP2) and its paralog PfHRP3 located in subtelomeric regions. However, laboratory and field isolates with deletions of pfhrp2 and pfhrp3 that can escape diagnosis by RDTs are spreading in some regions of Africa. They find that pfhrp2 deletions are less common and likely occur through chromosomal breakage with subsequent telomeric healing. Pfhrp3 deletions are more common and show three distinct patterns: loss of chromosome 13 from pfhrp3 to the telomere with evidence of telomere healing at breakpoint (Asia; Pattern 13-); duplication of a chromosome 5 segment containing pfhrp1 on chromosome 13 through non-allelic homologous recombination (NAHR) (Asia; Pattern 13-5++); and the most common pattern, duplication of a chromosome 11 segment on chromosome 13 through NAHR (Americas/Africa; Pattern 13-11++). The loss of these genes impacts the sensitivity of RDTs, and knowing these patterns and geographic distribution makes it possible to make better decisions for malaria control.

      Reviewer #3 (Public Review):

      Summary:

      The study provides a detailed analysis of the chromosomal rearrangements related to the deletions of histidine-rich protein 2 (pfhrp2) and pfhrp3 genes in P. falciparum that have clinical significance since malaria rapid diagnostic tests detect these parasite proteins. A large number of publicly available short sequence reads for the whole genome of the parasite were analyzed, and data on coverage and discordant mapping allowed the authors to identify deletions, duplications, and chromosomal rearrangements related to pfhrp3 deletions. Long-read sequences showed support for the presence of a normal chromosome 11 and a hybrid 13-11 chromosome lacking pfhrp3 in some of the pfhrp3-deleted parasites. The findings support that these translocations have repeatedly occurred in natural populations. The authors discuss the implications of these findings and how they do or do not support previous hypotheses on the emergence of these deletions and the possible selective pressures involved.

      Strengths:

      The genomic regions where these genes are located are challenging to study since they are highly repetitive and paralogous and the use of long-read sequencing allowed to span the duplicated regions, giving support to the identification of the hybrid 13-11 chromosome.

      All publicly available whole-genome sequences of the malaria parasite from around the world were analysed which allowed an overview of the worldwide variability, even though this analysis is biased by the availability of sequences, as the authors recognize.

      Despite the reduced sample size, the detailed analysis of haplotypes and identification of the location of breakpoints gives support to a single origin event for the 13-5++ parasites.

      The analysis of haplotype variation across the duplicated chromosome-11 segment identified breakpoints at varied locations that support multiple translocation events in natural populations. The authors suggest these translocations may be occurring at high frequency in meiosis in natural populations but are strongly selected against in most circumstances, which remains to be tested.

      Weaknesses:

      Reviewer comment: Relying on sequence data publicly available, that were collected based on diagnostic test positivity and that are limited by sequencing availability, limits the interpretation of the occurrence and relative frequency of the deletions.

      However, we have uncovered more mechanisms than previously detected for hrp2 (involving MDR1) in SEA and South American parasites are likely detected by microscopy as RDTs were never introduced due to the presence of the deletions.

      Reviewer comment: In the discussion, caution is needed when identifying the least common and most common mechanisms and their geographical associations. The identification of only one type of deletion pattern for Pfhrp2 may be related to these biases.

      We added a section in the Discussion on the limitations of our study, which states the following, “Limitations of this study include the use of publicly available sequencing data that were collected often based on positive rapid diagnostic tests, which limits our interpretation of the occurrence and relative frequency of these deletions. This could introduce regional biases due to different diagnostic methods as well as limit the full range of deletion mechanisms, particularly pfhrp2.”

      Reviewer comment: The specific objectives of the study are not stated clearly, and it is sometimes difficult to know which findings are new to this study. Is it the first study analyzing all the worldwide available sequences? Is it the first one to do long-read sequencing to span the entire duplicated region?

      In the Introduction, we added, “The objectives of this study were to determine the pfhrp3 deletion patterns along with their geographical associations and sequence and assemble the chromosomes containing the deletions using long-read sequencing.”

      We also added in the Discussion, “To the best of our knowledge, no prior studies have performed long-read sequencing to definitively span and assemble the entire segmental duplication involved in the deletions.”

      Reviewer comment: Another aspect that should be explained in the introduction is that there was previous information about the association of the deletions to patterns found in chromosomes 5 and 11. In the short-read sequences results, it is not clear if these chromosomes were analysed because of the associations found in this study (and no associations were found to putative duplications or deletions in other chromosomes), or if they were specifically included in the analysis because of the previous information (and the other chromosomes were not analysed).

      The former is correct. Chromosomes 5 and 11 were analyzed due to the associations found in this study, not from prior information. We have added the following sentence in the Results: “As a result of our short-read analysis demonstrating these three patterns and discordant reads between the chromosomes involved, chromosomes 5, 11, and 13 were further examined. No other chromosomes had associated discordant reads or changes in read coverage. ”

      Reviewer comment: An interesting statement in the discussion is that existing pfhrp3 deletions in a low-transmission environment may provide a genetic background on which less frequent pfhrp2 deletion events can occur. Does it mean that the occurrence of pfhrp3 deletions would favor the pfhrp2 deletion events? How, and is there any evidence for that?

      We should have stated more explicitly that selection would better be able to act on the now doubly deleted parasite versus a parasite with HRP3 still intact and weakly detectable by RDTs.Since fully RDT-negative parasites require a two-hit mechanism, where both pfhrp2 and pfhrp3 need to be deleted, and since there appear to be more mechanisms and drivers for pfhrp3 deletions, this would create a population of parasites with one hit already and would only require the additional hit of pfhrp2 deletion to occur to become RDT negative. So the point in the discussion being made is not that the pfhrp3 deletion would favor pfhrp2 deletion but rather that there is a population circulating with one hit already, which would make it more likely that the less frequent pfhrp2 deletion would result in a dual deleted parasite and therefore an RDT-negative parasite. The discussion has been modified to the following to try to make this point more clear. “In the setting of RDT use in a low-transmission environment, a pfhrp2 deletion occurring in the context of an existing pfhrp3 deletion may be more strongly selected for compared to pfhrp2 deletion occurring alone still detectable by RDTs.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Reviewer comment: In the text, clonal propagation is the proposed hypothesis for the presence of near-identical copies of the chromosome 11 duplicated region. Even among the parasites showing variation between chromosomes, Figure 5 shows 3 haplotype groups with multiple sample members, which is also suggestive that these are highly related parasites. In addition to confirming COI status, it would be straightforward to calculate the genome-wide relatedness between/among parasites belonging to the same haplotype group. The assumption is that they are clones or highly related. A different finding would require more thought into potential genomic artifacts driving the pattern.

      Thank you for this helpful suggestion. We confirmed the COI of each sample using THE REAL McCOIL. Six samples were not monoclonal, and we removed these samples from the downstream analysis to remove any contribution of polyclonal samples to the downstream haplotype analysis. Then, by using hmmIBD on whole-genome biallelic SNPs, we determined the whole-genome relatedness between the parasites. The haplotype groups do appear clonal though there appear to be several clonal groups within the larger groups of clusters 01 (n=28) and 03 (n=12) which combined with the variation seen within the 15.2kb region on chromosome 11/13, there appears to be different events that then lead to the same duplicated chromosome 11.

      Reviewer comment: By way of validating the PathWeaver results, it could be useful to use another comparator method on the samples that are COI=1 or 2.

      We have added an analysis based on biallelic SNPs to compare to the PathWeaver results, which produced similar results to help validate the PathWeaver results. We continued to use PathWeaver (Hathaway, in preparation), which is better able to detect variation relative to standard GATK4 analyses due to the refined local alignments from assembled haplotypes.

      Questions regarding Methods:

      Reviewer comment: Were any metrics of genome quality factored into sample selection?

      Yes, samples were removed if there was less than <5x median whole genome coverage. Additionally, several subsets of sWGA samples were removed based on visual inspection. These details have been added to the methods section.

      Reviewer comment: How were polyclonal samples treated to ensure they did not produce analysis artifacts?

      The read-depth analysis required zero coverage across the regions of pfhrp2/pfhrp3, which made it so that most of the samples analyzed were monoclonal (or polyclonal infections of only deleted strains). We have now used THE REAL McCOIL on whole genome SNPs to determine COIs. Six samples were identified as polyclonal, and we removed them for the analysis and updated the manuscript. Their removal did not significantly impact the results or conclusions.

      Reviewer comment: How was local realignment of short-read data performed? Was this step informed by the conserved, non-paralogous genomic regions, or were these only used for downstream variant analysis?

      No local realignment of short-read data was performed. The analysis was either read depth or de novo assembly from reads from specific regions. Regarding the de novo assembly, variant calls were replaced by complete local haplotypes, and a region was typed based on the haplotype called for the region.

      Reviewer comment: For read-depth estimation, what cutoffs were used to classify windows as deletion, WT, or duplication? How much variability was present in the data? The plot legends imply a continuous scale, but in reality, only 3 discrete colors are used (0, 1, 2), so these must represent the data after rounding.

      These have been added to the manuscript. See response to Reviewer #1 questions #2 and #3 above

      Reviewer comment: Similarly, what thresholds were used for mapping the long-reads? In Fig S21, it appears there is a high proportion of discordant reads.

      Long reads were mapped using minimap2 with default settings. For Figure 21, since it is from the mappings to 3D7 chromosome 11 and hybrid 3D7 13-11 chromosome, the genome from the duplicated region from the blue bar underneath is identical, so reads are expected to map to both since the genome regions are identical. The significance of this figure and Figure 4 is the number of long reads that span the whole chr11/13 duplicated region connection the 3D7 chromosome 11 and the hybrid proving that there are reads that start with chromosome 13 sequence and end with chromosome 11 sequence and the lack of reads that span from chromosome 13 into the 3D7 chromosome 13.

      Reviewer comment: The section on the mdr1 breakpoints is too vague.

      We have updated the methods section to be more explicit about how these breakpoints were determined.

      Reviewer comment: I assume that the "Homologous Genomic Structure" section of the Methods is the number analysis that was alluded to in the Results? As with other sections, this needs more information on exact methods and tools

      We have now updated the methods section to include exactly how the nucmer commands were run.

      Smaller comments:

      Reviewer comment: Introduction sub-header: "Precise *pfhrp2* and..."

      We have corrected the sub-header.

      Reviewer comment: Results (p.5) cite Table S4 instead of S3

      We have corrected this to Table S3.

      Reviewer comment: Results (p.5) "We identified 27 parasites with pfhrp2 deletion, 172 with pfhrp3 deletion, and 21 with both pfhrp2 and pfhrp3 deletions." This sentence makes it sound like they are 3 mutually exclusive categories. I'd suggest a rewording like "We identified 27 parasites with pfhrp2 deletion and 172 with pfhrp3 deletion. Of these, 21 contained both deletions."

      We have re-worded this sentence to the following: “We identified 26 parasites with pfhrp2 deletion and 168 with pfhrp3 deletion. Twenty field samples contained both deletions; 11 were found in Ethiopia, 6 in Peru, and 3 in Brazil, and all had the 13-11++ pfhrp3 deletion pattern.”

      Reviewer comment: The annotations used for the deletions differ between the text and the figures. It would be easier for the reader to harmonize the two if these matched.

      The figures have been updated to reflect the annotations of the text.

      Reviewer comment: Figure numbering does not match the order they are first referenced in the text

      The figure numbers have been updated to match the order in which they are first referenced.

      Reviewer comment: Results (p. 8) there is no Table S4

      This has been changed to Table S3.

      Reviewer comment: Results (p.8) mention a genome-wide number analysis, but I couldn't find these results. The referenced figure is for the duplicated region only.

      We have updated to point to the correct location of the nucmer results by adding a supplemental table with the results and updated to point to the correct figure.

      Reviewer comment: Discussion typo: "Here, we used publicly available short-read and long-read *short-read sequencing data* from..."

      This was not a typo, as we used publicly available PacBio long-read data and then generated new Nanopore long-read data. However, we did clarify this in the sentence.

      Reviewer #2 (Recommendations For The Authors):

      Introduction

      Reviewer comment: "(...) suggesting the genes have important infections in normal infections and their loss is selected against". The word "infections" is in place of "role", etc.

      We have changed the word accordingly.

      Results

      Reviewer comment: In the section "Pfhrp2 and pfhrp3 deletions in the global P. falciparum genomic dataset" it is mentioned the number of parasites with each deletion and where it is more common. "We identified 27 parasites with pfhrp2 deletion, 172 with pfhrp3 deletion, and 21 with both pfhrp2 and pfhrp3 deletions." and "Across all regions, pfhrp3 deletions were more common than pfhrp2 deletions; specifically, pfhrp3 deletions and pfhrp2 deletions were present in Africa in 43 and 12, Asia in 53 and 4, and South America in 76 and 11 parasites." It is not clear where the 21 parasites with both pfhrp2 and pfhrp3 deletions are located.

      We have specified the following in the Results section: “We identified 26 parasites with pfhrp2 deletion and 168 with pfhrp3 deletion. Twenty field samples contained both deletions; 11 were found in Ethiopia, 6 in Peru, and 3 in Brazil, and all had the 13-11++ pfhrp3 deletion pattern”

      Reviewer comment: "It should be noted that these numbers are not accurate measures of prevalence given that most WGS specimens have been collected based on RDT positivity." This, combined with the fact that subtelomeric regions are difficult to sequence and assembly, means these numbers are underestimated. I believe it should be more stressed in the text.

      We have added the following sentence, “Furthermore, subtelomeric regions are difficult to sequence and assemble, meaning these numbers may be significantly underestimated.”

      Reviewer comment: In the section "Pattern 13-11++ breakpoint occurs in a segmental duplication of ribosomal genes on chromosomes 11 and 13", Figures 2a and 2b should be mentioned in the text instead of just Figure 2.

      We have specified Figures 2a and 2b in the text now.

      Figures and Tables:

      Reviewer comment: Figure 2: I believe the color scale for percentage of identity is unnecessary given that the goal is to show that the paralogs are highly similar, and not that there is a significant difference between 0.99 and 0.998.

      Updated the color scale to represent the number of variants between segments rather than percent identity which ranges between 55-133 so that it represents something more discreet than 0.99 and 0.998.

      Reviewer comment: Adjust Figure 2b and the size of supplementary figure legends.Supplementary Figure 5-15: the legends are hard to read.

      All legends have been adjusted to be much more readable.

      Reviewer #3 (Recommendations For The Authors):

      Some minor suggestions:

      Reviewer comment: The order of the figures should follow the flow of the text, for example, Figure 5 appears in the text between Figure 1 and Figure 2.

      We have reordered the figures according to the order in which they appear in the text.

      Reviewer comment: Page 3 - "deleted parasites" - better to use: pfhrp2/3-deleted parasites.

      We have edited this accordingly.

      Reviewer comment: Define the acronyms the first time they are used, e.g. SEA.

      We have defined the acronyms accordingly.

      Reviewer comment: In the figures where pfmdr1 appears, indicate the correspondence to the full name of the gene that appears in the legend (multidrug resistance protein 1).

      Legends updated.

      Reviewer comment: Page 5 - Table S4 is missing.

      We apologize for our typo. There is no Table S4. We meant to refer to Table S3, which has been updated accordingly.

      Reviewer comment: Page 5 - "We identified 27 parasites with pfhrp2 deletion, 172 with pfhrp3 deletion, and 21 with both pfhrp2 and pfhrp3 deletions" - is it "and 21..." OR "from which, 21..."?

      We have reworded the sentence to the following: “We identified 26 parasites with pfhrp2 deletion and 168 with pfhrp3 deletion. Twenty field samples contained both deletions; 11 were found in Ethiopia, 6 in Peru, and 3 in Brazil, and all had the 13-11++ pfhrp3 deletion pattern.”

      Reviewer comment: Page 5 - "most WGS specimens have been collected based on RDT positivity." - explain better which tests are done - to detect pfhrp2, pfhrp3 or both?

      Co-occurrence is not detected?

      We used all publicly available WGS data that spanned over 30 studies, and the exact details of what RDTs were used are not readily available to fully answer this question. Though the exact details of RDTs are not known, this does not affect the deletion patterns found in the genomic data but does limit the ability to comment on how this affects prevalence. We have updated the manuscript to the following to be more explicit that we don’t have the full details: “It should be noted that these numbers are not accurate measures of prevalence, given that the publicly available WGS specimens utilized in this analysis come from locations and time periods that commonly used RDT positivity for collection”

      Reviewer comment: Supplementary Figure 1 - Legend for "Pattern" - what is the white?

      The “Pattern” refers to pfhrp3 deletion pattern with “white” being no pfhrp3 deletion. The annotation title has been changed to “pfhrp3- Pattern” to make this more clear and added to the text of the legend the following:”Of the 6 parasites without HRP3 deletion (marked as white in pfhrp3- Pattern column for having no pfhrp3 deletion),...”

      Reviewer comment: Supplementary Figure 8 - explain the haplotype rank. How was it obtained?

      The haplotype rank is based on the prevalence of the haplotype. To clarify this better the following has been added to the caption “Each column contains the haplotypes for that genomic region colored by the haplotype prevalence rank (more prevalent have a lower rank number, with most prevalent having rank 1) at that window/column. Colors are by frequency rank of the haplotypes (most prevalent haplotypes have rank 1 and colored red, 2nd most prevalent haplotypes are rank 2 and colored orange, and so forth. Shared colors between columns do not mean they are the same haplotype. If the column is black, there is no variation at that genomic window.”

      Reviewer comment: Figure 1 - Pattern in legend appears 11++13- but in text it is always referenced as 13-11++

      Figure legend has been updated to reflect the annotation within the text

      Reviewer comment: Page 6 - pattern 13- is which one(s) in Figure 1?

      This refers to the 13- with TARE1 sequence detected, the text has been updated to “(pattern 13-TARE1)” and the legend of Figure 1 has been updated so these statements match more closely.

      Reviewer comment: Page 7 - states "The 21 parasites with pattern 13-" and refers to Supplementary Figure 3 which presents "50 parasites with deletion pattern 13-". I believe this is pattern 13- unassociated with other rearrangements but it should be made clear in the text and legend of the supplementary figure.

      Thank you, you are correct. The manuscript has been updated in two locations for better clarity. The text has been updated to be “The 20 parasites with pattern 13-TARE1 without associated other chromosome rearrangements had deletions of the core genome averaging 19kb (range: 11-31kb). Of these 13-TARE1 deletions, 19 out of 20 had detectable TARE1 (pattern 13-TARE) adjacent to the breakpoint, consistent with telomere healing.” The Supplemental Figure 3 legend has been updated to “for the 48 parasites with pfhrp3 deletions not associated with pattern 13-11++”

      Reviewer comment: Supplementary figure 25 - "regions containing the pfhrp genes (lighter blue bars below chromosomes 11 and 13)" - the light blue bars are shown below chromosome 8 and 13; what is the difference between yellow and pink bars (telomere associates repetitive elements in the truncated legend)?

      The yellow bars are associated with the telomere-associated repetitive element 3 and the pink bars are telomere-associated repetitive element 1. To add clarity the legend has been updated to be “The yellow (TARE3) and pink (TARE1) bars on the bottom of the chromosomes represent the telomere-associated repetitive elements found at the end of chromosomes.”

      Reviewer comment: It would be helpful to have a positioning scale in the figures.

      Most plots have y-axis and x-axis with the genomic positioning labeled which can serve as a positioning scale so we opted not to add more to the figures to keep them less crowded. Other plots have regions plotted in genomic order but are all relatively positioned which prevents the usage of a positioning scale, we tried to clarify this by adding more details to the captions of these figures.

      Reviewer comment: Legend of Figure 6 - The last paragraph seems to be out of place

      We have deleted the last sentence in the legend of Figure 6 accordingly.

    1. O UR &c

      Dost disney look just like 'my website?' Owe Janet for "apparently making the idiotic deal that destroyed facebook's connection to imgur" which makes it look like my website wasn't kept up as well as I wanted it to be.

      I used to have a script that would back up all the imgur.com pictures and store them locally also; after using imgur for the emails because "when you send emails from the same domain over and over again" your throughput decreases.

      In any case i stop[ped using that script around when i stopped using Vultr/London where all of the primary development of "the website code" that modified MDBOOK was hand written by "this hacker."

      MIT, CALTECH, ANDREW CARNEGIE MELON:

      Please find me a psuedo-human that can survive walking through the Langoliers; which might mean you have to be "a princess of something" like Brooke and Kitty and ...

      Well I still wanna Fuck Bianca Rose so ... "her mom can call me an asshole as much as she wants" she is one shiny pretty looking thing.

      Also i watched her grow up; kinda feel like "i spent some time raising her" and could tell just by the way she skipped up to me and said "hello Adam" that she was .. without a doubt, one of God's favourite little princesses.

    1. players can literally write the rules and behavior of decentralized applications, and therefore, any Smart Assembly created in the game

      It seems that the protocol of a smart object is given through the Solidity code.

      Protocol code as contract code.

    1. Reviewer #3 (Public review):

      In this manuscript, Rossato and colleagues present a method for real-time decoding of EMG into putative single motor units. Their manuscript details a variety of decision points in their code and data collection pipeline that lead to a final result of recording on the order of ~10 putative motor units per muscle in human males. Overall the manuscript is highly restricted in its potential utility but may be of interest to aficionados. For those outside the field of human or nonhuman primate EMG, these methods will be of limited interest.

      Comment on revised version

      The revised manuscript has thoroughly and responsively addressed the concerns and suggestions raised in the first review. I think the method will be of use to the field and fits well within the purview of eLife's publications on methods development.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this useful study, a solid machine learning approach based on a broad set of systems to predict the R2 relaxation rates of residues in intrinsically disordered proteins (IDPs) is described. The ability to predict the patterns of R2 will be helpful to guide experimental studies of IDPs. A potential weakness is that the predicted R2 values may include both fast and slow motions, thus the predictions provide only limited new physical insights into the nature of the relevant protein dynamics.

      Fast motions are less sequence-dependent (e.g., as shown by R1). Hence the sequence-dependent part of R2 singles out slow motion.

      Public Reviews:

      Reviewer #1 (Public Review):

      Solution state 15N backbone NMR relaxation from proteins reports on the reorientational properties of the N-H bonds distributed throughout the peptide chain. This information is crucial to understanding the motions of intrinsically disordered proteins and as such has focussed the attention of many researchers over the last 20-30 years, both experimentally, analytically and using numerical simulation.

      This manuscript proposes an empirical approach to the prediction of transverse 15N relaxation rates, using a simple formula that is parameterised against a set of 45 proteins. Relaxation rates measured under a wide range of experimental conditions are combined to optimize residuespecific parameters such that they reproduce the overall shape of the relaxation profile. The purely empirical study essentially ignores NMR relaxation theory, which is unfortunate, because it is likely that more insight could have been derived if theoretical aspects had been considered at any level of detail.

      NMR relaxation theory is very valuable in particular regarding motions on different timescales. However, it has very little to say about the sequence dependence of slow motions, which is the focus of our work.

      Despite some novel aspects, in particular the diversity of the relaxation data sets, the residuespecific parameters do not provide much new insight beyond earlier work that has also noted that sidechain bulkiness correlated with the profile of R2 in disordered proteins.

      The novel insight from our work is that R2 can mostly be predicted based on the local sequence.

      Nevertheless, the manuscript provides an interesting statistical analysis of a diverse set of deposited transverse relaxation rates that could be useful to the community.

      Thank you!

      Crucially, and somewhat in contradiction to the authors stated aims in the introduction, I do not feel that the article delivers real insight into the nature of IDP dynamics. Related to this, I have difficulty understanding how an approximate prediction of the overall trend of expected transverse relaxation rates will be of further use to scientists working on IDPs. We already know where the secondary structural elements are (from 13C chemical shifts which are essential for backbone assignment) and the necessary 'scaling' of the profile to match experimental data actually contains a lot of the information that researchers seek.

      Again, the novel insight is that slow motions that dictate the sequence dependence of R2 can mostly be predicted based on the local sequence. The scaling factor may contain useful information but does not tell us anything about the sequence dependence of IDP dynamics.

      This reviewer brings up a lot of valuable points, clearly from an NMR spectroscopist’s perspective. The emphasis of our paper is somewhat different from that perspective. For example, we were interested in whether tertiary contacts make significant contributions to R2, as sometimes claimed. Our results show that, in general, they do not; instead local contacts dominate the sequence dependence of R2.

      (1) The introduction is confusing, mixing different contributions to R2 as if they emanated from the same physics, which is not necessarily true. 15N transverse relaxation is said to report on 'slower' dynamics from 10s of nanoseconds up to 1 microsecond. Semi-classical Redfield theory shows that transverse relaxation is sensitive to both adiabatic and non-adiabatic terms, due to spin state transitions induced by stochastic motions, and dephasing of coherence due to local field changes, again induced by stochastic motions. These are faster than the relaxation limit dictated by the angular correlation function. Beyond this, exchange effects can also contribute to measured R2. The extent and timescale limit of this contribution depends on the particular pulse sequence used to measure the relaxation. The differences in the pulse sequences used could be presented, and the implications of these differences for the accuracy of the predictive algorithm discussed.

      Indeed pulse sequences affect the measured R2 values. We make the modest assumption that such experimental idiosyncrasy would not corrupt the sequence dependence of IDP dynamics. As for exchange effects, our expectation is that the current SeqDYN may not do well for R2s where slow exchange plays a dominant role in generating sequence dependence, as tertiary contacts would be prominent in those cases; we now present one such case (new Fig. S5).

      (2) Previous authors have noted the correlation between observed transverse relaxation rates and amino acid sidechain bulkiness. Apart from repeating this observation and optimizing an apparently bulkiness-related parameter on the basis of R2 profiles, I am not clear what more we learn, or what can be derived from such an analysis. If one can possibly identify a motif of secondary structure because raised R2 values in a helix, for example, are missed from the prediction, surely the authors would know about the helix anyway, because they will have assigned the 13C backbone resonances, from which helical propensity can be readily calculated.

      We think that a sequence-based method that is demonstrated to predict well R2 values from expensive NMR experiments is significant. That pi-pi and cation-pi interactions are prominent features of local contacts and may seed tertiary contacts and mediate inter-chain contacts that drive phase separation is a valuable insight.

      (3) Transverse relaxation rates in IDPs are often measured to a precision of 0.1s-1 or less. This level of precision is achieved because the line-shapes of the resonances are very narrow and high resolution and sensitivity are commonly measurable. The predictions of relaxation rates, even when applying uniform scaling to optimize best-agreement, is often different to experimental measurement by 10 or 20 times the measured accuracy. There are no experimental errors in the figures. These are essential and should be shown for ease of comparison between experiment and prediction.

      Again, our focus is not the precision of the absolute R2 values, but rather the sequence dependence of R2.

      (4) The impact of structured elements on the dynamic properties of IDPs tethered to them is very well studied in the literature. Slower motions are also increased when, for example the unfolded domain binds a partner, because of the increased slow correlation time. The ad hoc 'helical boosting' proposed by the authors seems to have the opposite effect. When the helical rates are higher, the other rates are significantly reduced. I guess that this is simply a scaling problem. This highlights the limitation of scaling the rates in the secondary structural element by the same value as the rest of the protein, because the timescales of the motion are very different in these regions. In fact the scaling applied by the authors contains very important information. It is also not correct to compare the RMSD of the proposed method with MD, when MD has not applied a 'scaling'. This scaling contains all the information about relative importance of different components to the motion and their timescales, and here it is simply applied and not further analysed.

      Actually, applying the boost factor achieves the effect of a different scaling factor for the secondary structure element than for the rest of the protein.

      Regarding comparing RMSEs of SeqDYN and MD, it is true that SeqDYN applies a scaling factor whereas MD does not. However, even if we apply scaling to MD results it will not change the basic conclusion that “SeqDYN is very competitive against MD in predicting _R_2, but without the significant computational cost.”

      (5) Generally, the uniform scaling of all values by the same number is serious oversimplification. Motions are happening on all timescales they are giving rise to different transverse relaxation. It is not possible to describe IDP relaxation in terms of one single motion. Detailed studies over more than 30 years, have demonstrated that more than one component to the autocorrelation function is essential in order to account for motions on different timescales in denatured, partially disordered or intrinsically unfolded states. If one could 'scale' everything by the same number, this would imply that only one timescale of motion were important and that all others could be neglected, and this at every site in the protein. This is not expected to be the case, and in fact in the examples shown by the authors it is also never the case. There are always regions where the predicted rates are very different from experiment (with respect to experimental error), presumably because local dynamics are occurring on different timescales to the majority of the molecule. These observations contain useful information, and the observation that a single scaling works quite well probably tells us that one component of the motion is dominant, but not universally. This could be discussed.

      The reviewer appears to equate a single scaling factor with a single type of motion -- this is not correct. A single scaling factor just means that we factor out effects (e.g., temperature or magnetic field) that are uniform across the IDP sequence.

      (6) With respect to the accuracy of the prediction, discussion about molecular detail such as pi-pi interactions and phase separation propensity is possibly a little speculative.

      It is speculative; we now add more support to this speculation (p. 18 and new Fig. S6).

      (7) The authors often declare that the prediction reproduces the experimental data. The comparisons with experimental data need to be presented in terms of the chi2 per residue, using the experimentally measured precision which as mentioned, is often very high.

      Again, our interest is the sequence dependence of R2, not the absolute R2 value and its measurement precision.

      Reviewer #2 (Public Review):

      Qin, Sanbo and Zhou, Huan-Xiang created a model, SeqDYN, to predict nuclear magnetic resonance (NMR) spin relaxation spectra of intrinsically disordered proteins (IDPs), based primarily on amino acid sequence. To fit NMR data, SeqDYN uses 21 parameters, 20 that correspond to each amino acid, and a sequence correlation length for interactions. The model demonstrates that local sequence features impact the dynamics of the IDP, as SeqDYN performs better than a one residue predictor, despite having similar numbers of parameters. SeqDYN is trained using 45 IDP sequences and is retrained using both leave-one-out cross validation and five-fold cross validation, ensuring the model's robustness. While SeqDYN can provide reasonably accurate predictions in many cases, the authors note that improvements can be made by incorporating secondary structure predictions, especially for alpha-helices that exceed the correlation length of the model. The authors apply SeqDYN to study nine IDPs and a denatured ordered protein, demonstrating its predictive power. The model can be easily accessed via the website mentioned in the text.

      While the conclusions of the paper are primarily supported by the data, there are some points that could be extended or clarified.

      (1) The authors state that the model includes 21 parameters. However, they exclude a free parameter that acts as a scaling factor and is necessary to fit the experimental data (lambda). As a result, SeqDYN does not predict the spectrum from the sequence de-novo, but requires a one parameter fitting. The authors mention that this factor is necessary due to non-sequence dependent factors such as the temperature and magnetic field strength used in the experiment.

      Given these considerations, would it be possible to predict what this scaling factor should be based on such factors?

      There are still too few data to make such a prediction.

      (2) The authors mention that the Lorentzian functional form fits the data better than a Gaussian functional form, but do not present these results.

      We tested the different functional forms at the early stage of the method development. The improvement of the Lorentzian over the Gaussian was slight and we simply decided on the Lorentzian and did not go back and do a systematic analysis.

      (3) The authors mention that they conducted five-fold cross validation to determine if differences between amino acid parameters are statistically significant. While two pairs are mentioned in the text, there are 190 possible pairs, and it would be informative to more rigorously examine the differences between all such pairs.

      We now present t-test results for other pairs in new Fig. S3.

      Reviewer #3 (Public Review):

      The manuscript by Qin and Zhou presents an approach to predict dynamical properties of an intrinsically disordered protein (IDP) from sequence alone. In particular, the authors train a simple (but useful) machine learning model to predict (rescaled) NMR R2 values from sequence. Although these R2 rates only probe some aspects of IDR dynamics and the method does not provide insight into the molecular aspects of processes that lead to perturbed dynamics, the method can be useful to guide experiments.

      A strength of the work is that the authors train their model on an observable that directly relates to protein dynamics. They also analyse a relatively broad set of proteins which means that one can see actual variation in accuracy across the proteins.

      A weakness of the work is that it is not always clear what the measured R2 rates mean. In some cases, these may include both fast and slow motions (intrinsic R2 rates and exchange contributions). This in turn means that it is actually not clear what the authors are predicting. The work would also be strengthened by making the code available (in addition to the webservice), and by making it easier to compare the accuracy on the training and testing data.

      Our method predicts the sequence dependence of R2, which is dominated by slower dynamics.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Should make sure to define abbreviations such as NMR and SeqDYN.

      We now spell out NMR at first use. SeqDYN is the name of our method and is not an abbreviation.

      (2) The authors do not mention how the curves in Figure 2A are calculated.

      As we stated in the figure caption, these curves are drawn to guide the eye.

      (3) May be interesting to explore how the model parameters (q) correlate with different measures of hydrophobicity (especially those derived for IDPs like Urry). This may point to a relationship between amino acid interactions and amino acid dynamics

      We now present the correlation between q and a stickiness parameter refined by Tesei et al. (new ref 45) and used for predicting phase separation equilibrium (new Fig. S6).

      (4) The authors demonstrate that secondary structure cannot be fully accounted for by their model. They make a correction for extended alpha-helices, but the strength of this correction seems to only be based on one sequence. Would a more rigorous secondary structure correction further improve the model and perhaps allow its transferability to ordered proteins?

      We have five 4 test cases (Figs. 4E, F and 5H, I). However, we doubt that the SeqDYN method will be transferable to ordered proteins.

      Reviewer #3 (Recommendations For The Authors):

      Changes that could strengthen the manuscript substantially.

      (1) The authors do not really define what they mean by dynamics, but given that they train and benchmark on R2 measurements, the directly probe whatever goes into the measured R2. Using a direct measurement is a strength since it makes it clear what they are predicting. It also, however, makes it difficult to interpret. This is made clear in the text when the authors, for example write "𝑅2 is the one most affected by slower dynamics (10s of ns to 1 μs and beyond)." First, with the "and beyond" it could literally mean anything. Second, the "normal" R2 rate is limited up to motions up to the (local) "tumbling/reorganization" time (which is much faster), so any slow motions that go into R2 would be what one would normally call "exchange". The authors should thus make it clearer what exactly it is they are probing. In the end, this also depends on the origin of the experimental data, and whether the "R2" measurements are exchange-free or not. This may be a mixture, which hampers interpretations and which may also explain some of the rescaling that needs to be done.

      We now remove “and beyond”, and also raise the possibility that R2 measurements based on 15N relaxation may have relatively small exchange contributions (p. 17).

      (2) Related to the above, the authors might consider comparing their predictions to the relaxation experiments from Kriwacki and colleagues on a fragment of p27. In that work, the authors used dispersion experiments to probe the dynamics on different timescales. The authors would here be able to compare both to the intrinsic R2 rates (when slow motions are pulsed away) as well as the effective R2 rates (which would be the most common measurement). This would help shed light on (at least in one case) which type of R2 the prediction model captures. https://doi.org/10.1021/jacs.7b01380

      We now report this comparison in new Fig. S5 and discuss its implications (p. 17-18).

      (3) In some cases, disagreement between prediction and experiments is suggested to be due to differences in temperature, and hence is used as an argument for the rescaling done. Here, the authors use a factor of 2.0 to explain a difference between 278K and 298K, and a factor of 2.4 to explain the difference between 288K and 298K. It would be surprising if the temperature effect from 288K->298K is larger than from 278K->298K. Does this not suggest that the differences come as much from other sources?

      Note that the scaling factors 2.0 and 2.4 were obtained on two different IDPs. It is most likely that different IDPs have different scaling factors for temperature change. As a simple model, the tumbling time for a spherical particle scales with viscosity and the particle volume; correspondingly the scaling factor for temperature change should be greater for a larger particle than for a smaller particle.

      (4) The authors find (as have others before) aromatic residues to be common at/near R2 peaks. They suggest this to be indicative for Pi-Pi interactions. Could this not be other types of interactions since these residues are also "just" more hydrophobic? Also, can the authors rule out that the increased R2 rates near aromatic residues is not due to increased dynamics, but simply due to increased Rex-terms due to greater fluctuations in the chemical shifts near these residues (due to the large ring current effects).

      We noted both pi-pi and cation-pi as possible interactions that raise R2. There can be other interactions involving aromatic residues, but it’s unlikely to be only hydrophobic as Arg is also in the high-q end. For the same reason, a ring-current based explanation would be inadequate.

      (5) The authors write: "We found that, by filtering PsiPred (http://bioinf.cs.ucl.ac.uk/psipred) (35) helix propensity scores (𝑝,-.) with a very high cutoff of 0.99, the surviving helix predictions usually correspond well with residues identified by NMR as having high helix propensities." It would be good to show the evidence for this in the paper, and quantify this statement.

      The cases of most interest are the ones with long predicted helices, of which there are only 3 in the training set. For Sev-NT and CBP-ID4, we already summarize the NMR data for helix identification in the first paragraph of Results; the third case is KRS-NT, which we elaborate in p. 14.

      (6) When analysing the nine test proteins, it would be very useful for the reader to get a number for the average accuracy on the nine proteins and a corresponding number for the training proteins. The numbers are maybe there, but hard to find/compare. This would be important so that one can understand how well the model works on the training vs testing data.

      We now present the mean RMSE comparison in p. 14.

      (7) The authors write: "The 𝑞 parameters, while introduced here to characterize the propensities of amino acids to participate in local interactions, appear to correlate with the tendencies of amino acids to drive liquid-liquid phase separation." It would be good to show this data and quantify this.

      We now list supporting data in p. 18 and present new Fig. S6 for further support.

      (8) It is great that the authors have made a webservice available for easy access to the work. They should in my opinion also make the training code and data available, as well as the final trained model. Here it would also be useful to show the results from the use of a Gaussian that was also tested, and also state whether this model was discarded before or after examining the testing data.

      We have listed the IDP characteristics and sequences in Tables S1 and S2. We’re unsure whether we can disseminate the experimental R2 data without the permission of the original authors. As for the Gaussian function, as stated above, it was abandoned at an early state, before examining the testing data.

      Changes that would also be useful

      (1) The authors should make it clearer what they predict and what they don't. They mention transient helix formation and various contacts, but there isn't a one-to-one relationship between these structural features and R2 rates. Hence, they should make it clearer that they don't predict secondary structure and that an increased R2 rate may be indicative of many different structural/dynamical features on many different time scales.

      We clearly state that we apply a helix boost after the regular SeqDYN prediction.

      (2) The authors write "Instead, dynamics has emerged as a crucial link between sequence and function for IDPs" and cite their own work (reference 1) as reference for this statement. As far as I can see, that work does not study function of IDPs. Maybe the authors could cite additional work showing that the dynamics (time scales) affects function of IDPs beyond "just" structure? Otherwise, the functional consequences are not clear. Maybe the authors mean that R2 rates are indicative of (residual) structure, but that is not quite the same. Also, even in that case, there are likely more appropriate references.

      Ref. 1 summarized a number of scenarios where dynamics is related to function.

      (3) The authors might want to look at some of the older literature on interpreting NMR relaxation rates and consider whether some of it is worth citing.

      Fitting/understanding R2 profiles https://doi.org/10.1021/bi020381o https://doi.org/10.1007/s10858-006-9026-9

      MD simulations and comparisons to R2 rates without ad hoc reweighting (in addition to the papers from the authors themselves). https://doi.org/10.1021/ja710366c https://doi.org/10.1021/ja209931w

      The R2 data for the two unfolded proteins are very helpful! We now present the comparison of these data to SeqDYN prediction in Fig. 6C, D. The MD papers are superseded by more recent studies (e.g., refs. 1 and 14).

      There are more like these.

      (4) In the analysis of unfolded lysozyme, I assume that the authors are treating the methylated cysteines (which are used in the experiments) simply as cysteine. If that is the case, the authors should ideally mention this specifically.

      Treatment of methylated cysteines is now stated in the Fig. 6 caption.

      (5) The authors write "Pro has an excessively low ms𝑅2 [with data from only two IDPs (32, 33)], but that is due to the absence of an amide proton." It would be useful with an explanation why lacking a proton gives rise to low 15N R2 rates.

      That assertion originated from ref. 32.

      (6) When applying the model, the authors predict msR2 and then compare to experimental R2 by rescaling with a factor gamma. It would be good to make it clearer whether this parameter is always fitted to the experiments in all the comparisons. It would be useful to list the fitted gamma values for all the proteins (e.g. in Table S1).

      We already give a summary of the scaling factors (“For 39 of the 45 IDPs, Υ values fall in the range of 0.8 to 2.0 s–1”, p. 10).

      (7) p. 14 "nineth" -> "ninth"

      Corrected

    2. Reviewer #3 (Public review):

      The revised manuscript adds some new relevant analyses. It still, however, is unclear which timescales of motions the method refers to and there is confusion about whether the model can predict "slower motions". While the authors answer some of my points, others are left unanswered. That is of course the authors' prerogative, and readers will in any case be able to read the reviewer comments. I am not sure it is productive to add further comments at this point.

      Below are my comments from the first round of review:

      The manuscript by Qin and Zhou presents an approach to predict dynamical properties of an intrinsically disordered protein (IDP) from sequence alone. In particular, the authors train a simple (but useful) machine learning model to predict (rescaled) NMR R2 values from sequence. Although these R2 rates only probe some aspects of IDR dynamics and the method does not provide insight into the molecular aspects of processes that lead to perturbed dynamics, the method can be useful to guide experiments.

      A strength of the work is that the authors train their model on an observable that directly relates to protein dynamics. They also analyse a relatively broad set of proteins which means that one can see actual variation in accuracy across the proteins.

      A weakness of the work is that it is not always clear what the measured R2 rates mean. In some cases, these may include both fast and slow motions (intrinsic R2 rates and exchange contributions). This in turn means that it is actually not clear what the authors are predicting. The work would also be strengthened by making the code available (in addition to the webservice), and by making it easier to compare the accuracy on the training and testing data.

    1. For the NMNH exhibition Genome:Unlocking Life's Code, the volunteer corps had the opportunity to havetheir own genomes sequenced and to express personal thoughts regard-ing the complex ethical and social questions that direct-to-consumer se-quencing raises for them and their families. In other settings, teens havebeen specifically recruited to talk with other teens about relevant exhibitiss ues, and bilingual volunteers have been brought in to talk with visitorsin their own language-prompting conversations and questions that mightnot otherwise emerge.

      Do you currently view natural history museums as spaces of dialogue? How might introducing conversations and community involvement like this influence the definition of what a museum and natural history are?

    Annotators

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Strengths: 

      The paper clearly presents the resource, including the testing of candidate enhancers identified from various insects in Drosophila. This cross-species analysis, and the inherent suggestion that training datasets generated in flies can predict a cis-regulatory activity in distant insects, is interesting. While I can not be sure this approach will prevail in the future, for example with approaches that leverage the prediction of TF binding motifs, the SCRMShaw tool is certainly useful and worth consideration for the large community of genome scientists working on insects. 

      We thank the reviewer for the positive comments, and would just like to point out that we agree: while we cannot of course know if other methods will overtake SCRMshaw for enhancer prediction—we assume they will, at some point (although motif-based approaches have not fared as well in the past)—for now, SCRMshaw provides strong performance and is a useful part of the current toolkit.

      Weaknesses: 

      While the authors made the effort to provide access to the SCRMShaw annotations via the RedFly database, the usefulness of this resource is somewhat limited at the moment. First, it is possible to generate tables of annotated elements with coordinates, but it would be more useful to allow downloads of the 33 genome annotations in GFF (or equivalent) format, with SCRMshaw predictions appearing as a new feature. Also, I should note that unlike most species some annotations seem to have issues in the current RedFly implementation. For example, Vcar and Jcoen turn empty. 

      We have addressed these weaknesses in several ways:

      (1) We have created GFF versions of the SCRMshaw predictions and provide them standalone and also merged into the available annotation GFFs for each of the 33 species

      (2) We have made these GFF files, and also the original SCRMshaw output files, available for download in a Dryad repository linked to the publication (https://doi.org/10.5061/dryad.3j9kd51t0).

      (3) We have added the inadvertently omitted species to the REDfly/SCRMshaw database.

      We agree that the database functions are still somewhat limited, but note that database development is ongoing and we expect functionality to increase over time. In the meantime, the Dryad repository ensures that all results reported in this paper are directly available.

      Reviewer #2 (Public Review): 

      Summary: 

      … Upon identification of predicted enhancer regions, the authors perform post-processing step filtering and identify the most likely predicted enhancer candidates based on the proximity of an orthologous target gene. …

      We respectfully point out a small misunderstanding here on the part of the reviewer. We stress that putative target gene assignments and identities have no impact at all on our prediction of regulatory sequences, i.e., they are not “based on the proximity of an orthologous target gene.” Predictions are solely based on sequence-dependent SCRMshaw scores, with no regard to the nature or identities of nearby annotated features. Putative target genes are mapped to Drosophila orthologs purely as a convenience to aid in interpreting and prioritizing the predicted regulatory elements. We have added language on page 8 (lines 189ff) to make this more clear in the text.

      Weaknesses:

      This work provides predicted enhancer annotations across many insect species, with reporter gene analysis being conducted on selected regions to test the predictions. However, the code for the SCRMshaw analysis pipeline used in this work is not made available, making reproducibility of this work difficult. Additionally, while the authors claim the predicted enhancers are available within the REDfly database, the predicted enhancer coordinates are currently not downloadable as Supplementary Material or from a linked resource. 

      We have placed all the code for this paper into a GitHub repository “Asma_etal_2024_eLife” (https://github.com/HalfonLab/Asma_etal_2024_eLife) to address this concern. As described in our response to Reviewer 1, above, all results are now available in multiple formats in a linked Dryad repository in addition to the REDfly/SCRMshaw database.

      The authors do not validate or benchmark the application of SCRMshaw against other published methods, nor do they seek to apply SCRMshaw under a variety of conditions to confirm the robustness of the returned predicted enhancers across species. Since SCRMshaw relies on an established k-mer enrichment of the training loci, its performance is presumably highly sensitive to the selection of training regions as well as the statistical power of the given k-mer counts. The authors do not justify their selection of training regions by which they perform predictions. 

      Our objective in this study was not to provide proof-of-principle for the SCRMshaw method, as we have established the efficacy of the approach at this point in several previous publications. Rather, the objective here was to make use of SCRMshaw to provide an annotation resource for insect regulatory genomics. Note that the training regions we used here are the same as those we have used in earlier work. Naturally, we performed various assessments to establish that the method was working here, but we make no claims in this work about SCRMshaw’s relative efficiency compared to other methods. Some of our prior publications include assessments of the sort the reviewer references, which suggest that SCRMshaw is at least comparable to other enhancer discovery approaches. We note that benchmarking of such methods is in fact extremely complicated due to the fact that there are no established true positive/true negative data sets against which to benchmark (we have explored this in Asma et al. 2019 BMC Bioinformatics).

      While there is an attempt made to report and validate the annotated predicted enhancers using previously published data and tools, the validation lacks the depth to conclude with confidence that the predicted set of regions across each species is of high quality. In vivo, reporter assays were conducted to anecdotally confirm the validity of a few selected regions experimentally, but even these results are difficult to interpret. There is no large-scale attempt to assess the conservation of enhancer function across all annotated species. 

      We respectfully disagree that there is insufficient validation. We bring several different lines of evidence to bear suggesting that our results fall into the accuracy range—roughly 75%—established both here and in previous work. We are also clear about the fact that these are predictions only and need to be viewed as such (e.g. line 638). Although “large-scale” in vivo validation assays would certainly be both interesting and worthwhile, the necessary resources for such an assessment places it beyond our present capability.

      Lastly, it is suggested that predicted regions are derived from the shared presence of sequence features such as transcription factor binding motifs, detected through k-mer enrichment via SCRMshaw. This assumption has not been examined, although there are public motif discovery tools that would be appropriate to discover whether SCRMshaw is assigning predicted regions based on previously understood motif grammar, or due to other sequence patterns captured by k-mer count distributions. Understanding the sequence-derived nature of what drives predictions is within the scope of this work and would boost confidence in the predicted enhancers, even if it is limited to a few training examples for the sake of clarity of interpretation. 

      Again, we respectfully disagree that “this assumption has not been examined.” Although we did not undertake this analysis here, we have in the past, where we have shown that known TFBS motifs can be recovered from sets of SCRMshaw predictions (e.g., Kazemian et al. 2014 Genome Biology and Evolution). We return to this point when we address the Comments to Authors, below.

      Reviewer #3 (Public Review): 

      Weaknesses:  

      The rates of predicted true positive enhancer identification vary widely across the genomes included here based on the simulations and comparison to datasets of accessible chromatin in a manner that doesn't map neatly onto phylogenetic distance. At this point, it is unclear why these patterns may arise, although this may become more clear as regulatory annotation is undertaken for more genomes. 

      We agree that we do not see clear patterns with respect to phylogenetic distance in our results. However, we note that this initial data set is still fairly small, and not carefully phylogenetically distributed. We are hoping that, as the reviewer suggests, some of these questions become more clear as we add more genomes to our analysis. Fortunately, the list of available genomes with chromosome-level assembly is growing rapidly, and as we move ahead we should have much greater ability to choose informative species.

      Functional assessment of predicted enhancers was performed through reporter gene assays primarily in Drosophila melanogaster imaginal discs, a system amenable to transgenics. Unfortunately, this mode of canonical imaginal disc development is only representative of a subset of all holometabolous insects; therefore, it is difficult to interpret reporter gene expression in a fly imaginal disc as evidence of a true positive enhancer that would be active in its native species whose adult appendages develop differently through the larval stage (for example, Coleopteran and Lepidopteran legs). However, the reporter gene assays from other tissues do offer strong evidence of true positive enhancer detection, and constraints on transgenic experiments in other systems mean that this approach is the best available. 

      Please see an extensive discussion of this point in our response to Reviewer 3, below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Major Concerns: 

      (1) While the GitHub source code for SCRMshaw is provided, the authors do not provide a repository of manuscriptspecific code and scripts for readers. This is a barrier to reproducibility and the code used to perform the analysis should be made available. Additionally, links to available scripts do not work, see Line 690. Post-processing scripts point to a general lab folder, but again, no specific analysis or code is sourced for the work in this specific manuscript (e.g. Line 637). 

      As noted above, we have corrected this oversight and established a specific GitHub repository for this manuscript “Asma_etal_2024_eLife” (https://github.com/HalfonLab/Asma_etal_2024_eLife). 

      (2) On lines 479-488, there is a discussion about the annotations being provided on REDfly, though no link is provided. 

      We have included a link in the text at this point (now line 515).

      Additionally, for transparency, it would be valuable to provide in Supplementary Table 1 the genomic coordinates of the original training sets in addition to their identity. 

      These coordinates have been added to Supplementary Table 1 as suggested.

      Also, it is suggested to provide genomic coordinates of the predicted enhancers for each training set across all species, perhaps with a column denoting a linked ID of one genomic coordinate in a species to another species (i.e. if there is a linked region found from D. melanogaster to J. coenia, labeling this column in both coordinate sets as blastoderm.mapping1_region1). Providing these annotations directly in the work enhances the transparency of the results. 

      We are unsure exactly what the reviewer means here by “a linked region.” It is critical to understanding our approach to recognize that the genome sequences have diverged to the point where there is no alignment of non-coding regions possible. Thus there is no way to directly “link” coordinates of a predicted enhancer from one species to those of a predicted enhancer in another species. The coordinates for each prediction are available on a per-species basis either through the database or in the files now available in the linked Dryad repository; these can be filtered for results from a specific training set. The database will allow users to select all results for a given orthologous locus, from any subset of species. More complex searches will continue to become available as we improve functionality of the database, an ongoing project in collaboration with the REDfly team.

      (3) Figure 2B: It is unclear what this figure shows. Are the No Fly Orthologs false positives, Orthology pipeline issues, or interesting biology? 

      We have clarified this in the Figure 2 legend. “No Mapped Fly Orthologs” indicates that our orthology mapping pipeline did not identify clear D. melanogaster orthologs. For any given gene, this could reflect either a true lack of a respective ortholog, or failure of our procedure to accurately identify an existing ortholog.

      (4) SCRMshaw appears to be a versatile tool, previously published in a variety of works. However, in this manuscript, there is little discussion of the sensitivity of SCRMshaw to different initial parameters, how the selection of training loci can impact outcomes, or how SCRMshaw k-mer discovery methods compare to other similar tools.

      - This paper would be strengthened by addressing this weakness. Some specific suggestions below: 

      In order to strengthen confidence that SCRMshaw is a reliable predictor of enhancer regions in other species, it is suggested that you benchmark against other k-mer-derived methods to assign enhancers, such as GSK-SVM developed by the Beer Lab in 2016  (https://www.beerlab.org/gkmsvm/, https://www.biorxiv.org/content/10.1101/2023.10.06.561128v1). 

      We have established the effectiveness of SCRMshaw as an enhancer discovery method in previous work, and the main goal of this study was to make use of the established method to annotate numerous insect genomes as a community resource. Our claim here is that SCRMshaw works well for this purpose; we do not attempt a strong claim about whether other approaches may work equally well or marginally better (although we do not believe this is the case, based on prior work). Benchmarking enhancer discovery is challenging, as we point out in Asma et al. 2019 (BMC Bioinformatics), and, while important, best left for a dedicated comprehensive study. A major problem is that there are no independent objective “truth” sets for enhancers from the various species we interrogate here. Thus, while we could also run, e.g., GSK-SVM, what criteria would we use to establish which method had better accuracy for a given species? Note that the work from Beer’s lab took advantage of the ability to match human-mouse orthologous (or syntenic) regions and available open-chromatin data to assess whether conserved enhancers were discovered, but this is not possible given the degree of divergence, limited synteny, and relative lack of additional data for the insect genomes we are annotating.

      - In Table S1, we see that 7-146 regions are used as training sets, which is a huge variety. Does an increase in training set size provide a greater "rate of return" for predicted regions? Is the opposite true? Addressing this question would allow readers to understand if they wish to use SCRMshaw, a reasonable scope for their own training region selections. 

      - Within a training set, does subsampling provide the same outcomes in terms of prediction rates? There is no exploration of how "brittle" the training sets are, and whether the generalized k-mer count distributions that are established in a training set are consistent across randomly selected subgroups. Performing this analysis would raise confidence in the method applied and the resulting annotations. 

      These are interesting and important questions, but again we feel they are beyond the scope of this particular study, which is focused primarily on using SCRMshaw and not on optimizing various search parameters. That said, this is of course something we have investigated, although as with other aspects of enhancer discovery, the absence of a true gold standard enhancer set makes evaluation difficult. We have not found a clear correlation between training set size and performance beyond the very general finding that performance appears to be best when training set size is moderate, e.g. 20-40 initial enhancers. We suspect that larger training sets often contain too many members that don’t fit the core regulatory model and thus add noise, whereas sets that are too small may not contain enough signal for best performance (although small sets can still be useful, especially if used in an iterative cycle; see Weinstein et al. 2023 PLoS Genetics). However, establishing this rigorously is highly challenging given the limitations with assessing true and false positive rates at scale.

      (5) In Figure 2C, when plotting hexMCD, IMM, pacRC, and then the merged set, it is unclear whether the scorespecific bar allows coordinate redundancy, though this is implied. What might be more useful is a revision of this plot where the hexMCD/IMM/pac-RC-specific loci are plotted, with the merged set alongside as is currently reported. This would give the reader a clearer understanding of the variability between these scoring methods and why this variability occurs. 

      We have added the breakdowns between IMM, hexMCD, and pacRC in Supplementary Table S2, and made more complete reference to this in the text (lines 682ff). Both the database and the data files in the Dryad repository allow exploration of the overlap between the different methods and contain both separate and merged (for overlap and redundancy) results.

      Additionally, there is no information in the Methods section of these three SCRMshaw scores and what they represent, even colloquially. While SCRMshaw has been applied in several papers previously, it would help with scientific clarity to describe in a sentence or two what each score is meant to represent and why one is different from another. 

      We had chosen to err on the side of brevity given prior publication of the SCRMshaw methodology, but we recognize now that we went too far in that direction. We have added more complete descriptions of the methods in both the Results (lines 164-167) and the Methods (lines 667-681) sections.

      (6) When describing results in Figure 2, an important question arises: "Is there an anti-correlation between the number of predicted regions and evolutionary distance?" This would be an expected result that could complement Figure 4's point that shared orthology across 16 species is rarer than across 10 species. Visualizing and adding this to Figure 2 or Figure 4 would be a powerful statement that would boost confidence in the returned predicted enhancers and/or orthologous regions. 

      This is an important question and one in which we are very interested. Unfortunately, we do not have sufficient data at this time to address this proper statistical rigor. As we remarked above in response to Reviewer 3, “We agree that we do not see clear patterns with respect to phylogenetic distance in our results. However, we note that this initial data set is still fairly small, and not carefully phylogenetically distributed. We are hoping that, as the reviewer suggests, some of these questions become more clear as we add more genomes to our analysis. Fortunately, the list of available genomes with chromosome-level assembly is growing rapidly, and as we move ahead we should have much greater ability to choose informative species.”

      (7) In Figure 3, the authors seek to convey that SCRMshaw predicts enhancer regions that are mapped nearby one another, across different loci widths, and that this occurrence of nearby predicted regions occurs more than a randomly selected control. This is presumably meant to validate that SCRMshaw is not providing predictions with low specificity, but rather to highlight the possibility that SCRMshaw is identifying groups of shadow enhancers. However, these plots are extremely difficult to decipher and do not strongly support the claims due to the low resolution and difficult interpretability of the boxplot interquartile distributions.

      Additionally, as the majority of predicted regions are around ~750bp, how does that address loci groups of <1000bp? This suggests that predicted regions are overlapping, and therefore cannot be meaningfully interpreted as shadow enhancers. This plot should either be moved to the supplements or reworked to more effectively convey the point that "SCRMshaw is detecting predicted regions that are proximal to one another and that this proximity is not due to chance". 

      - A suggestion to rework this plot is to change this instead to a bar plot, where the y-axis instead represents "number of predictions with at least 2 predicted regions proximal to one another" divided by "total number of predictions", separating bar color by simulated/observed values. The x-axis grouping can remain the same. Because this plot is a broad generalization of the statement you're trying to make above, knowing whether a few loci have 2 versus 4 proximal predicted enhancers doesn't enhance your point. 

      We agree with the reviewer that these are not the clearest plots, and thank them for the suggestions regarding revision. We tried many variations on visualizing these complex data, including those suggested by the reviewer, and have concluded that despite their weaknesses, these plots are still the best visualization. The main problem is that the observed data cluster heavily around zero, so that the box plots are very squat and mainly only the outlier large values are observed. The key point, however, is that the expected values almost never give values much greater than one, so that the observed outlier points are the only points seen in the upper ranges of the y-axis. This is true across the three species, across the bins of locus sizes, and across training sets (averaged into the box plots). The reviewer is correct as well about the bins where locus size is < 1000. However, inspection of the data shows that this is not a large concern, as very few data points lie in this range and we never see multiple predicted enhancers there. Thus we believe while not the prettiest of graphs, Figure 3 does effectively support the claims made in the text. In keeping with our view that it is preferable to have data in the main paper whenever possible, we choose to keep the figure in place rather than move it to the Supplement.

      - Label the species for the reader's understanding of each subplot on the plot. 

      We apologize for this oversight and have now labeled each plot with its relevant species.

      (8) SCRMshaw operates on k-mer count distributions compared to a genomic background across different species, allowing it to assign predicted regions without prior knowledge of an organism's cis-regulatory sequences. This is powerful and boosts the versatility of the method. However, understanding the cis-regulatory origins of the kinds of kmers that are driving the detection of orthologous regions across species is crucial and absolutely within the scope of the paper, particularly for the justification of the provided annotations. Is SCRMshaw making use of enriched motifs within the training region set to assign regions in other species? One would presume so, but it is necessary to show this. There are many motif discovery tools that are readily available and require little up-front knowledge and little to no use of a CLI, such as MEMESuite (https://meme-suite.org/meme/tools/meme). It is highly recommended that, even for a few training pairs that are well understood (e.g. mesoderm.mapping1, dorsal_ectoderm.mapping1), assess the motif enrichment within the original sequence set, then see whether motif enrichments are reflected in the predicted enhancers. As evolutionary distance increases between D. melanogaster and the species of interest, is the assignment of enriched motifs more sparse? Is there a loss of a key motif? These are the kinds of questions that will allow readers to understand how these annotations are assigned as well as boost confidence in their usage. 

      This is a very important point and a subject of significant interest to us. We have demonstrated in earlier work (e.g., Kazemian et al. 2014 Genome Biol. Evol.) that SCRMshaw-predicted enhancers do contain expected TFBS motifs, across multiple species—and that even an overall arrangement of sites is sometimes conserved. Thus we have previously answered, in part, the reviewer’s question. 

      What we also learned from our previous work is that filtering out relevant motifs from the noise inherent in motif-finding is both arduous and challenging. As the reviewer is no doubt aware, while using motif discovery tools is simple, interpreting the output is much less so. In response to the reviewer’s comments, we revisited this issue with data from a small sample of training sets. We can discover motifs; we can see that the motif profiles are different between different training sets; and we can observe the presence of expected motifs based on the activity profile of the enhancers (e.g., Single-minded binding sites in our mesectoderm/midline training and result data). However, to do this cleanly and with appropriate statistical rigor is beyond what we feel would be practical for this paper. We hope to return to this important question in the future when we have a larger and phylogenetically more evenly-distributed set of species, and the time and resources to address it appropriately.

      (9) Figures 5-7 need to have better descriptions. 

      We have added to the figure 6 and 7 legends in response to this comment; please note as well that there is substantial detail provided in the text. If there are specific aspects of the figures that are not clear or which lack sufficient description, we are happy to make additional changes.

      Minor Concerns 

      (1)  In Figure 1A, it is implied that "k-mer count distributions" are actually only "5-mer count distributions". However, in the published documentation of SCRMshaw, it is suggested that k-mers between 1-6 bp are involved in establishing sequence distributions. Please add a justification for the selection of these criteria. It would be helpful to understand the implications of using up to a 3-mer versus a 12-mer when assessing k-mer counts using SCRMshaw.

      We have clarified in the Figure 1 legend that this is just an example, and the k-mers of different sizes are used in the IMM method; we have also increased the description of the basic method in the Methods section. To be clear, the hexMCD sub-method is 6-mer based (5th-order Markov chain), as is pacRC, while the IMM method considers Markov chains of orders 0-5.

      (2) Control the y-axis to remove white space from Figure 2D. 

      We have amended the figure as suggested.

      Additionally, expand in the manuscript on expected results from SCRMshaw. Given training regions of 750 bp, is the expectation that you return predicted enhancers of the same length? This is not explicitly stated, only a description of outliers. 

      The scoring is not dependent on the length of the training sequences, and there is no direct expectation of predicted enhancer length. Scores are calculated on 10-bp intervals, and a peak-calling algorithm is used to determine the endpoints of each prediction based on where the scores drop below a cutoff value. Thus there is no explicit minimum prediction length beyond the smallest possible length of 10-bp. That said, the initial scoring takes place over a 500-bp sequence window (for reasons of computational efficiency), which does influence scores away from the smaller end of the possible range. We correct for this in part by reducing scores below a certain threshold to zero, to prevent multiple low-scoring regions from combining to give a low but positive score over a long interval. Indeed, we found that in the original version of SCRMshawHD (Asma et al. 2019), multiple low-scoring but above-threshold intervals would get concatenated together in broad peaks, leading to an unrealistically large average prediction length. In the version used here, described in Supplementary Figure S6, low-scoring windows are now first reset to zero and a new threshold is calculated before overlapping scores are summed. This helps to prevent the broad peak problem, and we find that it results in a median prediction length ~750 bp, more in line with expected enhancer sizes.

      Reviewer #3 (Recommendations For The Authors): 

      Line 161: Given that the SCRMshaw HD method is the basis for the pipeline, the methodology deserves at least an "in brief" recapitulation in this manuscript. 

      As we remark in our response to Reviewer 2, above, “We had chosen to err on the side of brevity given prior publication of the SCRMshaw methodology, but we recognize now that we went too far in that direction. We have added more complete descriptions of the methods in both the Results (lines 164-167) and the Methods (lines 667-681) sections.” 

      Line 219: Throughout the reporting of the results, there appeared to be a bit of inconsistency/potential typos regarding whether threshold or exact P values were reported. In lines 219, 222, 265, 696, and 811, the reported values seem to clearly be thresholds (< a standard cutoff), while in lines 291,293, 297,300, values appear to be exact but are reported as thresholds (<). 

      This is not an error but rather reflects two different types of analysis. The predictions per locus (originally lines 219, 222 etc) are evaluated using an empirical P-value based on 1000 permutations. As such, they are thresholded at 1/1000. The overlap with open chromatin regions, on the other hand, are based on a z-score with the P-values taken from a standard conversion of z-scores to P-values.

      Page 13/Table 2: At face value, it seems surprising that the overlap between Dmel SCRMshaw predictions with open chromatin is so much smaller than the overlap between predictions and open chromatin in other species, both in raw % (Tcas, D plexippus, H. himera) and fold enrichment (Tcas), given that the training sets for SCRMshaw are all derived from Dmel data. The discussion here does not touch on this aspect of the results, and the interpretation of this approach, in general, would be strengthened if the authors could comment on potential reasons why this pattern may be arising here, or at least acknowledge that this is an open question.

      There are many variables at play here, as the data are from different species, from different tissues, and from different methods. Thus we think it is difficult to read too much into the precise results from these comparisons—the main take-home is really just that there is a significant amount of overlap. In acknowledgment of this, we have slightly modified the text in this section so that it now notes (line 302ff): “These comparisons are imperfect, as the tissues used to obtain the chromatin data do not precisely correspond to the training sequences used for SCRMshaw, and the data were obtained using a variety of methods.”

      Line 318-329: The inferences from the reporter gene assay deserve a more nuanced treatment than they are given here. The important nuance that was not addressed by the discussion here is that the imaginal disc mode of development in Drosophila is not broadly representative of the development of larval/adult epithelial tissues across Holometabola; thus, inference of a true positive validation becomes complicated in cases where predicted enhancers from a species were tested and shown to drive expression in a fly imaginal disc that the native species have no direct disc counterpart to. For example, in line 388 a Tcas enhancer is reported to drive expression in the eye-antennal disc, and in lines 404 and 423 additional Tcas enhancers were reported to drive expression in the leg discs; however, Tribolium larvae do not possess antennal discs or leg discs set aside during embryogenesis in the sense that flies do - instead the homologous epithelial tissues form larval antennae and larval legs external to the body wall that are actively used at this life stage and are starkly different in morphology than an internally invaginated epithelial disc, that will directly give rise to adult tissues in subsequent molts. Is the interpretation of an expression pattern driven in a fly disc as a true positive really as straightforward as it was presented here, when in the native species the expression pattern driven by the enhancer in question would be in the context of an extremely different tissue morphology? That said, I understand and am deeply sympathetic to the constraints on the authors in performing transgenic experiments outside of the model fly; but these divergent modes of development across Holometabola deserve a mention and nuance in the interpretation here. 

      This is indeed a very important point, and we greatly appreciate Reviewer 3 pointing out this caveat when interpreting the outcomes of our cross-species reporter assay. Reviewer 3 is correct that the imaginal disc mode of adult tissue (i.e. imaginal) development found in Diptera does not represent the imaginal development across Holometabola. 

      In fact, imaginal development is quite diverse among Holometabola. For instance, larval leg and antennal cells appear to directly develop into the adult legs and antennae in Coleoptera (i.e. primordial imaginal cells function as larval appendage cells), while some cells within the larval legs and antennae are set aside during larval development specifically for adult appendages in Lepidopteran species (i.e. imaginal cells exist within the larval appendages but do not contribute to the formation of larval appendages). In contrast, an almost entire set of cells that develop into adult epithelia are set aside as imaginal discs during embryogenesis in Diptera. Furthermore, the imaginal disc mode of development appears to have evolved independently in

      Hymenoptera. Therefore, determining how imaginal primordial tissues correspond to each other among Holometabola has been a challenging task and a topic of high interest within the evo-devo and entomology communities.

      Nevertheless, despite these differences in mode of imaginal development, decades of evo-devo studies suggest that the gene regulatory networks (GRNs) operating in imaginal primordial tissues appear to be fairly well conserved among holometabolan species (for example, see Tomoyasu et al. 2009 regarding wing development and Angelini et al. 2012 regarding leg development between flies and beetles). These outcomes imply that a significant portion of the transcriptional landscape might be conserved across different modes of imaginal development. Therefore, an enhancer functioning in the Tribolium larval leg tissue (which also functions as adult leg primordium) could be active even in the leg imaginal disc of Drosophila, if the trans factors essential for the activation of the enhancer are conserved between the two imaginal tissues. 

      That being said, we fully expect there to be both false negative and false positive results in our cross-species reporter assay. We are optimistic about the biological relevance of the positive outcomes of our crossspecies reporter assay, especially when the enhancer activity recapitulates the expression of the corresponding gene in Drosophila (for example, Am_ex Fig6B and Tc_hth Fig7B). Nonetheless, the biological relevance of these enhancer activities needs to be further verified in the native species through reporter assays, enhancer knock-outs, or similar experiments.

      In recognition of the Reviewer’s important point, we added the following caveat in our Discussion (lines 549553): “Furthermore, the unique imaginal disc mode of adult epithelial development in D. melanogaster  might have prevented some enhancers of other species from working properly in D. melanogaster imaginal discs, likely producing additional false negative results. Evaluating enhancer activities in the native species will allow us to address the degree of false negatives produced by the cross-species setting.” We moreover mention this caveat in the Results section when we first introduce the reporter assays (line 342).

      Line 580: This is the first time that the weakness of the closest-gene pairing approach is mentioned. This deserves mention earlier in the manuscript, as unfortunately, this is one of the major bottlenecks to this and any other approaches to investigating enhancer function. Could the authors address this earlier, perhaps pages 7-8, and provide citations for current understanding in the field of how often closest-gene pairing approaches correctly match enhancers to target genes? 

      We have added text as suggested on p.7-8 acknowledging the shortcomings of the closest-gene approach. We also clarify at the end of that section (lines 173-181) that target gene assignments, while useful for interpretation, have no bearing on the enhancer predictions themselves (which are generated prior to the target gene assignment steps).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editorial team for a thoughtful and constructive assessment. We appreciate all comments, and we try our best to respond appropriately to every reviewer’s queries below. It appears to us that one main worry was regarding appropriate modelling of the complex and rich structure of confounding variables in our movie task. 

      One recent approach fits large feature vectors that include confounding variables along the variable(s) of interest to the activity of each voxel in the brain to disentangle the contributions of each variable to the total recorded brain response. While these encoding models have yielded some interesting results, they have two major drawbacks which makes using them unfeasible for our purposes (as we explain in more detail below): first, by fitting large vectors to individual voxels, they tend to over-estimate effect size; second, they are very ineffective at unveiling group-level effects due to high variability between subjects. Another approach able to deal with at least the second of these worries is “inter-subject-correlation”. In this technique brain responses are recorded from multiple subjects while they are presented with natural stimuli. For each brain area, response time courses from different subjects are correlated to determine whether the responses are similar across subjects. Our “peak and valley” analysis is a special case of this analysis technique, as we explain in the manuscript and below. 

      For estimating individual-level brain-activation, we opted for an approach that adapts a classical method of analysing brain data – convolution - to naturalistic settings. Amplitude modulated deconvolution extends classical brain analysis tools in several ways to handle naturalistic data:

      (1) The method does not assume a fixed hemodynamic response function (HRF). Instead, it estimates the HRF over a specified time window from the data, allowing it to vary in amplitude based on the stimulus. This flexibility is crucial for naturalistic stimuli, where the timing and nature of brain responses can vary widely. 

      (2) The method only models the modulation of the amplitude of the HRF above its average with respect to the intensity or characteristics of the stimulus. 

      (3) By allowing variation in the response amplitude, non-linear relationships between the stimulus and brain-response can be captured. 

      It is true that amplitude modulated deconvolution does not come without its flaws – for example including more than a few nuisance regressors becomes computationally very costly. Getting to grips with naturalistic data (especially with fMRI recordings) continuous to be an active area of research and presents a new and exciting challenge. We hope that we can convince reviewers and editors with this response and the additional analyses and controls performed, that the evidence presented for the visual context dependent recruitment of brain areas for abstract and concrete conceptual processing is not incomplete. 

      Overview of Additional Analyses and Controls Performed by the Authors:

      (1) Individual-Level Peaks and Valleys Analysis (Supplementary Material, Figures S3, S4, and S5)

      (2) Test of non-linear correlations of BOLD responses related to features used in the Peak and Valley Analysis (Supplementary Material, Figures S6, S7)

      (3) Comparison of Psycholinguistic Variables Surprisal and Semantic Diversity between groups of words analysed (no significant differences found)  

      (4) Comparison of Visual Variables Optical Flow, Colour Saturation, and Spatial Frequency for 2s Context Window between groups of words analysed (no significant differences found)

      These controls are in addition to the five low-level nuisance regressors included in our model, which are luminance, loudness, duration, word frequency, and speaking rate (calculated as the number of phonemes divided by duration) associated with each analysed word. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Peaks and Valleys Analysis: 

      (1) Doesn't this method assume that the features used to describe each word, like valence or arousal, will be linearly different for the peaks and valleys? What about non-linear interactions between the features and how they might modulate the response? 

      Within-subject variability in BOLD response delays is typically about 1 second at most (Neumann et al., 2003). As individual words are presented briefly (a few hundred Ms at most) and the BOLD response to these stimuli falls within that window (1s/TR), any nonlinear interactions between word features and a participant’s BOLD response within that window are unlikely to significantly affect the detection of peaks and valleys.

      To quantitatively address the concern that non-linear modulations could manifest outside of that window, we include a new analysis in Figure S6, which compares the average BOLD responses of each participant in each cluster and each combination of features, showing that only a very few of all possible comparisons differ significantly from each other (~ 5000 combinations of features were significantly different from each other given an overall number of ~130.000 comparisons between BOLD responses to features, which amounts to 3.85%), suggesting that there are no relevant non-linear interactions between features. For a full list of the most non-linearly interacting features see Figure S7. 

      (2) Doesn't it also assume that the response to a word is infinitesimal and not spread across time? How does the chosen time window of analysis interact with the HRF? From the main figures and Figures S2-S3 there seem to be differences based on the timelag. 

      The Peak and Valley (P&V) method does not assume that the response to a word is infinitesimal or confined to an instantaneous moment. The units of analysis (words) fall within one TR, as they are at most hundreds of Ms long – for this reason, we are looking at one TR only. The response of each voxel at that TR will be influenced by the word of interest, as well as all other words that have been uttered within the 1s TR, and the multimodal features of the video stimulus that fall within that timeframe. So, in our P&V, we are not looking for an instantaneous response but rather changes in the BOLD signal that correspond to the presence of linguistic features within the stimuli. 

      The chosen time window of analysis interacts with the human response function (HRF) in the following way: the HRF unfolds over several seconds, typically peaking around 5-6 seconds after stimulus onset and returning to baseline within 20-30 seconds (Handwerker et al., 2004).

      Our P&V is designed to match these dynamics of fMRI data with the timing of word stimuli. We apply different lags (4s, 5s, and 6s) to account for the delayed nature of the HRF, ensuring that we capture the brain's response to the stimuli as it unfolds over time, rather than assuming an immediate or infinitesimal effect. We find that the P&V yields our expected results for a 5s and a 6s lag, but not a 4s lag. This is in line with literature suggesting that the HRF for a given stimulus peaks around 5-6s after stimulus onset (Handwerker et al., 2004). As we are looking at very short stimuli (a few hundred ms) it makes sense that the distribution of features would significantly change with different lags. The fact that we find converging results for both a 5s and 6s lag, suggests that the delay is somewhere between 5s and 6s. There is no way of testing this hypothesis with the resolution of our brain data, however (1 TR). 

      (3) Were the group-averaged responses used for this analysis? 

      Yes, the response for each cluster was averaged across participants. We now report a participant-level overview of the Peak and Valley analysis (lagged at 5s) with similar results as the main analysis in the supplementary material see Figures S3, S4, and S5.

      (4) Why don't the other terms identified in Figure 5 show any correspondence to the expected categories? What does this mean? Can the authors also situate their results with respect to prior findings as well as visualize how stable these results are at the individual voxel or participant level? It would also be useful to visualize example time courses that demonstrate the peaks and valleys. 

      The terms identified in figure 5 are sensorimotor and affective features from the combined Lancaster and Brysbaert norms. As for the main P&V analysis, we only recorded a cluster as processing a given feature (or term) when there were significantly more instances of words highly rated in that dimension occurring at peaks rather than valleys in the HRF. For some features/terms, there were never significantly more words highly rated on that dimension occurring at peaks compared to valleys, which is why some terms identified in figure 5 do not show any significant clusters.  We have now also clarified this in the figure caption. 

      We situate the method in previous literature in lines 289 – 296. In essence, it is a variant of the well-known method called “reverse correlation” first detailed in Hasson et al., 2004 (reference from the manuscript) and later adapter to a peak and valley analysis in Skipper et al., 2009 (reference from the manuscript). 

      We now present a more fine-grained characterisation of each cluster on an individual participant level in the supplementary material. We doubt that it would be useful to present an actual example time-course as it would only represent a fraction of over one hundred thousand analysed time-series. We do already present an exemplary time-course to demonstrate the method in Figure 1. 

      Estimating contextual situatedness: 

      (1) Doesn't this limit the analyses to "visual" contexts only? And more so, frequently recognized visual objects? 

      Yes, it was the point of this analysis to focus on visual context only, and it may be true that conducting the analysis in this way results in limiting it to objects that are frequently recognized by visual convolutional neural networks. However, the state-of-the-art strength of visual CNNs in recognising many different types of objects has been attested in several ways (He et al., 2015). Therefore, it is unlikely that the use of CNNs would bias the analysis towards any specific “frequently recognised” objects. 

      (2) The measure of situatedness is the cosine similarity of GloVe vectors that depend on word co-occurrence while the vectors themselves represent objects isolated by the visual recognition models. Expectedly, "science" and the label "book" or "animal" and the label "dog" will be close. But can the authors provide examples of context displacement? I wonder if this just picks up on instances where the identified object in the scene is unrelated to the word. How do the authors ensure that it is a displacement of context as opposed to the two words just being unrelated? This also has a consequence on deciding the temporal cutoff for consideration (2 seconds). 

      The cosine similarity is between the GloVe vectors of the word (that is situated or displaced) and the words referring to the objects identified by the visual recognition model. Therefore, the correlation is between more than just two vectors and both correlated representations depend on co-occurrence. The cosine similarity value reported is not from a comparison between GloVe vectors and vectors that are (visual) representations of objects from the visual recognition model. 

      A word is displaced if all the identified object-words in the defined context window (2s before word-onset) are unrelated to the word (_see lines 105-110 (pg. 5); lines 371-380 pg. 1516 and Figure 2 caption). Thus, a word is considered to be displaced if _all identified objects (not just two as claimed by the reviewer) in the scene are unrelated to the word. Given a context of 60 frames and an average of 5 identified objects per frame (i.e. an average candidate set of 300 objects that could be related) per word, the bar for “displacement” is set high. We provide some further considerations justifying the context window below in our responses to reviewers 2 and 3. 

      (3) While the introduction motivated the problem of context situatedness purely linguistically, the actual methods look at the relationship between recognized objects in the visual scene and the words. Can word surprisal or another language-based metric be used in place of the visual labeling? Also, it is not clear how the process identified in (2) above would come up with a high situatedness score for abstract concepts like "truth". 

      We disagree with the reviewer that the introduction motivated the problem of context situatedness purely linguistically, as we explicitly consider visual context in the abstract as well as the introduction. Examples in text include lines 71-74 and lines 105-115. This is also reflected in the cited studies that use visual context, including Kalenine et al., 2014; Hoffmann et al., 2013; Yee & Thompson-Schill, 2016; Hsu et al., 2011. However, we appreciate the importance of being very clear about this point, so we added various mentions of this fact at the beginning of the introduction to avoid confusion.

      We know that prior linguistic context (e.g. measured by surprisal) does affect processing. The point of the analysis was to use a non-language-based metric of visual context to understand how this affects conceptual representation in naturalist settings. Therefore, it is not clear to us why replacing this with a language-based metric such as surprisal would be an adequate substitution. However, the reviewer is correct that we did not control for the influence of prior context. We obtained surprisal values for each of our words but could not find any significant differences between conditions and therefore did not include this factor in the analyses conducted.  For considerations of differences in surprisal between each of the analysed sets of words, see the supplementary material.  

      The method would yield a high score of contextual situatedness for abstract concepts if there were objects in the scene whose GloVe embeddings have a close cosine distance to the GloVe embedding of that abstract word (e.g., “truth” and “book”). We believe this comment from the reviewer is rooted in a misconception of our method. They seem to think we compared GloVe vectors for the spoken word with vectors from a visual recognition model directly (in which case it is true that there would be a concern about how an abstract concept like “truth” could have a high situatedness). Apart from the fact that there would be concerns about the comparability of vectors derived from GloVe and a visual recognition model more generally, this present concern is unwarranted in our case, as we are comparing GloVe embeddings.  

      (4) It is a bit hard to see the overlapping regions in Figures 6A-C. Would it be possible to show pairs instead of triples? Like "abstract across context" vs. "abstract displaced"? Without that, and given (2) above, the results are not yet clear. Moreover, what happens in the "overlapping" regions of Figure 3? 

      To make this clearer, we introduced the contrasts (abstract situated vs displaced and concrete situated vs displaced) that were previously in the supplementary materials in the main text (now Figure 6, this was also requested by reviewer 2). We now show the overlap between the abstract situated (from the contrast in Figure 6) with concrete across context and the overlap between concrete displaced (from the contrast in Figure 6) with abstract across context separately in Figure 7. 

      The overlapping regions of Figure 3 indicate that both concrete and abstract concepts are processed in these regions (though at different time-points). We explain why this is a result of our deconvolution analysis on page 23:  

      “Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer time-frame. In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus.”

      Miscellaneous comments: 

      (1) In Figure 3, it is surprising that the "concrete-only" regions dominate the angular gyrus and we see an overrepresentation of this category over "abstract-only". Can the authors place their findings in the context of other studies? 

      The Angular Gyrus (AG) is hypothesised to be a general semantic hub; therefore it is not surprising that it should be active for general conceptual processing (and there is some overlap activation in posterior regions). We now situate our results in a wider range of previous findings in the results section under “Conceptual Processing Across Context”. 

      “Consistent with previous studies, we predicted that across naturalistic contexts, concrete and abstract concepts are processed in a separable set of brain regions. To test this, we contrasted concrete and abstract modulators at each time point of the IRF (Figure 3). This showed that concrete produced more modulation than abstract processing in parts of the frontal lobes, including the right posterior inferior frontal gyrus (IFG) and the precentral sulcus (Figure 3, red). Known for its role in language processing and semantic retrieval, the IFG has been hypothesised to be involved in the processing of action-related words and sentences, supporting both semantic decision tasks and the retrieval of lexical semantic information (Bookheimer, 2002; Hagoort, 2005). The precentral sulcus is similarly linked to the processing of action verbs and motor-related words (Pulvermüller, 2005). In the temporal lobes, greater modulation occurred in the bilateral transverse temporal gyrus and sulcus, planum polare and temporale. These areas, including primary and secondary auditory cortices, are crucial for phonological and auditory processing, with implications for the processing of sound-related words and environmental sounds (Binder et al., 2000). The superior temporal gyrus (STG) and sulcus (STS) also showed greater modulation for concrete words and these are said to be central to auditory processing and the integration of phonological, syntactic, and semantic information, with a particular role in processing meaningful speech and narratives (Hickok & Poeppel, 2007). In the parietal and occipital lobes, more concrete modulated activity was found bilaterally in the precuneus, which has been associated with visuospatial imagery, episodic memory retrieval, and self-processing operations and has been said to contribute to the visualisation aspects of concrete concepts (Cavanna & Trimble, 2006). More activation was also found in large swaths of the occipital cortices (running into the inferior temporal lobe), and the ventral visual stream. These regions are integral to visual processing, with the ventral stream (including areas like the fusiform gyrus) particularly involved in object recognition and categorization, linking directly to the visual representation of concrete concepts (Martin, 2007). Finally, subcortically, the dorsal and posterior medial cerebellum were more active bilaterally for concrete modulation. Traditionally associated with motor function, some studies also implicate the cerebellum in cognitive and linguistic processing, including the modulation of language and semantic processing through its connections with cerebral cortical areas (Stoodley & Schmahmann, 2009).

      Conversely, activation for abstract was greater than concrete words in the following regions (Figure 3, blue): In the frontal lobes, this included right anterior cingulate gyrus, lateral and medial aspects of the superior frontal gyrus. Being involved in cognitive control, decision-making, and emotional processing, these areas may contribute to abstract conceptualization by integrating affective and cognitive components (Shenhav et al., 2013). More left frontal activity was found in both lateral and medial prefrontal cortices, and in the orbital gyrus, regions which are key to social cognition, valuation, and decision-making, all domains rich in abstract concepts (Amodio & Frith, 2006). In the parietal lobes, bilateral activity was greater in the angular gyri (AG) and inferior parietal lobules, including the postcentral gyrus. Central to the default mode network, these regions are implicated in a wide range of complex cognitive functions, including semantic processing, abstract thinking, and integrating sensory information with autobiographical memory (Seghier, 2013). In the temporal lobes, activity was restricted to the STS bilaterally, which plays a critical role in the perception of intentionality and social interactions, essential for understanding abstract social concepts (Frith & Frith, 2003). Subcortically, activity was greater, bilaterally, in the anterior thalamus, nucleus accumbens, and left amygdala for abstract modulation. These areas are involved in motivation, reward processing, and the integration of emotional information with memory, relevant for abstract concepts related to emotions and social relations (Haber & Knutson, 2010, Phelps & LeDoux, 2005).

      Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer time-frame (for a comparison of significant timing differences see figure S9). In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. Left IFG is prominently involved in semantic processing, particularly in tasks requiring semantic selection and retrieval and has been shown to play a critical role in accessing semantic memory and resolving semantic ambiguities, processes that are inherently time-consuming and reflective of the extended processing time for abstract concepts (Thompson-Schill et al., 1997; Wagner et al., 2001; Hofman et al., 2015). The STG, particularly its posterior portion, is critical for the comprehension of complex linguistic structures, including narrative and discourse processing. The processing of abstract concepts often necessitates the integration of contextual cues and inferential processing, tasks that engage the STG and may extend the temporal dynamics of semantic processing (Ferstl et al., 2008; Vandenberghe et al., 2002). In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus, which is associated with primary visual processing (Kanwisher et al., 1997; Kosslyn et al., 2001).”

      The finding that concrete concepts activate more brain voxels compared to abstract concepts is generally aligned with existing research, which often reports more extensive brain activation for concrete versus abstract words. This is primarily due to the richer sensory and perceptual associations tied to concrete concepts - see for example Binder et al., 2005 (figure 2 in the paper). Similarly, a recent meta-analysis by Bucur & Pagano (2021) consistently found wider activation networks for the “concrete > abstract” contrast compared to the “abstract > concrete contrast”.   

      (2) The following line (Pg 21) regarding the necessary differences in time for the two categories was not clear. How does this fall out from the analysis method? 

      - Both categories overlap **(though necessarily at different time points)** in regions typically associated with word processing - 

      This is answered in our response above to point (4) in the reviewer’s comments. We now also provide more information on the temporal differences in the supplementary material (Figure S9). 

      Reviewer #2 (Public Review):

      The critical contrasts needed to test the key hypothesis are not presented or not presented in full within the core text. To test whether abstract processing changes when in a situated context, the situated abstract condition would first need to be compared with the displaced abstract condition as in Supplementary Figure 6. Then to test whether this change makes the result closer to the processing of concrete words, this result should be compared to the concrete result. The correlations shown in Figure 6 in the main text are not focused on the differences in activity between the situated and displaced words or comparing the correlation of these two conditions with the other (concrete/abstract) condition. As such they cannot provide conclusive evidence as to whether the context is changing the processing of concrete/abstract words to be closer to the other condition. Additionally, it should be considered whether any effects reflect the current visual processing only or more general sensory processing. 

      The reviewer identifies the critical contrast as follows:

      “The situated abstract condition would first need to be contrasted with the displaced abstract condition. Then, these results should be compared to the concrete result.” 

      We can confirm that this is indeed what had been done and we believe the reviewer’s confusion stems from a lack of clarity on our behalf. We have now made various clarifications on this point in the manuscript, and we changed the figures to make clear that our results are indeed based on the contrasts identified by this reviewer as the essential ones.

      Figure 6 in the main text now reflects the contrast between situated and displaced abstract and concrete conditions (as requested by the reviewer, this was previously Figure S7 from the supplementary material). To compare the results from this contrast to conceptual processing across context, we use cosine similarity, and we mention these results in the text. We furthermore show the overlap between the conditions of interest (abstract situated x concrete across context; concrete displaced x abstract across context) in a new figure (Figure 7) to bring out the spatial distribution of overlap more clearly.

      We also discussed the extent to which these effects reflect current visual processing only or more general sensory processing in lines 863 – 875 (pg. 33 and 34).   

      “In considering the impact of visual context on the neural encoding of concepts generally, it is furthermore essential to recognize that the mechanisms observed may extend beyond visual processing to encompass more general sensory processing mechanisms. The human brain is adept at integrating information across sensory modalities to form coherent conceptual representations, a process that is critical for navigating the multimodal nature of real-world experiences (Barsalou, 2008; Smith & Kosslyn, 2007). While our findings highlight the role of visual context in modulating the neural representation of abstract and concrete words, similar effects may be observed in contexts that engage other sensory modalities. For instance, auditory contexts that provide relevant sound cues for certain concepts could potentially influence their neural representation in a manner akin to the visual contexts examined in this study. Future research could explore how different sensory contexts, individually or in combination, contribute to the dynamic neural encoding of concepts, further elucidating the multimodal foundation of semantic processing.”

      Overall, the study would benefit from being situated in the literature more, including a) a more general understanding of the areas involved in semantic processing (including areas proposed to be involved across different sensory modalities and for verbal and nonverbal stimuli), and b) other differences between abstract and concrete words and whether they can explain the current findings, including other psycholinguistic variables which could be included in the model and the concept of semantic diversity (Hoffman et al.,). It would also be useful to consider whether difficulty effects (or processing effort) could explain some of the regional differences between abstract and concrete words (e.g., the language areas may simply require more of the same processing not more linguistic processing due to their greater reliance on word co-occurrence). Similarly, the findings are not considered in relation to prior comparisons of abstract and concrete words at the level of specific brain regions. 

      We now present an overview of the areas involved in semantic processing (across different sensory modalities for verbal and nonverbal stimuli) when we first present our results (section: “Conceptual Processing Across Context”).

      We looked at surprisal as a potential cofound and found no significant differences between any of the set of words analysed. Mean surprisal of concrete words is 22.19, mean surprisal of abstract words is 21.86. Mean surprisal ratings for concrete situated words are 21.98 bits, 22.02 bits for the displaced concrete words, 22.10 for the situated abstract words and 22.25 for the abstract displaced words. We also calculated the semantic diversity of all sets of words and found now significant differences between the sets. The mean values for each condition are: abstract_high (2.02); abstract_low (1.95); concrete_high (1.88); concrete_low (2.19); abstract_original (1.96); concrete_original (1.92). Hence processing effort related to different predictability (surprisal), or greater semantic diversity cannot explain our findings. 

      We submit that difficulty effects do not explain any aspects of the activation found for conceptual processing, because we included word frequency in our model as a nuisance regressor and found no significant differences associated with surprisal. Previous work shows that surprisal (Hale, 2001) and word frequency (Brysbaert & New, 2009) are good controls for processing difficulty.

      Finally, we added considerations of prior findings comparing abstract and concrete words at the level of specific brain regions to the discussion (section: Conceptual Processing Across Context). 

      The authors use multiple methods to provide a post hoc interpretation of the areas identified as more involved in concrete, abstract, or both (at different times) words. These are designed to reduce the interpretation bias and improve interpretation, yet they may not successfully do so. These methods do give some evidence that sensory areas are more involved in concrete word processing. However, they are still open to interpretation bias as it is not clear whether all the evidence is consistent with the hypotheses or if this is the best interpretation of individual regions' involvement. This is because the hypotheses are provided at the level of 'sensory' and 'language' areas without further clarification and areas and terms found are simply interpreted as fitting these definitions. For instance, the right IFG is interpreted as a motor area, and therefore sensory as predicted, and the term 'autobiographical memory' is argued to be interoceptive. Language is associated with the 'both' cluster, not the abstract cluster, when abstract >concrete is expected to engage language more. The areas identified for both vs. abstract>concrete are distinguished in the Discussion through the description as semantic vs. language areas, but it is not clear how these are different or defined. Auditory areas appear to be included in the sensory prediction at times and not at others. When they are excluded, the rationale for this is not given. Overall, it is not clear whether all these areas and terms are expected and support the hypotheses. It should be possible to specify specific sensory areas where concrete and abstract words are predicted to be different based on a) prior comparisons and/or b) the known locations of sensory areas. Similarly, language or semantic areas could be identified using masks from NeuroSynth or traditional metaanalyses.  A language network is presented in Supplementary Figure 7 but not interpreted, and its source is not given. 

      “The language network” was extracted through neurosynth and projected onto the “overlap” activation map with AFNI. We now specify this in the figure caption. 

      Alternatively, there could be a greater interpretation of different possible explanations of the regions found with a more comprehensive assessment of the literature. The function of individual regions and the explanation of why many of these areas are interpreted as sensory or language areas are only considered in the Discussion when it could inform whether the hypotheses have been evidenced in the results section. 

      We added extended considerations of this to the results (as requested by the reviewer) in the section “Conceptual Processing Across Contexts”. 

      “Consistent with previous studies, we predicted that across naturalistic contexts, concrete and abstract concepts are processed in a separable set of brain regions. To test this, we contrasted concrete and abstract modulators at each time point of the IRF (Figure 3). This showed that concrete produced more modulation than abstract processing in parts of the frontal lobes, including the right posterior inferior frontal gyrus (IFG) and the precentral sulcus (Figure 3, red). Known for its role in language processing and semantic retrieval, the IFG has been hypothesised to be involved in the processing of action-related words and sentences, supporting both semantic decision tasks and the retrieval of lexical semantic information (Bookheimer, 2002; Hagoort, 2005). The precentral sulcus is similarly linked to the processing of action verbs and motor-related words (Pulvermüller, 2005). In the temporal lobes, greater modulation occurred in the bilateral transverse temporal gyrus and sulcus, planum polare and temporale. These areas, including primary and secondary auditory cortices, are crucial for phonological and auditory processing, with implications for the processing of sound-related words and environmental sounds (Binder et al., 2000). The superior temporal gyrus (STG) and sulcus (STS) also showed greater modulation for concrete words and these are said to be central to auditory processing and the integration of phonological, syntactic, and semantic information, with a particular role in processing meaningful speech and narratives (Hickok & Poeppel, 2007). In the parietal and occipital lobes, more concrete modulated activity was found bilaterally in the precuneus, which has been associated with visuospatial imagery, episodic memory retrieval, and self-processing operations and has been said to contribute to the visualisation aspects of concrete concepts (Cavanna & Trimble, 2006). More activation was also found in large swaths of the occipital cortices (running into the inferior temporal lobe), and the ventral visual stream. These regions are integral to visual processing, with the ventral stream (including areas like the fusiform gyrus) particularly involved in object recognition and categorization, linking directly to the visual representation of concrete concepts (Martin, 2007). Finally, subcortically, the dorsal and posterior medial cerebellum were more active bilaterally for concrete modulation. Traditionally associated with motor function, some studies also implicate the cerebellum in cognitive and linguistic processing, including the modulation of language and semantic processing through its connections with cerebral cortical areas (Stoodley & Schmahmann, 2009).

      Conversely,  activation for abstract was greater than concrete words in the following regions (Figure 3, blue): In the frontal lobes, this included right anterior cingulate gyrus, lateral and medial aspects of the superior frontal gyrus. Being involved in cognitive control, decisionmaking, and emotional processing, these areas may contribute to abstract conceptualization by integrating affective and cognitive components (Shenhav et al., 2013). More left frontal activity was found in both lateral and medial prefrontal cortices, and in the orbital gyrus, regions which are key to social cognition, valuation, and decision-making, all domains rich in abstract concepts (Amodio & Frith, 2006). In the parietal lobes, bilateral activity was greater in the angular gyri (AG) and inferior parietal lobules, including the postcentral gyrus. Central to the default mode network, these regions are implicated in a wide range of complex cognitive functions, including semantic processing, abstract thinking, and integrating sensory information with autobiographical memory (Seghier, 2013). In the temporal lobes, activity was restricted to the STS bilaterally, which plays a critical role in the perception of intentionality and social interactions, essential for understanding abstract social concepts (Frith & Frith, 2003). Subcortically, activity was greater, bilaterally, in the anterior thalamus, nucleus accumbens, and left amygdala for abstract modulation. These areas are involved in motivation, reward processing, and the integration of emotional information with memory, relevant for abstract concepts related to emotions and social relations (Haber & Knutson, 2010, Phelps & LeDoux, 2005).

      Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer timeframe (for a comparison of significant timing differences see figure S9). In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. Left IFG is prominently involved in semantic processing, particularly in tasks requiring semantic selection and retrieval and has been shown to play a critical role in accessing semantic memory and resolving semantic ambiguities, processes that are inherently timeconsuming and reflective of the extended processing time for abstract concepts (ThompsonSchill et al., 1997; Wagner et al., 2001; Hofman et al., 2015). The STG, particularly its posterior portion, is critical for the comprehension of complex linguistic structures, including narrative and discourse processing. The processing of abstract concepts often necessitates the integration of contextual cues and inferential processing, tasks that engage the STG and may extend the temporal dynamics of semantic processing (Ferstl et al., 2008; Vandenberghe et al., 2002). In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus, which is associated with primary visual processing (Kanwisher et al., 1997; Kosslyn et al., 2001).”

      Additionally, these methods attempt to interpret all the clusters found for each contrast in the same way when they may have different roles (e.g., relate to different senses). This is a particular issue for the peaks and valleys method which assesses whether a significantly larger number of clusters is associated with each sensory term for the abstract, concrete, or both conditions than the other conditions. The number of clusters does not seem to be the right measure to compare. Clusters differ in size so the number of clusters does not represent the area within the brain well. Nor is it clear that many brain regions should respond to each sensory term, and not just one per term (whether that is V1 or the entire occipital lobe, for instance). The number of clusters is therefore somewhat arbitrary. This is further complicated by the assessment across 20 time points and the inclusion of the 'both' categories. It would seem more appropriate to see whether each abstract and concrete cluster could be associated with each different sensory term and then summarise these findings rather than assess the number of abstract or concrete clusters found for each independent sensory term. In general, the rationale for the methods used should be provided (including the peak and valley method instead of other possible options e.g., linear regression). 

      We included an assessment of whether each abstract and concrete cluster could be associated with each different sensory term and then summarised these findings on a participant level in the supplementary material (Figures S3, S4, and S5). 

      Rationales for the Amplitude Modulated Deconvolution are now provided on page 10 (specifically the first paragraph under “Deconvolution Analysis” in the Methods section) and for the P&V on pages 13, 14 and 15 (under “Peaks and Valley” (particularly the first paragraph) in the Methods section). 

      The measure of contextual situatedness (how related a spoken word is to the average of the visually presented objects in a scene) is an interesting approach that allows parametric variation within naturalistic stimuli, which is a potential strength of the study. This measure appears to vary little between objects that are present (e.g., animal or room), and those that are strongly (e.g., monitor) or weakly related (e.g., science). Additional information validating this measure may be useful, as would consideration of the range of values and whether the split between situated (c > 0.6) and displaced words (c < 0.4) is sufficient.  

      The main validation of our measure of contextual situatedness derives from the high accuracy and reliability of CNNs in object detection and recognition tasks, as demonstrated in numerous benchmarks and real-world applications. 

      One reason for low variability in our measure of contextual situatedness is the fact that we compared the GloVe vector of each word of interest with an average GloVe vector of all object-words referring to objects present in 56 frames (~300 objects on average). This means that a lot of variability in similarity measures between individual object-words and the word of interest is averaged out. Notwithstanding the resulting low variability of our measure, we thought that this would be the more conservative approach, as even small differences between individual measures (e.g. 0.4 vs 0.6) would constitute a strong difference on average (across the 300 objects per context window).  Therefore, this split ensures a sufficient distinction between words that are strongly related to their visual context and those that are not – which in turn allows us to properly investigate the impact of contextual relevance on conceptual processing.

      Finally, the study assessed the relation of spoken concrete or abstract words to brain activity at different time points. The visual scene was always assessed using the 2 seconds before the word, while the neural effects of the word were assessed every second after the presentation for 20 seconds. This could be a strength of the study, however almost no temporal information was provided. The clusters shown have different timings, but this information is not presented in any way. Giving more temporal information in the results could help to both validate this approach and show when these areas are involved in abstract or concrete word processing. 

      We provide more information on the temporal differences of when clusters are involved in processing concrete and abstract concepts in the supplementary material (Figure S9) and refer to this information where relevant in the Methods and Results sections. 

      Additionally, no rationale was given for this long timeframe which is far greater than the time needed to process the word, and long after the presence of the visual context assessed (and therefore ignores the present visual context). 

      The 20-second timeframe for our deconvolution analysis is justified by several considerations. Firstly, the hemodynamic response function (HRF) is known to vary both across individuals and within different regions of the brain. To accommodate this variability and capture the full breadth of the HRF, including its rise, peak, and return to baseline, a longer timeframe is often necessary. The 20-second window ensures that we do not prematurely truncate the HRF, which could lead to inaccurate estimations of neural activity related to the processing of words. Secondly and related to this point, unlike model-based approaches that assume a canonical HRF shape, our deconvolution analysis does not impose a predefined form on the HRF, instead reconstructing the HRF from the data itself – for this, a longer time-frame is advantageous to get a better estimation of the true HRF. Finally, and related to this point, the use of the 'Csplin' function in our analysis provides a flexible set of basis functions for deconvolution, allowing for a more fine-grained and precise estimation of the HRF across this extended timeframe. The 'Csplin' function offers more interpolation between time points, which is particularly advantageous for capturing the nuances of the HRF as it unfolds over a longer time-frame. 

      Although we use a 20-second timeframe for the deconvolution analysis to capture the full HRF, the analysis is still time-locked to the onset of each visual stimulus. This ensures that the initial stages of the HRF are directly tied to the moment the word is presented, thus incorporating the immediate visual context. We furthermore include variables that represent aspects of the visual context at the time of word presentation in our models (e.g luminance) and control for motion (optical flow), colour saturation and spatial frequency of immediate visual context. 

      Reviewer #3 (Public Review):

      The context measure is interesting, but I'm not convinced that it's capturing what the authors intended. In analysing the neural response to a single word, the authors are presuming that they have isolated the window in which that concept is processed and the observed activation corresponds to the neural representation of that word given the prior context. I question to what extent this assumption holds true in a narrative when co-articulation blurs the boundaries between words and when rapid context integration is occurring. 

      We appreciate the reviewer's critical perspective on the contextual measure employed in our study. We agree that the dynamic and continuous nature of narrative comprehension poses challenges for isolating the neural response to individual words. However, the use of an amplitude modulated deconvolution analysis, particularly with the CSPLIN function, is a methodological choice to specifically address these challenges. Deconvolution allows us to estimate the hemodynamic response function (HRF) without assuming its canonical shape, capturing nuances in the BOLD signal that may reflect the integration of rapid contextual shifts (only beyond the average modulation of the BOLD signal. The CSPLIN function further refines this approach by offering a flexible basis set for modelling the HRF and by providing a detailed temporal resolution that can adapt to the variance in individual responses. 

      Our choice of a 20-second window is informed by the need to encompass not just the immediate response to a word but also the extended integration of the contextual information. This is consistent with evidence indicating that the brain integrates information over longer timescales when processing language in context (Hasson et al., 2015). The neural representation of a word is not a static snapshot but a dynamic process that evolves with the unfolding narrative. 

      Further, the authors define context based on the preceding visual information. I'm not sure that this is a strong manipulation of the narrative context, although I agree that it captures some of the local context. It is maybe not surprising that if a word, abstract or concrete, has a strong association with the preceding visual information then activation in the occipital cortex is observed. I also wonder if the effects being captured have less to do with concrete and abstract concepts and more to do with the specific context the displaced condition captures during a multimodal viewing paradigm. If the visual information is less related to the verbal content, the viewer might process those narrative moments differently regardless of whether the subsequent word is concrete or abstract. I think the claims could be tailored to focus less generally on context and more specifically on how visually presented objects, which contribute to the ongoing context of a multimodal narrative, influence the subsequent processing of abstract and concrete concepts.

      The context measure, though admittedly a simplification, is designed to capture the local visual context preceding word presentation. By using high-confidence visual recognition models, we ensure that the visual information is reliably extracted and reflects objects that have a strong likelihood of influencing the processing of subsequent words. We acknowledge that this does not capture the full richness of narrative context; however, it provides a quantifiable and consistent measure of the immediate visual environment, which is an important aspect of context in naturalistic language comprehension.

      With regards to the effects observed in the occipital cortex, we posit that while some activation might be attributable to the visual features of the narrative, our findings also reflect the influence of these features on conceptual processing. This is especially because our analysis only looks at the modulation of the HRF amplitude beyond the average response (so also beyond the average visual response) when contrasting between conditions of high and low visual-contextual association with important (audio-visual) control variables included in the model. 

      Lastly, we concur that both concrete and abstract words are processed within a multimodal narrative, which could influence their neural representation. We believe our approach captures a meaningful aspect of this processing, and we have refined our claims to specify the influence of visually presented objects on the processing of abstract and concrete concepts, rather than making broader assertions about multimodal context. We also highlight several other signals (e.g. auditory) that could influence processing. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The approach taken here requires a lot of manual variable selection and seems a bit roundabout. Why not build an encoding model that can predict the BOLD time course of each voxel in a participant from the feature-of-interest like valence etc. and then analyze if (1) certain features better predict activity in a specific region (2) the predicted responses/regression parameters are more positive (peaks) or more negative (valleys) for certain features in a specific brain region (3) maybe even use contextual features use a large language model and then per word (like "truth") analyze where the predicted responses diverge based on the associated context. This seems like a simpler approach than having multiple stages of analysis. 

      It is not clear to us why an encoding model would be more suitable for answering the question at hand (especially given that we tried to clarify concerns about non-linear relationships between variables). On the contrary, fitting a regression model to each individual voxel has several drawbacks. First, encoding models are prone to over-estimate effect sizes (Naselaris et al., 2011). Second, encoding models are not good at explaining group-level effects due to high variability between individual participants (Turner et al., 2018). We would also like to point out that an encoding model using features of a text-based LLM would not address the visual context question - unless the LLM was multimodal. Multimodal LLMs are a very recent research development in Artificial Intelligence, however, and models like LLaMA (adapter), Google’s Gemini, etc. are not truly multimodal in the sense that would be useful for this study, because they are first trained on text and later injected with visual data. This relates to our concern that the reviewer may have misunderstood that we are interested in purely visual context of words (not linguistic context).

      (2) In multiple analyses, a subset of the selected words is sampled to create a balanced set between the abstract and concrete categories. Do the authors show standard deviation across these sets? 

      For the subset of words used in the context-based analyses, we give mean ratings of concreteness, log frequency and length and conduct a t-test to show that these variables are not significantly different between the sets. We also included the psycholinguistic control variables surprisal and semantic diversity, as well as the visual variables motion (optical flow), colour saturation and spatial frequency.  

      Reviewer #2 (Recommendations For The Authors):

      Figures S3-5 are central to the argument and should be in the main text (potentially combined).  

      These have been added to the main text

      S5 says the top 3 terms are DMN (and not semantic control), but the text suggests the r value is higher for 'semantic control' than 'DMN'? 

      Fixed this in the text, the caption now reads: 

      “This was confirmed by using the neurosynth decoder on the unthresholded brain image - top keywords were “Semantic Control” and “DMN”.”

      Fig. S7 is very hard to see due to the use of grey on grey. Not used for great effect in the final sentence, but should be used to help interpret areas in the results section (if useful). It has not been specified how the 'language network' has been identified/defined here. 

      We altered the contrast in the figure to make boundaries more visible and specified how the language network was identified in the figure caption. 

      In the Results 'This showed that concrete produced more modulation than abstract modulation in the frontal lobes,' should be parts of /some of the frontal lobes as this isn't true overall. 

      Fixed this in the text.  

      There are some grammatical errors and lack of clarity in the context comparison section of the results. 

      Fixed these in the text.

      Reviewer #3 (Recommendations For The Authors):

      •  The analysis code should be shared on the github page prior to peer review.  

      The code is now shared under: https://github.com/ViktorKewenig/Naturalistic_Encoding_Concepts

      •  At several points throughout the methods section, information was referred to that had not yet been described. Reordering the presentation of this information would greatly improve interpretability. A couple of examples of this are provided below. 

      Deconvolution Analysis: the use of amplitude modulation regression was introduced prior to a discussion of using the TENT function to estimate the shape of the HRF. This was then followed by a discussion of the general benefits of amplitude modulation. Only after these paragraphs are the modulators/model structure described. Moving this information to the beginning of the section would make the analysis clearer from the onset. 

      Fixed this in the text

      Peak and Valley Analysis: the hypotheses regarding the sensory-motor features and experiential features are provided prior to describing how these features were extracted from the data (e.g., using the Lancaster norms). 

      Fixed this in the text.

      •  The justification for and description of the IRF approach seems overdone considering the timing differences are not analyzed further or discussed. 

      We now present a further discussion of timing differences in the supplementary material.

      •  The need and suitability of the cluster simulation method as implemented were not clear. The resulting maps were thresholded at 9 different p values and then combined, and an arbitrary cluster threshold of 20 voxels was then applied. Why not use the standard approach of selecting the significance threshold and corresponding cluster size threshold from the ClustSim table? 

      We extracted the original clusters at 9 different p values with the corresponding cluster size from the ClustSim table, then only included clusters that were bigger than 20 voxels.  

      •  Why was the center of mass used instead of the peak voxel? 

      Peak voxel analysis can be sensitive to noise and may not reliably represent the region's activation pattern, especially in naturalistic imaging data where signal fluctuations are more variable and outliers more frequent. The centre of mass provides a more stable and representative measure of the underlying neural activity. Another reason for using the center of mass is that it better represents the anatomical distribution of the data, especially in large clusters with more than 100 voxels where peak voxels are often located at the periphery. 

      • Figure 1 seems to reference a different Figure 1 that shows the abstract, concrete, and overlap clusters of activity (currently Figure 3). 

      Fixed this in the text.

      • Table S1 seems to have the "Touch" dimension repeated twice with different statistics reported. 

      Fixed this in the text, the second mention of the dimension “touch” was wrong.  

      • It appears from the supplemental files that the Peaks and Valley analysis produces different results at different lag times. This might be expected but it's not clear why the results presented in the main text were chosen over those in the supplemental materials. 

      The results in the main text were chosen over those in the supplementary material, because the HRF is said to peak at 5s after stimulus onset. We added a specification of this rational to the “2. Peak and Valley Analysis” subsection in the Methods section.  

      References (in order of appearance) 

      (1) Neumann J, Lohmann G, Zysset S, von Cramon DY. Within-subject variability of BOLD response dynamics. Neuroimage. 2003 Jul;19(3):784-96. doi: 10.1016/s10538119(03)00177-0. PMID: 12880807.

      (2) Handwerker DA, Ollinger JM, D'Esposito M. Variation of BOLD hemodynamic responses across subjects and brain regions and their effects on statistical analyses. Neuroimage. 2004 Apr;21(4):1639-51. doi: 10.1016/j.neuroimage.2003.11.029. PMID: 15050587.

      (3) Binder JR, Westbury CF, McKiernan KA, Possing ET, Medler DA. Distinct brain systems for processing concrete and abstract concepts. J Cogn Neurosci. 2005 Jun;17(6):90517. doi: 10.1162/0898929054021102. PMID: 16021798

      (4) Bucur, M., Papagno, C. An ALE meta-analytical review of the neural correlates of abstract and concrete words. Sci Rep 11, 15727 (2021). heps://doi.org/10.1038/s41598-021-94506-9 

      (5) Hale., J. 2001. A probabilistic earley parser as a psycholinguistic model. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies (NAACL '01). Association for Computational Linguistics, USA, 1–8. heps://doi.org/10.3115/1073336.1073357

      (6) Brysbaert, M., New, B. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41, 977–990 (2009). heps://doi.org/10.3758/BRM.41.4.977 

      (7) Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., & Malach, R. (2004). Intersubject Synchronization of Cortical Activity During Natural Vision. Science, 303(5664), 6.

      (8) Naselaris T, Kay KN, Nishimoto S, Gallant JL. Encoding and decoding in fMRI. Neuroimage. 2011 May 15;56(2):400-10. doi: 10.1016/j.neuroimage.2010.07.073. Epub 2010 Aug 4. PMID: 20691790; PMCID: PMC3037423.

      (9) Turner BO, Paul EJ, Miller MB, Barbey AK. Small sample sizes reduce the replicability of task-based fMRI studies. Commun Biol. 2018 Jun 7;1:62. doi: 10.1038/s42003-0180073-z. PMID: 30271944; PMCID: PMC6123695.

      (10) He, K., Zhang, Y., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. Bioarchive (Tech Report). heps://doi.org/heps://doi.org/10.48550/arXiv.1512.03385

      (11) Hasson, U., & Egidi, G. (2015). What are naturalistic comprehension paradigms teaching us about language? In R. M. Willems (Ed.), Cognitive neuroscience of natural language use (pp. 228–255). Cambridge University Press. heps://doi.org/10.1017/CBO9781107323667.011

    1. Reviewer #3 (Public review):

      Summary:

      In this study, the authors studied the effects of traumatic brain injury created by LFPI procedure on the CA1 at the network level. The major findings in this study seem to be that the TBI reduces theta and gamma powers in CA1, reduces phase-amplitude coupling in between theta and gamma bands as well as disrupts the gamma entrainment of interneurons. I think the authors have made some important discoveries that could help advance the understanding of TBI effects at the physiological level, however, more investigations into deciphering the relationship of the behavioral and brain states to the observed effects would help clarify the interpretations for the readers.

      Strengths:

      The authors in this study were able to combine behavioral verification of the TBI model with the laminar electrophysiological recordings of the CA1 region to bring forward network-level anomalies such as the temporal coordination of network-level oscillations as well as in the firing of the interneurons. Indeed, it seems that the findings may serve future studies to functionally better understand and/or refine the therapies for the TBI.

      Weaknesses:

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensory-motor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

    2. Author response:

      We would like to thank the editors and reviewers for their constructive feedback, and we look forward to addressing their comments in the revised manuscript. We also appreciate the acknowledgment that the use of laminar electrodes in awake-behaving animals is an important advancement for the TBI community, and that our results provide a potential physiological link between coalescing TBI pathologies and cognitive deficits. We believe that integrating the reviewer comments will help to make our analyses even more rigorous and will improve the overall manuscript. Please find comments related to specific concerns raised in the public review below:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion… It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments.

      Previous mechanistic and physiological studies suggested interneuronal dysfunction following TBI that we hypothesized would disrupt oscillatory dynamics underlying temporal coding (single unit entrainment to theta/gamma, phase precession, and phase-amplitude coupling). These are known to support hippocampal-dependent learning and memory tasks such as the Morris Water Maze. While we did not record during a goal-directed behavioral task, the goal of recording in a familiar and novel environment was to assess remapping across these environments. Unfortunately, occupancy in the two environments was not high enough to rigorously characterize place cell specificity and phase precession or and investigate remapping, although putative place cells were identified. Despite this shortcoming, we were still able to confirm that the spike timing of interneurons relative to hippocampal oscillations was disrupted which we believe underlies the massive reduction in theta-gamma phase amplitude coupling reported. This opens the door to more strongly hypothesis-driven, mechanistic studies (i.e. closed loop stimulation) to alter the spike timing of interneurons relative to theta phase and potentially rescue these effects on phase amplitude coupling and behavior.

      The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported… There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units.

      The number of rats used for the spatial working memory task was reported in the text and Figure legend where the statistics were reported, but we will ensure that the statistics are more completely reported by including relevant statistical results and parameters outside of the test used and p-value. Additionally, we will report the number of units recorded per animal.

      Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group

      The spatial working memory deficit that we report in the Morris Water Maze is not a novel finding and has been demonstrated numerous times in this TBI model. Our goal in including this was to increase the rigor of the study by verifying this deficit in our hands at the injury level used for these physiology experiments. The dissociation between spatial working memory deficits and other motor, motivational, or sensory deficits from TBI in the Morris Water Maze (e.g. swim speed and escape latency with visible platforms) has been well characterized in this TBI model at many injury levels including more severe injuries than those used in this study. We will address this in the Discussion as it is an important point.

      The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

      We agree that there is a broadband downward shift in power following TBI especially in the pyramidal cell layer. We will include a normalization of the power spectra in order to specifically compare the theta and gamma bands between sham and injured rats and include discussion about the broadband decrease.

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensory-motor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

      We agree that changes in hippocampal physiology that we report could arise due to disrupted inputs from TBI, and this study is inherently limited due to recording exclusively from CA1. We chose to record from the hippocampus due to its importance for learning and memory, and its vulnerability in TBI. Future studies will investigate how hippocampal afferents are affected by injury, and we hope that the layer-specific changes we report will help to inform which inputs may be preferentially disrupted. Importantly, these inputs along with local processing within the hippocampus change drastically depending on the behavior of the animal. We will more rigorously assess movement and the behavioral state of the rats when comparing physiological properties, especially the firing rates reported in Figure 3.

    1. CodeSandbox is a cloud development platform that empowers developers to code, collaborate and ship projects of any size from any device in record time.

      .is.a - on line in browser node development environment

    1. CodeSandbox is a cloud development platform that empowers developers to code, collaborate and ship projects of any size from any device in record time

      .is.a - in browser node code development platform

    1. Social workers treat each person in a caring and respectful fashion, mindful of individual differences and cultural and ethnic diversity. Social workers promote clients’ socially responsible self-determination. Social workers seek to enhance clients’ capacity and opportunity to change and to address their own needs. Social workers are cognizant of their dual responsibility to clients and to the broader society. They seek to resolve conflicts between clients’ interests and the broader society’s interests in a socially responsible manner consistent with the values, ethical principles, and ethical standards of the profession.

      Structural inequality/ power imbalances raise quite a few questions for me, especially when it comes to personal biases. How can we check those at the door, and acknowledge the way we are navigating our roles as social workers? I think it would be helpful if the code of ethics went into more detail about what these balances may mean, and subtle things they may look like.

    2. Social workers treat each person in a caring and respectful fashion, mindful of individual differences and cultural and ethnic diversity. Social workers promote clients’ socially responsible self-determination. Social workers seek to enhance clients’ capacity and opportunity to change and to address their own needs.

      A2: This component of the Code of Ethics particularly resonates because I witness this in my field practicum placement at an immigration center. As for staff on the team, I the am the only one who is not of Hispanic ethnicity and has Spanish as a second language. I am grateful I can assist individuals with ethnicities not of my own so I can listen to their experiences - from their concerns to success stories. I think it is important for social workers to have field experience in places different from where they are used to so it can help them immerse into other populations. Therefore, social workers can empathize with the population's situations, rather than saying " I understand" without fully understanding. However, because white individuals encompass a great deal of power in our society today, I understand that I must be aware of my privileges as a white woman. I believe the Code of Ethics should highlight privileges individuals have and how to effectively address them with clientele that is in accordance with the Code. Such as a white individual using active listening techniques when listening to the plights of a Hispanic person / addressing their needs.

    3. Social workers should keep apprised of emerging technological developments that may be used in social work practice and how various ethical standards apply to them. Professional self-care is paramount for competent and ethical social work practice.

      A3- With the advancement of technology, it is much easier to contact people you need to get in touch with. For my self-care with technology, I will not be answering emails or calls regarding work outside of office hours. Additionally, I have Googled my own name and made sure that anything that was connected to me was appropriate and decent for the public eye. I don't have any personal social media, but if I did, I would make everything private and keep what I post about family/friends limited. When in the office, I make sure my phone is on do not disturb and that all of my clients information is confidential. I also ensure that when I am taking supplemental notes, I am using the code names for the clients instead of their real names.

    1. Social workers should provide services and represent themselves as competent only within the boundaries of their education, training, license, certification, consultation received, supervised experience, or other relevant professional experience.

      This Code of Ethics section directly relates to my field work experience at an immigration center. I am currently being trained to inform clients that I am new to the facility and can only provide services I am knowledgable about - here it is strictly referrals and scripted phone calls. I expect and am eager to learn more as I go, but this is what I have been asked to do so far.

    2. Social workers should avoid communication with clients using technology (such as social networking sites, online chat, e-mail, text messages, telephone, and video) for personal or non-work-related purposes.

      A3: This segment of the Code of Ethics suggests that social workers do not engage in any personal contact with their clientele over social media, email, nor phone. When both client and social worker connect with one another on social media, it acts as a bridge from client to social worker; the client is then aware of the social worker's private life. This should be avoided. Therefore, it is important that social workers establish effective boundaries. In the inSocialWork podcast, Allen Barskey mentions the same. For using social media as a social worker, I would keep my personal accounts private and use a profile photo that makes it difficult to decipher me. Yet, if I want to use my social media, I would use it as a one-way positive affirmations page where client comments and DMs are disabled.

    3. (c) Social workers should protect the confidentiality of all information obtained in the course of professional service, except for compelling professional reasons. The general expectation that social workers will keep information confidential does not apply when disclosure is necessary to prevent serious, foreseeable, and imminent harm to a client or others. In all instances, social workers should disclose the least amount of confidential information necessary to achieve the desired purpose; only information that is directly relevant to the purpose for which the disclosure is made should be revealed.

      **RE: Practicum- situation that was discussed in training: ** I am working with chaplains at the Spiritual Care Center at a hospital. It often involves acute trauma situations where I would be the only chaplain on duty. They gave an example of a situation where I am being called on to assist a patient who has been in a bad car accident and is suffering a serious injury, but is coherent enough to communicate. The patient happens to belong to my church. There is no one else on duty who can assist. I would use this code (in addition to code 1.07(a) which was already highlighted by another student) to inform the patient that I will keep their information confidential and not mention anything to the church. However, should the patient request support from their congregation (perhaps they'd like a visit from their minister or other members, or have member provide volunteer services such as meals when they return home...), the patient may provide their consent for me to let the congregation know. I would make sure I had written consent prior to informing fellow church members or anyone else. This Code of Ethics section ensures and respects the client's right to privacy and gives them self-determination in who they would like to inform about their injury.

    4. Social workers should respect clients’ right to privacy. Social workers should not solicit private information from or about clients except for compelling professional reasons. Once private information is shared, standards of confidentiality apply.

      This part of the NASW code of ethics is relevant to an experience I experienced in my field work because they emphasize how much it is important to respect the students/clients privacy. When attending a meeting you must consent and sign a paper that way it is documented who is attending the meeting in case there happens to be an issue.

    1. Reviewer #1 (Public review):

      Summary:

      This paper presents a data processing pipeline to discover causal interactions from time-lapse imaging data, and convicingly illustrates it on a challenging application for the analysis of tumor-on-chip ecosystem data.

      The core of the discovery module is the original tMIIC method of the authors, which is shown in supplementary material to compare favourably to two state-of-the-art methods on synthetic temporal data on a 15 nodes network.

      Strengths:

      This paper tackles the problem of learning causal interactions from temporal data which is an open problem in presence of latent variables.

      The core of the method tMIIC of the authors is nicely presented in connection to Granger-Schreiber causality and to the novel graphical conditions used to infer latent variables and based on a theorem about transfer entropy.

      tMIIC compares favourably to PC and PCMCI+ methods using different kernels on synthetic datasets generated from a network of 15 nodes.

      A full application to tumor-on-chip cellular ecosystems data including cancer cells, immune cells, cancer-associated fibroblasts, endothelial cells and anti cancer drugs, with convincing inference results with respect to both known and novel effects between those components and their contact.

      The code and dataset are available online for the reproducibility of the results.

      Weaknesses:

      The references to "state-of-the-art methods" concerning the inference of causal networks should be more precise by giving citations in the main text, and better discussed in general terms, both in the first section and in the section of presentation of CausalXtract. It is only in the legend of the figures of the supplementary material that we get information.

      Of course, comparison on our own synthetic datasets can always be criticized but this is rather due to the absence of common benchmark and I would recommend the authors to explicitly propose their datasets as benchmark to the community.

    2. Reviewer #2 (Public review):

      Summary:

      The authors propose a methodology to perform causal (temporal) discovery. The approach appears to be robust and is tested in the different scenarios: one related with live-cell imaging data, and another one using synthetic (mathematically defined) time series data. They compare the performance of their findings against another well-know method by using metrics like F-score, precision and recall,

      Strengths:

      Performance, robustness, the text is clear and concise, The authors provide the code to review.

      Weaknesses:

      One concern could be the applicability of the method in other areas like climate, economy. For those areas, public data are available and might be interesting to test how the method performs with this kind of data.

    1. With growth in the use of communication technology in various aspects of social work practice, social workers need to be aware of the unique challenges that may arise in relation to the maintenance of confidentiality, informed consent, professional boundaries, professional competence, record keeping, and other ethical considerations.

      This guideline overlaps with NASW Code of Ethics Section 1.07 (a) Respect for privacy;- Social workers must be vigilant about informing clients about the risks and benefits of technology such as telehealth platforms (c) Limits of Confidentiality- follow the HIPAA requirements; and (g) Dual relationships- social workers should maintain clear and appropriate professional boundaries and avoid dual relationships. An example of this would be to be careful about having certain social media accounts being public and not "friending" or accepting friend requests from a client. Further more, it imperative for a social worker to a) receive continuous tech training and stay up to date in this rapidly advancing environment, especially in the area of AI. It is also important to stay apprised of your organization's social media policies and any guideline updates by NASW. Can you think of any other sections of the Code of Ethics affected? Any other suggestions on how to uphold these values and standards?

    1. 42.6

      Error code description:

      Faulty water flow or flow detection on valve Y14 clean to control box.

      Condition for error detection:

      Valve Y14 clean to the control box is actuated, but the CDS sensor measures no/insufficient water flow.

      Error area:

      • Water supply,
      • Control cover,
      • Water flow detection

      Relevant causes/components

      • Water supply
      • Rinse pipe/control cover
      • Solenoid valve block with CDS sensor
      • Electrical connection to components
      • Incorrect CDS pulse saved
    2. 42.4

      Error code description:

      Faulty water flow or flow detection on valve Y4 to the care container.

      Condition for error detection:

      Valve Y4 to the care container is actuated, but the CDS sensor measures no/insufficient water flow.

      Error area:

      Water supply, Care container + supply line, Water flow detection (CDS)

      Relevant causes/components:

      • Water supply
      • Care container + supply line
      • Solenoid valve block with CDS sensor
      • Electrical connection to components
      • Incorrect CDS pulse saved (should be 1350)
    3. 42.5

      Error code description:

      Malfunction of Climate control.

      Conditions for error:.

      Insufficient air flow throuhgh Clima valve.

      Error area:

      Clima control

      Relevant causes;

      • Blocked air inlet
      • Electrical connection to Clima valve
      • 12 V dc supply failure from A10
      • Connection to X11 A10 input output pcb
    4. 42.2

      Error code description:

      Faulty water flow or flow detection via valve Y2 to control solenoid.

      Condition for error detection:

      Valve Y2 to the control box is actuated, but the CDS sensor measures no/insufficient water flow.

      Error area:

      • Water supply,
      • Control box cover,
      • Water flow detection (CDS)

      Relevant causes/components:

      • Water supply
      • Control nozzle / control cover
      • Solenoid valve block with CDS sensor
      • Electrical connection to components
      • Incorrect CDS pulse saved (should be 1350)
    5. 42.3

      Error code description:

      Faulty water flow or flow detection on valve Y3 to moistening.

      Condition for error detection:

      Valve Y3 for moistening is actuated, but the CDS sensor measures no/insufficient water flow.

      Error area: * Water supply, (Isolated?) * Moistening, (Nozzle blocked?) * Water flow detection. (CDS)

      Relevant causes/components: * Water supply * Moistening pipe/ moistening nozzle/plastic pipe * Solenoid valve block with CDS sensor * Electrical connection to components * Incorrect CDS pulse saved (Should be 1350)

    6. 42.1

      Error code description:

      Malfunction when filling the steam generator.

      Condition for error detection:

      Level electrode does not detect a full water level and the CDS sensor has also detected no/insufficient flow even though filling of the steam generator is anticipated.

      Error area:

      • Water supply,
      • Fill steam generator,
      • Water flow detection,
      • Level detection

      Relevant causes/components:

      • Water supply
      • Solenoid valve block with CDS sensor
      • Steam generator pressure hose
      • Electrical connection to components
      • Level electrode
      • Steam generator reference volume
      • Incorrect CDS pulse saved
    7. 28.4

      Error code description:

      The temperature limit of +2°C [36°F] of temperature sensor B5 in the steam generator has been un-dershot and only hot air operation is possible. This serves as frost protection as water routes can potentially be frozen.

      Condition for error detection:

      A temperature of less than +2°C [36°F] was measured by temperature sensor B5 in the steam generator and the unit can only be operated with all functions again when the measured temperature rises above +4°C [39°F].

      Error area: Ambient temperature too cold.

      Relevant causes/components:

      Ambient temperature of the unit below +2°C [36°F]

    8. 20

      Error code description:

      An incorrect temperature value is measured on one of the temperature sensors B1, B2, B4, B5, B9, B10, B15.

      (B1 -, Cooking cabinet) Service 20.1

      (B2 - Control box ) Service 20.2

      (B4 - Humidity) Service 20.4

      (B5 - Steam generator) Service 20.8

      (B9 - Cooking cabinet bottom - Floor unit) Service 20.16

      (B10 - iCombi Pro / iCombi Classic XS - Installation area) Service 20.32

      (B15 - iCombi Pro - Autodose) Service 20.64

      For details of :

      Condition for error detection: Error area: Relevant causes/components:

      Go to Troubleshooting manual, pages 13 to 20

    9. 20.8

      Error code description:

      An incorrect temperature value is measured at temperature sensor B5 steam generator.

      Condition for error detection

      The combi-steamer checks the temperature of temperature sensor B5 every second (in all modes).

      As soon as an implausible temperature value is measured on the temperature sensor B5 steam generator, a service error is triggered.

      Error area:

      Temperature sensor B5 steam generator, electrical connection

      Relevant causes/components:

      • Temperature sensor B5 steam generator defective.
      • Electrical connection to components faulty
    1. static inline unsigned int calc_slab_order(unsigned int size, unsigned int min_order, unsigned int max_order, unsigned int fract_leftover) { unsigned int order; for (order = min_order; order <= max_order; order++) { unsigned int slab_size = (unsigned int)PAGE_SIZE << order; unsigned int rem; rem = slab_size % size; if (rem <= slab_size / fract_leftover) break; } return order; }

      Code to choose how many pages to allocate for a new slab to minimize wasted space from the remainder

    1. Reviewer #1 (Public review):

      The manuscript introduces a valuable and innovative non-AI computational method for segmenting noisy grayscale images, with a particular focus on identifying immunostained potassium ion channel clusters.

      Strengths:

      (1) Applicability and Usability: The method is exceptionally accessible to biologists and researchers without advanced computational expertise. It offers a highly practical alternative to AI-based methods, which often require significant training data and computational resources, making it an excellent choice for a broader range of laboratories.

      (2) Proof-of-Concept: The manuscript provides compelling evidence through multiple experiments, showcasing the method's superior performance over traditional threshold-based techniques, particularly in noisy environments. The dual immuno-electron microscopy experiments further reinforce the robustness and effectiveness of this approach.

      (3) Clarity and Methodology: The manuscript is exceptionally well-written, with clear and concise descriptions that effectively highlight the method's advantages. The detailed figures and comprehensive references greatly enhance the manuscript's credibility and strongly support the claims made.

      Weaknesses:

      The manuscript does not include comparisons with more advanced segmentation techniques, particularly those based on artificial intelligence. While the authors have provided a rationale for this decision, including such comparisons could have enriched the discussion and offered additional insights. Additionally, there are some concerns about the computational demands of the method, especially when applied to large-scale or 3D image analysis. Although the authors have shared some computational data, further optimization or practical recommendations would enhance the method's utility. Initially, the manuscript lacked a data and code availability statement, which could have limited the method's accessibility. However, this issue has since been resolved, with the code now being made available to the community. Lastly, while the findings related to Kv4.2 in the thalamus are noteworthy, they might achieve even greater impact if presented in a separate paper. Nevertheless, the authors have chosen to retain these results within the current manuscript to strengthen the overall narrative and relevance.

      We appreciate that the authors have provided thorough explanations for their original choices. These justifications offer a clearer understanding of their approach and the reasons behind the presentation of the data.

      Conclusion:

      The revised manuscript successfully addresses the majority of the reviewers' concerns, presenting a strong case for the proposed segmentation method. The method's ease of use for non-experts in AI, combined with its proven effectiveness in proof-of-concept experiments, positions it as a valuable addition to the field. While the manuscript could benefit from incorporating comparisons with more advanced segmentation methods and offering a more detailed discussion of computational requirements, it remains a robust contribution. The decision to include the Kv4.2 findings within the paper is well-justified by the authors, though these results could potentially have an even greater impact if published separately.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript by David et al. describes a novel image segmentation method, implementing Local Moran's method, which determines whether the value of a datapoint or a pixel is randomly distributed among all values, in differentiating pixel clusters from the background noise. The study includes several proof-of-concept analyses to validate the power of the new approach, revealing that implementation of Local Moran's method in image segmentation is superior to threshold-based segmentation methods commonly used in analyzing confocal images in neuroanatomical studies.

      Strengths:

      Several proof-of-concept experiments are performed to confirm the sensitivity and validity of the proposed method. Using composed images with varying levels of background noise and analyzing them in parallel with the Local Moran's or a Threshold-Based Method (TBM), the study is able to compare these approaches directly and reveal their relative power in isolating clustered pixels.

      Similarly, dual immuno-electron microscopy was used to test the biological relevance of a colocalization that was revealed by Local Moran's segmentation approach on dual-fluorescent labeled tissue using immuno-markers of the axon terminal and a membrane-protein (Figure 5). The EM revealed that the two markers were present in terminals and their post-synaptic partners, respectively. This is a strong approach to verify the validity of the new approach for determining object-based colocalization in fluorescent microscopy.

      The methods section is clear in explaining the rationale and the steps of the new method (however, see the weaknesses section). Figures are appropriate and effective in illustrating the methods and the results of the study. The writing is clear; the references are appropriate and useful.

      Weaknesses:

      While the steps of the mathematical calculations to implement Local Moran's principles for analyzing high-resolution images are clearly written, the manuscript currently does not provide a computation tool that could facilitate easy implementation of the method by other researchers. Without a user-friendly tool, such as an ImageJ plugin or a code, the use of the method developed by David et al by other investigators may remain limited.

      This weakness is eliminated in the revision, which now provides the approach as a Matlab tool.

    1. Machine learning

      a part of AI that allows computers to adapt and learn from data/information. It improves them in doing tasks and understanding how to do something without needing a complex code or be specifically programmed to do so.

    1. Social workers should not engage in any dual or multiple relationships with supervisees in which there is a risk of exploitation of or potential harm to the supervisee, including dual relationships that may arise while using social networking sites or other electronic media.

      This part of the code of ethics raises concerns for how I would handle certain situations because it can be difficult to set boundaries with social media in some settings. As a social worker I believe that I would just make clear boundaries on social media use between the client and I but if that is breached I am not sure how I would handle that situation. Social media is for everyone so It can be difficult to make sure that all of your accounts/posts are closed off to the public.

    1. This can be incredibly frustrating for developers. In my own experience, the person in the worst position is the developer brought in to clean up another developer’s mess. It’s now your responsibility not only to convince management that they need to slow down to give you time to fix things (which will stall sales), but also to architect everything, orchestrate the rollout, and coordinate with sales goals and marketing. Oh, and let’s not forget actually producing the code to resolve the underlying issues. It can, at times, be an insurmountable problem. A developer in that situation has to wear a lot of hats. They need to be:● An advocate to management and by extension the C-suite.● A project manager.● A marketer to understand the features and desired functionality both now and down the road, to make selling the product more simple with defined pipelines and marketable features.● A decision maker, willing to make tough calls with regards to future compatibility of the services, how they interact, and what third-party tools they might need to integrate with to ensure the rectified code will be usable for the foreseeable future.Last but not least, they need to be a good developer to fix the mess. If you employ a developer who can manage all those responsibilities as well as their day job, I guarantee you aren’t paying them enough, or they’re already looking somewhere else.

      developer solving a bug

    1. Code for algorithms and figures is available at https://github.com/ronboger/conformal-protein-retrieval/.

      Thanks for providing the code! It helped me better understand some of the examples in the paper.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors of the manuscript entitled "A conserved fungal Knr4/Smi1 protein is vital for maintaining cell wall integrity and host plant pathogenesis" used a weighted gene co-expression network to identify Fusarium graminearum genes highly expressed during early symptomless infection of wheat. Based on its sequence and previous studies, authors selected FgKnr4 from the early symptomless Fusarium modules. The characterization of knockout strains revealed a role in morphogenesis, growth, cell wall stress tolerance, and virulence in F. graminearum and the phylogenetically distant fungus Zymoseptoria tritici.

      The methods are properly described and statistical analysis are reasonable so reproducibility is possible. The RNA-seq dataset is already published and the authors provided a repository with the code used to create the co-expression network. However, I have the following questions:

      • Why only use of high confidence transcripts maize to map the reads and not the full genome like Fusarium graminearum? I have never analyzed plant transcriptome.
      • The regular output of DESeq are TPMs, how did the authors obtain the FPKM used in the analysis?
      • Do the authors have a southern blot to prove the location of the insertion and number of insertions in Zymoseptoria tritici mutant and complemented strains?
      • Boxplots and bar graphs should have the same format. In Figures 5 B and F and supplementary figure 6.3 the authors showed the distribution of samples but it is lacking in figure 3 B and all bar graphs.
      • Line 247 FGRAMPH1_0T23707 should be FGRAMPH1_01T23707

      Referees cross-commenting

      I agree with reviewer 1, the order in which the figures are called in the text is confusing. Regardless of figures 5C-D I am no expert in the field therefore I can only say they look like they have not been edited.

      I agree with reviewer 1, data of DON mycotoxin production in infected issues is need it to support statement in line 272-273.

      I agree with Reviewer 2, the criteria to exclude genes from the final selection list should be explained.

      Significance

      The study showed, once again, that a weighted gene co-expression network is a great method to identify new genes of interest regardless of the organism or condition even if not very popular in the fungal pathogen field yet. The study proved that functions identified in a WGCN module from a pathogen have their opposite in the host module. The authors go beyond the theory and demonstrate the effect of the highest expressed gene during the early symptomless stage of infection in maize and wheat fungal pathogens.

      Fungal pathogen, RNA-seq, metabolic models, metabolism, comparative genomics

    1. Reviewer #2 (Public Review):

      In this manuscript, Yang et al. present a modeling framework to understand the pattern of response biases and variance observed in delayed-response orientation estimation tasks. They combine a series of modeling approaches to show that coupled sensory-memory networks are in a better position than single-area models to support experimentally observed delay-dependent response bias and variance in cardinal compared to oblique orientations. These errors can emerge from a population-code approach that implements efficient coding and Bayesian inference principles and is coupled to a memory module that introduces random maintenance errors. A biological implementation of such operation is found when coupling two neural network modules, a sensory module with connectivity inhomogeneities that reflect environment priors, and a memory module with strong homogeneous connectivity that sustains continuous ring attractor function. Comparison with single-network solutions that combine both connectivity inhomogeneities and memory attractors shows that two-area models can more easily reproduce the patterns of errors observed experimentally.

      Strengths:

      The model provides an integration of two modeling approaches to the computational bases of behavioral biases: one based on Bayesian and efficient coding principles, and one based on attractor dynamics. These two perspectives are not usually integrated consistently in existing studies, which this manuscript beautifully achieves. This is a conceptual advancement, especially because it brings together the perceptual and memory components of common laboratory tasks.

      The proposed two-area model provides a biologically plausible implementation of efficient coding and Bayesian inference principles, which interact seamlessly with a memory buffer to produce a complex pattern of delay-dependent response errors. No previous model had achieved this.

      Weaknesses:

      The correspondence between the various computational models is not clearly shown. It is not easy to see clearly this correspondence because network function is illustrated with different representations for different models. In particular, the Bayesian model of Figure 2 is illustrated with population responses for different stimuli and delays, while the attractor models of Figure 3 and 4 are illustrated with neuronal tuning curves but not population activity.

      The proposed model has stronger feedback than feedforward connections between the sensory and memory modules (J_f = 0.1 and J_b = 0.25). This is not the common assumption when thinking about hierarchical processing in the brain. The manuscript argues that error patterns remain similar as long as the product of J_f and J_b is constant, so it is unclear why the authors preferred this network example as opposed to one with J_b = 0.1 and J_f = 0.25.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper reports a number of somewhat disparate findings on a set of colorectal tumour and infiltrating T-cells. The main finding is a combined machine-learning tool which combines two previous state-of-the-art tools, MHC prediction, and T-cell binding prediction to predict immunogenicity. This is then applied to a small set of neoantigens and there is a small-scale validation of the prediciton at the end.

      Strengths:

      The prediction of immunogenic neoepitopes is an important and unresolved question.

      Weaknesses:

      The paper contains a lot of extraneous material not relevant to the main claim. Conversely, it lacks important detail on the major claim.

      (1) The analysis of T cell repertoire in Figure 2 seems irrelevant to the rest of the paper. As far as I could ascertain, this data is not used further.

      We appreciate the reviewer for their valuable feedback. We concur with the reviewer's observation that the analysis of the TCR repertoire in Figure 2 should be moved to the supplementary section. We have moved Figures 2B to 2F to Supplementary Figure 2.

      However, the analysis of TCR profiles is still presented in Figure 2, as it plays a pivotal role in the process of neoantigen selection. This is because the TCR profiles of eight (out of 28) patients were used for neoantigen prediction. We have added the following sentences to the results section to explain the importance of TCR profiling: “Furthermore, characterizing T cell receptors (TCRs) can complement efforts to predict immunogenicity.” (Results, Lines 311-312, Page 11)

      (2) The key claim of the paper rests on the performance of the ML algorithm combining NETMHC and pmtNET. In turn, this depends on the selection of peptides for training. I am unclear about how the negative peptides were selected. Are they peptides from the same databases as immunogenic petpides but randomised for MHC? It seems as though there will be a lot of overlap between the peptides used for testing the combined algorithm, and the peptides used for training MHCNet and pmtMHC. If this is so, and depending on the choice of negative peptides, it is surely expected that the tools perform better on immunogenic than on non-immunogenic peptides in Figure 3. I don't fully understand panel G, but there seems very little difference between the TCR ranking and the combined. Why does including the TCR ranking have such a deleterious effect on sensitivity?

      We thank the reviewer for their valuable feedback. We believe the reviewer implies 'MHCNet' as NetMHCpan and 'pmtMHC' as pMTnet tools. First, the negative peptides, which have been excluded from PRIME (1), were not randomized with MHC (HLA-I) but were randomized with TCR only. Secondly, the positive peptides selected for our combined algorithms are chosen from many databases such as 10X Genomics, McPAS, VDJdb, IEDB, and TBAdb, while MHCNet uses peptides from the IEDB database and pMTNet uses a totally different dataset from ours for training. Therefore, there is not much overlap between our training data and the training datasets for MHCNet and pMTNet. Thus, the better performance of our tool is not due to overlapping training datasets with these tools or the selection of negative peptides.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8).

      Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively. The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      (3) The key validation of the model is Figure 5. In 4 patients, the authors report that 6 out 21 neo-antigen peptides give interferon responses > 2 fold above background. Using NETMHC alone (I presume the tool was used to rank peptides according to binding to the respective HLAs in each individual, but this is not clear), identified 2; using the combined tool identified 4. I don't think this is significant by any measure. I don't understand the score shown in panel E but I don't think it alters the underlying statistic.

      Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5)

      In conclusion, the paper demonstrates that combining MHCNET and pmtMHC results in a modest increase in the ability to discriminate 'immunogenic' from 'non-immunogenic' peptide; however, the strength of this claim is difficult to evaluate without more knowledge about the negative peptides. The experimental validation of this approach in the context of CRC is not convincing.

      Reviewer #2 (Public Review):

      Summary:

      This paper introduces a novel approach for improving personalized cancer immunotherapy by integrating TCR profiling with traditional pHLA binding predictions, addressing the need for more precise neoantigen CRC patients. By analyzing TCR repertoires from tumor-infiltrating lymphocytes and applying machine learning algorithms, the authors developed a predictive model that outperforms conventional methods in specificity and sensitivity. The validation of the model through ELISpot assays confirmed its potential in identifying more effective neoantigens, highlighting the significance of combining TCR and pHLA data for advancing personalized immunotherapy strategies.

      Strengths:

      (1) Comprehensive Patient Data Collection: The study meticulously collected and analyzed clinical data from 27 CRC patients, ensuring a robust foundation for research findings. The detailed documentation of patient demographics, cancer stages, and pathology information enhances the study's credibility and potential applicability to broader patient populations.

      (2) The use of machine learning classifiers (RF, LR, XGB) and the combination of pHLA and pHLA-TCR binding predictions significantly enhance the model's accuracy in identifying immunogenic neoantigens, as evidenced by the high AUC values and improved sensitivity, NPV, and PPV.

      (3) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses. The calculation of ranking coverage scores and the comparative analysis between the combined model and the conventional NetMHCpan method demonstrate the superior performance of the combined approach in accurately ranking immunogenic neoantigens.

      (4) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses.

      Weaknesses:

      (1) While multiple advanced tools and algorithms are used, the study could benefit from a more detailed explanation of the rationale behind algorithm choice and parameter settings, ensuring reproducibility and transparency.

      We thank the reviewer for their comment. We have revised the explanation regarding the rationale behind algorithm choice and parameter settings as follows: “We examined three machine learning algorithms - Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGB) - for each feature type (pHLA binding, pHLA-TCR binding), as well as for combined features. Feature selection was tested using a k-fold cross-validation approach on the discovery dataset with 'k' set to 10-fold. This process splits the discovery dataset into 10 equal-sized folds, iteratively using 9 folds for training and 1 fold for validation. Model performance was evaluated using the ‘roc_auc’ (Receiver Operating Characteristic Area Under the Curve) metric, which measures the model's ability to distinguish between positive and negative peptides. The average of these scores provides a robust estimate of the model's performance and generalizability. The model with the highest ‘roc_auc’ average score, XGB, was chosen for all features.” (Method, lines 225-234, page 8).

      (2) While pHLA-TCR binding displayed higher specificity, its lower sensitivity compared to pHLA binding suggests a trade-off between the two measures. Optimizing the balance between sensitivity and specificity could be crucial for the practical application of these predictions in clinical settings.

      We appreciate the reviewer's suggestion. Due to the limited availability of patient blood samples and time constraints for validation, we have chosen to prioritize high specificity and positive predictive value to enhance the selection of neoantigens.

      (3) The experimental validation was performed on a limited number of patients (four), which might affect the generalizability of the findings. Increasing the number of patients for validation could provide a more comprehensive assessment of the model's performance.

      This has been addressed earlier. Here, we restate it as follows: Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      This study presents a new approach of combining two measurements (pHLA binding and pHLA-TCR binding) in order to refine predictions of which patient mutations are likely presented to and recognized by the immune system. Improving such predictions would play an important role in making personalized anti-cancer vaccinations more effective.

      Strengths:

      The study combines data from pre-existing tools pVACseq and pMTNet and applies them to a CRC patient population, which the authors show may improve the chance of identifying immunogenic, cancer-derived neoepitopes. Making the datasets collected publicly available would expand beyond the current datasets that typically describe caucasian patients.

      Weaknesses:

      It is unclear whether the pNetMHCpan and pMTNet tools used by the authors are entirely independent, as they appear to have been trained on overlapping datasets, which may explain their similar scores. The pHLA-TCR score seems to be driving the effects, but this not discussed in detail.

      The HLA percentile from NetMHCpan and the TCR ranking from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides.Additionally, we partitioned the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%), ensuring no overlap between the training and testing datasets.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8). We also included the dataset construction workflow in Supplementary Figure 1.

      Due to sample constraints, the authors were only able to do a limited amount of experimental validation to support their model; this raises questions as to how generalizable the presented results are. It would be desirable to use statistical thresholds to justify cutoffs in ELISPOT data.

      We chose a cutoff of 2 for ELISPOT, following the recommendation of the study by Moodie et al. (2). The study provides standardized cutoffs for defining positive responses in ELISPOT assays. It presents revised criteria based on a comprehensive analysis of data from multiple studies, aiming to improve the precision and consistency of immune response measurements across various applications.

      Some of the TCR repertoire metrics presented in Figure 2 are incorrectly described as independent variables and do not meaningfully contribute to the paper. The TCR repertoires may have benefitted from deeper sequencing coverage, as many TCRs appear to be supported only by a single read.

      We appreciate the reviewer’s feedback. We have moved Figures 2B through 2F to Supplementary Figure 2. We agree with the reviewer that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. The TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite the variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Please open source the raw and processed data, code, and software output (NetMHCpan, pMTnet), which are important to verify the results.

      NetMHCpan and pMTNet are publicly available software tools (3, 4). In our GitHub repository, we have included links to the GitHub repositories for NetMHCpan and pMTNet (https://github.com/QuynhPham1220/Combined-model).

      (2) Comparison with more state-of-the-art neoantigen prediction models could provide a more comprehensive view of the combined model's performance relative to the current field.

      To further evaluate our model, we gathered additional public data and assessed its effectiveness in comparison to other models. We utilized immunogenic peptides from databases such as NEPdb (5), NeoPeptide (6), dbPepneo (7), Tantigen (8), and TSNAdb (9), ensuring there was no overlap with the datasets used for training and validation. For non-immunogenic peptides, we used data from 10X Genomics Chromium Single Cell Immune Profiling (10-13).The findings indicate that the combined model from pMTNet and NetMHCpan outperforms NetTCR tool (14). To address the reviewer's inquiry, we have incorporated these results in Supplementary Table 6.

      (3) While the combined model shows a positive overall rank coverage score, indicating improved ranking accuracy, the scores are relatively low. Further refinement of the model or the inclusion of additional predictive features might enhance the ranking accuracy.

      We appreciate the reviewer’s suggestion. The RankCoverageScore provides an objective evaluation of the rank results derived from the final peptide list generated by the two tools. The combined model achieved a higher RankCoverageScore than pMTNet, indicating its superior ability to identify immunogenic peptides compared to existing in silico tools. In order to provide a more comprehensive assessment, we included an additional four validated samples to recalculate the rank coverage score. The results demonstrate a notable difference between NetMHCpan and the Combined model (-0.37 and 0.04, respectively). We have incorporated these findings into Supplementary Figure 6 to address the reviewer's question. Additionally, we have modified Figure 5E to present a simplified demonstration of the superior performance of the combined model compared to NetMHCpan.

      (4) Collect more public data and fine-tune the model. Then you will get a SOTA model for neoantigen selection. I strongly recommend you write Python scripts and open source.

      We thank the reviewer for their feedback. We have made the raw and processed data, as well as the model, available on GitHub. Additionally, we have gathered more public data and conducted evaluations to assess its efficiency compared to other methods. You can find the repository here: https://github.com/QuynhPham1220/Combined-model.

      Reviewer #3 (Recommendations For The Authors):

      The Methods section seems good, though HLA calling is more accurate using arcasHLA than OptiType. This would be difficult to correct as OptiType is integrated into pVACtools.

      We chose Optitype for its exceptional accuracy, surpassing 99%, in identifying HLA-I alleles from RNA-Seq data. This decision was informed by a recent extensive benchmarking study that evaluated its performance against "gold-standard" HLA genotyping data, as described in the study by Li et al.(15). Furthermore, we have tested two tools using the same RNA-Seq data from FFPE samples. The allele calling accuracy of Optitype was found to be superior to that of Acras-HLA. To address the reviewer's question, we have included these results in Supplementary Table 2, along with the reference to this decision (Method, line 200, page 07).

      I am not sufficiently expert in machine learning to assess this part of the methods.<br /> TCR beta repertoire analysis of biopsy is highly variable; though my expertise lies largely in sequencing using the 10X genomics platform, typically one sees multiple RNAs per cell. Seeing the majority of TCRs supported by only a single read suggests either problems with RNA capture (particularly in this case where the recovered RNA was split to allow both RNAseq and targeted TCR seq) or that the TCR library was not sequenced deeply enough. I'd like to have seen rarefaction plots of TCR repertoire diversity vs the number of reads to ensure that sufficiently deep sequencing was performed.

      We appreciate the suggestions provided by the reviewer. We agree that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. In addition, the TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them. We have already added the rarefaction plots of TCR repertoire diversity versus the number of reads in Figure 2C. These have been added to the main text (lines 329-335).

      In order to support the authors' conclusions that MSI-H tumors have fewer TCR clonotypes than MSS tumors (Figure S2a) I would have liked to see Figure 2a annotated so that it was easy to distinguish which patient was in which group, as well as the rarefaction plots suggested above, to be sure that the difference represented a real difference between samples and not technical variance (which might occur due to only 4 samples being in the MSI-H group).

      We thank the reviewer for their recommendation. Indeed, it's worth noting that the number of MSI-H tumors is fewer than the MSS groups, which is consistent with the distribution observed in colorectal cancer, typically around 15%. This distribution pattern aligns with findings from several previous studies, as highlighted in these studies (16, 17). To provide further clarification on this point, we have included rarefaction plots illustrating TCR repertoire diversity versus the number of reads in Supplementary Figure 3 (line 339). Additionally, MSI-H and MSS samples have been appropriately labeled for clarity.

      The authors write: "in accordance with prior investigations, we identified an inverse relationship between TCR clonality and the Shannon index (Supplementary Figure S1)" >> Shannon index is measure of TCR clonality, not an independent variable. The authors may have meant TCR repertoire richness (the absolute number of TCRs), and the Shannon index (a measure of how many unique TCRs are present in the index).

      We thank the reviewer for their comment regarding the correlation between the number of TCRs and the Shannon index. We have revised the figure to illustrate the relationship between the number of TCRs and the Shannon index, and we have relocated it to Figure 2B.

      The authors continue: "As anticipated, we identified only 58 distinct V (Figure 2C) and 13 distinct J segments (Figure 2D), that collectively generated 184,396 clones across the 27 tumor tissue samples, underscoring the conservation of these segments (Figure 2C & D)" >> it is not clear to me what point the authors are making: it is well known that TCR V and J genes are largely shared between Caucasian populations (https://pubmed.ncbi.nlm.nih.gov/10810226/), and though IMGT lists additional forms of these genes, many are quite rare and are typically not included in the reference sequences used by repertoire analysis software. I would clarify the language in this section to avoid the impression that patient repertoires are only using a restricted set of J genes.

      We thank for the reviewer’s feedback. We have revised the sentence as follows: " As anticipated, we identified 59 distinct V segments (Supplementary Figure 2C) and 13 distinct J segments (Supplementary Figure 2D), collectively sharing 185,627 clones across the 28 tumor tissue samples. This underscores the conservation of these segments (Supplementary Figure 2C & D)” (Result, lines 354-356, page 12)

      As a result I would suggest moving Figure 2 with the exception of 2A into the supplementals - I would have been more interested in a plot showing the distribution of TCRs by frequency, i.e. how what proportion of clones are hyperexpanded, moderately expanded etc. This would be a better measure of the likely immune responses.

      We thank the reviewer for their comment. With the exception of Figure 2A, we have relocated Figures 2B through 2F to Supplementary Figure 2.

      The authors write "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic peptides (Supplementary Table 3)" >> The authors mean to refer to Table S4.

      We appreciate the reviewer's feedback. Here's the revised sentence: "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic pHLA-TCR complexes (Supplementary Table 5)” (lines 368-370).

      The authors write "As anticipated, our analysis revealed a significantly higher prevalence of peptides with robust HLA binding (percentile rank < 2%) among immunogenic peptides in contrast to their non-immunogenic counterparts (Figure 3A & B, p< 0.00001)" >> this is not surprising, as tools such as NetMHCpan are trained on databases of immunogenic peptides, and thus it is likely that these aren't independent measures (in https://academic.oup.com/nar/article/48/W1/W449/5837056 the authors state that "The training data have been vastly extended by accumulating MHC BA and EL data from the public domain. In particular, EL data were extended to include MA data"). In the pMTNet paper it is stated that pMNet encoded pMHC information using "the exact data that were used to train the netMHCpan model" >> While I am not sufficiently expert to review details on machine learning training models, it would seem that the pHLA scores from NetMHCpan and pMTNet may not be independent, which would explain the concordance in scores that the authors describe in Figures 3B and 3D. I would invite the authors to comment on this.

      The HLA percentiles from NetMHCpan and TCR rankings from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides. NetMHCpan is trained to predict peptide-MHC class I interactions by integrating binding affinity and MS eluted ligand data, using a second output neuron in the NNAlign approach. This setup produces scores for both binding affinity and ligand elution. In contrast, pMTNet predicts TCR binding specificity of class I pMHCs through three steps:

      (1) Training a numeric embedding of pMHCs (class I only) to numerically represent protein sequences of antigens and MHCs.

      (2) Training an embedding of TCR sequences using stacked auto-encoders to numerically encode TCR sequence text strings.

      (3) Creating a deep neural network combining these two embeddings to integrate knowledge from TCRs, antigenic peptide sequences, and MHC alleles. Fine-tuning is employed to finalize the prediction model for TCR-pMHC pairing.

      Therefore, pHLA scores from NetMHCpan and pMTNet are independent. Furthermore, Figures 3B and 3D do not show concordance in scores, as there was no equivalence in the percentage of immunogenic and non-immunogenic peptides in the two groups (≥2 HLA percentile and ≥2 TCR percentile).

      Many of the authors of this paper were also authors of the epiTCR paper, would this not have been a better choice of tool for assessing pHLA-TCR binding than pMTNet?

      When we started this project, EpiTCR had not been completed. Therefore, we chose pMTNet, which had demonstrated good performance and high accuracy at that time. The validated performance of EpiTCR is an ongoing project that will implement immunogenic assays (ELISpot and single-cell sequencing) to assess the prediction and ranking of neoantigens. This study is also mentioned in the discussion: "Moreover, to improve the accuracy and effectiveness of the machine learning model in predicting and ranking neoantigens, we have developed an in-house tool called EpiTCR. This tool will utilize immunogenic assays, such as ELISpot and single-cell sequencing, for validation." (lines 532-535).

      In Figure 3G it would appear that the pHLA-TCR score is driving the interaction, could the authors comment on this?

      The authors sincerely appreciate the reviewer for their valuable feedback. Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively.

      The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      In Figure 4A I would invite the authors to comment on how they chose the sample sizes they did for the discovery and validation datasets: the numbers seem rather random. I would question whether a training dataset in which 20% of the peptides are immunogenic accurately represents the case in patients, where I believe immunogenic peptides are less frequent (as in Figure 5).

      We aimed to maximize the number of experimentally validated immunogenic peptides, including those from viruses, with only a small percentage from tumors available for training. This limitation is inherent in the field. However, our ultimate objective is to develop a tool capable of accurately predicting peptide immunogenicity irrespective of their source. Therefore, the current percentage of immunogenic peptides may not accurately reflect real-world patient cases, but this is not crucial to our development goals.

      For Figure 5C I would invite the authors to consider adding a statistical test to justify the cutoff at 2fold enrichments.

      Thank you for your feedback. Instead of conducting a statistical test, we have implemented standardized cutoffs as defined in the cited study (2). This research introduces refined criteria for identifying positive responses in ELISPOT assays through a comprehensive analysis of data from multiple studies. These criteria aim to improve the accuracy and consistency of immune response measurements across various applications. The reference to this study has been properly incorporated into the manuscript (Method, line 281, page 10).

      Minor points:

      "paired white blood cells" >> use "paired Peripheral Blood Mononuclear Cells".

      We appreciate the reviewer for the feedback. We agree with the reviewer's observation. The sentence has been revised as follows: "Initially, DNA sequencing of tumor tissues and paired Peripheral Blood Mononuclear Cells identifies cancer-associated genomic mutations. RNA sequencing then determines the patient's HLA-I allele profile and the gene expression levels of mutated genes." (Introduction, lines 55-58, page 2).

      "while RNA sequencing determines the patient's HLA-I allele profile and gene expression levels of mutated genes." >> RNA sequencing covers both the mutant and reference form of the gene, allowing assessment of variant allele frequency.

      "the current approach's impact on patient outcomes remains limited due to the scarcity of effective immunogenic neoantigens identified for each patient" >> Some clearer language here would have been preferred as different tumor types have different mutational loads

      We thank the reviewer for their valuable feedback. We agree with the reviewer's observation. The passage has been revised accordingly: “The current approach's impact on patient outcomes remains limited due to the scarcity of mutations in cancer patients that lead to effective immunogenic neoantigens.” (Introduction, lines 62-64, page 3).

      References

      (1) J. Schmidt et al., Prediction of neo-epitope immunogenicity reveals TCR recognition determinants and provides insight into immunoediting. Cell Rep Med 2, 100194 (2021).

      (2) Z. Moodie et al., Response definition criteria for ELISPOT assays revisited. Cancer Immunol Immunother 59, 1489-1501 (2010).

      (3) V. Jurtz et al., NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol 199, 3360-3368 (2017).

      (4) T. Lu et al., Deep learning-based prediction of the T cell receptor-antigen binding specificity. Nat Mach Intell 3, 864-875 (2021).

      (5) J. Xia et al., NEPdb: A Database of T-Cell Experimentally-Validated Neoantigens and Pan-Cancer Predicted Neoepitopes for Cancer Immunotherapy. Front Immunol 12, 644637 (2021).

      (6) W. J. Zhou et al., NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens. Database (Oxford) 2019 (2019).

      (7) X. Tan et al., dbPepNeo: a manually curated database for human tumor neoantigen peptides. Database (Oxford) 2020 (2020).

      (8) G. Zhang, L. Chitkushev, L. R. Olsen, D. B. Keskin, V. Brusic, TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC Bioinformatics 22, 40 (2021).

      (9) J. Wu et al., TSNAdb: A Database for Tumor-specific Neoantigens from Immunogenomics Data Analysis. Genomics Proteomics Bioinformatics 16, 276-282 (2018).

      (10) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-1-1-standard-3-0-2.

      (11) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-2-1-standard-3-0-2.

      (12) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-3-1-standard-3-0-2.

      (13) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-4-1-standard-3-0-2.

      (14) A. Montemurro et al., NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRalpha and beta sequence data. Commun Biol 4, 1060 (2021).

      (15) G. Li et al., Splicing neoantigen discovery with SNAF reveals shared targets for cancer immunotherapy. Sci Transl Med 16, eade2886 (2024).

      (16) Z. Gatalica, S. Vranic, J. Xiu, J. Swensen, S. Reddy, High microsatellite instability (MSI-H) colorectal carcinoma: a brief review of predictive biomarkers in the era of personalized medicine. Fam Cancer 15, 405-412 (2016).

      (17) N. Mulet-Margalef et al., Challenges and Therapeutic Opportunities in the dMMR/MSI-H Colorectal Cancer Landscape. Cancers (Basel) 15 (2023).

    1. Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      Strengths:

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses:

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows.

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-and-maximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

    2. Reviewer #3 (Public Review):

      Summary:

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%.

      Strengths:

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field.

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies.

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs.

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable.

      Weaknesses:

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior. Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here?

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training? Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN).

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird.

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well.

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data.

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method.

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one.

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy.

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method.

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy.

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate.

    3. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets. 

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure. 

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. 

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song. 

      We thank the reviewer for this suggestion, and plan to include a comparison of the triplet loss embedding space to the VAE space for song similarity comparisons in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field. 

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding. 

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver. 

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods. 

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.  

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches: 

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions.

      (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods. 

      We recognize the similarities between these approaches, and plan to include a comparison of triplet loss embeddings compared with MMD and VAE embeddings compared with MMD and EMD in the revised manuscript. Thank you for this suggestion.  

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability. 

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term. 

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field. 

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies. 

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs. 

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior. 

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We will revise the original manuscript to make this clearer. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      There appears to be some misunderstanding regarding our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and provide some additional explanation in the manuscript. First, we are not training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128-dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of twodimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a self-supervised learning task, as it does require syllable labels to generate the triplets. A common self-supervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we plan to include a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript.  

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, there appears to be some misunderstanding of our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful lowdimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN). 

      We did compare multiple methods for syllable segmentation (WhisperSeg,  TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.  

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird. 

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in testing. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1seg scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and nonstationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was ever any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see supplemental figure 2b), but still very high precision scores (supplemental figure 2a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Fig 3c) or syllable duration entropy (supplemental figure 7a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) and be willing to dedicate the time and resources to manually labeling a subset of recordings from each of their birds, we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.  

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well. 

      We appreciate the reviewer’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings share with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets.  

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data. 

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology and is outside the scope of our current efforts.  

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method. 

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human annotations for each individual bird being analyzed using AVN was never the goal of our pipeline, would require significant changes to AVN’s design, and is outside the scope of this manuscript.  

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one. 

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#SyllableRepetitions

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy. 

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We will expand our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (supplementary figure 2). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments nonvocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label, but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another. 

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and supplemental figure 4b&e. We will also expand our discussion of these different types of errors in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy. 

      We apologize for not making this distinction sufficiently clear in the manuscript and will add additional explanation to the main text to make the reasoning more apparent. We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space.  

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate. 

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. Anecdotally, we observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, but we will add an additional supplementary figure to the revised manuscript showing this.

    1. To speak with authorit y studentwriters have not only to speak in another's voice but throughanother's " code"; and they not onl y ha ve to do this , the y haveto speak in the voice and through the codes of those of us withpower and wisdom ;

      The writer argued that students don't have a choice at the beginning and need more writing space. Unfortunately, they have to follow people with high power, which causes their voices to be lost.

    1. Reviewer #1 (Public Review):

      Summary:

      Building upon their famous tool for the deconvolution of human transcriptomics data (EPIC), Gabriel et al. implemented a new methodology for the quantification of the cellular composition of samples profiled with Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq). To build a signature for ATAC-seq deconvolution, they first created a compendium of ATAC-seq data and derived chromatin accessibility marker peaks and reference profiles for 12 cell types, encompassing immune cells, endothelial cells, and fibroblasts. Then, they coupled this novel signature with the EPIC deconvolution framework based on constrained least-square regression to derive a dedicated tool called EPIC-ATAC. The method was then assessed using real and pseudo-bulk ATAC-seq data from human peripheral blood mononuclear cells (PBMC) and, finally, applied to ATAC-seq data from breast cancer tumors to show it accurately quantifies their immune contexture.

      Strengths:

      Overall, the work is of very high quality. The proposed tool is timely; its implementation, characterization, and validation are based on rigorous methodologies and results in robust estimates. The newly-generated, validation data and the code are publicly available and well-documented. Therefore, I believe this work and the associated resources will greatly benefit the scientific community.

      Weaknesses:

      In the benchmarking analysis, EPIC-ATAC was compared also to deconvolution methods that were originally developed for transcriptomics and not for ATAC-seq data. However, the authors described in detail the specific settings used to analyze this different data modality as robustly as possible, and they discussed possible limitations and ideas for future improvement.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I praise the authors for their impressive work; all my major concerns have been addressed. I believe the revised article is much stronger and will surely raise the interest of a broad readership.

      I list in the following a few minor points that the authors might want to consider when finalizing the work:

      - It might be helpful for the reader to know if EPIC-ATAC can also be used on tissues different from tumors and PBMC/blood, and how (i.e. which reference should they use). 

      We thank the reviewer for this comment. In the discussion, we have clarified this point as follows:

      “Although not tested in this work, the TME marker peaks and profiles could be used on normal tissues where immune cells are expected to be present. In cases where specific cell types are expected in a sample but are not part of our list of reference profiles (e.g., neuronal cells in brain tumors or tissues other than human PBMCs or tumor samples), custom marker peaks and reference profiles can be provided to EPIC-ATAC to perform cell-type deconvolution. To this end, users should select markers that are cell-type specific, which could be identified using pairwise differential analysis performed on ATAC-Seq data from sorted cells from the populations of interest, following the approach developed in this work (Figure 1, see Code availability).”

      - In Fig 2 the numbers are hard to read as they are too close or overlapping.We have updated Figure 2 to avoid the overlap between the numbers.

      - In Fig 5 I see some squared around the sub-panels, but it might be due to the PDF compression. 

      We do not see these squares on the Figure 5 but have seen such squares on Figure 1. We have checked that all the PDF files uploaded on the eLife submission system do not contain the previously mentioned squares.

      - In the Introduction, some "deconvolution concepts" are introduced (e.g. Line 63-65), but not explained/illustrated. It might be helpful to refer to a "didactic" review. 

      We have added two references to these sentences in the introduction:

      “As described in more details elsewhere (Avila Cobos et al., 2018; Sturm et al., 2019), many of these tools model bulk data as a mixture of reference profiles either coming from purified cell populations or inferred from single-cell genomic data for each cell type.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Amason et al. investigated the formation of granulomas in response to Chromobacterium violaceum infection, aiming to uncover the cellular mechanisms governing the granuloma response. They identify spatiotemporal gene expression of chemokines and receptors associated with the formation and clearance of granulomas, with a specific focus on those involved in immune trafficking. By analyzing the presence or absence of chemokine/receptor RNA expression, they infer the importance of immune cells in resolving infection. Despite observing increased expression of neutrophil-recruiting chemokines, treatment with reparixin (an inhibitor of CXCR1 and CXCR2) did not inhibit neutrophil recruitment during infection. Focusing on monocyte trafficking, they found that CCR2 knockout mice infected with C. violaceum were unable to form granulomas, ultimately succumbing to infection.

      The spatial transcriptomics data presented in the figures could be considered a valuable resource if shared, with the potential for improved and clarified analyses. The primary conclusion of the paper, that C. violaceum infection in the liver cannot be contained without macrophages, would benefit from clarification.

      We thank the reviewer for their time and effort in evaluating our manuscript.

      While the spatial transcriptomic data generated in the figures are interesting and valuable, they could benefit from additional information. The manual selection of regions of granulomas for analysis could use additional context - was the rest of the liver not sequenced, or excluded for other reasons? Including a healthy liver in the analysis could serve as a control for any lasting effects at the final time point of 21 days.

      We revised the text in the methods section to include additional information about manual selection of regions. The entire tissue section was sequenced, but using H&E as a guide, we manually selected each representative lesion and a surrounding layer of healthy hepatocytes at each timepoint. We agree that an uninfected control could be useful, however we did not include an uninfected mouse in the experiment because we were most interested in the cells that make up the granuloma, not hepatocytes outside the lesion. Additionally, we find that in the 21 DPI timepoint the surrounding hepatocytes appear to have returned to a homeostatic transcriptional state; at 21 DPI the majority of mice have undetectable CFU burdens.

      Providing more context for the scalebars throughout the spatial analyses, such as whether the data are raw counts or normalized based on the number of reads per spatial spot, would be helpful for interpretation, as changes in expression could signal changes in the numbers of cells or changes in the gene expression of cells.

      The scalebars for the SpatialFeaturePlots display the normalized gene expression values. The data are normalized based on the number of reads per spatial spot, using the sctransform method published in (Hafemeister & Satija, 2019). We agree that the changes in expression could result from changes in cell numbers and/or changes in gene expression on a per cell basis. However, the sctransform method is designed to preserve biological variation while minimizing technical effects observed in transcriptomics platforms. Regardless of the heterogeneity of sequencing depth, it is clear from these plots that gene expression changes dynamically over time and space, which was the focus of our analysis. We have updated the figure legends to clarify scalebar units, and revised the methods section. 

      In Figure 4, qualitative measurements are valuable, but having an idea of the raw data for a few of the pursued chemokines/receptors would aid interpretation

      All of the SpatialFeaturePlots utilized to generate Figure 4 have been included in the manuscript, either in the main figures or in the supplemental figures. For example, the SpatialFeaturePlots of Cxcl4, Cxcl9, and Cxcl10 are all in Figure 4 – figure supplement 1.

      In Figure 4 it would also be beneficial to clarify whether the reported values are across all clusters and consider focusing on clusters with the greatest change in expression.

      Figure 4 summarizes the expression of each gene at each timepoint for the entire selected area, independently of cluster identity. Different clusters do show variability in the relative change in expression. To better show these data, we have included an additional graphic that summarizes the top twenty upregulated genes for each cluster, many of which include chemokines (new Table 4). The average log2FC values for each of these genes can be found in Table 4 – source data 1.   

      Figures 5E and F would benefit from clarification regarding the x-axis units and whether the expression levels are summed across all clusters for each time point

      Figures 5E and 5F display the normalized gene expression values for all spots (independent of cluster identity) at each timepoint. We have updated the figure legend to reflect this clarification.

      Additionally, information on the sequencing depth of the samples would be helpful, particularly as shallow sequencing of RNA can result in poor capture of low-expression transcripts.

      We agree with the reviewer that sequencing depth is an additional factor to take into consideration. We have included an additional supplemental figure (Figure 1 – figure supplement 1A-B) to display raw counts spatially at the various timepoints, and within each cluster.

      Regarding the conclusion of the essentiality of macrophages in granuloma formation, it may be prudent to further investigate the role of macrophages versus CCR2. Consideration of experiments deleting macrophages directly, instead of CCR2, could provide more definitive evidence of the necessity of macrophage migration in containing infections.

      While CCR2 is expressed on a number of other cells besides monocytes, it is well-documented that loss of CCR2 results in accumulation of monocytes in the bone marrow and a significant reduction in the blood-monocyte population. As a result, monocytes are not recruited to the site of infection in numerous prior publications in the field; we confirm this as shown by flow cytometry and IHC. Nonetheless, future studies will aim to rescue Ccr2–/– mice via adoptive transfer of monocytes to further show that monocyte-derived macrophages are essential for defense against infection. We also intend to perform clodronate depletion experiments at various timepoints, however, clodronate will also deplete Kupffer cells and has off-target effects on neutrophils. Overall, the established importance of CCR2 for monocyte egress from the bone marrow and our observation that the macrophage ring fails to form give us sufficient confidence to conclude that monocyte-derived macrophages are essential for this innate granuloma.

      Analyzing total cell counts in the liver after infection could provide insight into whether the decrease in the fraction of macrophages is due to decreased numbers or infiltration of other cell types...

      Our flow data suggest that the decrease in macrophages in Ccr2–/– mice is due to both a decrease in macrophage number and an increase in the infiltration of other cell types (namely neutrophils). To better illustrate this, we now include an additional quantification of the total cell counts in the liver and spleen (new Figure 6 – figure supplement 1), which supports our conclusion that Ccr2–/– mice have a defect in granuloma macrophage numbers. We have also repeated the experiment to reach sufficient numbers to perform statistical analysis (revised Figure 6F–K).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Amason et al employ spatial transcriptomics and intervention studies to probe the spatial and temporal dynamics of chemokines and their receptors and their influence on cellular dynamics in C. violaceum granulomas. As a result of their spatial transcriptomic analysis, the authors narrow in on the contribution of neutrophil- and monocyte-recruiting pathways to host response. This results in the observation that monocyte recruitment is critical for granuloma formation and infection control, while neutrophil recruitment via CXCR2 may be dispensable.

      We thank the reviewer for their thoughtful comments and suggestions.

      Strengths:

      Since C. violaceum is a self-limiting granulomatous infection, it makes an excellent case study for 'successful' granulomatous inflammation. This stands in contrast to chronic, unproductive granulomas that can occur during M. tuberculosis infection, sarcoidosis, and other granulomatous conditions, infectious or otherwise. Given the short duration of C. violaceum infection, this study specifically highlights the importance of innate immune responses in granulomas.

      Another strength of this study is the temporal analysis. This proves to be important when considering the spatial distribution and timing of cellular recruitment. For example, the authors observe that the intensity and distribution of neutrophil- and monocyte-recruiting chemokines vary substantially across infection time and correlate well with their previous study of cellular dynamics in C. violaceum granulomas.

      The intervention studies done in the last part of the paper bolster the relevance of the authors' focus on chemokines. The authors provide important negative data demonstrating the null effect of CXCR1/2 inhibition on neutrophil recruitment during C. violaceum infection. That said, the authors' difficulty with solubilizing reparixin in PBS is an important technical consideration given the negative result...

      We agree with the reviewer, and the limited solubility of reparixin and other chemokine-receptor inhibitors is a major caveat of this study and others in the field. In future studies, there are several other inhibitors that could be used to further assess the role of CXCR1/2.

      On the other hand, monocyte recruitment via CCR2 proves to be indispensable for granuloma formation and infection control. I would hesitate to agree with the authors' interpretation that their data proves macrophages are serving as a physical barrier from the uninvolved liver. It is possible and likely that they are contributing to bacterial control through direct immunological activity and not simply as a structural barrier.

      We agree that macrophages do not form a physical or structural barrier, a word that implies epithelial-like function. Instead, we agree that macrophages mostly act immunologically. We revised the text to remove the term barrier.

      Weaknesses:

      There are several shortcomings that limit the impact of this study. The first is that the cohort size is very limited. While the transcriptomic data is rich, the authors analyze just one tissue from one animal per time point. This assumes that the selected individual will have a representative lesion and prevents any analysis of inter-individual variability.

      Granulomas in other infectious diseases, such as schistosomiasis and tuberculosis, are very heterogeneous, both between and within individuals. It will be difficult to assert how broadly generalizable the transcriptomic features are to other C. violaceum granulomas...

      We thank the reviewers for highlighting this key difference between granulomas in other infectious diseases, and granulomas induced by C. violaceum. Based on many prior experiments, we observe that C. violaceum-induced granulomas are very reproducible between and within individuals (highlighted in our previous publication). As this is a major advantage of this model system, we chose specific timepoints based on key events that consistently occur in the majority of lesions assessed at each timepoint, allowing us to be confident in the selection of representative granulomas. However, it is worth noting that granulomas within an individual mouse are seeded and resolved somewhat asynchronously. This did indeed affect our spatial transcriptomic data, as the 7 DPI timepoint was not histologically representative of a typical 7 DPI granuloma. Therefore, we excluded the 7 DPI timepoint from our analyses.

      Furthermore, this undermines any opportunity for statistical testing of features between time points, limiting the potential value of the temporal data.

      We agree with the reviewer that there is much more characterization and quantification that can be done. As demonstrated by the abundance of spatial and temporal data for the chemokine family alone, the spatial transcriptomics dataset is rich and will likely supply us with many years of analyses and investigations. Our current approach is to use the spatial transcriptomics dataset as a hypothesis-generating tool, followed by in vivo studies that seek to uncover physiological relevance for our observations. In the current paper, the strength of the spatial transcriptomic data for CCL2, CCL7 and their receptor CCR2 prompted us to study Ccr2–/– mice. These mice then prove the relevance of the spatial transcriptomic data. In regard to conclusions about temporal changes in chemokine expression, in this manuscript we do not make conclusions that CCL2 is important at one timepoint but not another. We are characterizing the broad temporal trends of expression in order to cast a broad net to inform future in vivo studies. There is much work for us to do to explore all the induced chemokines and their receptors.

      Another caveat to these data is the limited or incompletely informative data analysis. The authors use Visium in a more targeted manner to interrogate certain chemokines and cytokines. While this is a great biological avenue, it would be beneficial to see more general analyses considering Visum captures the entire transcriptome. Some important questions that are left unanswered from this study are:

      What major genes defined each spatial cluster?...

      The initial characterization of each spatial cluster was performed in Harvest et al., 2023. In brief, we used a mixture of published single-cell sequencing data, histological-based parameters, and ImmGen to define each cluster. We have not re-stated those methods in the current manuscript, but instead reference our prior paper.

      What were the top differentially expressed genes across time points of infection?...

      Though the top differentially expressed genes for each cluster can be informative in some situations, we chose a more targeted approach because of the obvious importance of chemokines. Nonetheless, we have included an additional graphic that summarizes the top twenty upregulated genes for each cluster (new Table 4). The average log2FC values for each of these genes can be found in Table 4 – source data 1.  

      Did the authors choose to focus on chemokines/receptors purely from a hypothesis perspective or did chemokines represent a major signature in the transcriptomic differences across time points?

      We chose to focus on chemokines because of their obvious importance for recruitment of immune cells. They were also among the highest induced genes in the spatial transcriptome (new Table 4).

      In addition to the absence of deep characterization of the spatial transcriptomic data, the study lacks sufficient quantitative analysis to back up the authors' qualitative assessments...

      See above comment regarding statistical comparisons.

      Furthermore, the authors are underutilizing the spatial information provided by Visium with no spatial analysis conducted to quantify the patterning of expression patterns or spatial correlation between factors.

      Several factors make quantification challenging. Lesions grow considerably in size in the first few days of infection, and then shrink in size in the latter days. This makes quantification challenging between timepoints. Radial quantification is also challenging due to the irregular shapes of each granuloma (see comment below for further discussion). Most importantly, the key next experiments are to validate the importance of each chemokine and receptor in vivo. Once we know which ones are the most important, this will justify putting more effort into spatial quantitative analysis and patterning of expression for those chemokines. 

      Impact:

      The author's analysis helps highlight the chemokine profiles of protective, yet host protective granulomas. As the authors comment on in their discussion, these findings have important similarities and differences with other notable granulomatous conditions, such as tuberculosis. Beyond the relevance to C. violaceum infection, these data can help inform studies of other types of granulomas and hone candidate strategies for host-directed therapy strategies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The Visium analysis would be strengthened by

      (1) Showing several histology examples of granulomas at each timepoint to help aid the reader in seeing how 'representative' each Visium sample is...

      These histological analyses are performed in our previous manuscript, and indeed were a crucial aspect of the initial characterization of the spatial transcriptomics dataset, which was performed in Harvest et al., 2023. Full liver sections are shown in that paper at each timepoint, and readers can see that the architecture is highly reproducible.

      (2) Validating their results in other tissues, either with Visium or with more targeted assays for their study's key molecules, such as immunohistochemistry or in situ hybridization

      We agree on the importance of validation studies and have plans to perform single-cell RNA sequencing experiments to further enhance resolution. With key genes in mind, we then plan to perform more in vivo studies to assess physiological relevance of upregulated genes in specific cell types.

      At the very least it would be important to validate the expression of CXCL1 and CXCL2 in other tissues and at the protein level, given the importance of those findings

      We think that the reviewer is asking us to validate that CXCL1 and CXCL2 are actually expressed given the negative reparixin data. However, if we do prove that they are expressed, this will not resolve whether they have critical roles in neutrophil recruitment. To prove this, we would need either a better CXCR2 inhibitor or Cxcr2 knockout mice. Therefore, we are saving further exploration for the future. Regarding validating other chemokines, we establish that CCR2 is critical, and we now show by immunofluorescence and ELISA (new Figure 7 – figure supplement 4) that CCL2 is highly expressed in WT mice, and Ccr2–/– mice actually have strongly elevated CCL2 expression at 3 DPI compared to WT mice.

      In Figure 1B, the UMAP here is largely uninformative. To display the clusters, the authors should instead show a heatmap or equivalent visualization of which genes defined each cluster. It would be helpful for the authors to also write out the full name of each cluster before using the abbreviations shown.

      Please see our previous comment about the initial characterization of clusters performed in Harvest et al., 2023, which details the characteristic genes for each cluster. We have written the full names of each cluster in the legend of Figure 1.

      In Figure 1C the authors, use a binary representation of whether a cluster is present or not at a particular time point. However, the spot size is arbitrary, and the colors of the dots are the same as the cluster color code. It is not clear what threshold the authors (or SpatialDimPlots) use to declare a given cluster is present at a given time point. Therefore, this chart does not give any sense of the extent of each cluster's presence at each time. The authors should revisualize these data to display the abundance of each cluster at each timepoint. This could simply be done by adjusting the size of the circle or using a more traditional heatmap.

      We have now updated this graphic to display the extent of a cluster’s presence, with the size of each dot corresponding to the abundance of each cluster.

      In Figures 2 and 3 the authors describe the kinetics of each chemokine by cluster. While the dynamic expression is evident in the images, it is challenging to determine which clusters are driving expression in the absence of cluster annotation in those figures. The authors should support their visual findings with quantification of each factor in each cluster across time points.

      In Figure 5, violin plots are shown for Cxcl1 and Ccl2 that depict gene expression by each cluster. However, because each capture area is approximately 50 µm in diameter, the data do not achieve single-cell resolution and are not as informative as one would hope. Therefore, violin plots for each chemokine were not shown, though we have generated these graphics. We did not add these graphics to the revision because we did not think readers would generally want to see several pages of violin plots in the supplement. As mentioned, we plan to do single-cell RNA sequencing to further assess chemokine expression by each cell type present within the granulomas at key timepoints.

      With respect to the lack of spatial analysis, the authors describe certain transcript signals (ie. peripheral region versus central region of the granuloma) across each lesion. To back up these qualitative assertions, the authors could use line profiles from the center of each granuloma to the outside to plot the variation in expression of each transcript over radial space. This would provide a more direct way to determine the spatial coordination between various transcripts.

      We considered using line profiles to quantify spatial variation within each lesion at each timepoint. However, this was exceptionally challenging due to the asymmetrical nature of some lesions, and the size discrepancy at different timepoints as the granulomas grow (during infection) and shrink (during resolution). When attempting to decide where to draw the line profiles, we determined that this approach did not enhance our analyses beyond using the cluster overlay and H&E to identify and interrogate different clusters.

      The data visualization in Figure 4 seems unnecessarily confusing. The authors put the transcriptomic signal into categories of 'absent', 'low', 'medium', and 'high.' Why not simply use a continuous scale? The data would also benefit from hierarchical clustering of the heatmap rows to highlight chemokines and their receptors with similar expression patterns across time.

      We considered using a continuous scale as suggested by the reviewer. However, we chose not to create a continuous scale because quantitation is challenging due to the size changes in the lesions over time, such that larger lesions have greater inclusion of surrounding hepatocytes as well as necrotic cores, which would dilute the signal if averaged with the active immunologic granuloma zones. Figure 4 was intended to simplify the entirety of the SpatialFeaturePlots in an easy-to-digest manner, to aid in hypothesis generation as we consider the potential function of each chemokine and receptor in this model. We chose to organize each chemokine ligand based on family, maintaining a numerical order to allow Figure 4 to serve as a quick reference for anyone who is interested in a particular chemokine ligand or receptor.

      Do the authors feel confident in the transcriptomic signal coming from regions of necrosis? Given that many of their bright signals are coming from within clusters annotated as necrosis or necrosis-adjacent this raises an important technical consideration. Can the authors use the H&E image to estimate the cellular density (based on nuclear counts) in each region annotated by Visium? Are there any studies supporting the accurate performance of spatial transcriptomic methods in necrosis? Necrosis can be a source of non-specific binding during in situ hybridization assays.

      The reviewer raises a good point. A defining characteristic of the areas of necrosis is the lack of defined cell borders, with faded or absent nuclei. In these regions, it is impossible to estimate cellular density. Given these concerns, we have included an additional figure (new Figure 1 – figure supplement 1A-B) to display raw counts in each cluster across all timepoints. Though regions of necrosis do display lower read quantity compared to other areas, we are still confident in the positive transcriptomic signal coming from adjacent regions because there are plenty of negative examples in which expression is not detected. In other words, temporal and spatial upregulation of key genes is still observed in the tissues, and future experiments will aim to interrogate the physiological relevance of each gene, while validating the spatial transcriptomics data with other methodologies.

      The methods should include a much more detailed description of the tissue preparation and collection for the Visium experiment. The section on the computational analysis of the Visium data is also extremely limited. At a minimum, the authors should include details on how they performed clustering of the Visium regions.

      The detailed description of tissue preparation, computational analysis, and clustering is in our previous manuscript, from which this dataset originates. We can add a direct quote of the methodology if the reviewer requests.

      The cluster labels in Figure 5 A-B are very difficult to see. Furthermore, it would help if the authors displayed the annotated cluster names (ie. Those shown in 5C) instead of their numerical coding for a more direct interpretation of the data.

      We agree and have updated this figure with annotated cluster names.

      The scale bars in Figure 7 are very difficult to see.

      The scale bars in histology images were kept small intentionally so as not to occlude data, and eLife is an online-only, digital media platform which allows readers to sufficiently zoom on high-resolution histology images. We have increased the DPI resolution for histology images to further aid in visualization.

      The information presented in Tables 2 and 3 is greatly appreciated and will really help guide the reader through the analyses.

      We assembled this information for our own learning about chemokines and hope that it is useful for the reader.

    1. View the LPC Academic Integrity Links to an external site. View the LPC Student Conduct Code Definitions Links to an external site.. View the LPC Academic Honesty Statement Links to an external site..

      big idea here is to do your own work and not use someone else brain.

    1. One of the most interesting parts of being a language user is that while we all speak a dialect, we all have command of a number of linguistic styles.  As we mentioned before, these styles can range from formal to more informal, and the ability to change these styles, even in the middle of a conversation, is called style shifting or sometimes code switching.  Speaking, though, is no different from writing in this way; we all shift the way that we write to meet the needs and expectations of our audience, just as we do when we are speaking. You may not remember learning different styles of speaking but may be more aware of the different styles of writing that you use.  You most likely acquired your various speaking styles from observing your family, friends, and teachers as you were growing up. Perhaps even your parents corrected your style when speaking with older relatives, their friends, or strangers.  You may recall them saying not to say particular words, or to speak louder, or to speak more clearly — all of these requests were requests for you to change your speech style to meet the expectations of your audience.

      It explains that we all use different language styles and naturally shift between them depending on the situation, This is called style shifting. It also notes that we learn these speaking styles from our environment.

    1. Encryption thus limits governments in a way no legislation can. And as described at length in this piece, it’s not just about protection of private property. It’s about using encryption and crypto to protect freedom of speech, freedom of association, freedom of contract, prevention from discrimination and cancellation via pseudonymity, individual privacy, and truly equal protection under rule-of-code — even as the State’s paper-based guarantees of the same become ever more hollow.

      okay so this is my more fundamental issue.

      everything of value, or at least almost everything, has some sort of intersection with the real world. the more value the bigger the intersection.

      i don't understand how it could be otherwise unless you want to trade the physical world for living in some sort of VR reality.

      which lol would still require massive massive data centers and massive massive energy requirements.

    1. My mother was in the room. Andit was perhaps the first time she had heard me give a lengthy speech, using the kind of English I have neverused with her.

      Is she referring to code switching?

    1. No original software or code was created for this publication.

      Have you considered using OpenAI's API to automate the submission of these queries? You can try out a GUI version of the API using ChatGPT's playground here. This interface has several advanced features that could improve your prompt responses. For example, you can set a System prompt to refine the behavior of ChatGPT, for example, telling it to respond only in exact quotations. You can also adjust the Temperature parameter of the model, which allows you to control how variable the responses are.

    1. Author Response

      We thank the reviewers for their positive comments and constructive feedback following their thorough reading of the manuscript. In this provisional reply we will briefly address the reviewer’s comments and suggestions point by point. In the forthcoming revised manuscript, we will more thoroughly address the reviewer’s comments and provide additional supporting data.

      (1) The expression 'randomly clustered networks' needs to be explained in more detail given that in its current form risks to indicate that the network might be randomly organized (i.e., not organized). In particular, a clustered network with future functionality based on its current clustering is not random but rather pre-configured into those clusters. What the authors likely meant to say, while using the said expression in the title and text, is that clustering is not induced by an experience in the environment, which will only be later mapped using those clusters. While this organization might indeed appear as randomly clustered when referenced to a future novel experience, it might be non-random when referenced to the prior (unaccounted) activity of the network. Related to this, network organization based on similar yet distinct experiences (e.g., on parallel linear tracks as in Liu, Sibille, Dragoi, Neuron 2021) could explain/configure, in part, the hippocampal CA1 network organization that would appear otherwise 'randomly clustered' when referenced to a future novel experience.

      As suggested by the reviewer, we will revise the text to clarify that the random clustering is random with respect to any future, novel environment. The cause of clustering could be prior experiences (e.g. Bourjaily M & Miller P, Front. Comput. Neurosci. 5:37, 2011) or developmental programming (e.g. Perin R, Berger TK, & Markram H, Proc. Natl. Acad. Sci. USA 108:5419, 2011).

      (2) The authors should elaborate more on how the said 'randomly clustered networks' generate beyond chance-level preplay. Specifically, why was there preplay stronger than the time-bin shuffle? There are at least two potential explanations:

      (2.1) When the activation of clusters lasts for several decoding time bins, temporal shuffle breaks the continuity of one cluster's activation, thus leading to less sequential decoding results. In that case, the preplay might mainly outperform the shuffle when there are fewer clusters activating in a PBE. For example, activation of two clusters must be sequential (either A to B or B to A), while time bin shuffle could lead to non-sequential activations such as a-b-a-b-a-b where a and b are components of A and B;

      (2.2) There is a preferred connection between clusters based on the size of overlap across clusters. For example, if pair A-B and B-C have stronger overlap than A-C, then cluster sequences A-B-C and C-B-A are more likely to occur than others (such as A-C-B) across brain states. In that case, authors should present the distribution of overlap across clusters, and whether the sequences during run and sleep match the magnitude of overlap. During run simulation in the model, as clusters randomly receive a weak location cue bias, the activation sequence might not exactly match the overlap of clusters due to the external drive. In that case, the strength of location cue bias (4% in the current setup) could change the balance between the internal drive and external drive of the representation. How does that parameter influence the preplay incidence or quality?

      Based on our finding that preplay occurs only in networks that sustain cluster activity over multiple decoding time bins (Figure 5d-e), our understanding of the model’s function is consistent with the reviewers first explanation. We will provide additional analysis in the forthcoming revised manuscript in order to directly test the first explanation and will also test the intriguing possibility that the reviewer’s second suggestion contributes to above-chance preplay.

      (3) The manuscript is focused on presenting that a randomly clustered network can generate preplay and place maps with properties similar to experimental observations. An equally interesting question is how preplay supports spatial coding. If preplay is an intrinsic dynamic feature of this network, then it would be good to study whether this network outperforms other networks (randomly connected or ring lattice) in terms of spatial coding (encoding speed, encoding capacity, tuning stability, tuning quality, etc.)

      We agree that this is an interesting future direction, but we see it as outside the scope of the current work. There are two interesting avenues of future work: 1) Our current model does not include any plasticity mechanisms, but a future model could study the effects of synaptic plasticity during preplay on long-term network dynamics, and 2) Our current model does not include alternative approaches to constructing the recurrent network, but future studies could systematically compare the spatial coding properties of alternative types of recurrent networks.

      (4) The manuscript mentions the small-world connectivity several times, but the concept still appears too abstract and how the small-world index (SWI) contributes to place fields or preplay is not sufficiently discussed.

      For a more general audience in the field of neuroscience, it would be helpful to include example graphs with high and low SWI. For example, you can show a ring lattice graph and indicate that there are long paths between points at opposite sides of the ring; show randomly connected graphs indicating there are no local clustered structures, and show clustered graphs with several hubs establishing long-range connections to reduce pair-wise distance.

      How this SWI contributes to preplay is also not clear. Figure 6 showed preplay is correlated with SWI, but maybe the correlation is caused by both of them being correlated with cluster participation. The balance between cluster overlap and cluster isolation is well discussed. In the Discussion, the authors mention "...Such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index..." (Lines 560-561). I believe the statement is not entirely appropriate, a network similar to ring lattice can still have the balance of cluster isolation and cluster overlap, while it will have small SWI due to a long path across some node pairs. Both cluster structure and long-range connection could contribute to SWI. The authors only discuss the necessity of cluster structure, but why is the long-range connection important should also be discussed. I guess long-range connection could make the network more flexible (clusters are closer to each other) and thus increase the potential repertoire.

      We agree that the manuscript would benefit from a more concrete explanation of the small-world index. We will revise the text and add illustrative figures.

      We note that while our most successful clustered networks are indeed those with small-world characteristics, there are other ways of producing small-world networks which may not show good place fields or preplay. We will test another type of small-world network if time permits.

      Our discussion of “cluster overlap” is specific to our type of small-world network in which there is no pre-determined spatial dimension (unlike the ring network of Watts and Strogatz). Therefore, because clusters map randomly to location once a particular spatial context is imposed, the random overlap between clusters produces long-range connections in that context (and any other context) so one can think of the amount of overlap between clusters as representing the number of long-range connections in a Watts-Strogatz model, except, we wish to iterate, such models involve a spatial topology within the network, which we do not include.

      (5) What drives PBE during sleep? Seems like the main difference between sleep and run states is the magnitude of excitatory and inhibitory inputs controlled by scaling factors. If there are bursts (PBE) in sleep, do you also observe those during run? Does the network automatically generate PBE in a regime of strong excitation and weak inhibition (neural bifurcation)?

      During sleep simulations, the PBEs are spontaneously generated by the recurrent connections in the network. The constant-rate Poisson inputs drive low-rate stochastic spiking in the recurrent network, which then randomly generates population events when there is sufficient internal activity to transiently drive additional spiking within the network.

      During run simulations, the spatially-tuned inputs drive greater activity in a subset of the cells at a given point on the track, which in turn suppress the other excitatory cells through the feedback inhibition.

      (6) Is the concept of 'cluster' similar to 'assemblies', as in Peyrache et al, 2010; Farooq et al, 2019? Does a classic assembly analysis during run reveal cluster structures?

      Yes, we are highly confident that the clusters in our network would correspond to the functional assemblies that have been studied through assembly analysis and will present the relevant data in a revision.

      (7) Can the capacity of the clustered network to express preplay for multiple distinct future experiences be estimated in relation to current network activity, as in Dragoi and Tonegawa, PNAS 2013?

      We agree this is an interesting opportunity to compare the results of our model to what has been previously found experimentally and will test this if time permits.

      Reviewer # 2

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      We agree that this is an important question, and we plan to run further simulations where we test the effects of varying the simulated speed. We will present results in the resubmission.

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      We agree that testing the robustness of our results to different models of feedforward input is important and we plan to do this in our revised manuscript for the linear track and W-track.

      Testing the model in a 2D environment is an interesting future direction, but we see it as outside the scope of the current work. To our knowledge there are no experimental findings of preplay in 2D environments, but this presents an interesting opportunity for future modeling studies.

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

      Thank you for making this important point and giving us the opportunity to clarify. We do find that subsets of cells with identical cluster membership have correlated place fields, but as we show in Figure 7b the network place map as a whole shows low remapping correlations across environments, which is consistent with experimental data (Hampson RE et al, Hippocampus 6:281, 1996; Pavlides C, et al, Neurobiol Learn Mem 161:122, 2019). Our model includes a relatively small number of cells and clusters compared to CA3, and with a more realistic number of clusters, the level of correlation across network place maps should reduce even further in our model network. The reason for a low level of correlation is because cluster membership is combinatorial, whereby cells that share membership in one cluster can also belong to separate/distinct other clusters, rendering their activity less correlated than might be anticipated. In our revised manuscript we will address this point more carefully and cite the relevant experimental support.

      Reviewer # 3

      Weaknesses:

      To generate place cell-like activity during a simulated traversal of a linear environment, the authors drive the network with a combination of linearly increasing/decreasing synaptic inputs, mimicking border cell-like inputs. These inputs presumably stem from the entorhinal cortex (though this is not discussed). The authors do not explore how the model would behave when these inputs are replaced by or combined with grid cell inputs which would be more physiologically realistic.

      We chose the linearly varying spatial inputs as the minimal model of providing spatial input to the network so that we could focus on the dynamics of the recurrent connections. We agree our results will be strengthened by testing alternative types of border-like input so will present such additional results in our revised version. However, given that a sub-goal of our model was to show that place fields could arise in locations at which no neurons receive a peak in external input, whereas combining input from multiple grid cells produces peaked place-field like input, adding grid cell input (and the many other types of potential hippocampal input) is beyond the scope of the paper.

      Even though the authors claim that no spatially-tuned information is needed for the model to generate place cells, there is a small location-cue bias added to the cells, depending on the cluster(s) they belong to. Even though this input is relatively weak, it could potentially be driving the sequential activation of clusters and therefore the preplays and place cells. In that case, the claim for non-spatially tuned inputs seems weak. This detail is hidden in the Methods section and not discussed further. How does the model behave without this added bias input?

      First, we apologize for a lack of clarity if we have caused confusion about the type of inputs (linear and cluster-dependent as we had attempted to portray prominently in Figure 1, where it is described in the caption, l. 156-157, and Results, l. 189-190 & l. 497-499, as well as in the Methods, l. 671-683) and if we implied an absence of spatially-tuned information in the network. In the revision we will clarify that for reliable place fields to appear, the network must receive spatial information and that one point of our paper is that the information need not arrive as peaks of external input already resembling place cells or grid cells. We chose linearly ramping boundary inputs as the minimally place-field like stimulus (that still contains spatial information) but in our revision we will include alternatives. We should note that during sleep, when “preplay” occurs, there is no such spatial bias (which is why preplay can equally correlate with place field sequences in any context). In the revision, we will update Figure 1 to show more clearly the cluster-dependent linearly ramping input received by some specific cells with both similar and different place fields.

      Unlike excitation, inhibition is modeled in a very uniform way (uniform connection probability with all E cells, no I-I connections, no border-cell inputs). This goes against a long literature on the precise coordination of multiple inhibitory subnetworks, with different interneuron subtypes playing different roles (e.g. output-suppressing perisomatic inhibition vs input-gating dendritic inhibition). Even though no model is meant to capture every detail of a real neuronal circuit, expanding on the role of inhibition in this clustered architecture would greatly strengthen this work.

      This is an interesting future direction, but we see it as outside the scope of our current work. While inhibitory microcircuits are certainly important physiologically, we focus here on a minimal model that produces the desired place cell activity and preplay, as measured in excitatory cells.

      For the modeling insights to be physiologically plausible, it is important to show that CA3 connectivity (which the model mimics) shares the proposed small-world architecture. The authors discuss the existence of this architecture in various brain regions but not in CA3, which is traditionally thought of and modeled as a random or fully connected recurrent excitatory network. A thorough discussion of CA3 connectivity would strengthen this work.

      We agree this is an important point that is missing, and we will revise the text to specifically address CA3 connectivity (Guzman et al., Science 353 (6304), 1117-1123 2016) and the small-world structure therein due to the presence of “assemblies”.

    2. Reviewer #2 (Public Review):

      Summary:

      The authors show that a spiking network model with clustered neurons produces intrinsic spike sequences when driven with a ramping input, which are recapitulated in the absence of input. This behavior is only seen for some network parameters (neuron cluster participation and number of clusters in the network), which correspond to those that produce a small world network. By changing the strength of ramping input to each network cluster, the network can show different sequences.

      Strengths:

      A strength of the paper is the direct comparison between the properties of the model and neural data.

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

    1. summary

      Speaking of summaries, AI worse than humans at summaries studies show.

      Succinct reason why by David Chisnall:

      LLMs are good at transforms that have the same shape as ones that appear in their training data. They're fairly good, for example, at generating comments from code because code follows common structures and naming conventions that are mirrored in the comments (with totally different shapes of text).

      In contrast, summarisation is tightly coupled to meaning. Summarisation is not just about making text shorter, it's about discarding things that don't contribute to the overall point and combining related things. This is a problem that requires understanding the material, because it's all about making value judgements.

    1. Realtalk is just one component of a culture, and downloading source code does not download values, norms, practices, and tacit knowledge. We intend the culture to spread in a manner similar to scientific practices, trades and crafts, martial arts, spoken language, and so on — in-person immersion in a community of practice, teachers teaching teachers. This will take time, and it may appear that Realtalk is “exclusive” during that time. But open-source software is also exclusive, to those who find meaning in source code. And those people already seem well-provided for.

      No tiene porque haber contradicción entre los encuentros en persona, que transmiten y encarnan cultura y los medios digitales donde también transitan. Nuestras Data Rodas tienen también inspiración en una cultura del cuerpo, con encuentros en persona y virtuales, a la vez que producen código y prosa que transita para quienes no están en los encuentros cara a cara.

    2. Open source is not open to most people. Source code in a git repo is not open to everyone. It’s open to the select class of people who know what it means to clone a git repo.

      Algo que se puede percibir muy rápidamente y, de hecho, una de las motivaciones detrás de mi tesis doctoral y que allí afirmé en la línea de "no puedes entender lo que no puedes cambiar". La mayoría de herramientas de software son paradógicamente inflexibles (no parecen la parte blanda, sino dura, por ello), en particular, por todas las fricciones para entenderlas y cambiarlas, incluyendo en manejo de Git.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper conducted a GWAS meta-analysis for COVID-19 hospitalization among admixed American populations. The authors identified four genome-wide significant associations, including two novel loci (BAZ2B and DDIAS), and an additional risk locus near CREBBP using cross-ancestry meta-analysis. They utilized multiple strategies to prioritize risk variants and target genes. Finally, they constructed and assessed a polygenic risk score model with 49 variants associated with critical COVID-19 conditions.

      Strengths:

      Given that most of the previous studies were done in European ancestries, this study provides unique findings about the genetics of COVID-19 in admixed American populations. The GWAS data would be a valuable resource for the community. The authors conducted comprehensive analyses using multiple different strategies, including Bayesian fine mapping, colocalization, TWAS, etc., to prioritize risk variants and target genes. The polygenic risk score (PGS) result demonstrated the ability of the cross-population

      PGS model for COVID-19 risk stratification.

      Thank you very much for the positive comments and the willingness to revise this manuscript.

      Weaknesses:

      (1) One of the major limitations of this study is that the GWAS sample size is relatively small, which limits its power.

      (2) The fine mapping section is unclear and there is a lack of information. The authors assumed one causal signal per locus, and only provided credible sets, but did not provide posterior inclusion probabilities (PIP) for the variants to be causal.

      (3) Colocalization and TWAS used eQTL data from GTEx data, which are mainly from European ancestries. It is unclear how much impact the ancestry mismatch would have on the result. The readers should be cautious when interpreting the results and designing follow-up studies.

      We agree with that the sample size is relatively small. Despite that, it was sufficient to reveal novel risk loci supporting the robustness of the main findings. We have indicated this limitation at the end of the discussion section.

      Thank you for rising this point. As suggested, we have also used SuSIE, which allows to assume more than one causal signal per locus. However, in this case the results were not different from those obtained with the original Bayesian colocalization performed with corrcoverage. Regarding the PIP, at the fine mapping stage we are inclined to put more weight on the functional annotations of the variants in the credible set than on the statistical contributions to the signal. This is the reason why we prefer not to put weight on the PIP of the variants but prioritize variants that were enriched functional annotations.

      This is a good point regarding the lack of diversity in GTEx data. We have also used data from AMR populations (GALA II-SAGE models), although it was only available for blood tissue. Regarding the ancestry mismatch between datasets, several studies have attempted to explore the impact. Gay et al. (PMID: 32912333) studied local ancestry effects on eQTLs from the GTEx consortium and concluded that adjustment of eQTLs by local ancestry only yields modest improvement over using global ancestry (as done in GTEx). Moreover, the colocalization results between adjusting by Local Ancestry and Global Ancestry were not significantly different. Besides, Mogil et al. (PMID: 30096133) observed that genes with higher heritability share genetic architecture between populations. Nevertheless, both studies have evidenced decreased power and poorer predictive performances regarding gene expression because of reduced diversity in eQTL analyses. As consequence of the ancestry mismatch, we now warn the readers that this may compromise signal detection (Discussion, lines 531-533). 

      Reviewer #2 (Public Review):

      This is a genome-wide association study of COVID-19 in individuals of admixed American ancestry (AMR) recruited from Brazil, Colombia, Ecuador, Mexico, Paraguay, and Spain. After quality control and admixture analysis, a total of 3,512 individuals were interrogated for 10,671,028 genetic variants (genotyped + imputed). The genetic association results for these cohorts were meta-analyzed with the results from The Host Genetics Initiative (HGI), involving 3,077 cases and 66,686 controls. The authors found two novel genetic loci associated with COVID-19 at 2q24.2 (rs13003835) and 11q14.1 (rs77599934), and other two independent signals at 3p21.31 (rs35731912) and 6p21.1 (rs2477820) already reported as associated with COVID-19 in previous GWASs. Additional meta-analysis with other HGI studies also suggested risk variants near CREBBP, ZBTB7A, and CASC20 genes.

      Strengths:

      These findings rely on state-of-the-art methods in the field of Statistical Genomics and help to address the issue of a low number of GWASs in non-European populations, ultimately contributing to reducing health inequalities across the globe.

      Thank you very much for the positive comments and the willingness to revise this manuscript.

      Weaknesses:

      There is no replication cohort, as acknowledged by the authors (page 29, line 587), and no experimental validation to assess the biological effect of putative causal variants/genes. Thus, the study provides good evidence of association, rather than causation, between the genetic variants and COVID-19. Lastly, I consider it crucial to report the results for the SCOURGE Latin American GWAS, in addition to its meta-analysis with HGI results, since HGI data has a different phenotype scheme (Hospitalized COVID vs Population) compared to SCOURGE (Hospitalized COVID vs Non-hospitalized COVID).

      We essentially agree with the reviewer in that one of the main limitations of the study is the lack of a replication stage because of the use of all available datasets on a one-stage analysis. To contribute to the interpretation of the findings in the absence of a replication stage, we now assessed the replicability of the novel loci using the Meta-Analysis Model-based Assessment of replicability (MAMBA) approach (PMID: 33785739) and included the posterior probabilities of replication in Table 2. We also explored further the potential replicability of signals in other populations. We agree that the results should be interpreted in terms of associations given the lack of functional validation of main findings, so we have slightly modified the discussion.

      As suggested, the SCOURGE Latin American GWAS summary is now accessible by direct request to the Consortium GitHub repository (https://github.com/CIBERER/Scourge-COVID19) (lines 797-799). We have also included the results from the SCOURGE GWAS analysis for the replication of the 40 lead variants in the Supplementary Table 12. Results from the SCOURGE GWAS for the lead variants in the AMR meta-analysis with HGI were already included in the Supplementary Table 2. As note, we have not been able to conduct the meta-analysis with the same hospitalization scheme as in the HGI study since the population-specific results for those analyses were not publicly released. However, sensitivity analyses included within the supplementary material from the COVID-19 Host Genetics Initiative (2021) stated that there were no significant differences in effects (Odds Ratios) between analyses using population controls or just non-hospitalized COVID-19 patients.

      Reviewer #3 (Public Review):

      Summary:

      In the context of the SCOURGE consortium's research, the authors conduct a GWAS meta-analysis on 4,702 hospitalized individuals of admixed American descent suffering from COVID-19. This study identified four significant genetic associations, including two loci initially discovered in Latin American cohorts. Furthermore, a trans-ethnic meta-analysis highlighted an additional novel risk locus in the CREBBP gene, underscoring the critical role of genetic diversity in understanding the pathogenesis of COVID-19.

      Strengths:

      (1) The study identified two novel severe COVID-19 loci (BAZ2B and DDIAS) by the largest GWAS meta-analysis for COVID-19 hospitalization in admixed Americans.

      (2) With a trans-ethnic meta-analysis, an additional risk locus near CREBBP was identified.

      Thank you very much for the positive comments and the willingness to revise this manuscript.

      Weaknesses:

      (1) The GWAS power is limited due to the relatively small number of cases.

      (2) There is no replication study for the novel severe COVID-19 loci, which may lead to false positive findings.

      We agree with that the sample size is relatively small. Despite that, it was sufficient to reveal novel risk loci supporting the robustness of the main findings. We have indicated this limitation at the end of the discussion section.

      Regarding the lack of a replication study, we now assessed the replicability of the novel loci using the Meta-Analysis Model-based Assessment of replicability (MAMBA) approach (PMID: 33785739). We have included the posterior probabilities of replication in Table 2.

      (3) Significant differences exist in the ages between cases and controls, which could potentially introduce biased confounders. I'm curious about how the authors treated age as a covariate. For instance, did they use ten-year intervals? This needs clarification for reproducibility.

      Thank you for rising this point. Age was included as a continuous variable. This has been now indicated in line 667 (within Material and Methods).

      (4)"Those in the top PGS decile exhibited a 5.90-fold (95% CI=3.29-10.60, p=2.79x10-9) greater risk compared to individuals in the lowest decile". I would recommend comparing with the 40-60% PGS decile rather than the lowest decile, as the lowest PGS decile does not represent 'normal controls'.

      Thank you. In the revised version, the PGS categories was compared following the recommendation (lines 461-463).

      (5) In the field of PGS, it's common to require an independent dataset for training and testing the PGS model. Here, there seems to be an overfitting issue due to using the same subjects for both training and testing the variants.

      We are sorry for the misunderstanding. In fact, we have followed the standard to avoid overfitting of the PGS model and have used different training and testing datasets. The training data (GWAS) was the HGI-B2 ALL meta-analysis, in which our AMR GWAS was not included. The PRS model was then tested in the SCOURGE AMR cohort. However, it is true that we did test the combination of the PRS adding the new discovered variants in the SCOURGE cohort. To avoid potential overfitting by adding the new loci, we have excluded from the manuscript the results on which we included the newly discovered variants.

      (6) The variants selected for the PGS appear arbitrary and may not leverage the GWAS findings without an independent training dataset.

      Again, we are sorry for the misunderstanding. The PGS model was built with 43 variants associated with hospitalization or severity within the HGI v7 results and 7 which were discovered by the GenOMICC consortium in their latest study and were not in the latest HGI release. The variants are included within the Supplementary Table 14, but we have now annotated the discovery GWAS.

      (7) The TWAS models were predominantly trained on European samples, and there is no replication study for the findings as well.

      This is a good point regarding the lack of diversity in GTEx data. We have also used data from AMR populations (GALA II-SAGE models), although it was only available for blood tissue. Regarding the ancestry mismatch between datasets, several studies have attempted to explore the impact. Gay et al. (PMID: 32912333) studied local ancestry effects on eQTLs from the GTEx consortium and concluded that adjustment of eQTLs by local ancestry only yields modest improvement over using global ancestry (as done in GTEx). Moreover, the colocalization results between adjusting by Local Ancestry and Global Ancestry were not significantly different. Besides, Mogil et al. (PMID: 30096133) observed that genes with higher heritability share genetic architecture between populations. Nevertheless, both studies have evidenced decreased power and poorer predictive performances regarding gene expression because of reduced diversity in eQTL analyses. As consequence of the ancestry mismatch, we now warn the readers that this may compromise signal detection (Discussion, lines 531-533). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors mentioned the fine mapping method did not converge for the locus in chr 11. I would consider trying a different fine-mapping method (such as SuSiE or FINEMAP). It would be helpful to provide posterior inclusion probabilities (PIP) for the variants in fine mapping results and plot the PIP values in the regional association plots.

      As suggested, we have also used SuSIE, which allows to assume more than one causal signal per locus. However, in this case the results were not different from those obtained with the original Bayesian colocalization performed with corrcoverage. SuSIE’s fine-mapping for chromosome 11 prioritized a single variant, which is likely due to the rare frequency. Thus, we have maintained the fine-mapping as it was originally indicated in the previous version of the manuscript but have now included the credible set in Supplementary Table 6.

      Regarding the PIP, at the fine mapping stage we are inclined to put more weight on the functional annotations of the variants in the credible set than on the statistical contributions to the signal. This is the reason why we prefer not to put weight on the PIP of the variants but prioritize variants that were enriched functional annotations.

      (2) Please provide more detailed information about the VEP and V2G analysis and how to interpret those results. My understanding of V2G is that it includes different sources of information (such as molecular QTLs and chromatin interactions from different tissues/cell types, etc.). It is unclear what sources of information and weight settings were used in the V2G model.

      Thank you for rising this point. As suggested, we have clarified the basis for VEP and V2G and the interpretation (lines 732-743).

      (3) The authors identified multiple genes with different strategies, e.g. FUMA, V2G, COLOC, TWAS, etc. How many genes were found/supported by evidence provided by multiple methods? It could be helpful to have a table summarizing the risk genes found by different strategies, and the evidence supporting the genes. e.g. which genes are found by which methods, and the biological functions of the genes, etc.

      Thank you for rising this point. As suggested, we now added a new figure (Figure 5) to summarize the findings with the multiple methods used.

      (4) It would be helpful to make the code/scripts available for reproducibility.

      As suggested, the SCOURGE Latin American GWAS summary and the analysis scripts (https://github.com/CIBERER/Scourge-COVID19/tree/main/scripts/novel-risk-hosp-AMR-2024) are now accessible in the Consortium GitHub repository (https://github.com/CIBERER/Scourge-COVID19) (lines 806-807).

      (5) The fonts in some of the figures (e.g. Figure 2) are hard to read.

      Thank you. We have now included the figures as SVG files.

      Reviewer #2 (Recommendations For The Authors):

      - The abstract lacks a conclusion sentence.

      Thank you. As suggested, we have included two additional sentences with broad conclusions from the study. We preferred to avoid relying on conclusions related to known or new biological links of the prioritized genes given the lack of functional validation of main findings.

      - Regarding the association analysis (page 27, line 677), I wonder if some of the 10 principal components (PCs) are capturing information about the recruitment areas (countries). It may be relevant to test for multicollinearity among these variables.

      Since we acknowledge that some of the categories might be correlated with a certain PC but not all of them do, we have calculated GVIF values for the main variables to assess the categorical variable as a single entity. The scaled GVIF^1(1/2*Df)) value for the categorical variable is 1.52. Thus, if we square this value, we obtain 2.31, which can be then used for applying usual rule-of-thumb for VIF values.

      - Still on the topic of association analysis, did the authors adjust the logistic model for comorbidities variables from Table 1? Given these comorbidities also have a genetic component and their distribution differs between non-hospitalized vs hospitalized, I am concerned that comorbidities might be confounding the association between genetic variants and COVID.

      We did not adjust by comorbidities since HGI studies were not adjusted either and we aimed to be as aligned as possible with HGI. However, as suggested, we have now tested the association between each of the comorbidities in Table 1 and each of the variants in Table 2, using the comorbidities as dependent variables and adjusting for the main covariables (age, sex, PCs and country of recruitment). None of the variants were significantly associated to the comorbidities (line 333).

      - If I understood correctly, the 49 genetic variants used to develop the polygenic risk score model (PRS) were based on the HGI total sample size (data release 7), which is predominantly of European ancestry. I am concerned about the prediction accuracy in the AMR population (PRS transferability issue).

      We have explored literature in search of other PRS to compare the associated OR in our cohort with ORs calculated in European populations. Horowitz et al. (2022) reported an OR of 1.38 for the top 10% with respect to hospitalization risk in European individuals using a GRS with 12 variants.

      We acknowledge that this might be an issue and is now explained in discussion of the revised version (lines 561-568). However, as this is the first time a PRS for COVID-19 is applied to a relatively large AMR cohort, we believe that this analysis will be of value for further analyses regarding PRS transferability, providing a source for comparison in further studies.    

      - On page 23, line 579, the authors acknowledge their "GWAS is underpowered". This sentence requires a sample/power calculation, otherwise, I suggest using "is likely underpowered".

      Thanks for the input. We have modified the sentence as suggested.

      Reviewer #3 (Recommendations For The Authors):

      I wonder if the authors have an approximate date when the GWAS summary statistic will be available. I reviewed some manuscripts in the past, and the authors claimed they would deposit the data soon, but in fact it would not happen until 2 years later.

      The summary statistics are already available from the SCOURGE Consortium repository https://github.com/CIBERER/Scourge-COVID19 (lines 806-807).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors assess the accuracy of short variant calling (SNPs and indels) in bacterial genomes using Oxford Nanopore reads generated on R10.4 flow cells from a very similar genome (99.5% ANI), examining the impact of variant caller choice (three traditional variant callers: bcftools, freebayes, and longshot, and three deep learning based variant callers: clair3, deep variant, and nano caller), base calling model (fast, hac and sup) and read depth (using both simplex and duplex reads).

      Strengths:

      Given the stated goal (analysis of variant calling for reads drawn from genomes very similar to the reference), the analysis is largely complete and results are compelling. The authors make the code and data used in their analysis available for re-use using current best practices (a computational workflow and data archived in INSDC databases or Zenodo as appropriate).

      Weaknesses:

      While the medaka variant caller is now deprecated for diploid calling, it is still widely used for haploid variant calling and should at least be mentioned (even if the mention is only to explain its exclusion from the analysis). 

      We have now added Medaka haploid caller to the benchmark. It performs quite well overall (better than the traditional methods), but not as good as Clair3 or DeepVariant.

      Appraisal:

      The experiments the authors engaged in are well structured and the results are convincing. I expect that these results will be incorporated into "best practice" bacterial variant calling workflows in the future. 

      Thank you for the positive appraisal.

      Reviewer #2 (Public Review):

      Summary:

      Hall et al describe the superiority of ONT sequencing and deep learning-based variant callers to deliver higher SNP and Indel accuracy compared to previous gold-standard Illumina short-read sequencing. Furthermore, they provide recommendations for read sequencing depth and computational requirements when performing variant calling.

      Strengths:

      The study describes compelling data showing ONT superiority when using deep learning-based variant callers, such as Clair3, compared to Illumina sequencing. This challenges the paradigm that Illumina sequencing is the gold standard for variant calling in bacterial genomes. The authors provide evidence that homopolymeric regions, a systematic and problematic issue with ONT data, are no longer a concern in ONT sequencing.

      Weaknesses:

      (1) The inclusion of a larger number of reference genomes would have strengthened the study to accommodate larger variability (a limitation mentioned by the authors). 

      Our strategic selection of 14 genomes—spanning a variety of bacterial genera and species, diverse GC content, and both gram-negative and gram-positive species (including M. tuberculosis, which is neither)—was designed to robustly address potential variability in our results. Moreover, all our genome assemblies underwent rigorous manual inspection as the quality of the true genome sequences is the foundation this research is built upon. Given this, the fundamental conclusions regarding the accuracy of variant calls would likely remain unchanged with the addition of more genomes.  However, we do acknowledge that a substantially larger sample size, which is beyond the scope of this study, would enable more fine-grained analysis of species differences in error rates.

      (2) In Figure 2, there are clearly one or two samples that perform worse than others in all combinations (are always below the box plots). No information about species-specific variant calls is provided by the authors but one would like to know if those are recurrently associated with one or two species. Species-specific recommendations could also help the scientific community to choose the best sequencing/variant calling approaches.

      Thank you for highlighting this observation. The precision, recall, and F1 scores for each sample and condition can be found in Supplementary Table S4.

      Upon investigation of the outliers in Figure 2 we discovered three things. First, there was a parameter in Longshot we were using that automatically capped coverage and lead to a number of false negatives, leading to its outlier. This has now been rectified and the figure is updated accordingly. Second, the outlier in the simplex sup SNP panel (top left) was the same E. coli sample for most variant callers (though Medaka had no issues). The reasoning for this was a variant dense repetitive region. We have added an in-depth explanation of this, along with figures illustrating the issue in Supplementary Section S2, with a brief statement in the main text. Third, the outlier in the duplex sup SNP panel (top right) is due to a very low (duplex) depth sample. This has also been added briefly to the main text and fully in Section S2.

      We have now included a species-segregated version of Figure 2 (Suppl. Figures S5-7) for Clair3 with the sup model (best performer) for a clearer interpretation of how each species performs.

      (3) The authors support that a read depth of 10x is sufficient to achieve variant calls that match or exceed Illumina sequencing. However, the standard here should be the optimal discriminatory power for clinical and public health utility (namely outbreak analysis). In such scenarios, the highest discriminatory power is always desirable and as such an F1 score, Recall and Precision that is as close to 100% as possible should be maintained (which changes the minimum read sequencing depth to at least 25x, which is the inflection point).

      We agree that the highest discriminatory power is always desirable for clinical or public health applications. In which case, 25x is probably a better minimum recommendation. However, we are also aware that there are resource-limited settings where parity with Illumina is sufficient. In these cases, 10x depth from ONT would provide enough data.

      The manuscript previously emphasised the latter scenario, but we have revised the text (Discussion) to clearly recommend 25x depth as a conservative aim in settings where resources are not a constraint, ensuring the highest possible discriminatory power.

      (4) The sequencing of the samples was not performed with the same Illumina and ONT method/equipment, which could have introduced specific equipment/preparation artefacts that were not considered in the study. See for example https://academic.oup.com/nargab/article/3/1/lqab019/6193612.

      To our knowledge, there is no evidence that sequencing on different ONT machines or barcoding kits leads to a difference in read characteristics or accuracy. To ensure consistency and minimise potential variability, we used the same ONT flowcells for all samples and performed basecalling on the same Nvidia A100 GPU. We have updated the methods to emphasise this.

      For Illumina and ONT, the exact machines and kits used for each sample have been added as supplementary table S9 We have also added a short paragraph about possible Illumina error rate differences in the ‘Limitations’ section of the Discussion.

      The third limitation is that Illumina sequencing was performed on different models: three samples on the NextSeq 500 and the rest on the NextSeq 2000. While differences in error rates exist between Illumina instruments, no specific assessment has been made between these NextSeq models [42]. However, the absolute differences in error rates are minor and unlikely to impact our study significantly. This is particularly relevant since Illumina's lower F1 score compared to ONT was due to missed calls rather than erroneous ones.

      In summary, while there may be specific equipment or preparation artifacts to consider, we took steps to minimise these effects and maintain consistency across our sequencing methods.

      Reviewer #3 (Public Review):

      Hall et al. benchmarked different variant calling methods on Nanopore reads of bacterial samples and compared the performance of Nanopore to short reads produced with Illumina sequencing. To establish a common ground for comparison, the authors first generated a variant truth set for each sample and then projected this set to the reference sequence of the sample to obtain a mutated reference. Subsequently, Hall et al. called SNPs and small indels using commonly used deep learning and conventional variant callers and compared the precision and accuracy from reads produced with simplex and duplex Nanopore sequencing to Illumina data. The authors did not investigate large structural variation, which is a major limitation of the current manuscript. It will be very interesting to see a follow-up study covering this much more challenging type of variation. 

      We fully agree that investigating structural variations (SVs) would be a very interesting and important follow-up. Identifying and generating ground truth SVs is a nontrivial task and we feel it deserves its own space and study. We hope to explore this in the future.

      In their comprehensive comparison of SNPs and small indels, the authors observed superior performance of deep learning over conventional variant callers when Nanopore reads were basecalled with the most accurate (but also computationally very expensive) model, even exceeding Illumina in some cases. Not surprisingly, Nanopore underperformed compared to Illumina when basecalled with the fastest (but computationally much less demanding) method with the lowest accuracy. The authors then investigated the surprisingly higher performance of Nanopore data in some cases and identified lower recall with Illumina short read data, particularly from repetitive regions and regions with high variant density, as the driver. Combining the most accurate Nanopore basecalling method with a deep learning variant caller resulted in low error rates in homopolymer regions, similar to Illumina data. This is remarkable, as homopolymer regions are (or, were) traditionally challenging for Nanopore sequencing.

      Lastly, Hall et al. provided useful information on the required Nanopore read depth, which is surprisingly low, and the computational resources for variant calling with deep learning callers. With that, the authors established a new state-of-the-art for Nanopore-only variant, calling on bacterial sequencing data. Most likely these findings will be transferred to other organisms as well or at least provide a proof-of-concept that can be built upon.

      As the authors mention multiple times throughout the manuscript, Nanopore can provide sequencing data in nearly real-time and in remote regions, therefore opening up a ton of new possibilities, for example for infectious disease surveillance.

      However, the high-performing variant calling method as established in this study requires the computationally very expensive sup and/or duplex Nanopore basecalling, whereas the least computationally demanding method underperforms. Here, the manuscript would greatly benefit from extending the last section on computational requirements, as the authors determine the resources for the variant calling but do not cover the entire picture. This could even be misleading for less experienced researchers who want to perform bacterial sequencing at high performance but with low resources. The authors mention it in the discussion but do not make clear enough that the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required. 

      We have provided runtime benchmarks for basecalling in Supplementary Figure S23 and detailed these times in Supplementary Table S7. In addition, we state in the Results section (P9 L239-241) “Though we do note that if the person performing the variant calling has received the raw (pod5) ONT data, basecalling also needs to be accounted for, as depending on how much sequencing was done, this step can also be resource-intensive.”

      Even with super-accuracy basecalling considered, our analysis shows that variant calling remains the most resource-intensive step for Clair3, DeepVariant, FreeBayes, Medaka, and NanoCaller. Therefore, the statement “the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required”, is incorrect. However, we have made this more prominent in the Results and Discussion.

      In the results section we added the underlined section:

      “… FreeBayes had the largest runtime variation, with a maximum of 597s/Mbp, equating to 2.75 days for the same genome. In contrast, basecalling with a single GPU using the super-accuracy model required a median runtime of 0.77s/Mbp, or just over 5 minutes for a 4Mbp genome at 100x depth. …”

      In the discussion we have added the following statement:

      “Basecalling is generally faster than variant calling, assuming GPU access, which is likely considered when acquiring ONT-related equipment.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The colour choices in Figure 3 and Figure 4 c made the illustrations somewhat difficult to read. More substantially, a deeper investigation of the causes of non-homopolymeric-related mistaken indel calls would be useful. 

      We have updated Figure 3 so that each line has a different style to aid in discriminating between colours. The colour scheme for Figure 4c has also been updated.

      In terms of non-homopolymeric false positive (FP) indel calls, we did an investigation of these for Clair3 and DeepVariant on the simplex sup data as these are the two best performing variant callers and deal the best with homopolymers. For Clair3, there were eight FPs across all samples. Five of these were homopolymers. The remaining three occurred within one or two bases of another insertion which inserted a similar sequence to the FP. For DeepVariant, it was much the same story, with 8/11 FP indels being in homopolymers, and the remaining three being within one or two bases of another insertion with a similar sequence. We have added a couple of sentences to the results explaining this finding.

      Reviewer #2 (Recommendations For The Authors):

      The paper is well-written and provides evidence for the conclusions. Some issues should be addressed.

      Include a section in the Results describing species-specific observations, namely if some samples had recurrently lower SNP and INDEL F1 scores (as observed in Figure 2). 

      Please see our response in your second point in the ‘Weaknesses’ section of the public review.

      Please provide more details on how the samples were sequenced. Section "Sequencing" in the methods is confusing and not clear enough to be reproduced (provide a supplementary table/figure with the workflow for each sample). Add information about how many samples were multiplexed in each run and what was the output achieved in each.

      We have now added a Supplementary Table S9 which outlines which instruments, kits, and multiplexing strategies were used for each sample. In addition, the raw pod5 data that we make available has been segregated by sample, so knowledge of the multiplexing strategy is not necessary for someone attempting to reproduce our results.

      The authors acknowledge that structural variation was not evaluated in this manuscript. Since ONT sequencing is often used to reconstruct the sequence of plasmids for outbreak/epidemiology analysis, perhaps they could undertake this analysis on a plasmids dataset (which suffers from constant structural variation).

      As noted in our response to Reviewer 3’s public review, we fully agree that investigating structural variations (SVs) would be a very interesting and important follow-up. Identifying and generating ground truth SVs is a nontrivial task and we feel it deserves its own space and study. We hope to explore this in the future.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript is well organized. However, some sections are a bit long and would benefit from being more concise.

      Thank you for your valuable feedback and for acknowledging the organisation of our manuscript. We appreciate your suggestion regarding the length of certain sections. We have gone back through and made the manuscript more concise.

      Figure 1: Is the Qscore really the same as identity? Isn't the determination of identity only possible after alignment? 

      When we say Qscore we are referring to the Phred-scaled version of the read identity, which is alignment based, not the Qscores of the individual bases in the FASTQ file. We have updated the text and figure legend to make this clearer. “The Qscore is the logarithmic transformation of the read identity,  , where 𝑃 is the read identity.”. We also now explicitly state that read identity is alignment-based.

      Abbreviations/terms mentioned but not introduced: <br /> - kmers, P2L57

      - ANI, P3L93 

      We have updated the text to better introduce these terms.

    1. code yourself.

      Hi, I understand how can we reach upright position by thinking long term instead of brute feedback cancellation. But,i am wondering what if I want to go to position of pi/2 and stays there. Can we still achieve it using optimal control if available max torque is less than mgl ?. Thank you

    1. So he follows a circle around the selenium pool, staying on the locus of all points of potential equilibrium. And unless we do something about it, he'll stay on that circle forever, giving us the good old runaround.

      This sounds like a general issue found in programming, which is that two pieces of code conflict resulting in an infinite loop or a dead lock.

    2. "We have: One, a robot may not injure a human being, or, through inaction, allow a human being to come to harm." "Right!" "Two," continued Powell, "a robot must obey the orders given it by human beings except where such orders would conflict with the First Law." "Right!" "And three, a robot must protect its own existence as long as such protection does Dot conflict with the First or Second Laws."

      These three rules are set to ensure that the actions of all AI align with human interests, assuming that the AI fully obeys them. However in real world modern AI, there is no real way to hard code instructions to a neural network, you would have to supply these rules to it as a prompt afterwards which might not always be followed.

    1. Reviewer #1 (Public Review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials according to the experts' consensus-recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on the rigor, the EEG methods is beyond my expertise as a reviewer.

      Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.

      The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.

      The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, there are certain aspects that are not clearly described in the Materials & Methods section, such as a description of the statistical analyses used for hypothesis testing.

      With regard to metabolite quantification, it is unclear why the authors chose to analyze and report metabolite values in terms of creatine ratios rather than quantification based on a water reference given that the MRS acquisition appears to support using a water reference. GABA is typically quantified using J-editing sequences as lower field strengths (~3T), and there is some evidence that the GABA signal can be reliably measured at 7T without editing, however, the authors should discuss potential limitations, such as reliability of Glu and GABA measurements with short-TE semi-laser at 7T. In addition, MRS measurements of GABA are known to be influenced by macromolecules, and GABA is often denoted as GABA+ to indicate that other compounds contribute to the measured signal, especially at a short TE and in the absence of symmetric spectral editing. A general discussion of the strengths and limitations of unedited Glu and GABA quantification at 7T is warranted given the interest of this work to researchers who may not be experts in MRS.

      Further, the single MRS voxel location is a limitation of the study as neurochemistry can vary regionally within individuals, and the putative excitatory/inhibitory imbalance in dyslexia may appear in regions outside the left temporal cortex (e.g., network-wide or in frontal regions involved in top-down executive processes). While the functional localization of the MRS voxel is a novelty and a potential advantage, it is unclear whether voxel placement based on left-lateralized reading-related neural activity may bias the experiment to be more sensitive to small, activity-related fluctuations in neurotransmitters in the CON group vs. the DYS group who may have developed an altered, compensatory reading strategy.

      As the authors note in the discussion, sex could serve as a moderator of associations between neural noise and reading abilities and should be considered in future studies.

      Appraisal:

      The authors present a thorough evaluation of the neural noise hypothesis of developmental dyslexia in a sample of adolescents and young adults using multiple methods of measuring excitatory/inhibitory imbalances as an indicator of neural noise. The authors concluded that there was no support for the neural noise hypothesis of dyslexia in their study based on null significance and Bayes factors. This conclusion is justified, and further research is called for to more broadly evaluate the neural noise hypothesis in developmental dyslexia.

      Impact:

      This study provides an exemplary foundation for the evaluation of the neural noise hypothesis of dyslexia. Other researchers may adopt the model applied in this paper to examine neural noise in various populations with/without dyslexia, or across a continuum of reading abilities, to more thoroughly examine the evidence (or lack thereof) for this hypothesis. Notably, the lack of evidence here does not rule out the possibility of a role for neural noise in dyslexia, and the authors point out that presentation with co-occurring conditions, such as ADHD, may contribute to neural noise in dyslexia. Dyslexia remains a multi-faceted and heterogenous neurodevelopmental condition, and many genetic, neurobiological, and environmental factors play a role. This study demonstrates one step toward evaluating neurobiological mechanisms that may contribute to reading difficulties.

    1. React documentation says the following about where to place the state: Sometimes, you want the state of two components to always change together. To do it, remove state from both of them, move it to their closest common parent, and then pass it down to them via props. This is known as lifting state up, and it’s one of the most common things you will do writing React code. If we think about the state of the forms, so for example the contents of a new note before it has been created, the App component does not need it for anything. We could just as well move the state of the forms to the corresponding components.

      要点

      关于 React 中组件定义的位置。注意 lifting stat up 和把状态放到子组件中两种情况。

    2. The new and interesting part of the code is props.children, which is used for referencing the child components of the component. The child components are the React elements that we define between the opening and closing tags of a component.

      要点

      在 React 中,定义在一个 component 开闭标签之间的内容就是这个组件的 child components

    1. A person may hold multiple identities such as a teacher, father, or friend. Each positionhas its own meanings and expectations that are internalized as identity. A major task ofself-development during early adolescence is the differentiation of multiple selves as afunction of social context (e.g., self with father, mother, close friends) with an awarenessof the potential contradictions. I noticed this with my own 16-year-old daughter. Whileshe was happy with her friends, she seemed to be depressed around me, or she wouldswitch from being cheerful around her friends to being nasty with her mother. Iwondered, and I believe she did as well, which one is the real her? However, as youngpeople mature cognitively, they achieve a sense of coherence in their identity

      i believe i've heard something similar to this its called code switching. people will change the way they act or talk depending on where and who there with.

    Annotators

    1. The complication comes from the fact that the execution model does not have any means for the execution of "give up ownership of the lock" to have any influence over which execution of "gain ownership of the lock" in some other timeline (thread) follows. Very often, only certain handoffs give valid results. Thus, the programmer must think of all possible combinations of one thread giving up a lock and another thread getting it next, and make sure their code only allows valid combinations.
    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      Major:

      In Melnick (2013) IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III? I am wondering whether other subtests were conducted and, if so, please include the results as well to have comprehensive comparisons with Melnick (2013).

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.” For further clarification, due to these reasons, we conducted only the visuo-spatial subtest.

      Minor:

      Comments:

      In the first revised version, we addressed the following recommendations in the 'Author response' file titled 'Recommendation for the authors.' It seems our response may not have reached you successfully. We would like to share and expand upon our response here:

      (1) Table 1 and Table supplementary 1-3 contain many correlation results. But what are the main points of these values? Which values do the authors want to highlight? Why are only p-values shown with significance symbols in Table supplementary 2??

      (1.1) What are the main points of these values?

      Thank reviewer for pointing this out. These correlations represent the relationship between behavior task (SI/BDT) and resting-state functional connectivity. It indicates that left hMT+ is involved in the efficient information integration network when it comes to BDT task. In addition, left hMT+’s surround suppression is involved in several hMT+ - frontal connectivity. Furthermore, the overlap regions between two task indicates the underlying mechanism.

      (1.2) Which values do the authors want to highlight?

      Table 1 and Table Supplementary 1-3 present the preliminary analysis results for Table 2 and Table Supplementary 4-6. So, we generally report all value. Conversely, in the Table 2 and Table Supplementary 4-6, we highlight the value which support our main conclusion.

      (1.3) Why are only p-values shown with significance symbols in Table Supplementary 2?

      Thank you for pointing this out, it is a mistake. We have revised it and delete the significance symbols.

      (2) Line 27, it is unclear to me what is "the canonical theory".

      We thank reviewer for pointing this out. We have revised “the canonical theory" to “the prevailing opinion” (line 27)

      (3) Throughout the paper, the authors use "MT+", I would suggest using "hMT+" to indicate the human MT complex, and to be consistent with the human fMRI literature.

      We thank reviewer for pointing this out. We have revised them.

      (4) At the beginning of the results section, I suggest including the total number of subjects. It is confusing what "31/36 in MT+, and 28/36 in V1" means.

      We thank reviewer for pointing this out. We have included the total number of subjects in the beginning of result section. (line 110, line 128)

      (5) Line 138, "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area". This sentence is strange because it is a well-established finding in numerous human fMRI papers. I think the authors should be more specific about what this finding implies.

      We thank reviewer for pointing this out. We have revised it to:” This finding is in line with prior results, which indicates that motion perception is associated with neural activity in hMT+ area, but not in EVC (primarily in V1)” (lines 156-158)

      (6) There are no unit labels for all x- and y-axies in Figure 1. I only see the unit for Conc is mmol per kg wet weight.

      We thank reviewer for pointing this out. Figure 1 is a schematic and workflow chart, so labels for x- and y-axes are not needed. I believe this confusion might pertain to Figure 3. In Figures 3a and 3b, the MRS spectrum does not have a standard y-axis unit as it varies based on the individual physical conditions of the scanner; it is widely accepted that no y-axis unit is used. While the x-axis unit is ppm, which indicate the chemical shift of different metabolites. In Figure 3c, the BDT represents IQ scores, which do not have a standard unit. Similarly, in Figures 3d and 3e, the Suppression Index does not have a standard unit.

      (7) Although the correlations are not significant in Figure Supplement 2&3, please also include the correlation line, 95% confidence interval, and report the r values and p values (i.e., similar format as in Figure 1C).

      We thank reviewer for pointing this out. We have revised them and include the correlation line, 95% confidence interval, r values and p values.

      (8) There is no need to separate different correlation figures into Figure Supplementary 1-4. They can be combined into the same figure.

      We thank reviewer for the suggestion. However, each correlation figure in the supplementary figures has its own specific topic and conclusion. Please notes that in the revised version, we have added a figure showing the EVC (primarily in V1) MRS scanning ROI as Supplementary Figure 1. Therefore, the figures the reviewer is concerned about are Supplementary Figure 2-5. The correlation figures in Supplementary Figure 2 indicate that GABA in EVC (primarily in V1) does not show any correlation with BDT and SI, illustrating that inhibition in EVC (primarily in V1) is unrelated to both 3D visuo-spatial intelligence and motion suppression processing. The correlations in Supplementary Figure 3 indicate that the excitation mechanism, represented by Glutamate concentration, does not contribute to 3D visuo-spatial intelligence in either hMT+ or EVC (primarily in V1). Supplementary Figure 4 validates our MRS measurements. Supplementary Figure 5 addresses potential concerns regarding the impact of outliers on correlation significance. Even after excluding two “outliers” from Figures 3d and 3e, the correlation results remain stable.

      (9) Line 213, as far as I know, the study (Melnick et al., 2013) is a psychophysical study and did not provide evidence that the spatial suppression effect is associated with MT+.

      We thank reviewer for pointing this out. It was a mistake to use this reference, and we have revised it accordingly. (line 242)

      (10) At the beginning of the results, I suggest providing more details about the motion discrimination tasks and the measurement of the BDT.

      We thank reviewer for pointing this out. We have included some brief description of task in the beginning of result section. (lines 116-120)

      (11) Please include the absolute duration thresholds of the small and large sizes of all subjects in Figure 1.

      We thank reviewer for the suggestion. We have included these results in Figure 3.

      (12) Figure 5 is too small. The items in plot a and b can be barely visible.

      We thank reviewer for pointing this out. We increase the size and resolution of the Figure.

      Reviewer #3 (Public Review):

      (1) Throughout the manuscript, hMT+ connectivity with the frontal cortex has been treated as an a priori hypothesis/space. However, there is no such motivation or background literature mentioned in the Introduction. Can the authors clarify the necessity of functional connectivity? In other words, can BOLD activity of hMT+ in the localizer task substitute for functional connectivity between hMT+ and the frontal cortex?

      (1.1) Throughout the manuscript, hMT+ connectivity with the frontal cortex has been treated as an a priori hypothesis/space. However, there is no such motivation or background literature mentioned in the Introduction. Can the authors clarify the necessity of functional connectivity?

      We thank reviewer for pointing this out. We offered additional motivation and background literature in the introduction: “Frontal cortex is usually recognized as the cognitive core region (Duncan et al., 2000; Gray et al., 2003). Strong connectivity between the cognitive regions suggests a mechanism for large-scale information exchange and integration in the brain (Barbey, 2018; Cole et al., 2012).  Therefore, the potential conjunctive coding may overlap with the inhibition and/or excitation mechanism of hMT+. Taken together, we hypothesized that 3D visuo-spatial intelligence (as measured by BDT) might be predicted by the inhibitory and/or excitation mechanisms in hMT+ and the integrative functions connecting hMT+ with frontal cortex (Figure 1a).” (lines 67-74). Additionally, we have included a whole-brain analysis for validation. Functional connectivity reveals the information exchange relationships across regions, enhancing our understanding of how hMT+ and the frontal cortex collaborate when solving visual-spatial intelligence tasks.

      (1.2) In other words, can BOLD activity of hMT+ in the localizer task substitute for functional connectivity between hMT+ and the frontal cortex?

      We thank the reviewer for this question. The localizer task was used solely for defining the hMT+ MRS scanning area. Functional connectivity was measured using resting-state fMRI. Research has shown that resting-state functional connectivity between the frontal cortex and other ROIs can further reveal the neural mechanisms underlying intelligence tasks (Song et al., 2008).

      (2) There is an obvious mismatch between the in-text description and the content of the figure:<br /> "In contrast, there was no correlation between BDT and GABA levels in V1 voxels (figure supplement 1a). Further, we show that SI significantly correlates with GABA levels in hMT+ voxels (r = 0.44, P = 0.01, n = 31, Figure 3d). In contrast, no significant correlation between SI and GABA concentrations in V1 voxels was observed (figure supplement 1b)."

      We thank reviewer for pointing this out. We have revised it. The revised version is :” In contrast, there was no correlation between BDT and GABA levels in V1 voxels (figure supplement 2a). Further, we show that SI significantly correlates with GABA levels in hMT+ voxels (r = 0.44, P = 0.01, n = 31, Figure 3d). In contrast, no significant correlation between SI and GABA concentrations in V1 voxels was observed (figure supplement 2b).” (lines 151-156)

      (3) The authors' response to my previous round of review indicated that the "V1 ROIs" covered a substantial amount of V3 (32%). Therefore, it would no longer be appropriate to call these "V1 ROIs". I'd suggest renaming them as "Early Visual Cortex (EVC) ROIs" to be more accurate. Can the authors justify why choosing the left hemisphere for visual intelligence task, which is typically believed to be right lateralized?

      (3.1) The authors' response to my previous round of review indicated that the "V1 ROIs" covered a substantial amount of V3 (32%). Therefore, it would no longer be appropriate to call these "V1 ROIs". I'd suggest renaming them as "Early Visual Cortex (EVC) ROIs" to be more accurate.

      We thank the reviewer for pointing this out. We have revised our description of the MRS scanning ROIs to Early Visual Cortex (EVC). Since the majority of our EVC ROIs are in V1 (around 70%) and almost no V2 was included, we decided to mark the EVC ROIs with the explanation "primarily in V1" for better clarification. This terminology has been widely used to better emphasize the V1-based experimental design.

      (3.2) Can the authors justify why choosing the left hemisphere for visual intelligence task, which is typically believed to be right lateralized?

      We thank the reviewer for pointing this out. The use of the left MT/V5 as a target was motivated by studies demonstrating that left MT+/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011). Therefore, we chose to use the left hMT+ as our MRS ROI and maintain consistency across different models' ROIs. Additionally, our results support the notion that the visual intelligence task is right lateralized in the frontal cortex. At the resting-fMRI level, we found that significant ROIs, where functional connectivity is highly correlated with BDT scores, are in the right frontal cortex (Figure 5a, b).

      (4) "Small threshold" and "large threshold" are neither standard descriptions, and it is unclear what "small threshold" refers to in the following figure caption. Additionally, the unit (ms) is confusing. Does it refer to timing?<br /> "(f) Peason's correlation showing significant negative correlations between BDT and small threshold."

      Thank you for pointing this out; we agree with your suggestion. We have revised the terms “small threshold” and “large threshold” to “duration threshold of small grating” and “duration threshold of large grating”, respectively. The unit (ms) refers to timing. The details are described in the methods section: “The duration was adaptively adjusted in each trial, and duration thresholds were estimated using a staircase procedure. Thresholds for large and small gratings were obtained from a 160-trial block that contained four interleaved 3-down/1-up staircases. For each participant, we computed the correct rate for different stimulus durations separately for each stimulus size. These values were then fitted to a cumulative Gaussian function, and the duration threshold corresponding to the 75% correct point on the psychometric function was estimated for each stimulus size”.

      (5) In the response letter, the authors mentioned incorporating the neural efficiency hypothesis in the Introduction, but the revised Introduction does not contain such information.

      We thank the reviewer for pointing this out. In our revised version, the second paragraph of the introduction addresses the neural efficiency hypothesis: “The “neuro-efficiency” hypothesis is one explanation for individual differences in gF (Haier et al., 1988). This hypothesis puts forward that the human brain’s ability to suppress irrelevant information leads to more efficient cognitive processing. Correspondingly, using a well-known visual motion paradigm (center-surround antagonism) (Liu et al., 2016; Tadin et al., 2003), Melnick et al found a strong link between suppression index (SI) of motion perception and the scores of the block design test (BDT, a subtest of the Wechsler Adult Intelligence Scale (WAIS), which measures the visuo-spatial component (3D domain) of gF (Melnick et al., 2013). Motion surround suppression (SI), a specific function of human extrastriate cortical region, middle temporal complex (hMT+), aligns closely with this region's activities (Gautama & Van Hulle, 2001). Furthermore, hMT+ is a sensory cortex involved in visual perception processing (3D domain) (Cumming & DeAngelis, 2001). These findings suggest that hMT+ potentially plays a significant role in 3D visuo-spatial intelligence by facilitating the efficient processing of 3D visual information and suppressing irrelevant information. However, more evidence is needed to uncover how the hMT+ functions as a core region for 3D visuo-spatial intelligence.” (lines 51-66)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      In the Code availability, it states that "this paper does not report original code". It seems weird because at least the code to reproduce the figures from the data should be provided.

      Thank you for pointing this out. Almost all figures were created using software such as DPABI, BrainNet, and GraphPad Prism 9.5, which are manually operated and do not require code adjustments. However, for the MRS fitting curve, we can provide our MATLAB code for redrawing the MRS fitting. The code has been uploaded to GitHub.

    1. No matter how the validity of tokens is checked and ensured, saving a token in the local storage might contain a security risk if the application has a security vulnerability that allows Cross Site Scripting (XSS) attacks. An XSS attack is possible if the application would allow a user to inject arbitrary JavaScript code (e.g. using a form) that the app would then execute. When using React sensibly it should not be possible since React sanitizes all text that it renders, meaning that it is not executing the rendered content as JavaScript. If one wants to play safe, the best option is to not store a token in local storage. This might be an option in situations where leaking a token might have tragic consequences. It has been suggested that the identity of a signed-in user should be saved as httpOnly cookies, so that JavaScript code could not have any access to the token. The drawback of this solution is that it would make implementing SPA applications a bit more complex. One would need at least to implement a separate page for logging in. However, it is good to notice that even the use of httpOnly cookies does not guarantee anything. It has even been suggested that httpOnly cookies are not any safer than the use of local storage. So no matter the used solution the most important thing is to minimize the risk of XSS attacks altogether.

      延伸

      Cross Site Scripting 注入攻击 httpOnly cookies -> 让 SPA 不好实现

  4. vector.geospatial.science vector.geospatial.science
    1. AbstractAs single-cell sequencing data sets grow in size, visualizations of large cellular populations become difficult to parse and require extensive processing to identify subpopulations of cells. Managing many of these charts is laborious for technical users and unintuitive for non-technical users. To address this issue, we developed TooManyCellsInteractive (TMCI), a browser-based JavaScript application for visualizing hierarchical cellular populations as an interactive radial tree. TMCI allows users to explore, filter, and manipulate hierarchical data structures through an intuitive interface while also enabling batch export of high-quality custom graphics. Here we describe the software architecture and illustrate how TMCI has identified unique survival pathways among drug-tolerant persister cells in a pan-cancer analysis. TMCI will help guide increasingly large data visualizations and facilitate multi-resolution data exploration in a user-friendly way.

      A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giae056), where the paper and peer reviews are published openly under a CC-BY 4.0 license. These peer reviews were as follows:

      Reviewer 2: Mehmet Tekman

      PAPER: TOOMANYCELLSINTERACTIVE REVIEW


      Table of Contents


      1. Using the Application .. 1. Positive Notes: ..... 1. General UI and Execution .. 2. Negative Notes: ..... 1. Controls ..... 2. Documentation ..... 3. Feature Overlays:
      2. Docker / Postgreseql
      3. Ethos of the Introduction

      The manuscript reads very well, and the quality of the language is good.

      This review tests the application itself, and makes some comment about some ambiguous wording in the introduction

      1 Using the Application

      I tested the Interactive Display at https://tmci.schwartzlab.ca/

      1.1 Positive Notes: ~~~~~~~~~~~~~~~~~~~

      1.1.1 General UI and Execution

      The general interactivity of the UI was very impressive and expressive. I liked that every aspect including the pies and the lines themselves could be coloured and scaled.

      I found the feature overlays and pruning history stack very intuitive, as well as rolling back the history on each state change.

      The choice of D3 was a good one, enabling very pleasing animations enter/exit/update state animations, as well as ease of SVG export.

      The inclusion of a command line `generate-svg.sh' for rendering without a browser is very useful.

      1.2 Negative Notes: ~~~~~~~~~~~~~~~~~~~

      1.2.1 Controls

      At first I wasn't able to find the controls, despite having the page open to 1330px wide, but then I realised I had to scroll down outside of the SVG container to find them.

      As mentioned in a recently opened PR, there's a CSS media rule `@media only screen and (min-width:1238px)' taking place, that looks strange on my Firefox 122 on Linux. Maybe better media rules for screens in the 700-900px wide range might be useful, as well as making separate rules for smartphones.

      1.2.2 Documentation

      Typescript is a good language to develop in, and lends itself naturally to documentation, though I did notice a distinct lack of documentation above many functions in the code base.

      Perhaps write a bit more documentation to make the code base accessible to new collaborators?

      Otherwise, the quality of code looked good, and the license was GPLv3 which is always welcome.

      1.2.3 Feature Overlays:

      I found the feature overlays super useful, though limited by the number of colours. These appear to be limited to one colour for all genes.

      Very useful for showing multiple genes, but it would be nice to have the ability to colour the expression of different genes with different colours, at least for < 3 genes of interest (due to the difficult colour mixing constraints).

      2 Docker / Postgreseql

      It is not clear to me what the Node server and PostgresQL database run in the docker container are actually doing, other than fetching cell metadata and marking user subsets from pruning actions.

      Could this not have been implemented in Javascript (e.g. IndexedDB)? Why does the data need to be hosted, if it's the user loading it from their own machine anyway. Is the idea that the visualization should be shared by multiple users who will be accessing the same dataset?

      If this is a single-user analysis, then why not keep all the computation and retrieval on the client-side?

      The reason I'm asking this is because I believe that by keeping the database operations within Javascript, you could run the system within a single Conda environment, or even with pure Node lockfile.

      I can understand needing a Docker for development purposes, but to actually run the software itself seems excessive. Is it not possible to separate the client and server into Conda? That way, one could then include the vizualisation (as the end stage) in bioinformatic pipelines.

      3 Ethos of the Introduction

      This is a small wording complaint in the Introduction section.

      TooManyCellsInteractive (TMCI) presents itself as a solution to the conventional scRNA-seq workflows that prepare the data via the usual: data → PCA → UMAP→ kNN → clustering stages.

      TMCI hints that it as an alternative solution to this workflow, but from what I can see in the documentation, it appears to require a cluster_tree.json' file, one that is produced only by the TooManyCells (TMC) pipeline.

      Unless I've misunderstood, it's not accurate to say that TMCI is an alternative to these conventional workflows, but that TMC is.

      TMCI simply consumes the files output by TMC and renders them. If what I'm saying is true, then the introduction should reflect that.

    1. une première grid avec les images dont le nom commence par "paysage

      J'ai commencé le bout de code par .paysage et .portraits comme dans l'énoncé de l'exercice mais il n'a pas pas marché de tout mais pourtant tout le reste etait bien. C'est seulement quand j'ai changé pour .grid-paysage et grid-portrait que ça a marché. Quelqu'un peut m'expliquer svp.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weakness 1. Enhancing Reproducibility and Robustness: To enhance the reproducibility and robustness of the findings, it would be valuable for the authors to provide specific numbers of animals used in each experiment. Explicitly stating the penetrance of the rod-like neurocranial shape in dact1/2-/- animals would provide a clearer understanding of the consistency of this phenotype. 

      In Fig. 3 and Fig. 4 animal numbers were added to the figure and figure legend (line 1111). In Fig. 5 animal numbers were added to the figure. We now state that dact1/2-/- animals exhibit the rod-like neurocranial shape that is completely penetrant (Line 260). 

      Weakness 2. Strengthening Single-Cell Data Interpretation: To further validate the single-cell data and strengthen the interpretation of the gene expression patterns, I recommend the following: 

      -Provide a more thorough explanation of the rationale for comparing dact1/2 double mutants with gpc4 mutants.

      -Employ genotyping techniques after embryo collection to ensure the accuracy of animal selection based on phenotype and address the potential for contamination of wild-type "delayed" animals.

      -Supplement the single-cell data with secondary validation using RNA in situ or immunohistochemistry techniques. 

      An explanation of our rationale was added to the results section (Lines 391403) and a summary schematic was added to Figure 6 (panel A).

      Genotyping of the embryos was not possible but quality control analysis by considering the top 2000 most variable genes across the dataset showed good clustering by genotype, indicating the reproducibility of individuals in each group (See Supplemental Fig. 4).

      The gene expression profiles obtained in our single-cell data analysis for gpc4, dact1, and dact2 correlate closely with our in situ hybridization analyses. Further, our data is consistent with published zebrafish single-cell data. We validated our finding of increased capn8 expression in dact1/2 mutants by in situ hybridization. Therefore we are confident in the robustness of our single-cell data.  

      Weakness 3. Directly Investigating Non-Cell-Autonomous Effects: To directly assess the proposed non-cell-autonomous role of dact1/2, I suggest conducting transplantation experiments to examine the ability of ectodermal/neural crest cells from dact1/2 double mutants to form wild-type-like neurocranium.  

      The reviewer’s suggestion is an excellent experiment and something to consider for future work. Cell transplant experiments between animals of specific genotypes are challenging and require large numbers. It is not possible to determine the genotype of the donor and recipient embryos at the early timepoint of 1,000 cell stage where the transplants would have to be done in the zebrafish. So that each transplant will have to be carried out blind to genotype from a dact1+/-; dact2+/- or dact1-/-; dact2+/- intercross and then both animals have to be genotyped at a subsequent time point, and the phenotype of the transplant recipient be analyzed. While possible, this is a monumental undertaking and beyond the scope of the current study.

      Weakness 4. Further Elucidating Calpain 8's Role: To strengthen the evidence supporting the critical role of Calpain 8, I recommend conducting overexpression experiments using a sensitized background to enhance the statistical significance of the findings. 

      We thank the reviewer for their suggestion and have now performed capn8 overexpression experiments in embryos generated from dact1/2 double heterozygous breeding. We found a statistically significant effect of capn8 overexpression in the dact1+/-,dact2+/- fish (Lines 462-464 and Fig. 8C,D). 

      Minor Comments:  

      Comment: Creating the manuscript without numbered pages, lines, or figures makes orientation and referencing harder.  

      Revised

      Comment: Authors are inconsistent in the use of font and adverbs, which requires extra effort from the reader. ("wntIIf2 vs wnt11f2 vs wnt11f2l"; "dact1/2-/- vs dact1/dact2 -/-"; "whole-mount vs wholemount vs whole mount").  

      Revised throughout.

      Comment: Multiple sentences in the "Results" belong to the "Materials and Methods" or the "Discussion" section. 

      We have worked to ensure that sentences are within the appropriate sections of the manuscript.

      Comment: Abstract:

      "wnt11f2l" should be "wnt11f2"  

      Revised (Line 24).

      Comment: Main text:

      Page 5 - citation Waxman, Hocking et al. 2004 is used 3x without interruption any other citation. 

      Revised (Line 112).

      Page 9 - "dsh" mutant is mentioned once in the whole manuscript - is this a mistake?

      Revised, Rewritten (Line 196).

      Page 10 - Fig 2B does not show ISH.

      Revised (Line 229).

      Page 11 - "kyn" mutant is mentioned here for the first time but defined on page 15.

      Revised (Line 245). Now first described on page 4.

      Page 14 - "cranial CNN" should be CNCC.

      Revised. (Line 334)

      Page 16 - dact1/dact2/gpc4: Fig. 5C is used but it should be Fig 5E.

      Revised. (Line 381)

      Page 18 - dact1/2-/- or dact1-/-, dact2-/-. 

      Revised. (Line 428)

      Comment: Methods:

      Page 24 - ZIRC () "dot" is missing. ChopChop ")" is missing. "located near the 5' end of the gene" - In the Supplementary Figure 1 looks like in the middle of the gene.

      Revised. (Lines 600, 609, 611, respectively).

      Page 25 - WISH -not used in the main text.

      Revised. (Line 346).

      Page 26 - 4% (v/v) formaldehyde; at 4C - 4{degree sign}C; 50% (v/v) ethanol; 3% (w/v) methylcellulose.

      Revised. (Lines 659, 660, 662).

      Page 27 - 0.1% (w/v) BSA. 

      Revised. (Line 668).

      Comment: Discussion:

      The overall discussion requires more references and additional hypotheses. On page 20, when mentioning 'as single mutants develop normally,' does this refer to the entire animals or solely the craniofacial domain? Are these mutants viable? If they are, it's crucial to discuss this phenomenon in relation to prior morpholino studies and genetic compensation.

      Observing how the authors interpret previously documented changes in nodal and shh signaling would be beneficial. While Smad1 is discussed, what about other downstream genes? Is shh signaling altered in the dact1/2 double mutants? 

      We have revised the Discussion to include more references (Lines 473, 476, 483, 488, 491, 499, 501, 502, 510, 515, 529, 557, 558) and additional hypotheses (Lines 503-505, 511-519, 522-525). We have added more specific information regarding the single mutants (Lines 270-275, 480-493, Fig. S3). We have added discussion of other downstream genes, including smad1 (Lines 561-572) and shh (Lines 572-580).

      Comment: Figures:

      Appreciating differences between specimens when eyes were or were not removed is quite hard.

      Yes this was an unfortunate oversight, however, the key phenotype is the EP shown in the dissections.

      Fig 1. - wntIIf2 vs wnt11f2? C - Thisse 2001 - correct is Thisse et al. 2001.

      Revised typo in Fig 1. (And Line 1083).

      Fig 1E: These plots are hard to understand without previous and detailed knowledge. Authors should include at least some demarcations for the cephalic mesoderm, neural ectoderm, mesenchyme, and muscle. Missing color code.

      We have moved this data to supplementary figure S1 and have added labels of the relevant cell types and have added the color code.

      Comment:- Fig 2 - In the legend for C - "wildtype and dact2-/- mutant" and "dact1/2 mutant"; in the picture is dact1-/-, dact2-/-.

      Revised (Line 1105).

      Fig 2 - B - it is a mistake in 6th condition dact1: 2x +/+, heterozygote (+/-) is missing.

      Revised Figure 2B.

      Fig 4. - Typo in the legend: dact1/"t"2-/- .

      Revised. (Line 1127).

      Fig 8C - In my view, when the condition gfp mRNA says "0/197, " none of the animals show this phenotype. I assume the authors wanted to say that all the animals show this phenotype; therefore, "197/197" should be used.

      We have removed this data from the figure as there were concerns by the reviewers regarding reproducibility. 

      Fig S1 - Missing legend for the 28 + 250, 380 + 387 peaks? RT-qPCR - is not mentioned in the Materials and Methods. In D - ratio of 25% (legend), but 35% (graph).

      Revised.(Line 1203, Line 625, Line 1213, respectively).

      Fig S2 - The word "identified" - 2x in one sentence. 

      Revised. (Line 1230).

      Reviewer #2 (Public Review):

      Weakness(1) While the qualitative data show altered morphologies in each mutant, quantifications of these phenotypes are lacking in several instances, making it difficult to gauge reproducibility and penetrance, as well as to assess the novel ANC forms described in certain mutants.  

      In Fig. 3 and Fig. 4 animal numbers were added to the figure legend. In Fig. 5 animal numbers were added to the figure to demonstrate reproducibility. We now state that dact1/2-/- animals exhibit the rod-like neurocranial shape that is completely penetrant (Line 260). As the altered morphologies that we report are qualitatively significant from wildtype we did not find it necessary to make quantitative measurements. For experiments in which it was necessary to in-cross triple heterozygotes (Fig 3, Fig. 5), we dissected and visually analyzed the ANC of at least 3 compound mutant individuals. At least one individual was dissected for the previously published or described genotypes/phenotypes (i.e. wt, wntllf2-/-, dact1/2-/-, gpc4-/-, wls/-). We realize quantitative measurements may identify subtle differences between genotypes. However, the sheer number of embryos needed to generate these relatively rare combinatorial genotypes and the amount of genotyping required prevented quantitative analyses. 

      Weakness 2) Germline mutations limit the authors' ability to study a gene's spatiotemporal functional requirement. They therefore cannot concretely attribute nor separate early-stage phenotypes (during gastrulation) to/from late-stage phenotypes (ANC morphological changes). 

      We agree that we cannot concretely attribute nor separate early and latestage phenotypes. Conditional mutants to provide temporal or cell-specific analysis are beyond the scope of this work. Here we speculate based on evidence obtained by comparing and contrasting embryos with grossly similar early phenotypes and divergent late-stage phenotypes. We believe our findings contribute to the existing body of literature on zebrafish mutants with both early convergent extension defects and craniofacial abnormalities.   

      Weakness (3) Given that dact1/2 can regulate both canonical and non-canonical wnt signaling, this study did not specifically test which of these pathways is altered in the dact1/2 mutants, and it is currently unclear whether disrupted canonical wnt signaling contributes to the craniofacial phenotypes, even though these phenotypes are typical non-canonical wnt phenotypes. 

      Previous literature has attributed canonical wnt, non-canonical wnt, and nonwnt functions to dact, and each of these likely contributes to the dact mutant phenotype (Lines 87-89). We performed cursory analyses of tcf/lef:gfp expression in the dact mutants and did not find evidence to support further analysis of canonical wnt signaling in these fish. Single-cell RNAseq did not identify differential expression of any canonical or non-canonical wnt genes in the dact1/2 mutants.

      Further research is needed to parse out the intracellular roles of dact1 and dact2 in response to wnt and tgf-beta signaling. Here we find that dact may also have a role in calcium signaling, and further experiments are needed to elaborate this role.      

      Weakness (4) The use of single-cell RNA sequencing unveiled genes and processes that are uniquely altered in the dact1/2 mutants, but not in the gpc4 mutants during gastrulation. However, how these changes lead to the manifested ANC phenotype later during craniofacial development remains unclear. The authors showed that calpain 8 is significantly upregulated in the mutant, but the fact that only 1 out of 142 calpainoverexpressing animals phenocopied dact1/2 mutants indicates the complexity of the system. 

      To further test whether capn8 overexpression may contribute to the ANC phenotype we performed overexpression experiments in the resultant embryos of dact1/dact2 double het incross. We found the addition of capn8 caused a small but statistically significant occurrence of the mutant phenotype in dact1/2 double heterozygotes (Fig.8D). We agree with the reviewer that our results indicate a complex system of dysregulation that leads to the mutant phenotype. We hypothesize that a combination of gene dysregulation may be required to recapitulate the mutant ANC phenotype. Further, as capn8 activity is regulated by calcium levels, overexpression of the mRNA alone likely has a small effect on the manifestation of the phenotype. 

      Weakness (5) Craniofacial phenotypes observed in this study are attributed to convergent extension defects but convergent extension cell movement itself was not directly examined, leaving open if changes in other cellular processes, such as cell differentiation, proliferation, or oriented division, could cause distinct phenotypes between different mutants. 

      Although convergent extension cell movements were not directly examined, our phenotypic analyses of the dact1/2 mutant are consistent with previous literature where axis extension anomalies were attributed to defects in convergent extension (Waxman 2004, Xing 2018, Topczewski 2001). We do not attribute the axis defect to differentiation differences as in situ analyses of established cell type markers show the existence of these cells, only displaced relative to wildtype (Figure 1). We agree that we cannot rule out a role for differences in apoptosis or proliferation however, we did not detect transcriptional differences in dact1/2 mutants that would indicate this in the single-cell RNAseq dataset. Defects in directed division are possible, but alone would not explain that dact1/2 mutant phenotype, particularly the widened dorsal axis (Figure 1).

      Major comments:  

      Comment (1) The author examined and showed convergent extension phenotype (CE) during body axis elongation in dact1/dact2-/- homozygous mutants. Given that dact2-/- single mutants also displayed shortened axis, the authors should either explain why they didn't analyze CE in dact2-/- (perhaps because that has been looked at in previously published dact2 morphants?) or additionally show whether CE phenotypes are present in dact1 and dact2 single mutants.  

      The authors should quantify the CE phenotype in both dact2-/- single mutants and dact1/dact2-/- double mutants, and examine whether the CE phenotypes are exacerbated in the double mutants, which may lend support to the authors' idea that dact1 can contribute to CE. The authors stated in the discussion that they "posit that dact1 expression in the mesoderm is required for dorsal CE during gastrulation through its role in noncanonical Wnt/PCP signaling". However, no evidence was presented in the paper to show that dact1 influences CE during body axis elongation.  

      Because any axis shortening in shortening in dact2-/- single mutants was overcome during the course of development and at 5 dpf there was no noticeable phenotype, we did not analyze the single mutants further.  

      We have added data to demonstrate the resulting phenotype of each combinatorial genotype to provide a more clear and detailed description of the single and compound mutants (Fig. S3). 

      Our hypothesis that dact1 may contribute to convergent extension is based on its apparent ability to compensate (either directly or indirectly) for dact2 loss in the dact2-/- single mutant. 

      Comment (2) Except in Fig. 2, I could not find n numbers given in other experiments. It is therefore unclear if these mutant phenotypes were fully or partially penetrant. In general, there is also a lack of quantifications to help support the qualitative results. For example, in Fig. 4, n numbers should be given and cell movements and/or contributions to the ANC should be quantified to statistically demonstrate that the second stream of CNCC failed to contribute to the ANC.  

      Similarly, while the fan-shaped and the rod-shaped ANCs are very distinct, the various rod-shaped ANCs need to be quantified (e.g. morphometry or measurements of morphological features) in order for the authors to claim that these are "novel ANC forms", such as in the dact1/2-/-, gpc4/dact1/2-/-, and wls/dact1/2-/- mutants (Fig. 5).  

      We have added n numbers for each experiment and stated that the rod-like phenotype of the dact1/2-/- mutant was fully penetrant. 

      Regarding CNCC experiments, we repeated the analysis on 3 individual controls and mutants and did not find evidence that CNCC migration was directly affected in the dact1/2 mutant. Rather, differences in ANC development are likely secondary to defects in floor plate and eye field morphometry. Therefore we did not do any further analyses of the CNCCs.

      Regarding figure 5, we have added n numbers. We dissected and analyzed a minimum of three triple mutants (dact1/2-/-,gpc4-/- and dact1/2-/-,wls-/-) and numerous dact1/s double mutants and found that the triple mutant ANC phenotype was consistent and recognizably different enough from the dact1/2-/-, or gpc4 or wls single mutant that morphometry measurements were not needed. Further, the triple mutant phenotype (narrow and shortened) appears to be a simple combination of dact1/2 (narrow) and gpc4/wls (shortened) phenotypes. As we did not find evidence of genetic epistasis, we did not analyze the novel ANC forms further.

      Comment (3): The authors have attributed the ANC phenotypes in dact1/2-/- to CE defects and altered noncanonical wnt signaling. However, no evidence was presented to support either. The authors can perhaps utilize diI labelling, photoconversionmediated lineage tracing, or live imaging to study cell movement in the ANC and compare that with the cell movement change in the gpc4-/- , and gpc4/dact1/2-/- mutants in order to first establish that dact1/2 affect CE and then examine how dact1/2 mutations can modulate the CE phenotypes in gpc4-/- mutants.  

      Concurrently, given that dact1 and dact2 can affect (perhaps differentially) both canonical and non-canonical wnt signaling, the authors are encouraged to also test whether canonical wnt signaling is affected in the ANC or surrounding tissues, or at minimum, discuss the potential role/contribution of canonical wnt signaling in this context.  

      Given the substantial body of research on the role of noncanonical wnt signaling and planar cell polarity pathway on convergent extension during axis formation (reviewed by Yang and Mlodzik 2015, Roszko et al., 2009) and the resulting phenotypes of various zebrafish mutants (i.e. Xing 2018, Topczewski 2001), including previous research on dact1 and 2 morphants (Waxman 2004), we did not find it necessary to analyze CE cell movements directly.  

      Our finding that CNCC migration was not defective in the dact1/2 mutants and the knowledge that various zebrafish mutants with anterior patterning defects (slb, smo, cyc) have a similar craniofacial abnormality led us to conclude that the rod-like ANC in the dact1/2 mutant was secondary to an early patterning defect (abnormal eye field morphology). Therefore, testing dact1/2 and convergent extension or wnt signaling in the ANC itself was not an aim of this paper.  

      Comment (4) The authors also have not ruled out other possibilities that could cause the dact1/2-/- ANC phenotype. For example, increased cell death or reduced proliferation in the ANC may result in the phenotype, and changes in cell fate specification or differentiation in the second CNCC stream may also result in their inability to contribute to the ANC. 

      We agree that we cannot rule out whether cell death or proliferation is different in the dact1/2 mutant ANC. However, because we do not find the second CNCC stream within the ANC, this is the most likely explanation for the abnormal ANC shape. Because the first stream of CNCC are able to populate the ANC and differentiate normally, it is most likely that the inability of the second stream to populate the ANC is due to steric hindrance imposed by the abnormal cranial/eye field morphology. These hypotheses would need to be tested, ideally with an inducible dact1/2 mutant, however, this is beyond the scope of this paper.     

      Comment (5) The last paragraph of the section "Genetic interaction of dact1/2 with Wnt regulators..." misuses terms and conflates phenotypes observed. For instance, the authors wrote "dact2 haploinsuffciency in the context of dact1-/-; gpc4-/- double mutant produced ANC in the opposite phenotypic spectrum of ANC morphology, appearing similar to the gpc4-/- mutant phenotype". However, if heterozygous dact2 is not modulating phenotypes in this genetic background, its function is not "haploinsuffcient". The authors then said, "These results show that dact1 and dact2 do not have redundant function during craniofacial morphogenesis, and that dact2 function is more indispensable than dact1". However this statement should be confined to the context of modulating gpc4 phenotypes, which is not clearly stated. 

      Revised (Lines 380, 382).   

      Comment (6) For the scRNA-seq analysis, the authors should show the population distribution in the UMAP for the 3 genotypes, even if there are no obvious changes. The authors are encouraged, although not required, to perform pseudotime or RNA velocity analysis to determine if differentiation trajectories are changed in the NC populations, in light of what they found in Fig. 4. The authors can also check the expression of reporter genes downstream of certain pathways, e.g. axin2 in canonical wnt signaling, to query if these signaling activities are changed (also related to point #3 above). 

      We have added population distribution data for the 3 genotypes to Supplemental Figure 4. Although RNA velocity analysis would be an interesting additional analysis, we would hypothesize that the NC population is not driving the differences in phenotype. Rather these are likely changes in the anterior neural plate and mesoderm. 

      Comment (7) While the phenotypic difference between gpc4-/- and dact1/2-/- are in the ANC at a later stage, ssRNA-seq was performed using younger embryos. The authors should better explain the rationale and discuss how transcriptomic differences in these younger embryos can explain later phenotypes. Importantly, dact1, dact2, and capn8 expression were not shown in and around the ANC during its development and this information is crucial for interpreting some of the results shown in this paper. For example, if dact1 and dact2 are expressed during ANC development, they may have specific functions during that stage. Alternatively, if dact1 and dact2 are not expressed when the second stream CNCCs are found to be outside the ANC, then the ANC phenotype may be due to dact1/2's functions at an earlier time point. The author's statement in the discussion that "embryonic fields determined during gastrulation effect the CNCC ability to contribute to the craniofacial skeleton" is currently speculative. 

      We have reworded our rationale and hypothesis to increase clarity (Lines 391-405). We believe that the ANC phenotype of the dact1/2 mutants is secondary to defective CE and anterior axis lengthening, as has been reported for the slb mutant (Heisenberg 1997, 2000). We utilized the gpc4 mutant as a foil to the dact1/2 mutant, as the gpc4 mutant has defective CE and axis extension without the same craniofacial phenotype.

      We have added dact1 and dact2 WISH of 24 and 48 hpf (Fig1. D,E) to show expression during ANC development. 

      Comment (8) The functional testing of capn8 did not yield a result that would suggest a strong effect, as only 1 in 142 animals phenocopied dact1/2. Therefore, while the result is interesting, the authors should tone down its importance. Alternatively, the authors can try knocking down capn8 in the dact1/2 mutants to test how that affects the CE phenotype during axis elongation, as well as ANC morphogenesis. 

      As overexpression of capn8 in wildtype animals did not result in a significant phenotype, we tested capn8 overexpression in compound dact1/2 mutants as these have a sensitized background. We found a small but statistically significant effect of exogenous capn8 in dact1+/-,dact2+/- animals. While the effect is not what one would expect comparing to Mendelian genetic ratios, the rod-like ANC phenotype is an extreme craniofacial dysmorphology not observed in wildtype or mRNA injected embryos hence significant. The experiment is limited by the available technology of over-expressing mRNA broadly without temporal or cell specificity control. It is possible that if capn8 over-expression was restricted to specific cells (floor plate, notochord or mesoderm) and at the optimal time period during gastrulation/segmentation that the aberrant ANC phenotype would be more robust. We agree with the reviewer that although the finding of a new role for capn8 during development is interesting, its importance in the context of dact should be toned down and we have altered the manuscript accordingly (Lines 455-467).  

      Comment (9) A difference between the two images in Fig. 8B is hard to distinguish.

      Consider showing flat-mount images. 

      We have added flat-mount images to Fig. 8B

      Minor comments:

      Comment (1) wnt11f2 is spelled incorrectly in a couple of places, e.g. "wnt11f2l" in the abstract and "wntllf2" in the discussion. 

      Revised throughout.

      Comment (2) For Fig. 1D, the white dact1 and yellow dact2 are hard to distinguish in the merged image. Consider changing one of their colors to a different one and only merge dact1 and dact2 without irf6 to better show their complementarity.  

      We agree with the reviewer that the expression patterns of dact1 and dact2 are difficult to distinguish in the merged image. We have added outlines of the cartilage elements to the images to facilitate comparisons of dact1 and dact2 expression (Fig 1F). 

      Comment (3) For Fig. 1E, please label the clusters mentioned in the text so readers can better compare expressions in these cell populations.  

      We have moved this data to supplementary figure S1 and have added labels.

      Comment (4) The citing and labelling of certain figures can be more specific. For example, Fig. S1A, B, and Fig. S1C should be used instead of just Fig. S1 (under the section titled dact1 and dact2 contribute to axis extension...". Similarly, Fig. 4 can be better labeled with alphabets and cited at the relevant places in the text.  

      We have modified the labeling of the figures according to the reviewer’s suggestion (Fig S2 (previously S1), Fig4) and have added reference to these labels in the text (Lines 202, 204, 212, 328, 334, 336). 

      Comment (5) For Fig. 2B, the (+/+,-/-) on x-axis should be (+/-,-/-).  

      Revised in Figure 2B.

      Comment (6) Several figures are incorrectly cited. Fig. 2C is not cited, and the "Fig. 2C" and "Fig. 2D" cited in the text should be "Fig. 2D" and "Fig. 2E" respectively. Similarly, Fig. 5C and D are not cited in the text and the cited Fig. 5C should be 5E. The VC images in Fig. 5 are not talked about in the text. Finally, Fig. 7C was also not mentioned in the text.  

      We have corrected the labeling and have added descriptions of each panel in the Results (Fig.2 Line 231, 237, 242, Fig 5 Line 373, 381, Fig 7 line 431). 

      Comment (7) In the main text, it is indicated that zebrafish at 3ss were used for ssRNAseq, but in the figure legend, it says 4ss. 

      Revised (Line 682)

      Comment (8) No error bars in Fig. S1B and the difference between the black and grey shades in Fig. S1D is not explained.  

      Error bars are not included in the graphs of qPCR results (now Fig S2C) as these are results of a pool of 8 embryos performed one time. We have added a legend to explain the gray vs. black bars (now Fig S2E). 

      Reviewer #3 (Public Review):  

      Weaknesses: The hypotheses are very poorly defined and misinterpret key previous findings surrounding the roles of wnt11 and gpc4, which results in a very confusing manuscript. Many of the results are not novel and focus on secondary defects. The most novel result of overexpressing calpain8 in dact1/2 mutants is preliminary and not convincing.  

      We apologize for not presenting the question more clearly. The Introduction was revised with particular attention to distinguish this work using genetic germline mutants from prior morpholino studies. Please refer to pages 4-5, lines 106-121.

      Weakness 1) One major problem throughout the paper is that the authors misrepresent the fact that wnt11f2 and gpc4 act in different cell populations at different times. Gastrulation defects in these mutants are not similar: wnt11 is required for anterior mesoderm CE during gastrulation but not during subsequent craniofacial development while gpc4 is required for posterior mesoderm CE and later craniofacial cartilage morphogenesis (LeClair et al., 2009). Overall, the non-overlapping functions of wnt11 and gpc4, both temporally and spatially, suggest that they are not part of the same pathway.  

      We have reworded the text to add clarity. While the loss of wnt11 versus the loss of gpc4 may affect different cell populations, the overall effect is a shortened body axis. We stressed that it is this similar impaired axis elongation phenotype but discrepant ANC morphology phenotypes in the opposite ends of the ANC morphologic spectrum that is very interesting and leads us to investigate dact1/2 in the genetic contexts of wnt11f2 and gpc4.  Pls refer to page 4, lines 73-84. Further, the reviewer’s comment that wnt11 and gpc4 are spatially and temporally distinct is untested. We think the reviewer’s claim of gpc4 acting in the posterior mesoderm refers to its requirement in the tailbud (Marlow 2004). However this does not exclude gpc4 from acting elsewhere as well. Further experiments would be necessary. Both wnt11f2 and gpc4 regulate non-canonical wnt signaling and are coexpressed during some points of gastrulation and CF development (Gupta et al., 2013; Sisson 2015). This data supports the possibility of overlapping roles. 

      Weakness 2) There are also serious problems surrounding attempts to relate single-cell data with the other data in the manuscript and many claims that lack validation. For example, in Fig 1 it is entirely unclear how the Daniocell scRNA-seq data have been used to compare dact1/2 with wnt11f2 or gpc4. With no labeling in panel 1E of this figure these comparisons are impossible to follow. Similarly, the comparisons between dact1/2 and gpc4 in scRNA-seq data in Fig. 6 as well as the choices of DEGs in dact1/2 or gpc4 mutants in Fig. 7 seem arbitrary and do not make a convincing case for any specific developmental hypothesis. Are dact1 and gpc4 or dact2 and wnt11 coexpressed in individual cells? Eyeballing similarity is not acceptable.  

      We have moved the previously published Daniocell data to Figure S1 and have added labeling. These data are meant to complement and support the WISH results and demonstrate the utility of using available public Daniocell data. Please recommend how we can do this better or recommend how we can remediate this work with specific comment. 

      Regarding our own scRNA-seq data, we have added rationale (line 391-403) and details of the results to increase clarity (Lines 419-436). We have added a panel to Figure 6 (panel A) to help illustrate or rationale for comparing dact1/2 to gpc4 mutants to wt. The DEGs displayed in Fig.7A are the top 50 most differentially expressed genes between dact1/2 mutants and WT (Figure 7 legend, line 422-424).   

      We have looked at our scRNA-seq gene expression results for our clusters of interest (lateral plate mesoderm, paraxial mesoderm, and ectoderm). We find dact1, dact2, and gpc4 co-expression within these clusters. Knowing whether these genes are coexpressed within the same individual cell would require going back and analyzing the raw expression data. We do not find this to be necessary to support our conclusions. The expression pattern of wnt11f2 is irrelevant here.   

      Weakness 3) Many of the results in the paper are not novel and either confirm previous findings, particularly Waxman et al (2004), or even contradict them without good evidence. The authors should make sure that dact2 loss-of-function is not compensated for by an increase in dact1 transcription or vice versa. Testing genetic interactions, including investigating the expression of wnt11f2 in dact1/2 mutants, dact1/2 expression in wnt11f2 mutants, or the ability of dact1/2 to rescue wnt11f2 loss of function would give this work a more novel, mechanistic angle.

      We clarified here that the prior work carried out by Waxman using morppholinos, while acceptable at the time in 2004, does not meet the rigor of developmental studies today which is to generate germline mutants. The reviewer’s acceptance of the prior work at face value fails to take the limitation of prior work into account. Further, the prior paper from Waxman et al did not analyze craniofacial morphology other than eyeballing the shape of the head and eyes. Please compare the Waxman paper and this work figure for figure and the additional detail of this study should be clear. Again, this is by no means any criticism of prior work as the prior study suffered from the technological limitations of 2004, just as this study also is the best we can do using the tools we have today. Any discrepancies in results are likely due to differences in morpholino versus genetic disruption and most reviewers would favor the phenotype analysis from the germline genetic context. We have addressed these concerns as objectively as we can in the text (Lines 482-493). The fact that dact1/2 double mutants display a craniofacial phenotype while the single mutants do not, suggests compensation (Lines 503-505), but not necessarily at the mRNA expression level (Fig. S2C). 

      This paper tests genetic interaction through phenotyping the wntll/dact1/dact2 mutant.

      Our results support the previous literature that dact1/2 act downstream of wnt11 signaling. There is no evidence of cross-regulation of gene expression. We do not expect that changes in wnt11 or dact would result in expression changes in the others.

      RNA-seq of the dact1/2 mutants did not show changes in wnt11 gene expression. Unless dact1 and/or dact2 mRNA are under expressed in the wnt11 mutant, we would not expect a rescue experiment to be informative. And as wnt11 is not a focus of this paper, we have not performed the experiment.  

      Weakness 4) The identification of calpain 8 overexpression in Dact1/2 mutants is interesting, but getting 1/142 phenotypes from mRNA injections does not meet reproducibility standards.

      As the occurrence of the mutant phenotype in wildtype animals with exogenous capn8 expression was below what would meet reproducibility standards, we performed an additional experiment where capn8 was overexpressed in embryos resulting from dact1/dact2 double heterozygotes incross (Fig. 8). We reasoned that an effect of capn8 overexpression may be more robust on a sensitized background. We found a statistically significant effect of capn8 in dact1/2 double heterozygotes, though the occurrence was still relatively rare (6/80). These data suggest dysregulation of capn8 contributes to the mutant ANC phenotype, though there are likely other factors involved. 

      Comment: The manuscript title is not representative of the findings of this study.  

      We revised the title to strictly describe that we generated and carried out genetic analysis in loss of function compound mutants (Genetic requirement) and that we found capn8 was important which modified this requirement.

      Introduction: p.4:

      Comment: Anterior neurocranium (ANC) - it has to be stated that this refers to the combined ethmoid plate and trabecular cartilages. 

      Thank you, we agree that the ANC and ethmoid plate terminology has been confusing in the literature and we should endeavor to more clearly describe that the phenotypes in question are all in the ethmoid plate and the trabeculae are not affected. ANC has been replaced with ethmoid plate (EP) throughout the manuscript and figures. We also describe that all the observed phenotypes affect the ethmoid plate and not the trabeculae, (pages 13, Lines 265-267).

      Comment: Transverse dimension is incorrect terminology - replace with medio-lateral.

      Revised (Lines 69, 74).

      Comment: Improper way of explaining the relationship between mutant and gene..."Another mutant knypek, later identified as gpc4..." a better  way to explain this would be that the knypek mutation was found to be a non-sense mutation in the gpc4 gene.  

      Revised (Line 71)

      Comment: "...the gpc4 mutant formed an ANC that is wider in the transverse dimension than the wildtype, in the opposite end of the ANC phenotypic spectrum compared to wnt11f2...These observations beg the question how defects in early patterning and convergent extension of the embryo may be associated with later craniofacial morphogenesis."

      This statement is broadly representative of the general failure to distinguish primary from secondary defects in this manuscript. Focusing on secondary defects may be useful to understand the etiology of a human disease, but it is misleading to focus on secondary defects when studying gene function. The rod-like ethmoid of slb mutant results from a CE defect of anterior mesoderm during gastrulation(Heisenberg et al. 1997, 2000), while the wide ethmoid plate of kny mutants results from CE defects of cartilage precursors (Rochard et al., 2016). Based on this evidence, wnt11f2 and gpc4 act in different cell populations at different times.  

      It is true that the slb mutant craniofacial phenotype has been stated as secondary to the CE defect during gastrulation and the kny phenotype as primary to chondrocyte CE defects in the ethmoid, however the direct experimental evidence to conclude only primary or only secondary effects does not yet exist. There is no experiment to our knowledge where wnt11f2 was found to not affect ethmoid chondrocytes directly. Likewise, there is no experiment having demonstrated that dysregulated CE in gpc4 mutants does not contribute to a secondary abnormality in the ethmoid. 

      Here, we are analyzing the CE and craniofacial phenotypes of the dact1/2 mutants without any assumptions about primary or secondary effects and without drawing any conclusions about wnt11f2 or gpc4 cellular mechanisms.     

      Comment: "The observation that wnt11f2 and gpc4 mutants share similar gastrulation and axis extension phenotypes but contrasting ANC morphologies supports a hypothesis that convergent extension mechanisms regulated by these Wnt pathway genes are specific to the temporal and spatial context during embryogenesis."

      This sentence is quite vague and potentially misleading. The gastrulation defects of these 2 mutants are not similar - wnt11 is required for anterior mesoderm CE during gastrulation and has not been shown to be active during subsequent craniofacial development while gpc4 is required for posterior mesoderm CE and craniofacial cartilage morphogenesis (LeClair et al., 2009). Here again, the non-spatially overlapping functions of wnt11 and gpc4 suggest that are not part of the same pathway.  

      Though the cells displaying defective CE in wnt11f2 and gpc4 mutants are different, the effects on the body axis are similar. The dact1/2 showed a similar axis extension defect (grossly) to these mutants. Our aim with the scRNA-seq experiment was to determine which cells and gene programs are disrupted in dact1/2 mutants. We found that some cell types and programs were disrupted similarly in dact1/2 mutants and gpc4 mutants, while other cells and programs were specific to dact1/2 versus gpc4 mutants. We can speculate that these that were specific to dact1/2 versus gpc4 may be attributed to CE in the anterior mesoderm, as is the case for wnt11. 

      p.5

      Comment: "We examined the connection between convergent extension governing gastrulation, body axis segmentation, and craniofacial morphogenesis." A statement focused on the mechanistic findings of this paper would be welcome here, instead of a claim for a "connection" that is vague and hard to find in the manuscript.  

      We have rewritten this statement (Line 125).

      p.7 Results:

      Comment: It is unclear why Farrel et al., 2018 and Lange et al., 2023 are appropriate references for WISH. Please justify or edit.  

      This was a mistake and has been edited (Page 9).

      Comment: " Further, dact gene expression was distinct from wnt11f2." This statement is inaccurate in light of the data shown in Fig1A and the following statements - please edit to reflect the partially overlapping expression patterns.  

      We have edited to clarify (Lines 142-143).

      p.8

      Comment: "...we examined dact1 and 2 expression in the developing orofacial tissues. We found that at 72hpf..." - expression at 72hpf is not relevant to craniofacial morphogenesis, which takes place between 48h-60hpf (Kimmel et al., 1998; Rochard et al., 2016; Le Pabic et al., 2014).  

      We have included images and discussion of dact1 and dact2 expression at earlier time points that are important to craniofacial development (Lines 160-171)(Fig 1D,E). 

      Comment: "This is in line with our prior finding of decreased dact2 expression in irf6 null embryos". - This statement is too vague. How are th.e two observations "in line".  

      We have removed this statement from the manuscript.

      Comment: Incomplete sentence (no verb) - "The differences in expression pattern between dact1 and dact2...".  

      Revised (Line 172).

      Comment: "During embryogenesis..." - Please label the named structures in Fig.1E.

      Please be more precise with the described expression time. Also, it would be useful to integrate the scRNAseq data with the WISH data to create an overall picture instead of treating each dataset separately.  

      We have moved the previously published Daniocell data to supplementary figure S1 and have labeled the key cell types. 

      p.9

      Comment: "The specificity of the gene disruption was demonstrated by phenotypic rescue with the injection of dact1 or dact2 mRNA (Fig. S1)." - please describe what is considered a phenotypic rescue.

      -The body axis reduction of dact mutants needs to be documented in a figure. Head pictures are not sufficient. Is the head alone affected, or both the head and trunk/tail? Fig.2E suggests that both head and trunk/tail are affected - please include a live embryos picture at a later stage.  

      We have added a description of how phenotypic rescue was determined (Line 208). We have added a figure with representative images of the whole body of dact1/2 mutants. Measurements of body length found a shortening in dact1/2 double mutants versus wildtype, however differences were not found to be significantly different by ANOVA (Fig. 3C, Fig. S3, Line 270-275).

      p. 11

      Comment: "These dact1-/-;dact2-/- CE phenotypes were similar to findings in other Wnt mutants, such as slb and kny (Heisenberg, Tada et al., 2000; Topczewski, Sepich et al., 2001)." The similarity between slb and kny phenotypes should be mentioned with caution as CE defects affect different regions in these 2 mutants. It is misleading to combine them into one phenotype category as wnt11 and gpc4 are most likely not acting in the same pathway based on these spatially distinct phenotypes.  

      Here we are referring to the grossly similar axis extension defects in slb and kny mutants. We refer to these mutants to illustrate that dact1 and or 2 deficiency could affect axis extension through diverse mechanisms. We have added text for clarity (Lines 249-252).  

      Comment: "No craniofacial phenotype was observed in dact1 or dact2 single mutants. However, in-crossing to generate [...] compound homozygotes resulted in dramatic craniofacial deformity."

      This result is intriguing in light of (1) the similar craniofacial phenotype previously reported by Waxman et al (2004) using morpholino- based knock-down of dact2, and the phenomenon of genetic compensation demonstrated by Jakutis and Stainier 2001 (https://doi.org/10.1146/annurev-genet-071719-020342). The authors should make sure that dact2 loss-of-function is not compensated for by an increase in dact1 transcription, as such compensation could lead to inaccurate conclusions if ignored.  

      We agree with the reviewer that genetic compensation of dact2 by dact1 likely explains the different result found in the dact2 morphant versus CRISPR mutant. We found increased dact1 mRNA expression in the dact2-/- mutant (Fig S2X) however a more thorough examination is required to draw a conclusion. Interestingly, we found that in wildtype embryos dact1 and dact2 expression patterns are distinct though with some overlap. It would be informative to investigate whether the dact1 expression pattern changes in dact2-/- mutants to account for dact2 loss.   

      Comment: "Lineage tracing of NCC movements in dact1/2 mutants reveals ANC composition" - the title is misleading - ANC composition was previously investigated by lineage tracing (Eberhardt et al., 2006; Wada et al., 2005).  

      This has been reworded (Line 292)

      p.13

      Comment: There is no frontonasal prominence in zebrafish.  

      This is true, texts have been changed to frontal prominence.  (Lines 293,

      299, 320)

      Comment: The rationale for investigating NC migration in mutants where there is a gastrula-stage failure of head mesoderm convergent extension is unclear. The whole head is deformed even before neural crest cells migrate as the eye field does not get split in two (Heisenberg et al., 1997; 2000), suggesting that the rod-like ethmoid plate is a secondary defect of this gastrula-stage defect. In addition, neural crest migration and cartilage morphogenesis are different processes, with clear temporal and spatial distinctions.  

      We carried out the lineage tracing experiment to determine which NC streams contributed to the aberrantly shaped EP, whether the anteromost NC stream frontal prominence, the second NC stream of maxillary prominence, or both.  We found that the anteromost NCC did contribute to the rod-like EP, which is different from when hedgehod signaling is disrupted,  So while it is possible that the gastrula-effect head mesoderm CE caused a secondary effect on NC migration, how the anterior NC stream and second NC stream are affected differently between dact1/2 and shh pathway is interesting.  We added discussion of this observation to the manuscript (page 23, Lines 514-520). 

      p. 14-16

      Comment: Based on the heavy suspicion that the rod-like ethmoid plate of the dact1/2 mutant results from a gastrulation defect, not a primary defect in later craniofacial morphogenesis, the prospect of crossing dact1/2 mutants with other wnt-pathway mutants for which craniofacial defects result from craniofacial morphogenetic defects is at the very least unlikely to generate any useful mechanistic information, and at most very likely to generate lots of confusion. Both predictions seem to take form here.  

      However, the ethmoid plate phenotype observed in the gpc4-/-; dact1+/-; dact2-/- mutants (Fig. 5E) does suggest that gpc4 may interact with dact1/2 during gastrulation, but that is the case only if dact1+/-; dact2-/- mutants do not have an ethmoid cartilage defect, which I could not find in the manuscript. Please clarify.  

      The perspective that the rod-like EP of the dact1/2 is due to gastrulation defect is being examined here. Why would other mutants such as wnt11f2 and gpc4 that have gastrulation CE defects have very different EP morphology, whether primary or secondary NCC effect?  Further dact1 and dact2 were reported as modifiers of Wnt signaling, so it is logical to genetically test the relationship between dact1, dact2, wnt11f2, gpc4 and wls. The experiment had to be done to investigate how these genetic combinations impact EP morphology. This study found that combined loss of dact1, dact2 and wls or gpc4 yielded new EP morphology different than those previously observed in either dact1/2, wls, gpc4, or any other mutant is important, suggesting that there are distinct roles for each of these genes contributing to facial morphology, that is not explained by CE defect alone.   

      Comment: I encourage the authors to explore ways to test whether the rod-like ethmoid of dact1/2 mutants is more than a secondary effect of the CE failure of the head mesoderm during gastrulation. Without this evidence, the phenotypes of dact1/2 -gpc4 or - wls are not going to convince us that these factors actually interact.  

      Actually, we find our results to support the hypothesis that the ethmoid of the dact1/2 mutants is a secondary effect of defective gastrulation and anterior extension of the body axis. However, our findings suggest (by contrasting to another mutant with impaired CE during gastrulation) that this CE defect alone cannot explain the dysmorphic ethmoid plate. Our single-cell RNA seq results and the discovery of dysregulated capn8 expression and proteolytic processes presents new wnt-regulated mechanisms for axis extension.    

      p. 20 Discussion

      Comment: "Here we show that dact1 and dact2 are required for axis extension during gastrulation and show a new example of CE defects during gastrulation associated with craniofacial defects."

      Waxman et al. (2004) previously showed that dact2 is involved in CE during gastrulation.

      Heisenberg et al. (1997, 2000), previously showed with the slb mutant how a CE defect during gastrulation causes a craniofacial defect.  

      The Waxman paper using morpholino to disrupt dact2 is produced limited analysis of CE and no analysis of craniofacial morphogenesis. We generated genetic mutants here to validate the earlier morpholino results and to analyze the craniofacial phenotype in detail. We have removed the word “new” to make the statement more clear (Line 475).

      Comment: "Our data supports the hypothesis that CE gastrulation defects are not causal to the craniofacial defect of medially displaced eyes and midfacial hypoplasia and that an additional morphological process is disrupted."

      It is unclear to me how the authors reached this conclusion. I find the view that medially displaced eyes and midfacial hypoplasia are secondary to the CE gastrulation defects unchallenged by the data presented. 

      This statement was removed and the discussion was reworded.

      Comment: The discussion should include a detailed comparison of this study's findings with those of zebrafish morpholino studies.  

      We have added more discussion to compare ours to the previous morpholino findings (Lines 476-484).

      Comment: The discussion should try to reconcile the different expression patterns of dact1 and dact2, and the functional redundancy suggested by the absence of phenotype of single mutants. Genetic compensation should be considered (and perhaps tested).  

      The different expression patterns of dact1 and dact2 along with our finding that dact1 and dact2 genetic deficiency differently affect the gpc4 mutant phenotype suggest that dact1 and dact2 are not functionally redundant during normal development. This is in line with the previously published data showing different phenotypes of dact1 or dact2 knockdown. However, our results that genetic ablation of both dact1 and dact2 are required for a mutant phenotype suggests that these genes can compensate upon loss of the other. This would suggest then that the expression pattern of dact1 would be changed in the dact2 mutant and visa versa. We find that this line of investigation would be interesting in future studies. We have addressed this in the Discussion (Lines 485498).

      Comment: "Based on the data...Conversely, we propose...ascribed to wnt11f2 "

      Functional data always prevail overexpression data for inferring functional requirements.  

      This is true.

      p.21

      Comment: "Our results underscore the crucial roles of dact1 and dact2 in embryonic development, specifically in the connection between CE during gastrulation and ultimate craniofacial development."

      How is this novel in light of previous studies, especially by Waxman et al. (2004) and Heisenberg et al. (1997, 2000). In this study, the authors fail to present compelling evidence that craniofacial defects are not secondary to the early gastrulation defects resulting from dact1/2 mutations.  p. 22

      We have not claimed that the craniofacial defects are not secondary to the gastrulation defects. In fact, we state that there is a “connection”. Further, we do not claim that this is the first or only such finding. We believe our findings have validated the previous dact morpholino experiments and have contributed to the body of literature concerning wnt signaling during embryogenesis. 

      Comment: The section on Smad1 discusses a result not reported in the results section. Any data discussed in the discussion section needs to be reported first in the results section.  

      We have added a comment on the differential expression of smad1 to the results section (Lines 446-448).

    1. However, we want to be careful when using the uniqueness index. If there are already documents in the database that violate the uniqueness condition, no index will be created. So when adding a uniqueness index, make sure that the database is in a healthy state! The test above added the user with username root to the database twice, and these must be removed for the index to be formed and the code to work.

      延伸阅读

      使用 Schema 上字段的 unique 选项来保证一个字短的唯一性的时候,需要确保数据库状态是好的,如果现有数据违反了唯一性,索引不会被建立起来。

    1. The then-chain is alright, but we can do better. The generator functions introduced in ES6 provided a clever way of writing asynchronous code in a way that "looks synchronous". The syntax is a bit clunky and not widely used.

      延伸阅读

      接下来 ES6 中的 generator function 再提升了一点(不了解)

    2. The Promise.all method can be used for transforming an array of promises into a single promise, that will be fulfilled once every promise in the array passed to it as an argument is resolved. The last line of code await Promise.all(promiseArray) waits until every promise for saving a note is finished, meaning that the database has been initialized.

      Promise.all() 的原理。

    3. One starts to wonder if it would be possible to refactor the code to eliminate the catch from the methods? The express-async-errors library has a solution for this.

      express-async-errors 库,可以让我们移除针对 async 代码的 try catch 写法。

    4. The await keyword can't be used just anywhere in JavaScript code. Using await is possible only inside of an async function.

      await 关键字只能在 async 函数里使用

    5. All of the code we want to execute once the operation finishes is written in the callback function. If we wanted to make several asynchronous function calls in sequence, the situation would soon become painful. The asynchronous calls would have to be made in the callback. This would likely lead to complicated code and could potentially give birth to a so-called callback hell.

      对于异步代码的书写,首先是 callback hell

    6. There are a few different ways of accomplishing this, one of which is the only method. With this method we can define in the code what tests should be executed:

      使用 test.only 方法,配合 npm test -- --test-only 命令,可以只运行指定测试。

    1. To introduce data science, it makes sense that we ought to talk about data first. The word data is the plural of the the Latin word datum. One quick word before we continue: Because the word data is the plural of datum, I (and many people) prefer data as a plural noun—hence “What are Data?” for the section title. (In fact, I think it’s funny to define data science as “the science of datums,” but that’s a terrible joke and I promise I won’t do it again in this book). However, it’s quite common in American English to treat data as a singular word—so common in fact, that you might notice me trip up and write “What is Data?” at some point. My opinion here is strong enough that I won’t mind if you point out when I’m inconsistent but not so strong that I’m going to get picky about how you treat the word—go with whatever comes more naturally to you. Even though we rarely use the singular datum, it’s worth briefly exploring its etymology. The word means “a given”—that is, something taken for granted. That’s important: The word data was introduced in the mid-seventeenth century to supplement existing terms such as evidence and fact. Identifying information as data, rather than as either of those other two terms, served a rhetorical purpose (Poovey, 1998; Posner & Klein, 2017; Rosenberg, 2013). It converted otherwise debatable information into the solid basis for subsequent claims. Modern usage of the word data started in the 1940s and 1950s as practical electronic computers began to input, process, and output data. When computers work with data, all of that data has to be broken down to individual bits as the “atoms” that make up data. A bit is a binary unit of data, meaning that it is only capable of representing one of two values: 0 and 1. That doesn’t carry a lot of information by itself (at best, “yes” vs. “no” or TRUE vs. FALSE). However, by combining bits, we can increase the amount of information that we transmit. For example, even a combination of just two bits can express four different values: 00, 01, 10 and 11. Every time you add a new bit you double the number of possible messages you can send. So three bits would give eight options and four bits would give 16 options. When we get up to eight bits—which provides 256 different combinations—we finally have something of a reasonably useful size to work with. Eight bits is commonly referred to as a byte—this term probably started out as a play on words with the word bit (and four bits is sometimes referred to as a nibble or a nybble, because nerds like jokes). A byte offers enough different combinations to encode all of the letters of the (English) alphabet, including capital and small letters. There is an old rulebook called ASCII—the American Standard Code for Information Interchange—which matches up patterns of eight bits with the letters of the alphabet, punctuation, and a few other odds and ends. For example the bit pattern 0100 0001 represents the capital letter A and the next higher pattern 0100 0010 represents capital B. This is more background than anything else—most of the time (but not all of the time!) you don’t need to know the details of what’s going on here to carry out data science. However, it is important to have a foundational understanding that when we’re working with data in this class, the computer is ultimately dealing with everything as bits and translating combinations of bits into words, pictures, numbers, and other formats that makes sense for humans. This background is also helpful for pointing out that just like the word data has connotations related to trustworthiness, it also has connotations of things that are digital and quantitative. While all of these connotations are reasonable, it’s important that we understand their limits. For example, while many people think of data as numbers alone, data can also consist of words or stories, colors or sounds, or any type of information that is systematically collected, organized, and analyzed. Some folks might resist that broad definition of data because “words or stories” told by a person don’t feel as trustworthy or objective as numbers stored in a computer. However, one of the recurring themes of this course is to emphasize that data and data systems are not objective—even when they’re digital and quantitative. When I was introducing ASCII a few paragraphs ago, there were two details in there that might have passed you by but that actually have pretty important consequences. First, I noted that ASCII can “encode all the letters of the (English) alphabet”; second, I mentioned that the “A” in ASCII stood for “American.” Early computer systems in the United States were built around American English assumptions for what counts as a letter. This makes sense… but it has had consequences! While most modern computer systems have moved on to more advanced character encoding systems (ones that include Latin letters, Chinese characters, Arabic script, and emoji, for example), there are still some really important computer systems that use limited encoding schemes like ASCII. In 2015, Tovin Lapin wrote a newspaper article about this, noting that: Every year in California thousands of parents choose names such as José, André, and Sofía for their children, often honoring the memory of a deceased grandmother, aunt or sibling. On the state-issued birth certificates, though, those names will be spelled incorrectly. California, like several other states, prohibits the use of diacritical marks or accents on official documents. That means no tilde (~), no accent grave (`), no umlaut (¨) and certainly no cedilla (¸). Although more than a third of the state population is Hispanic, and accents are used in the names of state parks and landmarks, the state bars their use on birth records. There were attempts in 2014 to change this, but when lawmakers realized it would cost $10 million to update computer systems, things stalled. Moral of the story: even though ASCII is a straightforward technical system built on digital data with no real wiggle room for what means what, it’s still subjective and biased. How we organize data and data systems matters! So, even digital and quantitative data (systems) can be biased, which means that we ought to push lightly back against the rhetorical connotations of data as trustworthy. I’m not suggesting we throw data, science, and data science out the window and go with our gut and our opinions, but we shouldn’t take for granted that a given dataset doesn’t have its own subjectivity. Likewise, we ought to ask ourselves what information needs to become data before it can be trusted—or, more precisely, whose information needs to become data before it can be considered as fact and acted upon (Lanius, 2015; Porter, 1996).

      This section to me is the framework of what data is that will launch into doing cool things this semester definitely things to revisit for sure to get a better understanding if we don’t read it all the way through the first time

  5. Aug 2024
    1. Both tests store the response of the request to the response variable, and unlike the previous test that used the methods provided by supertest for verifying the status code and headers, this time we are inspecting the response data stored in response.body property. Our tests verify the format and content of the response data with the method strictEqual of the assert-library.

      之前使用 supertest 库的方法来验证 HTTP 的状态码和头信息,这里是用 node 的 assert 库方法来验证内容。

    1. VS Code has a handy feature that allows you to see where your modules have been exported. This can be very helpful for refactoring. For example, if you decide to split a function into two separate functions, your code could break if you don't modify all the usages. This is difficult if you don't know where they are. However, you need to define your exports in a particular way for this to work. If you right-click on a variable in the location it is exported from and select "Find All References", it will show you everywhere the variable is imported. However, if you assign an object directly to module.exports, it will not work. A workaround is to assign the object you want to export to a named variable and then export the named variable. It also will not work if you destructure where you are importing; you have to import the named variable and then destructure, or just use dot notation to use the functions contained in the named variable. The nature of VS Code bleeding into how you write your code is probably not ideal, so you need to decide for yourself if the trade-off is worthwhile.

      Neovim 里怎么来找一个变量的 reference

    1. When done properly, this is not plagiarism—in fact, it is good practice in data science.

      It feels really good when you can solve a situation with your own code and your solution is elegant and well crafted. However, in the real world, you often don't have the luxury of enough time to solve certain questions presented and there is rarely a situation presented that hasn't been solved by someone else. It's a balancing act to be sure, but programmers build communities to make solutions stronger and more efficient over time, there is no shame in using that to your advantage!

    2. Along these lines, I strongly discourage you from using any generative AI tool to write code or text for you.

      While I will use AI in my private life to help with content creation, I think using it for code would be incredibly difficult.

    3. I can apply that understanding—in conjunction with R programming—to completing practical projects.

      I have never used any coding software or written code of any kind so I am excited to gain some knowledge in that area!

    1. Reviewer #1 (Public Review):

      The authors are attempting to use the internal workings of a language hierarchy model, comprising phonemes, syllables, words, phrases, and sentences, as regressors to predict EEG recorded during listening to speech. They also use standard acoustic features as regressors, such as the overall envelope and the envelopes in log-spaced frequency bands. This is valuable and timely research, including the attempt to show differences between normal-hearing and hearing-impaired people in these regards.

      I will start with a couple of broader questions/points, and then focus my comments on three aspects of this study: The HM-LSTM language model and its usage, the time windows of relevant EEG analysis, and the usage of ridge regression.

      Firstly, as far as I can tell, the OSF repository of code, data, and stimuli is not accessible without requesting access. This needs to be changed so that reviewers and anybody who wants or needs to can access these materials.

      What is the quantification of model fit? Does it mean that you generate predicted EEG time series from deconvolved TRFs, and then give the R2 coefficient of determination between the actual EEG and predicted EEG constructed from the convolution of TRFs and regressors? Whether or not this is exactly right, it should be made more explicit.

      About the HM-LSTM:

      • In the Methods paragraph about the HM-LSTM, a lot more detail is necessary to understand how you are using this model. Firstly, what do you mean that you "extended" it, and what was that procedure? And generally, this is the model that produces most of the "features", or regressors, whichever word we like, for the TRF deconvolution and EEG prediction, correct? A lot more detail is necessary then, about what form these regressors take, and some example plots of the regressors alongside the sentences.<br /> • Generally, it is necessary to know what these regressors look like compared to other similar language-related TRF and EEG/MEG prediction studies. Usually, in the case of e.g. Lalor lab papers or Simon lab papers, these regressors take the form of single-sample event markers, surrounded by zeros elsewhere. For example, a phoneme regressor might have a sample up at the onset of each phoneme, and a word onset regressor might have a sample up at the onset of each word, with zeros elsewhere in the regressor. A phoneme surprisal regressor might have a sample up at each phoneme onset, with the value of that sample corresponding to the rarity of that phoneme in common speech. Etc. Are these regressors like that? Or do they code for these 5 linguistic levels in some other way? Either way, much more description and plotting is necessary in order to compare the results here to others in the literature.<br /> • You say that the 5 regressors that are taken from the trained model's hidden layers do not have much correlation with each other. However, the highest correlations are between syllable and sentence (0.22), and syllable and word (0.17). It is necessary to give some reason and interpretation of these numbers. One would think the highest correlation might be between syllable and phoneme, but this one is almost zero. Why would the syllable and sentence regressors have such a relatively high correlation with each other, and what form do those regressors take such that this is the case?<br /> • If these regressors are something like the time series of zeros along with single sample event markers as described above, with the event marker samples indicating the onset of the relevant thing, then one would think e.g. the syllable regressor would be a subset of the phoneme regressor because the onset of every syllable is a phoneme. And the onset of every word is a syllable, etc.

      For the time windows of analysis:

      • I am very confused, because sometimes the times are relative to "sentence onset", which would mean the beginning of sentences, and sometimes they are relative to "sentence offset", which would mean the end of sentences. It seems to vary which is mentioned. Did you use sentence onsets, offsets, or both, and what is the motivation?<br /> • If you used onsets, then the results at negative times would not seem to mean anything, because that would be during silence unless the stimulus sentences were all back to back with no gaps, which would also make that difficult to interpret.<br /> • If you used offsets, then the results at positive times would not seem to mean anything, because that would be during silence after the sentence is done. Unless you want to interpret those as important brain activity after the stimuli are done, in which case a detailed discussion of this is warranted.<br /> • For the plots in the figures where the time windows and their regression outcomes are shown, it needs to be explicitly stated every time whether those time windows are relative to sentence onset, offset, or something else.<br /> • Whether the running correlations are relative to sentence onset or offset, the fact that you can have numbers outside of the time of the sentence (negative times for onset, or positive times for offset) is highly confusing. Why would the regressors have values outside of the sentence, meaning before or after the sentence/utterance? In order to get the running correlations, you presumably had the regressor convolved with the TRF/impulse response to get the predicted EEG first. In order to get running correlation values outside the sentence to correlate with the EEG, you would have to have regressor values at those time points, correct? How does this work?<br /> • In general, it seems arbitrary to choose sentence onset or offset, especially if the comparison is the correlation between predicted and actual EEG over the course of a sentence, with each regressor. What is going on with these correlations during the middle of the sentences, for example? In ridge regression TRF techniques for EEG/MEG, the relevant measure is often the overall correlation between the predicted and actual, calculated over a longer period of time, maybe the entire experiment. Here, you have calculated a running comparison between predicted and actual, and thus the time windows you choose to actually analyze can seem highly cherry-picked, because this means that most of the data is not actually analyzed.<br /> • In figures 5 and 6, some of the time window portions that are highlighted as significant between the two lines have the lines intersecting. This looks like, even though you have found that the two lines are significantly different during that period of time, the difference between those lines is not of a constant sign, even during that short period. For instance, in figure 5, for the syllable feature, the period of 0 - 200 ms is significantly different between the two populations, correct? But between 0 and 50, normal-hearing are higher, between 50 and 150, hearing-impaired are higher, and between 150 and 200, normal-hearing are higher again, correct? But somehow they still end up significantly different overall between 0 and 200 ms. More explanation of occurrences like these is needed.

      Using ridge regression:

      • What software package(s) and procedure(s) were specifically done to accomplish this? If this is ridge regression and not just ordinary least squares, then there was at least one non-zero regularization parameter in the process. What was it, how did it figure in the modeling and analysis, etc.?<br /> • It sounds like the regressors are the hidden layer activations, which you reduced from 2,048 to 150 non-acoustic, or linguistic, regressors, per linguistic level, correct? So you have 150 regressors, for each of 5 linguistic levels. These regressors collectively contribute to the deconvolution and EEG prediction from the resulting TRFs, correct? This sounds like a lot of overfitting. How much correlation is there from one of these 150 regressors to the next? Elsewhere, it sounds like you end up with only one regressor for each of the 5 linguistic levels. So these aspects need to be clarified.<br /> • For these regressors, you are comparing the "regression outcomes" for different conditions; "regression outcomes" are the R2 between predicted and actual EEG, which is the coefficient of determination, correct? If this is R2, how is it that you have some negative numbers in some of the plots? R2 should be only positive, between 0 and 1.

    1. he theovisual spectacle of the Hindu pantheon was, however, "hard to see" for mostEuropean observers prior to the twentieth century, and they dismissed it either as"demonic" or as a distorted simulacrum of the "realist" aesthetic of Greco-Romancivilization (Mitter 1977)?the latter assessment prefiguring one common Westernresponse to the visual code of Indian popular films. When Hindu images are crafted,their painted or inlaid eyes are customarily added last and then ritually "opened,"establishing the deity within the icon and making him or her available for theprimary act of worship, which is "seeing/looking" (darsana; Hindi dar san). InIndian English, people go to temples "to take darsan"; Hindi favors "to do darsan"(darsan karn?)?both idioms imply a willful and tangible act. "Darsanic" contactinvites the exchange of substance through the eyes, which are not simply "windowsof the soul," but portals to a self that is conceived as relatively less autonomous andbounded and more psychically permeable than in Western understandings (F. Smith2006). Darsan may also refer to the auspicious sight of powerful places and persons;holy people and kings (and politicians and filmstars) "give darsan" to those whoapproach t

      Here, it emerges that the Western approach to understanding visualization of seeing and watching film. From the Indian cultural approach, seeing goes beyond the simplistic view to include an indepth and likely religious experience that opens into the soul.

    1. What shapes digital culture is often in a “black box”: It is the proprietary information of very large corporations, and the public may or may not have access to the code. Even if we did have it, it would be difficult to explain exactly how algorithms work.

      It's very interesting and terrifying to consider that many of the global communication platforms are privately owned and have invisible forces that are locked within a corporate vault of secrets. We've already seen how these algorithms have worked for the benefit of their designers, but we do not have access to the technical aspects of how any of this works.

    1. eLife assessment

      In this important study, the authors manually assessed randomly selected images published in eLife between 2012 and 2022 to determine whether they were accessible for readers with deuteranopia, the most common form of color vision deficiency. They then developed an automated tool designed to classify figures and images as either "friendly" or "unfriendly" for people with deuteranopia. Such a tool could be used by journals or researchers to monitor the accessibility of figures and images, and the evidence for its utility was solid: it performed well for eLife articles, but performance was weaker for a broader dataset of PubMed articles, which were not included in the training data. The authors also provide code that readers can download and run to test their own images, and this may be of most use for testing the tool, as there are already several free, user-friendly recoloring programs that allow users to see how images would look to a person with different forms of color vision deficiency. Automated classifications are of most use for assessing many images, when the user does not have the time or resources to assess each image individually.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important study, the authors manually assessed randomly selected images published in eLife between 2012 and 2020 to determine whether they were accessible for readers with deuteranopia, the most common form of color vision deficiency. They then developed an automated tool designed to classify figures and images as either "friendly" or "unfriendly" for people with deuteranopia. While such a tool could be used by publishers, editors or researchers to monitor accessibility in the research literature, the evidence supporting the tools' utility was incomplete. The tool would benefit from training on an expanded dataset that includes different image and figure types from many journals, and using more rigorous approaches when training the tool and assessing performance. The authors also provide code that readers can download and run to test their own images. This may be of most use for testing the tool, as there are already several free, user-friendly recoloring programs that allow users to see how images would look to a person with different forms of color vision deficiency. Automated classifications are of most use for assessing many images, when the user does not have the time or resources to assess each image individually.

      Thank you for this assessment. We have responded to the comments and suggestions in detail below. One minor correction to the above statement: the randomly selected images published in eLife were from articles published between 2012 and 2022 (not 2020).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this study developed a software application, which aims to identify images as either "friendly" or "unfriendly" for readers with deuteranopia, the most common color-vision deficiency. Using previously published algorithms that recolor images to approximate how they would appear to a deuteranope (someone with deuteranopia), authors first manually assessed a set of images from biology-oriented research articles published in eLife between 2012 and 2022. The researchers identified 636 out of 4964 images as difficult to interpret ("unfriendly") for deuteranopes. They claim that there was a decrease in "unfriendly" images over time and that articles from cell-oriented research fields were most likely to contain "unfriendly" images. The researchers used the manually classified images to develop, train, and validate an automated screening tool. They also created a user-friendly web application of the tool, where users can upload images and be informed about the status of each image as "friendly" or "unfriendly" for deuteranopes.

      Strengths:

      The authors have identified an important accessibility issue in the scientific literature: the use of color combinations that make figures difficult to interpret for people with color-vision deficiency. The metrics proposed and evaluated in the study are a valuable theoretical contribution. The automated screening tool they provide is well-documented, open source, and relatively easy to install and use. It has the potential to provide a useful service to the scientists who want to make their figures more accessible. The data are open and freely accessible, well documented, and a valuable resource for further research. The manuscript is well written, logically structured, and easy to follow.

      We thank the reviewer for these comments.

      Weaknesses:

      (1) The authors themselves acknowledge the limitations that arise from the way they defined what constitutes an "unfriendly" image. There is a missed chance here to have engaged deuteranopes as stakeholders earlier in the experimental design. This would have allowed [them] to determine to what extent spatial separation and labelling of problematic color combinations responds to their needs and whether setting the bar at a simulated severity of 80% is inclusive enough. A slightly lowered barrier is still a barrier to accessibility.

      We agree with this point in principle. However, different people experience deuteranopia in different ways, so it would require a large effort to characterize these differences and provide empirical evidence about many individuals' interpretations of problematic images in the "real world." In this study, we aimed to establish a starting point that would emphasize the need for greater accessibility, and we have provided tools to begin accomplishing that. We erred on the side of simulating relatively high severity (but not complete deuteranopia). Thus, our findings and tools should be relevant to some (but not all) people with deuteranopia. Furthermore, as noted in the paper, an advantage of our approach is that "by using simulations, the reviewers were capable of seeing two versions of each image: the original and a simulated version." We believe this step is important in assessing the extent to which deuteranopia could confound image interpretations. Conceivably, this could be done with deuteranopes after recoloration, but it is difficult to know whether deuteranopes would see the recolored images in the same way that non-deuteranopes see the original images. It is also true that images simulating deuteranopia may not perfectly reflect how deuteranopes see those images. It is a tradeoff either way. We have added comments along these lines to the paper.

      (2) The use of images from a single journal strongly limits the generalizability of the empirical findings as well as of the automated screening tool itself. Machine-learning algorithms are highly configurable but also notorious for their lack of transparency and for being easily biased by the training data set. A quick and unsystematic test of the web application shows that the classifier works well for electron microscopy images but fails at recognizing red-green scatter plots and even the classical diagnostic images for color-vision deficiency (Ishihara test images) as "unfriendly". A future iteration of the tool should be trained on a wider variety of images from different journals.

      Thank you for these comments. We have reviewed an additional 2,000 images, which were randomly selected from PubMed Central. We used our original model to make predictions for those images. The corresponding results are now included in the paper.

      We agree that many of the images identified as being "unfriendly" are microscope images, which often use red and green dyes. However, many other image types were identified as unfriendly, including heat maps, line charts, maps, three-dimensional structural representations of proteins, photographs, network diagrams, etc. We have uploaded these figures to our Open Science Framework repository so it's easier for readers to review these examples. We have added a comment along these lines to the paper.

      The reviewer mentioned uploading red/green scatter plots and Ishihara test images to our Web application and that it reported they were friendly. Firstly, it depends on the scatter plot. Even though some such plots include green and red, the image's scientific meaning may be clear. Secondly, although the Ishihara images were created as informal tests for humans, these images (and ones similar to them) are not in eLife journal articles (to our knowledge) and thus are not included in our training set. Thus, it is unsurprising that our machine-learning models would not classify such images correctly as unfriendly.

      (3) Focusing the statistical analyses on individual images rather than articles (e.g. in figures 1 and 2) leads to pseudoreplication. Multiple images from the same article should not be treated as statistically independent measures, because they are produced by the same authors. A simple alternative is to instead use articles as the unit of analysis and score an article as "unfriendly" when it contains at least one "unfriendly" image. In addition, collapsing the counts of "unfriendly" images to proportions loses important information about the sample size. For example, the current analysis presented in Fig. 1 gives undue weight to the three images from 2012, two of which came from the same article. If we perform a logistic regression on articles coded as "friendly" and "unfriendly" (rather than the reported linear regression on the proportion of "unfriendly" images), there is still evidence for a decrease in the frequency of "unfriendly" eLife articles over time.

      Thank you for taking the time to provide these careful insights. We have adjusted these statistical analyses to focus on articles rather than individual images. For Figure 1, we treat an article as "Definitely problematic" if any image in the article was categorized as "Definitely problematic." Additionally, we no longer collapse the counts to proportions, and we use logistic regression to summarize the trend over time. The overall conclusions remain the same.

      Another issue concerns the large number of articles (>40%) that are classified as belonging to two subdisciplines, which further compounds the image pseudoreplication. Two alternatives are to either group articles with two subdisciplines into a "multidisciplinary" group or recode them to include both disciplines in the category name.

      Thank you for this insight. We have modified Figure 2 so that it puts all articles that have been assigned two subdisciplines into a "Multidisciplinary" category. The overall conclusions remain the same.

      (4) The low frequency of "unfriendly" images in the data (under 15%) calls for a different performance measure than the AUROC used by the authors. In such imbalanced classification cases the recommended performance measure is precision-recall area under the curve (PR AUC: https://doi.org/10.1371%2Fjournal.pone.0118432) that gives more weight to the classification of the rare class ("unfriendly" images).

      We now calculate the area under the precision-recall curve and provide these numbers (and figures) alongside the AUROC values (and figures). We agree that these numbers are informative; both metrics lead to the same overall conclusions.

      Reviewer #2 (Public Review):

      Summary:

      An analysis of images in the biology literature that are problematic for people with a color-vision deficiency (CVD) is presented, along with a machine learning-based model to identify such images and a web application that uses the model to flag problematic images. Their analysis reveals that about 13% of the images could be problematic for people with CVD and that the frequency of such images decreased over time. Their model yields 0.89 AUC score. It is proposed that their approach could help making biology literature accessible to diverse audiences.

      Strengths:

      The manuscript focuses on an important yet mostly overlooked problem, and makes contributions both in expanding our understanding of the extent of the problem and in developing solutions to mitigate the problem. The paper is generally well-written and clearly organized. Their CVD simulation combines five different metrics. The dataset has been assessed by two researchers and is likely to be of high-quality. Machine learning algorithm used (convolutional neural network, CNN) is an appropriate choice for the problem. The evaluation of various hyperparameters for the CNN model is extensive.

      We thank the reviewer for these comments.

      Weaknesses:

      The focus seems to be on one type of CVD (deuteranopia) and it is unclear whether this would generalize to other types.

      We agree that it would be interesting to perform similar analyses for protanopia and other color-vision deficiencies. But we leave that work for future studies.

      The dataset consists of images from eLife articles. While this is a reasonable starting point, whether this can generalize to other biology/biomedical articles is not assessed.

      This is an important point. We have reviewed an additional 2,000 images, which were randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      "Probably problematic" and "probably okay" classes are excluded from the analysis and classification, and the effect of this exclusion is not discussed.

      We now address this in the Discussion section.

      Machine learning aspects can be explained better, in a more standard way.

      Thank you. We address this comment in our responses to your comments below.

      The evaluation metrics used for validating the machine learning models seem lacking (e.g., precision, recall, F1 are not reported).

      We now provide these metrics (in a supplementary file).

      The web application is not discussed in any depth.

      The paper includes a paragraph about how the Web application works and which technologies we used to create it. We are unsure which additional aspects should be addressed.

      Reviewer #3 (Public Review):

      Summary:

      This work focuses on accessibility of scientific images for individuals with color vision deficiencies, particularly deuteranopia. The research involved an analysis of images from eLife published in 2012-2022. The authors manually reviewed nearly 5,000 images, comparing them with simulated versions representing the perspective of individuals with deuteranopia, and also evaluated several methods to automatically detect such images including training a machine-learning algorithm to do so, which performed the best. The authors found that nearly 13% of the images could be challenging for people with deuteranopia to interpret. There was a trend toward a decrease in problematic images over time, which is encouraging.

      Strengths:

      The manuscript is well organized and written. It addresses inclusivity and accessibility in scientific communication, and reinforces that there is a problem and that in part technological solutions have potential to assist with this problem.

      The number of manually assessed images for evaluation and training an algorithm is, to my knowledge, much larger than any existing survey. This is a valuable open source dataset beyond the work herein.

      The sequential steps used to classify articles follow best practices for evaluation and training sets.

      We thank the reviewer for these comments.

      Weaknesses:

      I do not see any major issues with the methods. The authors were transparent with the limitations (the need to rely on simulations instead of what deuteranopes see), only capturing a subset of issues related to color vision deficiency, and the focus on one journal that may not be representative of images in other journals and disciplines.

      We thank the reviewer for these comments. Regarding the last point, we have reviewed an additional 2,000 images, which were randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      N/A

      Thank you.

      Reviewer #2 (Recommendations For The Authors):

      - The web application link can be provided in the Abstract for more visibility.

      We have added the URL to the Abstract.

      - They focus on deuteranopia in this paper. It seems that protanopia is not considered. Why? What are the challenges in considered this type of CVD?

      We agree that it would be interesting to perform similar analyses for protanopia and other color-vision deficiencies. But we leave that work for future studies. Deuteranopia is the most common color-vision deficiency, so we focused on the needs of these individuals as a starting point.

      - The dataset is limited to eLife articles. More discussion of this limitation is needed. Couldn't one also include some papers from PMC open access dataset for comparison?

      We have reviewed an additional 2,000 images, which we randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      - An analysis of the effect of selecting a severity value of 0.8 can be included.

      We agree that this would be interesting, but we leave it for future work.

      - "Probably problematic" and "probably okay" classes are excluded from analysis, which may oversimplify the findings and bias the models. It would have been interesting to study these classes as well.

      We agree that this would be interesting, but we leave it for future work. However, we have added a comment to the Discussion on this point.

      - Some machine learning aspects are discussed in a non-standard way. Class weighting or transfer learning would not typically be considered hyperparameters."corpus" is not a model. Description of how fine-tuning was performed could be clearer.

      We have updated this wording to use more appropriate terminology to describe these different "configurations." Additionally, we expanded and clarified our description of fine tuning.

      - Reporting performance on the training set is not very meaningful. Although I understand this is cross-validated, it is unclear what is gained by reporting two results. Maybe there should be more discussion of the difference.

      We used cross validation to compare different machine-learning models and configurations. Providing performance metrics helps to illustrate how we arrived at the final configurations that we used. We have updated the manuscript to clarify this point.

      - True positives, false positives, etc. are described as evaluation metrics. Typically, one would think of these as numbers that are used to calculate evaluation metrics, like precision (PPV), recall (sensitivity), etc. Furthermore, they say they measure precision, recall, precision-recall curves, but I don't see these reported in the manuscript. They should be (especially precision, recall, F1).

      We have clarified this wording in the manuscript.

      - There are many figures in the supplementary material, but not much interpretation/insights provided. What should we learn from these figures?

      We have revised the captions and now provide more explanations about these figures in the manuscript.

      - CVD simulations are mentioned (line 312). It is unclear whether these methods could be used for this work and if so, why they were not used. How do the simulations in this work compare to other simulations?

      This part of the manuscript refers to recolorization techniques, which attempt to make images more friendly to people with color vision deficiencies. For our paper, we used a form of recolorization that simulates how a deuteranope would see a figure in its original form. Therefore, unless we misunderstand the reviewer's question, these two types of simulation have distinct purposes and thus are not comparable.

      - relu -> ReLU

      We have corrected this.

      Reviewer #3 (Recommendations For The Authors):

      The title can be more specific to denote that the survey was done in eLife papers in the years 2012-2022. Similarly, this should be clear in the abstract instead of only "images published in biology-oriented research articles".

      Thank you for this suggestion. Because we have expanded this work to include images from PubMed Central papers, we believe the title is acceptable as it stands. We updated the abstract to say, "images published in biology- and medicine-oriented research articles"

      Two mentions of existing work that I did not see are to Jambor and colleagues' assessment on color accessibility in several fields: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041175/, and whether this work overlaps with the 'JetFighter' tool

      (https://elifesciences.org/labs/c2292989/jetfighter-towards-figure-accuracy-and-accessibility).

      Thank you for bringing these to our attention. We have added a citation to Jambor, et al.

      We also mention JetFighter and describe its uses.

      Similarly, on Line 301: Significant prior work has been done to address and improve accessibility for individuals with CVD. This work can be generally categorized into three types of studies: simulation methods, recolorization methods, and estimating the frequency of accessible images.

      - One might mention education as prior work as well, which might in part be contributing to a decrease in problematic images (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041175/)

      We now suggest that there are four categories and include education as one of these.

      Line 361, when discussing resources to make figures suitable, the authors may consider citing this paper about an R package for single-cell data: https://elifesciences.org/articles/82128

      Thank you. We now cite this paper.

      The web application is a good demonstration of how this can be applied, and all code is open so others can apply the CNN in their own uses cases. Still, by itself, it is tedious to upload individual image files to screen them. Future work can implement this into a workflow more typical to researchers, but I understand that this will take additional resources beyond the scope of this project. The demonstration that these algorithms can be run with minimal resources in the browser with tensorflow.js is novel.

      Thank you.

      General:

      It is encouraging that 'definitely problematic' images have been decreasing over time in eLife. Might this have to do with eLife policies? I could not quickly find if eLife has checks in place for this, but given that JetFighter was developed in association with eLife, I wonder if there is an enhanced awareness of this issue here vs. other journals.

      This is possible. We are not aware of a way to test this formally.