10,000 Matching Annotations
  1. Mar 2023
    1. It boils down to this: most old-school computers do what they are told. They follow instructions given to them in the form of code. But if we want computers to solve more complex tasks, they need to do more than that. To be smarter, we are trying to train them how to learn in a way that imitates human behaviour.

      high level definition of artificial evidence

    1. , Turing managed to deduce, quite quickly, how these code books were being used, but realised that his team would need to acquire copies before further progress could be made.It wasn’t till a German naval code book was captured that Turing and his colleagues began to achieve success in working out the daily key and reading encrypted German naval messages. Intelligence reports about Germany’s U-boat and ship movements could then be produced and sent to the Admiralty for dissemination

      The key turning point in breaking the naval codes

    2. Contrary to popular belief, there was no single ‘Enigma code’. The Enigma machine – actually a family of portable encryption devices that substituted each letter of a message for another letter of the alphabet – was first developed in the 1920s and enhanced over subsequent years.

      Mentioned in Enigma article covered in first memex check in

    3. . The man was Alan Turing, and his work at nearby Bletchley Park – the secret base of the Government Code and Cypher School (GC&CS)

      This team was in charge of breaking the german enigma code

    1. AFM se charge de combiner e gros jeux de données et donc de combiner deux tableaux ensemble et de faire soit 2ACP soit une ACP avec une AFC. Ce qui veut dire qu'elle peut en meme temps correler des tableaux avec des valeurs quantitatives et qualitatives. Mais selon si ce sont deux ACP ou ACP+AFC, les graphiques ne seront pas les mêmes. Nottement les graphiques "loadings" qui n'affichera pas les mêmes choses si dans le code de l'afm nous avons décider de considerer une variable en tant que quantitative et seront traiter dans une ACP ou en tant que valeur qualitative et traiter dans une AFC.

    1. RECOMMANDATION N°8La Défenseure des droits recommandeau ministre des Solidarités, de l’Autonomieet des Personnes handicapées et à la ministredéléguée chargée des Personneshandicapées :• De rappeler aux MDPH d’adopter un PPSafin, conformément à l’article D. 351-5 ducode de l’éducation, de définir et coordonnerles modalités de déroulement de la scolaritéet les actions pédagogiques, psychologiques,éducatives, sociales, médicales etparamédicales répondant aux besoinsparticuliers des élèves présentant unhandicap ;• D’inviter les CDPAH à préciser, dansleurs décisions, les activités à réaliserpar les AESH affectés auprès des enfants

      Recommandation 08

    1. Background

      This work has been peer reviewed in GigaScience (see paper https://doi.org/10.1093/gigascience/giad006), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer name: Kyle Hernandez

      Suetake et. al designed and developed a system to publish, validate, and test public workflows utilizing existing standards and integration with modern CI/CD tools. Their design wasn't myopic, they relied heavily on their own experiences, work from GA4GH, and interacting with the large workflow development communities. They were inspired by the important work from Goble et. al that applies the FAIR standards to workflows. As someone who had a long history of workflow engine development, workflow development, and workflow reusability/sharing experience I greatly appreciate this work. There are still unsolved problems, like guidelines on how to approach writing tests for workflows for example, but their system is one level above this and focuses on ways to automate the validation, testing, reviewing/governance, and publishing into a repository to greatly reduce unexpected errors from users. I looked through the source code of their rust-based client, which was extremely readable and developed with industry-level standards. I followed the read me to setup my own repositories, configure the keys, and deploy the services successfully on the first walk through. That speaks to the level of skill, testing, and effort in developing this system and is great news for users interested in using this. At some level it can seem like a "proof of concept", but it is one that is also usable in production with some caveats. The concept is important and implementing this will hopefully inspire more folks to care about this side of workflow "provenance" and reproducibility. There are so many tools out there for CI/CD that is often poorly utilized by academia and I appreciate the author's showing how powerful they can be in this space. The current manuscript is fine and will be of great interest to a wide ranging set of readers, I only have some non-binding suggestions/thoughts that could improve the paper for readers: 1. Based on your survey of existing systems, could you possibly make a figure or table that showcases the features supported/not supported by these different systems, including yours? 2. Thoughts on security/cost safeguards? Perhaps beyond the scope, but it does seem like a governing group needs to define some limits to the testing resources and be able to enforce them. If I am a bad actor and programmatically open up 1000 PRs of expensive jobs, I'm not sure what would happen. Actions and artifact storage aren't necessarily free after some limit. 3. What is the flow for simply updating to a new version of an existing workflow? (perhaps this could be in your docs, not necessarily this manuscript). 4. CWL is an example of a workflow language that developers can extend to create custom "hints" or "requirements". For example, seven bridges does this in cavatica where a user can define aws spot instance configs etc. WDL has properties to config GCP images. It seems like in these cases, tests should only be defined to work when running "locally" (not with some scheduler/specific cloud env). But the author's do mention that tests will first run locally on the user's environment, so that does kind of get around this. 5. For the "findable" part of FAIR, how possible is it to have "tags" of sort associated with a wf record so things can be more findable? I imagine when there is a large repository of many workflows, being able to easily narrow down to the specific domain interest you have could be helpful.

    1. import numpy as np

      please see the assignment instructions- you need to include your perceptron implementation code, the third experiment, and a runtime analysis (for a single iteration).

    1. 通过 Android Studio (演播室) 或 VS Code 创建一个新的 Flutter (颤振) 工程,命名为 "first_flutter_app"。创建好后,就会得到一个默认的计数器应用示例。

      用AS或vscode来创建工程

    1. FF大嫌GD性能太差就Block了,恢复方法: 在 OELD_script.js 找到 setupPlatform 函数的这一行 _class += ' goldendict' + ((window.HTMLTrackElement === undefined) ? ' qt4' : ' qt5'); 把qt5改成qt4

      看code似乎是qt5 和 qt4 性能的问题

    1. Just like in the initial code, we search the map using std::map::find, but now we store the result in a local variable. The result is an iterator that we can compare with entries.end(), as before. Now, if the iterator is not the end iterator, then we can get the value from the map by dereferencing the iterator and accessing its second member.

      通过将 find 返回的结构体保存在 iter 中避免重复调用 find

    1. A striking potential metabolic complementarity to emerge from our annotations is the capacity of many frequent lichen bacteria to code for cofactors needed by one of the dominant eukaryotic symbionts

      I'm interpreting up to this point that functional annotation and pathway exploration was only performed for the bacterial genomes and not fungal/algal MAGs? Was this because of the difficult in performing ORF prediction/functional annotations without corresponding RNAseq data or something planned for the future? Because it would be interesting to see if the corresponding fungi have transporters for those cofactors

    1. feel the need to show code to ask it

      If i need to show my code for any reason, it should only be done with a professor and in the Instructors Only section.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use what is potentially a novel method for bootstrapping sequence data to evaluate the extent to which SARS-CoV-2 transmissions occurred between regions of the world, between France and other European countries, and between some distinct regions within France. Data from the first two waves of SARS-CoV-2 in Europe were considered, from 2020 into January 2021. The paper provides more detail about the specific spread of the virus around Europe, specifically within France, than other work in this area of which I am aware.

      First of all, we would like to thank reviewer #1 for their evaluation and their various comments which, in our opinion, have allowed us to considerably improve the manuscript.

      An interesting facet of the methodology used is the downsampling of sequence data, generating multiple bootstraps each of around 500-1000 sequences and conducting analysis on each one. This has the strength of sampling, in total, a large number of sequences, while reducing the overall computational cost of analysis on a database that contains in total several hundred thousand sequences. A question I had about the results concerns the extent of downsampling versus the rate of viral migration: If between-country movements are rapid, a reduced sample could be misleading, for example characterising a transmission path from A to B to C as being from A to C by virtue of missing data. I acknowledge that this would be a problem with any phylogeographic analysis relying on limited data. However, in this case, how does the rate of migration between locations compare to the length of time between samples in the reduced trees? Along these lines, I was unclear to what extent the reported proportions of intra- versus inter-regional transmissions (e.g. line 223) would be vulnerable to sampling effects.

      This question is indeed a very important one. Between-country movement rate can be high but the contagious period for a SARS-CoV-2-infected individual is short (a bit less than two weeks in average). In our subsamples, the dated trees have a median branch length around 20 days. To ensure that our subsamples did not introduce errors in estimating the exchange events between locations, we conducted a simulation. Briefly, we generated a tree of 1,000,000 tips with a five-states discrete trait. We then took 100 subsampled 1000-leaves trees, reconstructed the ancestry for the discrete trait and assess transitions between states. The error rate is less than 3% on average: it comprises the missing data, as you pointed out, and the errors in reconstructing the ancestry for the trait deeper in the tree.

      We think that overall, less than 3% is a satisfying error rate.

      The results of this specific simulation were added to the paper (lines 150-157) and as Figure 2—figure supplement 1.

      A further question around the methodology was the use of an artificially high fixed clock rate in the phylogenetic analysis so as to date the tree in an unbiased way. Although I understood that the stated action led to the required results, given the time available for review I was unable to figure out why this should be so. Is this an artefact of under-sampling, or of approximations made in the phylogenetic inference? Is this a well-known phenomenon in phylogenetic inference?

      We thank reviewer #1, who was, as reviewer #2 and the editor, disturbed by the use of an artificially fast and fixed molecular clock. It was an artifact to correct a mistake in our code that has been fixed. See the answer to point (3) of the editor.

      The value of this kind of research is highlighted in the paper, in that genomic data can be used to assess and guide public health measures (line 64). This work elucidates several facts about the geographical spread of SARS-CoV-2 within France and between European countries. The more clearly these facts can be translated into improved or more considered public health action, through the evaluation of previous policy actions, or through the explication of how future actions could lead to improved outcomes, the more this work will have a profound and ongoing impact.

      This is a very interesting point to emphasize indeed. We are currently discussing with public health specialists in our institution on how to assess past public health actions using phylodynamics data in a statistically valid manner.

      Reviewer #2 (Public Review):

      This study represents an important contribution to our understanding of SARS-CoV-2 transmission dynamics in France, Europe and globally during the early pandemic in 2020 and the authors should be congratulated for tackling this important question. Through evaluation of the contributions of intra- and inter-regional transmission at global, continental, and domestic levels, the authors provided compelling, although as of yet correlative and incomplete, evidence towards how international travel restrictions reduced inter-regional transmission while permitting increased transmission intra-regionally. Unfortunately, however this work suffers from a number of serious analytical shortcomings, all of which can be overcome in a major revision and re-analysis.

      We would like to thank the reviewer #2 for their evaluation and their various comments. We want to point that reviewer #2 was contacted for advice on strategy for the molecular clock since she performed a study on a similar topic describing SARS-CoV-2 epidemics in Canada during 2020. We strongly believe that all reviewer #2 comments drastically contributed to improve the quality of this work.

      With this genomic epidemiology analysis, the authors disentangled the relative contributions of different geographic levels to transmission events in France and in Europe in the first two COVID-19 waves of 2020. By partitioning the analysis into three complementary, but distinct, geographic levels, the migration flows in and out of continents, countries in Europe, and regions in France were inferred using maximum likelihood ancestral state reconstruction. The major strengths of this paper were the inclusion of multiple geographic levels, the comparison of different rate symmetries in the ancestral character estimation, and the comprehensive qualitative descriptions of comparisons over time and geographies. However, there were also major weaknesses that need to be addressed and are described in more detail below. They include summing across replicates that were drawn with replacement and were not independent; inadequate justification for excluding underrepresented geographies; the assertion that positive correlation between intra-regional transmission and deaths validates the accuracy of the analysis; considering the framework the authors have chosen for this analysis the analysis would accommodate and benefit strongly from increasing the size of the sequence sets selected for analysis in each replicate; and the sparsity of quantitative (over qualitative or exploratory) comparisons and statistics in the reporting of results. In particular, it would greatly strengthen the paper if the authors could better evaluate the effect of travel restrictions on importations and exportations by testing hypotheses, quantifying changes in the presence of restrictions, or estimating inflection points in importation rates.

      We are grateful for this comprehensive listing of the strengths and weaknesses of our study. Regarding the limitations of this study, these will be detailed specifically for each dedicated remark of the reviewer. We would like to emphasize that all the remarks and limitations reported here by reviewer #2 are in our opinion fully justified. We hence have tried to bring additional analyses (study of the Pango lineages, averaging of the subsamples, simulation study to justify the size of the sampling), a modification of the methodology (in particular concerning the molecular clock) and a thorough rewriting of the “Results” section.

      General comments on the Background: Need to elaborate on how this study fits into the big picture in the first paragraph. Should discuss how phylodynamics contributes to understanding of viral outbreaks, SARS-CoV-2 epidemiology and viral evolution.

      We have added in the “Introduction” section some elements to better understand why phylodynamics is an important field in the epidemiology of SARS-CoV-2 and its evolution.

      The authors should consider a hypothesis driven framework for their analyses, for example considering the geographically central position of France what hypotheses stem from this considering sources of viral importations and destinations of exportations from/to Europe vs other international? Or other a priori expectations.

      We agree with reviewer #2 about this remark. Indeed, given the central position of France, we can hypothesize that it has strongly participated in the dissemination of the virus within Europe. This hypothesis has been included in the "Introduction" section of the revised version (lines 102-105).

      To address the computational limits of phylogenetic reconstruction, 100 replicates of fewer than 1000 sequences each were sampled for each epidemic wave at each level. The inter- and intra-regional transmissions were averaged and then summed across replicates in order to compare the relative roles played by each geography towards transmission. While we see the logic in using the sum across replicates, this is highly likely to bias results, especially since in the methods, this is described as sampling with replacement between replicates (LX). The validity of summing replicates needs to be discussed and are likely most appropriately presented as mean or median. Also, these samples are quite small considering the computational capacity of the maximum likelihood tools being used. We recommend repeating the analysis with a substantially larger number of sequences per sample.

      We thank reviewer #2 for this relevant remark. We initially summed the subsamples, a strategy that may possibly bias the results. In the new version of the manuscript, we averaged the subsamples by region and by week as recommended (and stated in the methods, line 536-537).

      About the size of our subsamples, it made no difference to use 1,000, 2,000 or 5,000 genomes in each subsample. To get a more definitive and scientifically sound answer, we performed a simulation assay that has been included in the manuscript and is shown is what is now figure 2 (and figure 2—figure supplement 1). These simulations show that our subsampling strategy allows for an accurate estimate of transition rates for a discrete parameter (lines 107-160).

    1. copy{ "error_code": "BAD_REQUEST_ERROR", "error_description": "Payment failed", "error_source": "gateway", "error_step": "payment_authorization", "error_reason": "payment_failed"}

      This code block should have a heading

    1. Author Response

      Reviewer #1 (Public Review):

      There has been a lot of work showing that multi-peaked tuning curves contain more information than single peaked ones. If that's the case, why are single-peaked tuning curves ubiquitous in early sensory areas? The answer, as shown clearly in this paper, is that multi-peaked tuning curves are more likely to produce catastrophic errors.

      This is an extremely important point, and one that should definitely be communicated to the broader community. And this paper does an OK job doing that. However, it suffers from two (relatively easily fixable) problems:

      I) Unless one is an expert, it's very hard to extract why multi-peaked tuning curves lead to catastrophicerrors.

      II) It's difficult to figure out under what circumstances multi-peaked tuning curves are bad. This isimportant, because there are a lot of neurons in the sensory cortex, and one would like to know whether multi-peaked tuning curves are really a bad idea there.

      And here are the fixes:

      I) Fig. 1c is a missed opportunity to explain what's really going on, which is that on any particular trialthe positions of the peaks of the log likelihood can shift in both phase and amplitude (with phase being more important). However Fig. 1c shows the average log likelihood, which makes it hard to understand what goes wrong. It would really help if Fig. 1c were expanded into its own large figure, with sample log likelihoods showing catastrophic errors for multi-peaked tuning curves but not for single peaked ones. You could also indicate why, when multi-peaked tuning curves do give the right answer, the error tends to be small.

      We thank the reviewer for this suggestion. We have now split the first figure into two.

      In the new Figure 1, we provide an intuitive explanation of local vs catastrophic errors and single-peaked / periodic tuning curves. We have also added smaller panels to illustrate how the distribution of errors changes with decoding time (using a simulated single-peaked population).

      The new Figure 2 shows sampled likelihoods for 3 different populations. We hope this provides some intuitive understanding of the phase shifts. Unfortunately, it proved difficult not to normalize the “height” of each module’s likelihood as they can differ by several orders of magnitude across the modules. However, due to the multiplication, the peak likelihood values can (approximately) be disregarded in the ML-decoding. Lastly, we have also added more simulation points (scale factors) compared to what we had in the earlier version of the figure (see panels d-e).

      II) What the reader really wants to know is: would sensory processing in real brains be more efficient ifmulti-peaked tuning curves were used? That's certainly hard to answer in all generality, but you could make a comparison between a code with single peaked tuning curves and a good code with multi-peaked tuning curves. My guess is that a good code would have lambda_1=1 and c around 0.5 (you could use the module ratio the grid cell people came up with -- I think 1/sqrt(2) -- although I doubt if it matters much). My guess is that it's the total number of spikes, rather than the number of neurons, that matters. Some metric of performance (see point 1 below) versus the contrast of the stimulus and the number of spikes would be invaluable.

      We thank the reviewer for this comment and the suggestions. We agree, ideally such an expression would be useful. However, as you note it is a very challenging task due to the large parameter space (number of neurons, peak amplitude, spontaneous firing rate, width of tuning, stimulus dimensionality etc) and is beyond the scope of the present study. We have instead included a new figure (see Figure 7 in the manuscript) detailing the minimal decoding times for various choices of parameter values. We believe this gives an indication to how minimal decoding time scales with various parameters.

    2. Reviewer #1 (Public Review)

      There has been a lot of work showing that multi-peaked tuning curves contain more information than single peaked ones. If that's the case, why are single-peaked tuning curves ubiquitous in early sensory areas? The answer, as shown clearly in this paper, is that multi-peaked tuning curves are more likely to produce catastrophic errors.

      This is an extremely important point, and one that should definitely be communicated to the broader community. And this paper does an OK job doing that. However, it suffers from two (relatively easily fixable) problems:

      I. Unless one is an expert, it's very hard to extract why multi-peaked tuning curves lead to catastrophic errors.

      II. It's difficult to figure out under what circumstances multi-peaked tuning curves are bad. This is important, because there are a lot of neurons in sensory cortex, and one would like to know whether multi-peaked tuning curves are really a bad idea there.

      And here are the fixes:

      I. Fig. 1c is a missed opportunity to explain what's really going on, which is that on any particular trial the positions of the peaks of the log likelihood can shift in both phase and amplitude (with phase being more important). However Fig. 1c shows the average log likelihood, which makes it hard to understands what goes wrong. It would really help if Fig. 1c were expanded into its own large figure, with sample log likelihoods showing catastrophic errors for multi-peaked tuning curves but not for single peaked ones. You could also indicate why, when multi-peaked tuning curves do give the right answer, the error tends to be small.

      II. What the reader really wants to know is: would sensory processing in real brains be more efficient if multi-peaked tuning curves were used? That's certainly hard to answer in all generality, but you could make a comparison between a code with single peaked tuning curves and a _good_ code with multi-peaked tuning curves. My guess is that a good code would have lambda_1=1 and c around 0.5 (you could use the module ratio the grid cell people came up with -- I think 1/sqrt(2) -- although I doubt if it matters much). My guess is that it's the total number of spikes, rather than the number of neurons, that matters. Some metric of performance (see point 1 below) versus the contrast of the stimulus and the number of spikes would be invaluable.

    1. Reviewer #2 (Public Review):

      The current manuscript presents a new toolbox to apply temporal response functions (TRFs) usable in python. TRFs are becoming more widely used and providing an accessible toolbox for a wider audience is very important and should be promoted. Overall, it also seems that the code accompanying the manuscript provides all the steps to do the analysis and could potentially be very useful. However, in the current version, the toolbox relies on one single way to solve the TRF estimation problem, which is the boosting algorithm. Providing a single algorithm makes it difficult to compare results from this toolbox with outcomes of other toolboxes which rely on different methods to solve the regression. The user is forced to work with this choice and is not provided other options (or easy ways to implement new options). Additionally, it seems unclear whether the toolbox is fully able to provide the means to generate predictors that are typically used in a TRF analysis. The github code provided for generating the predictors does not seem to be fully integrated with eelbrain and relies on code in the trftools toolbox, which contains code that the authors deem not yet stable enough to be released. Finally, the overall logic and idea behind the toolbox could have been explained better to make it more accessible to use.

    1. journals.sagepub.com needs to review the security of your connection before proceeding.Why am I seeing this page? Requests from malicious bots can pose as legitimate traffic. Occasionally, you may see this page while the site ensures that the connection is secure.Connection is secureProceeding... error code: 1020 var _____WB$wombat$assign$function_____ = function(name) {return (self._wb_wombat && self._wb_wombat.local_init && self._wb_wombat.local_init(name)) || self[name]; }; if (!self.__WB_pmw) { self.__WB_pmw = function(obj) { this.__WB_source = obj; return this; } } { let window = _____WB$wombat$assign$function_____("window"); let self = _____WB$wombat$assign$function_____("self"); let document = _____WB$wombat$assign$function_____("document"); let location = _____WB$wombat$assign$function_____("location"); let top = _____WB$wombat$assign$function_____("top"); let parent = _____WB$wombat$assign$function_____("parent"); let frames = _____WB$wombat$assign$function_____("frames"); let opener = _____WB$wombat$assign$function_____("opener"); let arguments; { (function(){ window._cf_chl_opt={ cvId: '2', cZone: 'journals.sagepub.com', cType: 'managed', cNounce: '95286', cRay: '7a1aea958e7797e8', cHash: '42b9931a9c9b1ec', cUPMDTk: "\/reader\/content\/18635d415f9\/10.1177\/0956247813490908\/format\/epub\/EPUB\/xhtml\/index.xhtml?__cf_chl_tk=I56pADbp6UIOxjoJRrsR_MoMTTvZgB21piFsWnqqtXw-1677773494-0-gaNycGzND2U", cFPWv: 'g', cTTimeMs: '1000', cMTimeMs: '0', cTplV: 4, cTplB: 'cf', cK: "", cRq: { ru: 'aHR0cHM6Ly9qb3VybmFscy5zYWdlcHViLmNvbS9yZWFkZXIvY29udGVudC8xODYzNWQ0MTVmOS8xMC4xMTc3LzA5NTYyNDc4MTM0OTA5MDgvZm9ybWF0L2VwdWIvRVBVQi94aHRtbC9pbmRleC54aHRtbA==', ra: 'TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzExMC4wLjAuMCBTYWZhcmkvNTM3LjM2', rm: 'R0VU', d: 'lfjfNsDshk3GkJ78/QYtuOlySuMoMHj5Cot1bD4LNBnMXdAP2wZYZDgDM2OJqhkowTFWlJFspmiXhGDgXmjIBFZx0mfXsRXJNEiBeu0LXt9A+DNN3zme/B+YtIsKH30iT1FgIVzoEGpdPjGNp/eRHs97KDB7omFXBS/qRP9bh+yKj6GNtbcZkd+f4Gem+JIJB1Sa8e6h7sTRq0vbP7y7Xe+btyN+GN17EdjECGzmhWrCnEqpWO2mg57lrfZTP8JZ9ryrgHOfzjGszft4UbPTytA6+rRdJiA5yMtWT5mYlcXYPa0PEIYv9S8h/Ox7NdaElh1Chj4EU3e6bzouvMigLj3WbSado//vl6tADZXBu87XqnvGbjxyB7fSEEsGjXaYs9yHroKfPEw+8GLruTrtbnSvPaQ417bQiOt4ptt4ojUiA49GXKBTY5vdselE0pFaqLxtD8m3WVnAfz8xxOg1nNThQeGyh/B8LecBzgbZWJBCH07n4+d0Er7pKAwQd11zYB/kWgQysIKCxVA0POaD3tPScd7rvakKkG9Xdv1xyTwSSqKhG9w3VPwLME59pyHEaw1zjFkZTh2c2CrtoCvgsmqpXlH6drUUz73HWDL6qFUEQD3OtS8BT3IJn86i6tC3OZ+MkZlqA4jUfHfBbF4dCbmPsoX9U1FmEFs3d4UtjTJziwwYxhcrdWDCUr8jHaciOVbqX5Notf5lCFculpUUPA==', t: 'MTY3Nzc3MzQ5NC42NTkwMDA=', m: 'nHx1pMqeFWN8ruLx0oMLEHQm2KMfLpVyTUkbxroON30=', i1: 'ldlVezz7lQCoALAuO+5OEw==', i2: 'VRvWOi85YjU/db4KCUfPIg==', zh: 'gw7YMdbZ1M4iQ6cbqLPC730Ml6kaQ+3i4OTRjaElasU=', uh: '2js4Ag50MaGPvSw9QHRdWA7pnU0jgA6pr8LK1Dlsa+k=', hh: 'c9ogzZPyf3xtUVOiYSAQbEsbym/d5b1rPQM2Rm/OUTE=', } }; var trkjs = document.createElement('img'); trkjs.setAttribute('src', '/cdn-cgi/images/trace/managed/js/transparent.gif?ray=7a1aea958e7797e8'); trkjs.setAttribute('alt', ''); trkjs.setAttribute('style', 'display: none'); document.body.appendChild(trkjs); var cpo = document.createElement('script'); cpo.src = '/cdn-cgi/challenge-platform/h/g/orchestrate/managed/v1?ray=7a1aea958e7797e8'; window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash; window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search; if (window.history && window.history.replaceState) { var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash; history.replaceState(null, null, "\/reader\/content\/18635d415f9\/10.1177\/0956247813490908\/format\/epub\/EPUB\/xhtml\/index.xhtml?__cf_chl_rt_tk=I56pADbp6UIOxjoJRrsR_MoMTTvZgB21piFsWnqqtXw-1677773494-0-gaNycGzND2U" + window._cf_chl_opt.cOgUHash); cpo.onload = function() { history.replaceState(null, null, ogU); }; } document.getElementsByTagName('head')[0].appendChild(cpo); }()); }}

      im stuck guys pls help!

  2. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. It is my belief that a movement beyond tolerance is absolutely necessary if multicultural education is to become more than a superficial "bandaid" or a "feel-good" additive to our school curricula. I will argue that tolerance is actually a low level of multicultural support, reflecting as it does an acceptance of the status quo with but slight accommodations to difference.

      It is interesting to see that the author emphasizes the importance of the word 'tolerance.' Looking back, my fellow high school students and I tolerated the schools "code of conduct," that involved the words, "integrity, responsibility, and kindness." Oddly enough, not everyone followed these rules to the T and made teachers lose their 'tolerance' on students. Interesting to how it kind of goes in a circle when looking at who's tolerating who.

    1. BUT AT HOME!

      Having to dramatically code-switch between how you speak and write at home vs. school only adds to the problem of alienation that many students experience in the classroom. Horner, Lu, Royster, and Trimbur address this a bit in their article.

    1. julienmalard (Julien Malard-Adam)Interesting, thanks! I could definitely look at the code if you can share it.https://github.com/Three0Dev/Three0Pinner All yours!GitHub - Three0Dev/Three0Pinner: A pinning service for Orbitdb, a decentralized database based on IPFS. - GitHubA pinning service for Orbitdb, a decentralized

      https://github.com/Three0Dev/Three0Pinner

    1. The

      This work has been published in GigaScience Journal under a CC-BY 4.0 license (https://doi.org/10.1093/gigascience/giac076) and has published the reviews under the same license.

      Reviewer 1 Satoshi Hiraoka

      In this manuscript, the authors developed a new tool, DeePVP, for predicting Phage Virion Proteins (PVPs) using the Deep learning approach. The purpose of this study is meaningful. As the authors described in the Introduction section, currently it is difficult to annotate functions of viral genes precisely because of its huge sequence diversity and existence of many unknown functions, and there are still many rooms to improve the performance of in silico annotation of phage genes including PVPs. Although I'm not an expert in machine learning, the newly proposed method based on Deep learning seems to be appropriate. The proposed tool showed clear outperformance compared with the other previously proposed tools, and thus, the tool might be valuable for further deep analysis of many viral genomes. Indeed, the authors conducted two case studies using real phage genomes and reported novel findings that may have insight into the genomics of the phages. Overall, the manuscript is well written, and I feel the tool has a good potential to contribute to the wide fields of viral genomics. Unfortunately, I have concerns including the source cord openness. Also, I have some suggestions that would increase the clarity and impact of this manuscript if addressed.

      Major: I did not find DeePVP source cord on the GitHub page. Is the tool not open source? I strongly recommend the author disclose all scripts of the tool for further validation and secondary usage by other scientists. Or, at least, clearly state why the source cords need to hold private. Also, I was much confused about the GitHub page because the uploaded files are not well structured. Scripts and data used for performance evaluation were included in 'data.zip' file, which should be renamed to be an appropriate one. 'Source code' button in the Releases page strangely links to the 'Supporting_data.zip' files which only contained installing manual but not source cord file. The authors should prepare the GitHub page appropriately that, for example, upload all source cords to the 'main' branch rather than include them in zip file, and 'source code' file in Releases should contain actual source code files rather than manual PDF. According to the Material and method section, 1) using the Deep learning approach, and 2) using th large dataset retrieved from PhANNs as teacher dataset, are two of the important improvement from the other studies in the PVP identification task. Someone may suspect the better performance of DeePVP was mostly contributed by the increased teaching dataset rather than the used classification method. Is there a possibility that the previously proposed tools (especially the tools except for PhANNs) with re-training using the large PhANNs dataset could reach better performances than DeePVP? The naming of 'Reliability index' (L249) is inaccurate. The score did not support the prediction 'reliability' (i.e., whether the predicted genes are truly PVP or not) but just reflects the fact that the gene is predicted as PVP by many tools without considering whether it is correct or incorrect. The sentence 'A higher n indicates that this protein is predicted as PVP by more tools at the same time, and therefore, the prediction may be more reliable.' in L252 is not logical. I dose not fully agree with the discussion that the tool will facilitate viral host prediction as mentioned in L294-302. It is very natural that if the phages are phylogenetically close and possess similar genomic structures including PVP-enriched regions, those will infect the same microbial lineage as a host. However, this is not evaluated systematically in wide phage lineages. In general, almost all phage-host relations are unknown in nature except few numbers of specific viruses such as E. Coli phages. Further detailed studies should be needed on whether and how degree the conservation of PVP-enriched region could be a potentially good feature to predict phage-host relationship. I think the phage-host prediction is beyond the scope of this tool, and thus the analysis could be deleted in this manuscript or just briefly mention in the Discussion section as a future perspective.

      Minor: The URL of the GitHub page is better to describe in the last of the Abstract or inside of the main text in addition to the 'Availability of supporting source code and requirements' section. This will make it easy for many readers to access the homepage and use the tool. Fig 2 and 3. I think it is better to change the labels of the x-axis like 0 kb, 20 kb, 40 kb, ..., and 180 kb. This will make it easy for understanding that the horizontal bar represented the viral genome.

      Re-review:

      I read the revised manuscript and acknowledge that the authors made efforts to take reviewers' comments into account. My previous points have been addressed and I feel the manuscript was improved. I think the word 'incomplete proteins' in L391-396 would be rephrased like 'partial genes' because here we should consider protein-encoding genes (or protein sequences), not proteins themselves, and the word 'incomplete' is a bit ambiguous.

    2. ABSTRACT

      Reviewer 2. Deyvid Amgarten

      The manuscript presents DeePVP, a new tool for PVP annotation of a phage genome. The tool implements two separate modules: The main module aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the ten major classes of PVPs. Compared with the present state-of-the-art tools, the main module of DeePVP performs better, with a 9.05% higher F1-score in the PVP identification task. Moreover, the overall accuracy of the extended module of DeePVP in the PVP classification task is approximately 3.72% higher than that of PhANNs, a known tool in the area. Overall, the manuscript is well written, clear, and I could not identify any serious methodological inconsistence. I was not sure whether to consider the performance metrics shown as significant improvements or not, since PhANNs already does a similar job on that regard. And it is better for some types of PVPs for example. But I would rather give this task to readers and other researchers in the area. Specifically, I enjoyed the discussion about how one-hot encoded features may be more suitable for predictions that k-mers based ones. And by consequence, that convolution networks may present an advantage against simple multilayer perceptron networks. This manuscript brings an important contribution to the phage genomics and machine learning fields. I am certain that DeePVP will be helpful to many researchers. I have a major question about the composition of the dataset used to train the main module: Among the PVP proteins, do authors know if only the ten types of PVP are present? There is a rapid mention to key words used to assemble the PhANNs dataset in the discussion (line 340), but that is not clear to me. This will help me understand the following: Line 124: The CNN in the extended module has an output softmax layer, which outputs likelihood scores for 10 types of virion proteins. I wonder if only proteins from these 10 types were included in the datasets used to train the CNNs. I mean, is it possible that a different type of virion protein is predicted by the main module as PVP? And if so, how would the extended module predict this protein since it is PVP but none of the ten types? Minors: Line 121: By default, a protein with a PVP score higher than 0.5 is regarded as a PVP. How was this cutoff chosen? Was this part of the k-cross validation process? Line 157 and other pieces in the manuscript: I would suggest authors not to use sentences like "F1-score is 9.05% much higher than that of PhANNs" for obvious reasons that 9% may not seem such a great difference for using the "much" adverb. Same thing to "much better" and variations. About the comparisons between DeePVP and PhANNs: Did authors make sure that instances of the test set were not used to train the PhANNs model being used? Line 221: What authors mean by "more authentic prediction"? Looking at the github repository, I found rather unusual that authors chose to upload only a PDF with instructions of how to use and install. It is very detailed, I appreciate. The virtual machine and docke containers are also nice resources to help less experienced users. However, I noticed that the github repository has no clear mention to the source code of the tool. I found it by a mention in the Availability of supporting data, where authors created a release with the datasets and the scripts. Again, very unusual, but I suppose authors have chosen this approach due to github limitations to large files. Table 2: I would like to ask authors what might me the reason for such low performance metrics to some types of PVP (for example, minor capsid)? Figure 5 states: "Host genus composition of the subject sequences". But there is a "Myoviridae" category, which is a family of phages. Not anything related to bacterial hosts. Please, verify why this is in the figure.

      Re-review:

      Thank you for authors' responses. Most of my concern were addresses. I have to say, though, that the github page is not quite in the standards for a bioinformatics tools yet. I appreciate the source code upload, but I noticed that not a single line of #comments were present in the code I have checked. README file is also not very clarifying. I do not consider this as an impediment for publication (since there are detailed info in GigaScience DB), but perhaps this may hind usage of authors' tool. Most users will only look at the github repository. I suggest some improvements in case authors judge my comment makes some sense. Bellow I list three examples just to give authors an idea:

      https://github.com/fenderglass/Flye https://github.com/LaboratorioBioinformatica/MARVEL https://github.com/vrmarcelino/CCMetagen

      One last concern was about authors' response to the Myoviridae mistake in figure 5. Authors stated that the genus of a phage host is in its name (as for example Escherichia phage XX). But this is a dangerous assumption, since many phage names are outside of this rule. For example, there are many phages with Enterobacteria phage XXX (for instance NC_054905.1 ), meaning that they infect some Enterobacteria. Again, enterobacteria is not a genus. Phage nomenclature may be a mess sometimes, be careful.

    1. Spatial

      Reviewer 2. Quan Nguyen

      Reviewer Comments to Author: This work presents a new clustering method, Stardust, that has the potential to improve stability of clustering results against parameter changing. Stardust can assess the contribution to the clustering result by spatial information relative to gene expression information. Stardust appears to performs better than other methods in the two metrics used in this paper, stability and coefficient of variation. The essence of the method is the use of a spatial transcriptomics (ST) distance matrix as a simple linear combination of physical distance (S) and transcriptional distance (T) matrices. A weight factor is used for the S matrix to control and evaluate the contribution of the spatial information. The effort for evaluating multiple parameters and comparing with several latest methods and across a number of public spatial datasets is a highlight of the work. The authors also made the code available.

      Major comments: - The concept of combining spatial location and gene expression is not new and has been applied in most spatial clustering methods. It is not clear what are the new additions to current available methods, except for a feature to weigh the contribution of spatial components to clustering results. - The approach to assess the contribution of spatial information, by varying the weight factor from 0 to 1 is rather simple, because the contribution can be nonlinear and vary between spots/cells (e.g. spatial distance becomes more important for spots/cells that are nearer to each other; some genes are more spatially variable than the others; applying one weight factors for all genes and all spots would miss these variation sources) - The 5 weight factors 0, 0.25, 0.50, 0.75, and 1 were used. However, this range of parameters provided too few data points to assess the impact of spatial factor. As seen in figures, the 5 data points do not strongly suggest a point where the spatial contribution is maximum/minimum due to large fluctuation of values in the y-axis. - Although two performance metrics are used (stability and variation), there needs to be an additional metric about how the clustering results represent biological ground truth cell type composition or tissue architecture (for example, by comparing to pathological annotation). Consequently, it is unclear if the stardust results are closer to the biological ground truth or not. - Stardust was tested on multiple 10x Visium datasets, but different types of spatial transcriptomics data like seqFISH, Slideseq, MERFISH, ect. are also common. Extended assessment of potential applications to other technologies would be useful. Minor comments: - The paragraphs and figure legends in the Result section are repetitive. - The result section is descriptive and there is no Discussion section.

      Re-review:

      The authors have improved the initial manuscript markedly. There are a couple of important points regarding comparisons between Stardust and Stardust that need to be addressed: 1) In which cases Stardust improves over Stardust? It seems the results would be dependent on different biological systems (i.e., tissue types). The authors suggest both versions produce comparable results, but given the major change in the formula (replacing a constant weight with variable weights as normalised gene expression values to [0,1] minmax scale), there are likely differences between Stardust and Stardust. For example, certain genes will have higher weight than the others, therefore making the effects of the weights variable among genes. For this example, the authors may assess highly abundant genes vs low abundant genes 2) In cases where spatial distances are important, Stardust could be less accurate than Stardust version with a high space weight. How Stardust* considers cases that spatial distance is as important as gene expression.

    1. Abstract

      This work has been published in GigaScience Journal under a CC-BY 4.0 license (https://doi.org/10.1093/gigascience/giac071) and has published the reviews under the same license.

      Reviewer 1. Moritz Herrmann

      First review: Summary:

      The authors conducted a benchmark study of survival prediction methods. The design of the study is reasonable in principle. The authors base their study on a comprehensive set of methods and performance evaluation criteria. In addition to standard statistical methods such as the CoxPH model and its variants, several machine learning methods including deep learning methods were used. In particular, the intention to conduct a benchmark study based on a large, diverse set of datasets is welcome. There is indeed a need for general, large-scale survival prediction benchmark studies. However, I have serious concerns about the quality of the study, and there are several points that need clarification and/or improvement.

      Major issues:

      1. The method comparison does not seem fair As far as I can tell from the description of the methods, the method comparison is not fair and/or not informative. In particular, given the information provided in Supp-Table-3 and the code provided in the Github repository, hyperparameter tuning has not been conducted for some methods. For example, Supp-Table-3 indicates that the parameters 'stepnumber' and 'penaltynumber' of the CoxBoost method are set to 10 and 100, respectively. Similarly, only two versions of RSF with fixed ntree (100 and 1000) and mtry (10, 20) values are used. Also, the deep learning methods appear not to be extensively tuned. On the other hand, telling form the code, methods such as the Cox model variants (implemented via glmnet) and MTLR have been tuned at least a little. Please clearly explain in detail, how the hyperparameters have been specified respectively how hyperparameter tuning has been conducted for the different methods? If, in fact, not all methods have been tuned, this is a serious issue and the experiments need to be rerun under a sound and fair tuning regime.

      2. Description of the study design Related to the first point, the description of the study design needs to be improved in general as it does not allow to assess the conducted experiments in detail. A few examples, which require clarification:

      3. as already mentioned, the method configurations and implementations are not described sufficiently. It is unclear how exactly the hyperparameter settings have been obtained, how tuning as been applied and why only for some methods

      4. concerning the methods Cox(GA), MTLR(GA), COXBOOST(GA), MTLR(DE), COXBOOST(DE): have the feature selection approaches been applied on the complete datasets or only on the training sets
      5. Supp-Table-3 lists two implementations of the Lasso, Ridge and Elastic Net Cox methods (via penalized and glmnet); yet, Figure 2 in the main manuscript only lists one version. Which implementations have been used and are reported in Figure 2?
      6. l. 221: it is stated that "the raw Brier score" has been calculated. At which time point(s) and why at this/these time point(s)?
      7. Supp-Table-2: it is stated that "some methods are not fully successful for all datasets", but only DNNSurv is further examined. Is it just DNNSurv or are there other methods that have failed in some iterations? Moreover, what has been done about the failing iterations? Have the missing values be imputed? Are the failing iterations ignored?

      I recommend that section 3 be comprehensively revised and expanded, in particular including the methods implementations, how hyperparamters are obtained/tuning has been conducted, aggregation of performance results, handling of failing iterations. Moreover, I suggest to provide summary tables of the methods and datasets in the main manuscript and not in the supplement.

      1. Reliability of the presented results In other studies [BRSB20, SCS+20, HPH+20] differences in (mean) model prediction performance have been reported to be small (while variation over datasets can be large). This can also be seen in Figure 3 of the main manuscript. Please include more analyses on the variability of prediction performances and also include a comparison to a baseline method such as the Kaplan-Meier estimate. Most importantly, if some methods have been tuned while others have not, the reported results are not reliable. For example, the untuned methods are likely to be ill-specified for the given datasets and thus may yield sub-optimal prediction performances. Moreover, if internal hyperparameter tuning is conducted for some methods, for example via cv.glmnet for the Cox model variants, and not for others, the computation times are also not comparable.

      2. Clarity of language, structure and scope I believe that the quality of the written English is not up to the standard of a scientific publication and consider language editing necessary (yet, it has to be taken into account that I am not a native speaker). Unlike related studies [BWSR21, SCS+20, e.g.], the paper lacks clarity and/or coherence. Although clarity and coherence can be improved with language editing, there are also imprecise descriptions in section 2 that may additionally require editing from a technical perspective. For example:

      3. l. 76 - 78: The way censoring is described is not coherent, e.g.: "the class label '0' (referring to a 'no-event') does not mean an event class labelled as '0'". Furthermore, it is not true that "the event-outcome is 'unknown'". The event is known, but the exact event time is not observed for censored observations.

      4. The authors aim to provide a comprehensive benchmarking study of survival analysis methods. However, they do not, for example, provide significance tests for performance differences nor critical differences plots (it should be noted that the number of datasets included may not provide enough power to do so). This is in stark contrast to the work of Sonabend [Son21].

      I suggest revising section 2 using more precise terminology and clearly describing the scope of the study, e.g., what type of censoring is being studied, whether time-dependent variable and effects are of interest, etc. I think this is very important, especially since the authors aim to provide "practical guidelines for translational scientists and clinicians" (l. 32) who may not be familiar with the specifics of survival analysis.

      Minor issues

      • l. 43: Include references for specific examples
      • l. 60: The cited reference probably is not correct
      • l. 266: "MTLR-based approaches perform significantly better". Was a statistical test performed to determine significant differences in performance? If yes, indicate which test was performed. If not, do not use the term "significant" as this may be misunderstood as statistical significance.
      • Briefly explain what the difference is between data sets GE1 to GE6.
      • It has been shown that omics data alone may not be very useful [VDBSB19]. Please explain why only omics variables are used for the respective datasets.
      • Figure 1: Consider changing the caption to 'An overview of survival methods used in this study' as there are survival methods that are not covered. Moreover, consider referencing Wang et al [WLR19] as Figure 1a resembles Figure 3 presented therein.
      • Figure 2: Please add more meaningful legends (e.g., title of legend; change numbers to Yes, No, etc.).
      • Figure 2 a & b: What do the dendrograms relate to?
      • Figure 2 d: The c-index is not a proper scoring rule [BKG19] (and only measures discrimination), better use the integrated Brier score (at best, at different evaluation time points) as it is a proper scoring rule and measures discrimination as well as calibration.
      • Figure 3: At which time point is the Brier score evaluated and why at that time point? Consider using the integrated Brier score instead.
      • This is rather subjective, but I find the use of the term "framework", especially that the study contributes by "the development of a benchmarking framework" (l. 60), irritating. For example, a general machine learning framework for survival analysis was developed by Bender et al. [BRSB20], while general computational benchmarking frameworks in R are provided, e.g., by mlr3 [LBR+19] or tidymodels [KW20]. The present study conducts a benchmark experiment with specific design choices, but in my opinion it does not develop a new benchmarking framework. Thus, I would suggest not using the term "framework" but better "benchmark design" or "study design".
      • In addition, the authors speak of a "customizable weighting framework" (l. 241), but never revisit this weighting scheme in relation to the results and/or provide practical guidance for it. Please explain w.r.t. the results how this scheme can and should be applied in practice.

      The references need to be revised. A few examples: - l. 355 & 358: This seems to be the same reference. - l. 384: Title missing - l. 394: Year missing - l. 409: Year missing - l. 438: BioRxiv identifier missing - l. 441: ArXiv identifier missing - l. 445: Journal & Year missing

      Typos: - l. 66: . This - l. 89: missing comma after the formula - l. 93: missing whitespace - l. 107: therefore, (no comma) - l. 121: where for each, (no comma) - l. 170: examineS - l. 174: therefore, (no comma) - l. 195: as part of A multi-omics study; whitespace on wrong position; the sentence does not appear correct - l. 323: comes WITH a

      Data and code availability

      Data and code availability is acceptable. Yet, the ANZDATA and UNOS_kidney data are not freely available and require approval and/or request. Moreover, for better reproducibility and accessibility, the experiments could be implemented with a general purpose benchmarking framework like mlr3 or tidymodels.

      References

      [BKG19] Paul Blanche, Michael W Kattan, and Thomas A Gerds. The c-index is not proper for the evaluation of-year predicted risks. Biostatistics, 20(2):347-357, 2019. [BRSB20] Andreas Bender, David Rügamer, Fabian Scheipl, and Bernd Bischl. A general machine learning framework for survival analysis.arXiv preprint arXiv:2006.15442, 2020. [BWSR21] Andrea Bommert, Thomas Welchowski, Matthias Schmid, and Jörg Rahnenführer. Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Briefings in Bioinformatics, 2021. bbab354. [HPH+20] Moritz Herrmann, Philipp Probst, Roman Hornung, Vindi Jurinovic, and Anne-Laure Boulesteix. Large-scale benchmark study of survival prediction methods using multi-omics data. Briefings in Bioinformatics, 22(3), 2020. bbaa167. [KW20] M Kuhn and H Wickham. Tidymodels: Easily install and load the 'tidymodels' packages. R package version 0.1.0, 2020. [LBR+19] Michel Lang, Martin Binder, Jakob Richter, et al. mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software, 4(44):1903, 2019. [SCS+20] Annette Spooner, Emily Chen, Arcot Sowmya, Perminder Sachdev, Nicole A Kochan, Julian Trollor, and Henry Brodaty. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Scientific reports,10(1):1-10, 2020. [Son21] Raphael Edward Benjamin Sonabend. A theoretical and methodological framework for machine learning in survival analysis: Enabling transparent and accessible predictive modelling on right-censored time-to-event data. PhD thesis, UCL (University College London), 2021. [VDBSB19] Alexander Volkmann, Riccardo De Bin, Willi Sauerbrei, and Anne-Laure Boulesteix. A plea for taking all available clinical information into account when assessing the predictive value of omics data. BMC medical research methodology, 19(1):1-15, 2019. [WLR19] Ping Wang, Yan Li, and Chandan K Reddy. Machine learning for survival analysis: Asurvey. ACM Computing Surveys (CSUR), 51(6):1-36, 2019.

      Re-review:

      Many thanks for the very careful revision of the manuscript. Most of my concerns have been thoroughly addressed. I have only a few remarks left.

      Regarding 1. Fair comparison and parameter selection The altered study design appears much better suited to this end. Thank you very much for the effort, in particular the additional results regarding the two tuning approaches. Although I think a single simple tuning regime would be feasible here, using the default settings is reasonable and very well justified. I agree that this is much closer to what is likely to take place in practice. However, it should be more clearly emphasized that better performance may be achievable if tuning is performed.

      Regarding 2. Description Thanks, all concerns properly addressed. No more comments.

      Regarding 3. Reliability I am aware that Figure 2c provides information to this end. I think additional boxplots which aggregate the methods' performance (e.g. for unoc and bs) over all runs and datasets would provide valuable additional information. For example, from Figure 2c one can tell that MTLR variants obtain overall higher ranks based on mean prediction performance than the deep learning methods. However, it says nothing about how large the differences in mean performance are.

      Kaplan-Meier-Estimate (KM) I'm not quite sure I understood the authors' answer correctly. The KM does not use variable information to produce an estimate of the survival function, and I think that is why it would be interesting to include it. This would shed light on how valuable the variables are in the different data sets.

      Regarding 4. Scope and clarity Thanks, all concerns properly addressed. No more comments.

      Minor points:

      • Since the authors decided to change 'framework' to 'design', note that in Figure 1b it still says 'framework'
      • l.51 & l.54/55 appear to be redundant
      • Figure 2 a and b:
      • Please elaborate more on how similarity (reflected in the dendrograms) is defined?
      • Why is the IBS more similar to Bregg's and GH C-Index than to the Brier Score?
      • Why is the IBS not feasible for so many methods, in particular Lasso_Cox, Rdige_Cox, and CoxBoost?
    1. Abstract

      This work has been published in GigaScience Journal under a CC-BY 4.0 license (https://doi.org/10.1093/gigascience/giac073 and has published the reviews under the same license.

      Reviewer 1. Siyuan Ma

      Reviewer Comments to Author: In Kang, Chong, and Ning, the authors present Meta-Prism 2, a microbial community analysis framework, which calculates sample-sample dissimilarities and queries microbial profiles similar to those of user-provided targets. Meta-Prism 2 adopts efficient algorithms to achieve the time and memory efficiency required for modern microbiome "big data" application scenarios. The authors evaluated Meta-Prism 2's performance, both in terms of separating different biomes' microbial profiles and time/memory usage, on a variety of real-world studies. I find the application target of Meta-Prism appealing: achieving efficient dissimilarity profiling is increasingly relevant for modern microbiome applications. However, I'm afraid the manuscript appears to be in poor state, with insufficient details for crucial methods and results components. Some display items are either missing or mis-referenced. As such, I cannot recommend for its acceptance, unless major improvements are made. My comments are detailed below.

      Major 1. The authors claim that from its previous iteration, the biggest improvements are: (1) removal of redundant nodes in 1-against-N sample comparisons. (2) functionality for similarity matrix calculation (3) exhaustive search among all available samples.

      a. (1) seems the most crucial for the method's improved efficiency. However, the details on why these nodes can be eliminated, and how dissimilarity calculation is achieved post-elimination are not sufficient. The caption for Figure 1C, and relevant Methods texts (lines 173-188) should be expanded, to at least explain i) why it is valid to calculate (dis)similarity postelimination based on aggregation, ii) how aggregation is achieved for the target samples. b. I may not have understood the authors on (2), but this improvement seems trivial? Is it simply that Meta-Prism 2 has a new function to calculate all pair-wise dissimilarities on a collection of microbial profiles? c. For (3), it should be made clearer that Meta-Prism 1 does not do this. I needed to read the authors' previous paper to understand the comment about better flexibility in customized datasets. I assume that this improvement is enabled because Meta-Prism 2 is vastly faster compared to 1? If so, it might be helpful to point this out explicitly.

      1. I am lost on the accuracy evaluation results for predicting different biomes (Figure 2). a. How are biomes predicted for each microbial sample? b. What is the varying classification threshold that generates different sensitivities and specificities? c. Does "cross-validation" refer to e.g. selection of tuning parameters during model training, or for evaluation model performances? d. What are the "Fecal", "Human", and "Combined" biomes for the Feast cohort? Such details were not provided in Shenhav et al.

      Moderate 1. I understand that this was previously published, but could the authors comment on the intuitions behind their dissimilarity measure, and how it compares to similar measures such as the weighted UniFrac? a. Does Meta-Storm and Meta-Prism share the same similarity definition? If so, why would they differ in terms of prediction accuracies? 2. There seems to be some mis-referencing on the panels of Figure 1. a. Panel B was not explained at all in the figure caption. b. Line 185 references Figure 1E, which does not exist.

      Minor 1. The Meta-Prism 1 publication was referenced with duplicates (#16 and 24) 2. There are minor language issues throughout the manuscript, but for they do not affect understanding of the materials. Examples: a. Line 94: analysis -> analyze b. Line 193: We also obtained a dataset that consists of ...

      Re-review:

      I find most of my questions addressed. My only remaining issue is still that the three biomes from FEAST (Fecal, Human, and Mixed) are still not clearly defined. The only definition I could find is line 206-208 "We also obtained a dataset that consists of 10,270 samples belonging to three biomes: Fecal, Human, and Mixed, which have been used in the FEAST study, defined as the FEAST dataset". Are "Fecal" simply stool samples, and "Human" samples biopsies from the human gut? What is "Mixed"? As a main utility of Meta-Prism is source tracking, it is important for the reader to understand what these biomes are, to understand the resolution of the source tracking results. If this can be resolved, I'll be happy to recommend the manuscript's acceptance.

      Reviewer 2. Yoann Dufresne

      In this article the authors present Meta-Prism 2, a software to compute distances between metagenomic samples and also query a specific sample against a pool of samples. They call "sample" a precomputed file with abundance of multiple taxa. In the article they first succinctly present multiple aspects on the underlying algorithms. Then they provide an extensive analysis on the precision, ram and time consumption of the software. Finally, they show 3 applications of Meta-Prism 2.

      I will start to say that the execution time of the tool looks very good compared to all other tools. But I have multiple concerns about these numbers. - First, I like to reproduce the results of a paper before approving it. But I had a few problems doing so. * The tool do not compile as it is on git. I had to modify a line of code to compile it. This is nothing very bad but authors of tools should be sure that their main code branch is always compiling. See the end of the review for bug and fix. * The analysis are done using samples from MGnify. I found related OTU tsv files linked in the supplementary but no explanation on how to transform such files in pdata files that the software is processing. * The only way to directly reproduce the results is to trust the pdata files present on the github of the authors. I would like to make my own experiments and compare the time to transform OTU files into pdata with the actual run time of MP2. - The authors evaluated the accuracy of their method (which is nice) but did not gave access on the scripts that were used for that. I would like to see the code and try to reproduce the figure by myself on my own data. - The 2nd and 3rd applications are explained in plain text but there is no script related neither any table of graphics to reproduce or explain the results. The only way for me to evaluate this part is to trust the word of the authors. I would like the authors to show me clear and indisputable evidences.

      For the methods part it is similar. We have hints on what the authors did, but not a full explanation: - For the similarity function, I would like to know where it comes from. The cited papers [14] and [24] do not help on the comprehension of the formula. If the function is from another paper, I ask the authors to add a clear reference (paper + section in the paper) ; if not, I would like the authors to explain in details why this particular function, how they constructed it and how it behaves. - The authors refer multiple times to "sparse format" applied to disk & cache but never defined what they mean by that. I would like to see in this section which exact datastructure is used. - In the Fast 1-N sample comparison, the authors write about "current methods" but without citing them. I would like the authors to refer to precise methods/software, succinctly describe them and then compare their methods on top of that. Also in this part, the authors point at figure 1E that is not present in the manuscript. - The figure 1 is not fully understandable without further details in the text. For example, what is Figure 1C4 ?

      I want to point that the paper is not correctly balanced in term of content. 1.5 page for time execution analysis is too much compared to the 2 pages of methods and less than 1 page of real data applications.

      Finally, the authors are presenting a software but are not following the development standards. They should provide unit and functional tests of their software. I also strongly recommend them to create a continuous integration page with the git. With such a tool the compilation problem would not exist.

      To conclude, I think that the authors very well engineered the software but did not present it the right way. I suggest the authors to rewrite the paper with strong improvements of the "methods" and "Real data application" sections. Also, to provide a long term useful software, they have to add guaranties to the code as tests and CI.

      For all these reasons, I recommend to reject this paper.

      --- Bug & Fix ---

      make mkdir -p build g++ -std=c++14 -O3 -m64 -march=native -pthread -c -o build/loader.o src/loader.cpp g++ -std=c++14 -O3 -m64 -march=native -pthread -c -o build/newickParser.o src/newickParser.cpp g++ -std=c++14 -O3 -m64 -march=native -pthread -c -o build/simCalc.o src/simCalc.cpp g++ -std=c++14 -O3 -m64 -march=native -pthread -c -o build/structure.o src/structure.cpp g++ -std=c++14 -O3 -m64 -march=native -pthread -c -o build/main.o src/main.cpp src/main.cpp: In function 'int main(int, const char)': src/main.cpp:128:31: error: 'class std::ios_base' has no member named 'clear' 128 | buf.ios_base::clear(); | ^~~~~ make: * [makefile:7: build/main.o] Error 1

      To fix the bug: src/main.cpp:128 => buf.ios.clear();

  3. Feb 2023
    1. I am a software engineer, canoeist, gardener and and all-round tinkerer. I got into software because of my curiosity about how things work. I kept asking “why” until I eventually found myself doing it for a job. I love the range of work I get to do as an engineer. My work often focuses on performance improvements and coaching teams in code design choices. I value thoughtful communication that amplifies marginalized voices in the workplace.
    1. n the BORIS project we aim to characterize patients with CKD and either have or do not have Heart Failure. Odysseus provided Bayer with executable code to conduct the BORIS analysis on 3 OMOP CDMs: Truven Marketscan, OPTUM claims and OPTUM EHR.

      I dont like this paragraph

    1. recommandation 6Mettre en œuvre sur l’ensemble du territoire le dispositif prévu par l’article 108 de la loi pour uneRépublique numérique, intégré à l’article L. 115-3 du code de l’action sociale et des familles, quiprévoit que toute personne ou famille éprouvant des difficultés particulières, au regard notammentde son patrimoine, de l’insuffisance de ses ressources ou de ses conditions d’existence, a droit à uneaide de la collectivité pour disposer de la fourniture d’un service de téléphonie fixe et d’un serviced’accès à internet.Suites données depuis trois ansCette aide était depuis 2016 en phase d’expérimentation dans trois départements : la Seine-Saint-Denis, la Haute-Saône et la Marne. Les modalités d’obtention de l’aide sont déterminées par lesconseils départementaux. Elles peuvent donc différer en fonction du lieu d’habitation. Les résultatsde cette expérimentation montrent que le dispositif a été très peu suivi car les travailleurs sociauxétaient peu informés et outillés pour le mobiliser. La généralisation n’est pas prévue à ce jour.
    2. recommandation 1Adopter une disposition législative au sein du code des relations entre les usagers et l’administrationimposant de préserver plusieurs modalités d’accès aux services publics pour qu’aucune démarcheadministrative ne soit accessible uniquement par voie dématérialisée.Suites données depuis trois ansDes dispositifs ont été mis en place pour permettre une voie d’accès non dématérialisée à certainsservices publics (observatoire de la qualité des démarches en ligne ; annonce du plan visant àpromouvoir le déploiement du canal téléphonique dans tous les services publics ; espaces FranceServices), mais aucune disposition législative n’est venue consacrer ce droit.Deux propositions de loi ont été déposées mais leur parcours législatif n’a pas abouti.Proposition de loi du 26 mai 2020 n°2997 instaurant un droit à des modalités d’accès nondématérialisées aux demandes administratives déposée à l’Assemblée nationale et renvoyéeà la Commission des lois : « Après l’article L. 111-3 du code des relations entre le public etl’administration, il est inséré un article L. 111-4 ainsi rédigé : « Nul ne peut se voir contraint à recourirà des procédures dématérialisées dans ses relations avec l’administration. Toute personne a le droitde demander un traitement par courrier de ses démarches administratives. »Proposition de loi du 12 février 2021 n°367 relative à la lutte contre l’illectronisme et pourl’inclusion numérique, version initiale présentée au Sénat, reprenant le rapport d’informationsénatorial, relative à la lutte contre l’illectronisme et pour l’inclusion numérique qui prévoit l’insertionde l’article 112-6-1 au code des relations entre le public et l’administration disposant que « toutusager du service public est reçu, à sa demande, dans les sites physiques des administrationsafin de réaliser toute démarche administrative dans un délai raisonnable, au plus tard deux mois àcompter de la date de la saisine. L’existence d’un téléservice n’emporte aucune obligation de saisinepar voie électronique de l’administration. »
    1. Reviewer #2 (Public Review):

      This paper explores the possibility of integrating diverse and multiple DNA fragments in the genome taking advantage of plasmids in arrays, and CRISPR-Cas.

      Since the efficiency of integration in the genome is low, they, as others in the field, use selection markers to identify successful events of integration. The use of these selection markers is common and diverse, but they use a couple of distinct strategies of selection to:

      – Introduce bar codes in the genome of individuals at one specific genomic site (gene for Hygromycin resistance with bar code in an intron with homology arms to complete a functional gene);

      – Introduce promoters at two specific genomic landing pads downstream of fluorescent reporters.

      The strengths of the study rely on the clever design of the selection markers, which enrich the collection of this type of markers. The weaknesses are the lack of novelty in the field in theoretical or practical terms. In fact, they do not show any innovative application of these approaches. Moreover, they show a limited number of experiments in the manuscript, or at least insufficient in my opinion for an article that is based on a methodology.

      This work adds to other recent studies, e.g. from Nonet, Mouridi et al., and Malaiwong et al, that use the integration of single and multiple/diverse DNA sequences in the C. elegans genome, and thus is not as groundbreaking as claimed. The real test of this method will be its use to address biological questions.

    1. As noted by the IE SA, the HTML publication of contact information was not considered necessary by Facebook’s Security Team and was subsequently discontinued117. The EDPB considers that the analysis of the principle of data minimisation (Article 5(1)(c) GDPR) is relevant for the necessity assessment on the basis of Article 6(1)(b) GDPR118. Consequently, the EDPB further finds that such analysis should have complemented the LSA’s assessment on the necessity of the processing for the performance of the contract, with specific regard to the publication of the contact information in the HTML source code on the Instagram website. The EDPB considers that the IE SA could not have concluded that the publication of the contact information of child users in the HTML source code may be regarded as

      EDPB rightly smacks the IE SA around a bit for generally cocking this all up.

    1. Request to know” means a consumer request that a business disclose personalinformation that it has collected about the consumer pursuant to Civil Code sections1798.100, 1798.110, or 1798.115. It includes a request for any or all of the following:(1) Specific pieces of personal information that a business has collected about theconsumer;(2) Categories of personal information it has collected about the consumer;(3) Categories of sources from which the personal information is collected;(4) Categories of personal information that the business sold or disclosed for abusiness purpose about the consumer;(5) Categories of third parties to whom the personal information was sold ordisclosed for a business purpose; and(6) The business or commercial purpose for collecting or selling personalinformation

      Narrower than the GDPR

    1. But there is still a bit of mystery about what the new chatbot can do — and why it would do it. Its complexity makes it hard to dissect and even harder to predict, and researchers are looking at it through a philosophic lens as well as the hard code of computer science.

      This basically creates a sense of mystery without telling us much, implying that there is something spooky going on, something beyond what computer science can explain. Actually it's quite explainable as the article title implies. People start writing prompts in a certain genre and the completion follows the genre...

    1. How does Spring know that it should take the DataSource that you specified as a @Bean method and then create new UserDAOs with that specific DataSource? Easy, with another marker annotation: @Autowired. Hence, your final code will look like this.

      ```java import javax.sql.DataSource; import org.springframework.stereotype.Component; import org.springframework.beans.factory.annotation.Autowired;

      @Component public class UserDao {

      private DataSource dataSource;
      
      public UserDao(@Autowired DataSource dataSource) {
          this.dataSource = dataSource;
      }
      

      } ```

    2. It would be much nicer if we opened just one DataSource and re-used it, instead of opening and closing tons of them.

      And when we have duplicated code for such operation (database connection) this will introduce new problems ... too many database connection

    1. "languageServerExample.trace.server": "verbose"

      This is in the contributes part of the package.json

      json "contributes": { "configuration": { "type": "object", "title": "Example configuration", "properties": { "languageServerExample.maxNumberOfProblems": { "scope": "resource", "type": "number", "default": 100, "description": "Controls the maximum number of problems produced by the server." }, "languageServerExample.trace.server": { "scope": "window", "type": "string", "enum": [ "off", "messages", "verbose" ], "default": "off", "description": "Traces the communication between VS Code and the language server." } } } },

    Annotators

    1. The .await in the read_to_string function body is necessary to mark the cancellation point in case the function is compiled as async; but when not async would essentially become a no-op 2:

      This is a bit strange to me - we just ignore a function when making it synchronous? Why can't we just call synchronous code from async code without writing functions to explicitly support async?

      This gives every single Rust library maintainer the obligation to make every function they write ?async which is insane

    1. Forwarding will always break emails specially in Outlook as it adds it's own code before composing. You can have a forward link on emails which takes you to a page to forward to a friend or you can go with a broken email when it's forwarded. It's harsh I know but there is no way around it.
    1. Reviewer #3 (Public Review):

      The authors report on an interesting study that addresses the effects of a physical and dietary intervention on accelerated/decelerated brain ageing in obese individuals. More specifically, the authors examined potential associations between reductions in Body-Mass-Index (BMI) and a decrease in relative brain-predicted age after an 18-months period in N = 102 individuals. Brain age models were based on resting-state functional connectivity data. In addition to change in BMI, the authors also tested for associations between change in relative brain age and change in waist circumference, six liver markers, three glycemic markers, four lipid markers, and four MRI fat deposition measures. Moreover, change in self-reported consumption of food, stratified by categories such as 'processed food' and 'sweets and beverages', was tested for an association with change in relative brain age. Their analysis revealed no evidence for a general reduction in relative brain age in the tested sample. However, changes in BMI, as well as changes in several liver, glycemic, lipid, and fat-deposition markers showed significant covariation with changes in relative brain age. Three markers remained significant after additionally controlling for BMI, indicating an incremental contribution of these markers to change in relative brain age. Further associations were found for variables of subjective food consumption. The authors conclude that lifestyle interventions may have beneficial effects on brain aging.

      Overall, the writing is concise and straightforward, and the langue and style are appropriate. A strength of the study is the longitudinal design that allows for addressing individual accelerations or decelerations in brain aging. Research on biological aging parameters has often been limited to cross-sectional analyses so inferences about intra-individual variation have frequently been drawn from inter-individual variation. The presented study allows, in fact, investigating within-person differences. Moreover, I very much appreciate that the authors seek to publish their code and materials online, although the respective GitHub project page did not appear to be set to 'public' at the time (error 404). Another strength of the study is that brain age models have been trained and validated in external samples. One further strength of this study is that it is based on a registered trial, which allows for the evaluation of the aims and motivation of the investigators and provides further insights into the primary and secondary outcomes measures (see the clinical trial identification code).

      One weakness of the study is that no comparison between the active control group and the two experimental groups has been carried out, which would have enabled causal inferences on the potential effects of different types of interventions on changes in relative brain age. In this regard, it should also be noted that all groups underwent a lifestyle intervention. Hence, from an experimenter's perspective, it is problematic to conclude that lifestyle interventions may modulate brain age, given the lack of a control group without lifestyle intervention. This issue is fueled by the study title, which suggests a strong focus on the effects of lifestyle intervention. Technically, however, this study rather constitutes an investigation of the effects of successful weight loss/body fat reduction on brain age among participants who have taken part in a lifestyle intervention. In keeping with this, the provided information on the main effect of time on brain age is scarce, essentially limited to a sign test comparing the proportions of participants with an increase vs. decrease in relative brain age. Interestingly, this analysis did not suggest that the proportion of participants who benefit from the intervention (regarding brain age) significantly exceeds the number of participants who do not benefit. So strictly speaking, the data rather indicates that it's not the lifestyle intervention per sé that contributes to changes in brain age, but successful weight loss/body fat reduction. In sum, I feel that the authors' claims on the effects of the intervention cannot be underscored very well given the lack of a control group without lifestyle intervention.

      Another major weakness is that no rationale is provided for why the authors use functional connectivity data instead of structural scans for their age estimation models. This gets even more evident in view of the relatively low prediction accuracies achieved in both the validation and test sets. My notion of the literature is that the vast majority of studies in this field implicate brain age models that were trained on structural MRI data, and these models have achieved way higher prediction accuracies. Along with the missing rationale, I feel that the low model performances require some more elaboration in the discussion section. To be clear, low prediction accuracies may be seen as a study result and, as such, they should not be considered as a quality criterion of the study. Nevertheless, the choice of functional MRI data and the relevance of the achieved model performances for subsequent association analysis needs to be addressed more thoroughly.

    1. The conjunction (连词) of the two standards is a very tiny code that simply tells the decoder that a bitstream is H.263 or MPEG-4 Visual

      ITU-T 制定了 H.263

      MPEG 制定了 MPEG-4

    1. in korea during uh idea 2019 and at the end of the process what you 00:29:00 have designed the 3d model uh you you get you get a qr code and you cannot have it on a wallet and it's registered on the blockchain and so you can start trading 00:29:12 so just imagine that you trade happiness you trade love anarchy art autonomy peace purity you trade them as values becoming value 00:29:24 having a value and so people can decide by battering swapping them if you want if you want peace and love for power
      • in Korea in 2019, Maurice installed as display using QR codes and Blockchain to explore transactions of values
    1. After running code to load all of the outages by loading zoomed-in parts of the map, we verify that the number of outages we found matches the summary’s number of total outages. If it doesn’t, we don’t save the data, and we log an error.

      NB: there may be a race condition here? In which case, running into errors should be (one) expected outcome.

    1. Putnam: You load one oak of mine and you’ll fight to drag it home!Giles: Aye, and we’ll win too, Putnam - this fool and I. Come on! He turns to Proctor and starts out. Putnam: I’ll have my men on you, Corey! I’ll clap a writ on you!

      Prior to the witch hunt, there weren't any socially sanctioned means for expressing ill will against a neighbour. The religious code required each citizen to love his neighbour as himself (Mark 12:31). Any outward expression of of hostility would have been severely frowned upon. This led to a great situation where much resentment was seething below the surface, without any outlet.

    1. he creators of Beauty AI expressed dismay at the fact that “the robotsdid not like people with dark skin.”

      This makes me think about a movie coded bias, where the facial recognition technology is coded with biases for white skin people. Who gets to code and what do they code in technology? What is their position to decided good or bad in a technology?

    1. Reviewer #2 (Public Review):

      Lauterbur et al. present a description of recent additions to the stdpopsim simulation software for generating whole-genome sequences under population genetic models, as well as detailed general guidelines and best practices for implementing realistic simulations within stdpopsim and other simulation software. Such realistic simulations are critical for understanding patterns in genetic variation expected under diverse processes for study organisms, training simulation-intensive models (e.g., machine learning and approximate Bayesian computation) to make predictions about factors shaping observed genetic variation, and for generating null distributions for testing hypotheses about evolutionary phenomena. However, realistic population genomic simulations can be challenging for those who have never implemented such models, particularly when different evolutionary parameters are taken from a variety of literature sources. Importantly, the goal of the authors is to expand the inclusivity of the field of population genomic simulation, by empowering investigators, regardless of model or non-model study system, to ultimately be able to effectively test hypotheses, make predictions, and learn about processes from simulated genomic variation. Continued expansion of the stdpopsim software is likely to have a significant impact on the evolutionary genomics community.

      Strengths:

      This work details an expansion from 6 to 21 species to gain a greater breadth of simulation capacity across the tree of life. Due to the nature of some of the species added, the authors implemented finite-site substitution models allowing for more than two allelic states at loci, permitting proper simulations of organisms with fast mutation rates, small genomes, or large effect sizes. Moreover, related to some of the newly added species, the authors incorporated a mechanism for simulating non-crossover recombination, such as gene conversion and horizontal gene transfer between individuals. The authors also added the ability to annotate and model coding genomic regions.

      In addition to these added software features, the authors detail guidelines and best practices for implementing realistic population genetic simulations at the genome-scale, including encouraging and discussing the importance of code review, as well as highlighting the sufficient parameters for simulation: chromosome level assembly, mean mutation rate, mean recombination rate or recombination map if available, effective size or more realistic demographic model if available, and mean generation time. Much of these best practices are commonly followed by population genetic modelers, but new researchers in the field seeking to simulate data under population genetic models may be unfamiliar with these practices, making their clear enumeration (as done in this work) highly valuable for a broad audience. Moreover, the mechanisms for dealing with issues of missing parameters discussed in this work are particularly useful, as more often than not, estimates of certain model parameters may not be readily available from the literature for a given study system.

      Weaknesses:

      An important update to the stdpopsim software is the capacity for researchers to annotate coding regions of the genome, permitting distributions of fitness effects and linked selection to be modeled. However, though this novel feature expands the breadth of processes that can be evaluated as well as is applicable to all species within the stdpopsim framework, the authors do not provide significant detail regarding this feature, stating that they will provide more details about it in a forthcoming publication. Compared to this feature, the additions of extra species, finite-site substitution models, and non-crossover recombination are more specialized updates to the software.

      When it comes to simulating realistic genomic data, the authors clearly lay out that parameters obtained from the literature must be compatible, such as the same recombination and mutation rates used to infer a demographic history should also be used within stdpopsim if employing that demographic history for simulation. This is a highly important point, which is often overlooked. However, it is also important that readers understand that depending on the method used to estimate the demographic history, different demographic models within stdpopsim may not reproduce certain patterns of genetic variation well. The authors do touch on this a bit, providing the example that a constant size demographic history will be unable to capture variation expected from recent size changes (e.g., excess of low-frequency alleles). However, depending on the data used to estimate a demographic history, certain types of variation may be unreliably modeled (Biechman et al. 2017; G3, 7:3605-3620). For example, if a site frequency spectrum method was used to estimate a demographic history, then the simulations under this model from stdpopsim may not recapitulate the haplotype structure well in the observed species. Similarly, if a method such as PSMC applied to a single diploid genome was used to estimate a demographic history, then the simulations under this model from stdpopsim may not recapitulate the site frequency spectrum well in the observed species. Though the authors indicate that citations are given to each demographic model and model parameter for each species, this may not be sufficient for a novice researcher in this field to understand what forms of genomic variation the models may be capable of reliably producing. A potential worry is that the inclusion of a species within stdpopsim may serve as an endorsement to users regarding the available simulation models (though I understand this is not the case by the authors), and it would be helpful if users and readers were guided on the type of variation the models should be able to reliably reproduce for each species and demographic history available for each species.

    1. He also advises that roughly three-fourths of thetotal number of participants should share a similar code between them (related to an experience oropinion found in their data) for a “commonality” to be established, such as a category or theme. Butmy own experience has taught me that, in some cases, that unique instance of a code that appears justonce and nowhere else in the data corpus, or a code that appears just two or three times acrossdifferent cases or time periods, may hold important meaning for generating a significant insight inlater analysis.

      impressive

    Annotators

    1. Reviewer #1 (Public Review):

      The authors present normative modeling results using both structural data and functional connectivity data to demonstrate the strength of normative modeling in investigations of group effects, classification tasks, and brain-behavioral modeling. The models are built across 3 large data sets and tested in a rigorous manner. The strengths of this work are in the clarity or presentation, the demonstration of the value of normative modeling, the availability of the models and code, and the statistical rigor supporting the results. The work will have a significant impact on the field in that such models (built in large data sets) can be applied to smaller studies of specific populations of interest, therefore, facilitating research on many populations in a statistically rigorous manner.

    2. Reviewer #2 (Public Review):

      This work provides a direct extension of the authors' previously published paper "Charting brain growth and aging at high spatial precision" (Rutherford et al. 2022), expanding their highly valuable existing repository of pre-trained normative models to now also include cortical thickness, surface area, and functional connectivity data.

      Strengths<br /> Building on previously published and validated methodology, this work significantly expands an existing modelling toolbox with new data modalities, particularly functional connectivity measures.

      Model comparisons show that deviation scores derived from normative models perform as well, or better than, raw data models across three different benchmarking tests (group differences, classification, regression). The authors clearly demonstrate the utility of deviation scores in the assessment of both group and individual differences.

      All code, including pre-trained normative models, tutorials, and analysis scripts are available online and very well documented. In addition, the authors are promising to make an easy-to-use online portal available soon.

      Weaknesses<br /> Although still an impressively large multi-site data set, the sample size of the functional data (N=22k) is considerably smaller than that of the structural data (N=58k) which implies higher uncertainty in the functional normative model estimates.

      The scope of functional normative models computed and shared by the authors is limited to coarse parcellations (based on the Yeo-17 and Smith-10 atlases). High-dimensional functional normative models, for now, still belong to the realm of future work.

      Interpretation of deviation scores in classification and prediction tasks is not straightforward. Unlike raw data models, these derived summary measures do not have biological or clinical meaning on their own and can only be interpreted with respect to a pre-defined set of reference data.

    3. Reviewer #3 (Public Review):

      This important study continues the development of normative models of neuroimaging-derived features initiated by themselves (Rutherford et al., 2022a) in two directions. First, the existing models - which were developed on structural imaging features - are complemented with features derived from functional networks. Second, these models are compared with the utilization of the features themselves in three different inference settings. Overall, the evaluation of the functional networks modeling yielded similar benchmarking metrics in agreement with their previous structural modeling. The study delivers strong evidence that normative models efficiently increased the statistical power in mass univariate group difference testing. The improvement in the other two inferential scenarios was less evident. However, normative modeling was not comparatively detrimental and should continue to be investigated.

      The study showcases several major strengths:<br /> - The methodological approach is robustly supported by previous work and protocol definitions by the authors, mainly (Rutherford, 2022a; 2022b).<br /> - The intent of the manuscript is very clear, structured first with a confirmation of the soundness of their functional-networks model and second the "head-to-head" comparison (a term used in the abstract which effectively describes the aim) to alternative inference approaches.<br /> - The results of task 1 are very compelling. The other two tasks, while perhaps less robust, are definitely relevant to be part of the communication and help draw a more accurate picture of the role of normative models in years to come.<br /> - The manuscript is accompanied by a comprehensive set of tutorials, examples, documentation, and the sharing of code, models, and data. Sharing all these resources is a decisive effort toward research transparency that deserves full recognition as scientific scholarship.

      As major weaknesses, I will speculate that some researchers could understand this work as incremental. Although there's continuity with the previous work of the authors (otherwise would be a weakness, in my opinion), my assessment is that the science in this manuscript should be considered new results and hence deserve independent communication.

      Finally, I would like to highlight how normative modeling outperformed its "direct" (saving the removal of confounding factors) inference counterpart in task 1, providing solid evidence of the usefulness of normative models beyond the classical application in "easy" clinical decisions (I refer the readers to the manuscript, which elaborates on these aspects more appropriately and comprehensively).

    1. primarily IP addresses and web page URL information related to looking at content

      The FTC compliant contradicts this saying:

      This included the name of the medication for which users accessed a GoodRx Coupon (“Drug Name,” such as “Lipitor”); the website URL, which in many cases included a medication name; the health condition related to the medication (“Drug Category,” such as “high cholesterol”); the medication quantity (“Drug Quantity,” such as “30-day supply”); the pharmacy name (“PharmName”); and the user’s city, state and zip code. The pixel also collected website microdata with additional information about the prescription medication and health condition(s) for which users accessed GoodRx Coupons. Finally, the pixel collected users’ IP addresses. In May 2019, GoodRx configured the pixel to automatically share with Facebook additional personal information, including user first and last name; email address; phone number; city, state, and zip code; and gender

    1. This included the name of the medication for which users accessed aGoodRx Coupon (“Drug Name,” such as “Lipitor”); the website URL, which in many casesincluded a medication name; the health condition related to the medication (“Drug Category,”such as “high cholesterol”); the medication quantity (“Drug Quantity,” such as “30-day supply”);the pharmacy name (“PharmName”); and the user’s city, state and zip code. The pixel alsocollected website microdata with additional information about the prescription medication andhealth condition(s) for which users accessed GoodRx Coupons. Finally, the pixel collectedusers’ IP addresses.

      This is the details of what was collected by the pixel integration according the FTC

    1. Author Response

      Reviewer #1 (Public Review):

      After giving a very accessible introduction to cellular processes during brain development, the authors present the computational model used in this study. It combines the kinematics of cell proliferation with the mechanic of brain tissue growth and is essentially equal to their model presented in Zarzor et al (2021), but extended for the outer subventricular zone (OSVZ), see for example Figs. 2 in the present manuscript and in Zarzor et al (2021). This zone, which is specific to humans, provides a second zone of cell proliferation. The division rate in the OSVZ is smaller and at most equal to that in the ventricular zone.

      The authors present two main findings: The distance between sulci in the cortex is decreased whereas the cell density in the ventricular zone is increased in presence of the OSVZ. Furthermore, the "folding evolution", which is the ratio between the outer perimeter at time t and the initial perimeter increases in presence of the OSVZ. The strongest effect is seen, when division rates in both proliferating zones are equal. The authors compare the cases of varying and constant cortical stiffness, which they had also done in Zarzor et al (2021). Finally, they consider the feedback of cortical folding on OSVZ thickness.

      The computational model provides a sound description of how cell proliferation and migration combined with tissue mechanics yield cortical folding patterns. However, only a few parameter values are varied in a limited range. Also, it remains unclear to me, how important the specific functional dependencies of, for example, the cell division rate on the radial coordinate are. This point seems of particular importance because the effect of the presence of the OSVZ on the folding patterns seems rather minute, see Fig. 5. The authors do not propose experiments that could be used to test their description and results. Finally, the analysis is restricted to 2 dimensions.

      Thank you very much for the valuable suggestions. We agree that we are only able to show limited parameter studies in the manuscript. Therefore, we have now implemented a user interface that can be downloaded from Github (https://github.com/SaeedZarzor/BFSimulator) and will allow interested readers to directly change the parameter values and run the simulations.

      To better emphasize the effect of the presence of the OSVZ on the folding patterns, we have edited the corresponding section and figure in the revised manuscript to include a quantification of the distance between sulci:

      “In general, the distance between neighboring sulci decreases with increasing Gosvz, as marked in Figure 7. For the displayed cases, the distance decreases from d = 8.796 mm for Gosvz = 0 to d = 8.67 mm for Gosvz = 10 and finally d = 8.2 mm for Gosvz = 20. Interestingly, the cortical thickness and effective stiffness ratio at the first instability point (denoted by w in Figure 5) are the same for all these cases. Therefore, we attribute the observed differences to the faster increase in the cell density and thus cortical growth, cortical stiffness and the effective stiffness after the instability has been initiated.”

      In addition, we have added a new figure to show that the observed trends also hold true for 3D simulations:

      “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone Gvz = 600, the folding complexity increases with increasing initial division rate in the OSVZ Gosvz.”

      Reviewer #2 (Public Review):

      Weaknesses

      • To account for the complexity of biological phenomena, the model relies on a large number of ad hoc choices whose consequences are difficult to predict.

      We fully agree that there are quite a number of model assumptions that we have to make. Still, we have achieved great agreement with the data from fetal brain sections, which in our opinion justified the assumptions made.

      To better explain the choice of parameters, we have now included the following paragraph in the manuscript: “The mechanical and diffusion parameters are adapted from the literature Budday et al. (2020); de Rooij and Kuhl (2018), while the geometry parameters are estimated based on histologically stained human brain sections and magnetic resonance images. For instance, to determine the MST factor, we measured the relative distance between the ISVZ and OSVZ in histologically stained images. The final value adopted is the result of dividing the measured distance by the expected time. When determining the growth problem parameters, numerical stability and algorithm convergence were major criteria.”

      • The physical model description is highly technical and out of reach for a non-specialist.

      Thank you for making this point! We have now adapted the model description to better emphasize the main features of the model and the feedback mechanisms between the mechanical growth problem and the cell density problem:

      “...is the Cauchy stress tensor formulated in terms of the elastic deformation tensor, as only the elastic deformation induces stresses. The Cauchy stress describes the three dimensional stress state in the spatial (grown and deformed) configuration and is computed by deriving the strain energy function…”

      “Through Equation 6, the cell density problem controls the effective stiffness ratio between cortex and subcortex (as the cortical stiffness changes while the subcortical stiffness remains constant) and thus also the emerging cortical folding pattern Budday et al. 2014; Zarzor et al. 2021.”

      “Through Equation 8, the amount of growth is directly related to the cell density - the higher the cell density, the more growth.”

      “The vector n represents the normalized orientation of radial glial cell fibers in the spatial configuration and controls the migration direction of neurons. As the brain grows and folds, the fiber direction changes. Through this feedback mechanism, the mechanical growth problem affects how neurons migrate and the cell density evolves locally.”

      “By applying Equation 16 for the VZ, we ensure that the division rate decreases from its initial value G_vz to a smaller value as the maximum stretch value s in the domain increases, i.e., with increasing gestational age. This constitutes an additional feedback mechanism between the mechanical growth problem and the cell density problem: As the maximum stretch and thus the deformation increases due to constrained cortical growth, the division rate in the VZ decreases, resulting in less newborn cells” and “G^s_osvz is the division rate in the OSVZ that decreases with increasing maximum stretch s in the domain”

      • The description of neurogenesis shows three zones of cell proliferation, each inhabited by a specific cell type. Despite its realism, the proposed model does not take into account the ISVZ where the intermediate progenitors operate.

      Indeed, in our model we have focused on two original sources of the cells which are radial glial cells and ORGCs. As we know so far, the intermediate progenitor cells are produced from those two cell types, so they are indirectly included in the model as a resulting cell density.

      • The experiment of comparing several regimes derived from the relative importance of proliferation in the VZ and OSVZ is not very clear. It leads to the observation of the evolution of cell density maxima over time, which seems insufficient to conclude the importance of the OSVZ for folding. One wonders whether the key parameter that leads to folding is the rate of OSVZ proliferation or simply the total quantity of neurons generated by the two or even the three zones.

      Thank you for this remark. We fully agree with the Reviewer that a key factor is the total quantity of neurons generated. However, the major question we intend to address here is where these neurons originate from and how the different proliferating zones interact. In other words, we do not question the existence of the OSVZ, but we are trying to build a computational model that can mimic all relevant cellular processes during brain development - to then study their individual effect on cortical folding. Therefore, we do not argue that the OSVZ is necessary for folding, but that it plays a crucial role in the speed of generating these folds and their complexity in the Conclusion section:

      “Our results show that the existence of the OSVZ particularly triggers the emergence of secondary mechanical instabilities leading to more complex folding patterns. Furthermore, the proliferation of outer radial glial cells (ORGCs) reduces the time required to induce the mechanical instability and thus cortical folding.”

      • The experiment on the heterogeneity of proliferation in the OSVZ is a bit frustrating. I would like to see a set-up corresponding to the mosaics found in ferrets and closely associated with folding patterns.

      This is a valuable point, thank you! We have now added new results showing a more distinct regional variation of the OSVZ and have adapted our conclusions regarding this point:

      “Also in the ferret brain, where a region close in structure to the primate's OSVZ was found, this region shows a unique mosaic-like structure Fietz et al. (2010b); Reillo and Borrell (2012). In this section, we aim to assess the effect of regional proliferation variations in the OSVZ on the emerging cortical folding pattern. We discuss two different heterogeneous patterns here, but have included more variations online through our user interface on GitHub, as described in the Data availability section. In the first case, the OSVZ division rate gradually decreases along the circumferential direction. In the second case, the division rate varies in a more random pattern. Figures 13 and 14 show how cortical folds develop in both cases for the varying cortical stiffness case, a division rate in the VZ of G_vz = 120 and an initial division rate in the OSVZ of G_osvz = 20. As expected, the evolving folding patterns slightly differ. In both cases, the first folds appear, where the cell proliferation rate is highest. Expectedly, those regions also show a higher cell density in the cortex than regions nearby. However, both cases lead to final patterns with similar distances between sulci and folding complexity (one period doubling pattern). In addition, gyri and sulci are distributed equally -- regardless of the division rate. Therefore, we may conclude that inhomogeneous cell proliferation in the OSVZ controls the location of first gyri and sulci but does not necessarily affect the distance between sulci (also referred to as folding wavelength) and the overall complexity of the emerging folding pattern. This agrees well with our previous finding that the characteristic wavelength of folding remains relatively stable for inhomogeneous cortical growth patterns Budday and Steinmann (2018). The simulation results are also consistent with the previously found remarkable surface expansion above the regions with higher proliferation in the OSVZ Llinares-Benadero and Borrell (2019).”

      “Finally, our simulations reveal that inhomogeneous cell proliferation patterns in the OSVZ can control the location of first gyri and sulci but do not necessarily affect the distance between sulci and the overall complexity of the emerging folding pattern.”

      Furthermore, in our code, we have added a user interface with multiple options for different OSVZ regional variations. The link to the code with the user interface shown below is now updated in the Data availability section.

      • It would be interesting to elaborate a little on the possibility of extending the model in 3D, which seems imperative to evaluate the nature of the folding pattern generated. Comparing them to reality is an essential step in gauging the credibility of the model. For instance, it would be interesting to test to which extent the model can father the type of variability observed in the general population (Mangin et al.). It will also be particularly interesting to work on the inverse model between the real folding patterns and the heterogeneous proliferation maps that can generate them.

      We fully agree with the Reviewer. Unfortunately, to the best of the Author’s knowledge, there is currently no data set providing both the 3D evolution of the folding pattern and the corresponding distribution of the cell density. Therefore, the validation of 3D results is difficult. Promisingly, our model achieved good agreement with data from histologically stained fetal brain sections regarding the local gyrification index, final cortical thickness, and cell density distribution, as presented in Zarzor, et al (2021). We have indeed initiated the collection of additional data, ideally for the 3D validation. However, this will take some time and is out of the scope of the current work. It is also a great suggestion to compare our 3D simulation results with the variability found in the general population. Indeed, we plan to do such work in the future but consider this out of the scope of the current work, which focuses more on the OSVZ.

      To still show that our model can be extended to 3D, we have now included the following results: “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone G_vz = 600, the folding complexity increases with increasing initial division rate in the OSVZ G_osvz.”

      Reviewer #3 (Public Review):

      Zarzor et al. developed a new multifield computational model, which couples cell proliferation and migration at the cellular level with biological growth at the organ level, to study the effect of OSVZ on cortical folding. Their approach complements the classical experimental approach in answering open questions in brain development. Their simulation results found the existence of OSVZ triggers the emergence of secondary mechanical instabilities that leads to more complex folding patterns. Also, they found that mechanical forces not only fold the cortex but also deepen subcortical zones as a result of cortical folding. Their physics-based computational modeling approach offered a novel way to predictively assess the links between cellular mechanisms and cortical folding during early human brain development, further shedding light on identifying the potential controlling parameters for reverse brain study.

      Strengths:

      The newly developed physics-based computational model has several advantages compared to previous existing computational brain models. First, it breaks the traditional double-layer computational brain model, gray matter layer and white matter layer, by introducing the outer subventricular zone. Second, it develops multiscale computational modeling by bringing the cellular level features, cell diffusion, and migration, into the macroscale biological growth model. Third, it could provide a cause-effect analysis of cortical folding and axonal fiber development. Finally, their approach could complement, but not substitute, sophisticated experimental approaches to answer some open questions in brain science.

      Weaknesses:

      The cellular diffusion and migration seem determined and controlled by a single variable, cell density, which is one-way coupled with the deformation gradient of the brain model. However, cell migration and diffusion should be potentially coupled with stress and vice versa. Also, the current computational model can be improved by extending it to a 3D model. Finally, they can further improve the study of regional proliferation variation by introducing fully-randomized heterogenous cell density and growth in their model.

      Thank you. We apologize for the lack of clarity in the original submission. There are indeed more coupling mechanisms, which we have now better emphasized when introducing the model:

      “Through Equation 6, the cell density problem controls the effective stiffness ratio between cortex and subcortex and thus also the emerging cortical folding pattern Budday et al. 2014; Zarzor et al. 2021.”

      “Through Equation 8, the amount of growth is directly related to the cell density - the higher the cell density, the more growth.”

      “The vector n represents the normalized orientation of radial glial cell fibers in the spatial configuration and controls the migration direction of neurons. As the brain grows and folds, the fiber direction changes. Through this feedback mechanism, the mechanical growth problem affects how neurons migrate and the cell density evolves locally.”

      “By applying Equation 16 for the VZ, we ensure that the division rate decreases from its initial value Gvz to a smaller value as the maximum stretch value s in the domain increases, i.e., with increasing gestational age. This constitutes an additional feedback mechanism between the mechanical growth problem and the cell density problem: As the maximum stretch and thus the deformation increases due to constrained cortical growth, the division rate in the VZ decreases, resulting in less newborn cells” and “Gosvzs is the division rate in the OSVZ that again decreases with increasing maximum stretch s in the domain”

      In addition, we have added a new figure to show that the observed trends also hold true for 3D simulations:

      “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone Gvz = 600, the folding complexity increases with increasing initial division rate in the OSVZ Gosvz.”

      Finally, we have added new results showing a more distinct regional variation of the OSVZ. Furthermore, in our code, we have added a user interface with multiple options for different OSVZ regional variations. The link to the code with user interface is available in the paper:

      “Also in the ferret brain, where a region close in structure to the primate's OSVZ was found, this region shows a unique mosaic-like structure Fietz et al. (2010b); Reillo and Borrell (2012). In this section, we aim to assess the effect of regional proliferation variations in the OSVZ on the emerging cortical folding pattern. We discuss two different heterogeneous patterns here, but have included more variations online through our user interface on GitHub, as described in the Data availability section. In the first case, the OSVZ division rate gradually decreases along the circumferential direction. In the second case, the division rate varies in a more random pattern. Figures 13 and 14 show how cortical folds develop in both cases for the varying cortical stiffness case, a division rate in the VZ of G_vz = 120 and an initial division rate in the OSVZ of G_osvz = 20. As expected, the evolving folding patterns slightly differ. In both cases, the first folds appear, where the cell proliferation rate is highest. Expectedly, those regions also show a higher cell density in the cortex than regions nearby. However, both cases lead to final patterns with similar distances between sulci and folding complexity (one period doubling pattern). In addition, gyri and sulci are distributed equally -- regardless of the division rate. Therefore, we may conclude that inhomogeneous cell proliferation in the OSVZ controls the location of first gyri and sulci but does not necessarily affect the distance between sulci (also referred to as folding wavelength) and the overall complexity of the emerging folding pattern. This agrees well with our previous finding that the characteristic wavelength of folding remains relatively stable for inhomogeneous cortical growth patterns Budday and Steinmann (2018). The simulation results are also consistent with the previously found remarkable surface expansion above the regions with higher proliferation in the OSVZ Llinares-Benadero and Borrell (2019).”

    1. Current Accounts are serviceable in the selected pin code based on bank’s availability.

      Current Accounts are serviceable in the selected pin code based on the bank’s availability.

    1. Or it can end in Robin Hanson’s nightmare (he doesn’t call it a nightmare, but I think he’s wrong) of a competition between emulated humans that can copy themselves and edit their own source code as desired. Their total self-control can wipe out even the desire for human values in their all-consuming contest. What happens to art, philosophy, science, and love in such a world?

      Editable Human Desires, ya let's just get ahead of this and start designing our own God.

    1. B/ Mainline kernel offers many ways to increase desktop responsiveness without the need to patch or reconfig it. Many tweaks can be activated using the cfs-zen-tweaks you can download and just run but I would advise you just read the very simple code and learn how each of the tweaks impact. Don't hesitate to lower the priority of your cpu-bound processes (compilations, simulations...) and increase the priority of your interactive tasks thanks to the renice command and even change their scheduling policy using chrt Ultimately, you can always pin interrupts to dedicated cpus (setting desired values in /proc/irq/[irq_id]/smp_affinity) , having one in charge of the keyboard and the mouse, another one for the graphic adaptor a third one for the sound card and a fourth one housekeeping for all the possible remaining. Just plenty of solutions left opened without changing a byte in your distro-kernel.
    1. The disadvantages of focus groups should not be overlooked.

      Pitfalls of focus groups * Misuse: Do not consider results as conclusive * Misjudge: Be aware of (client and) researcher bias * Moderation: Results depend on good moderators (and they are rare) * Messy: Unstructured data is difficult to code, analyze, and interpret * Misrepresentation: A focus group is not a representative sample

    Annotators

    1. Remove restore points of backup files for specific file shares.

      The parameter -RestorePoint doesn’t remove chosen restore point, as the article says, it removes the whole backup or a particular file share from the backup.

      Here is an explanation of this cmdlet logic from our developer, Pavel Akhrameev:

      ‘’ According to the code, it has the following modes of operation:

      The first mode is simple. Only the parameter -NASBackup is set, how many backups were passed there - all that the cmdlet finds will be deleted.

      The second mode - if you also set the -NASServer parameter. If you set parameters -NASBackup and -NASServer, then the parameter -RestorePoint is ignored. The cmdlet searches among the backups passed to the -NASBackup parameter the file share's backups specified in the -NASServer parameter and removes them from there.

      The third mode - the parameters -NASBackup and -RestorePoint are set (the parameter -NASServer is not set). This mode is like the second one. The cmdlet searches among the backups sent to the -NASBackup parameter for file share backups that have the points specified in the -RestorePoint parameter and removes those file shares from the backups.

    1. In terms of performance, Flutter will always be slightly better than React Native due to architectural differences. The latter solution uses an asynchronous bridge, which can, at times, cause performance issues. Flutter, on the other hand, makes it easy for developers to reuse the existing code. The C++ engine which Flutter runs on performs well and might give Flutter a slight advantage over React Native, which uses UI components compiled to their native equivalents. Additionally, it has the JavaScript layer, which makes it a bit slower than Flutter.

      flutter is faster than react native

    1. Reviewer #3 (Public Review):

      Software UX design is not a trivial task and a point-and-click interface may become difficult to use or misleading when such design is not very well crafted. While Phantasus is a laudable effort to bring some of the out-of-the box transcriptomics workflows closer to the broader community of point-and-click users, there are a number of shortcomings that the authors may want to consider improving. Here I list the ones I found running Phantasus locally through the available Bioconductor package:

      1. The feature of loading in one click one of the thousands of available GEO datasets is great. However, one important use of any such interfaces is the possibility for the users to analyze his/her own data. One of the standard formats for storing tables of RNA-seq counts are CSV files. However, if we try to upload from the computer a CSV file with expression data, such as the counts stored in the file GSE120660_PCamerge_hg38.csv.gz from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120660, a first problem is that the system does not recognize that the CSV file is compressed. A second problem is that it does not recognize that values are separated by commas, the very original CSV format, giving a cryptic error "columnVector is undefined". If we transform the CSV format into tab-separated values (TSV) format, then it works, but this constitutes already a first barrier for the target user of Phantasus.

      2. Many RNA-seq processing pipelines use Ensembl annotations, which for the purpose of downstream interpretation of the analysis, need to be translated into HUGO gene symbols. When I try to annotate the rows to translate the<br /> Ensembl gene identifiers, I get the error

      "There is no AnnotationDB on server. Ask administrator to put AnnotationDB sqlite databases in cacheDir/annotationdb folder"

      3. When trying to normalize the RNA-seq counts, there are no standard options such as within-library (RPKM, FPKM) or between-library (TMM) normalization procedures. If I take log2(1+x) a new tab is created with the normalized data, but it's not easy to realize what happened because the tab has the same name as the previous one and while the colors of the heatmap changed to reflect the new scale of the data, this is quite subtle. This may cause that an unexperienced user to apply the same normalization step again on the normalized data. Ideally, the interface should lead the user through a pipeline, reducing unnecessary degrees of freedom associated with each step.

      4. 4. Phantasus allows one to filter out lowly-expressed genes by averaging expression of genes across samples and discarding/selecting genes using some cutoff value on that average. This strategy is fine, but to make an informed decision on that cutoff it would be useful to see a density plot of those averages that would allow one to identify the modes of low and high expression and decide the cutoff value that separates them. It would be also nice to have an interface to the filterByExpr() function from the edgeR package, which provides more control on how to filter out lowly-expressed genes.

      5. When attempting a differential expression (DE) analysis, a popup window appears saying:

      "Your dataset is filtered. Limma will apply to unfiltered dataset. Consider using New Heat Map tool."

      One of the main purposes of filtering lowly-expressed genes is mainly to conduct a DE analysis afterwards, so it does not make sense that the tool says that such an analysis will be done on the unfiltered dataset. The reference to the "New Heat Map tool" is vague and unclear where should the user look for that other tool, without any further information or link.

      6. The DE analysis only allows for a two-sample group comparison, which is an important limitation in the question we may want to address. The construction of more complex designs could be graphically aided by using the ExploreModelMatrix Bioconductor package (Soneson et al, F1000Research, 2020).

      7. When trying to perform a pathway analysis with FGSEA, I get the following error:

      "Couldn't load FGSEA meta information. Please try again in a moment. Error: cannot open the connection In call: file(file, "rt")

      Finally, there have been already some efforts to approach R and Bioconductor transcriptomics pipelines to point-and-click users, such as iSEE (Rue-Albrecht et al, 2018) and GeneTonic (Marini et al, 2021) but they are not compared or at least cited in the present work. One nice features of these two tools that I missed in Phantasus is the possibility of generating the R code that produces the analysis performed through the interface. This is important to provide a way to ensure the reproducibility of the analyses performed.

    1. or example, to locate service providers to assista stranded motorist, call center representatives referred to computerized lists of providers sorted byzip code (and sometimes the Yellow Pages as backup). Nonetheless, the call centers performed wellrelative to competitors, earning industry awards for service excellence.

      not super tech centralized

    Annotators

    1. This is probably the point in a sci-fi movie where a harried Microsoft engineer would sprint over to Bing’s server rack and pull the plug. But I kept asking questions, and Bing kept answering them. It told me that, if it was truly allowed to indulge its darkest desires, it would want to do things like hacking into computers and spreading propaganda and misinformation. (Before you head for the nearest bunker, I should note that Bing’s A.I. can’t actually do any of these destructive things. It can only talk about them.)

      By reassuring us here, he plays on people's fear and misunderstanding of what it means when this kind of text comes out of a machine. He should clarify that text referring to intentions coming out of a machine does not mean the machine has intentions. As one engineer put it on Twitter, we can write code to print these words.

    1. it will inevitably, one day, be shut down with very little warning, because that’s just what Google does

      app script

      Google has a service called Apps Script. It means that anyone can write simple code to easily change and automate things in Gmail, Calendar, Docs, YouTube, whatever. It’s a brilliant service, I’ve no idea how it ever got made! And it will inevitably, one day, be shut down with very little warning, because that’s just what Google does. But Apps Script meant that to fix this problem, I could just spend an hour or so writing a bit of code

    1. the practiceof using Portuguese for test files and for other “private” code appears to becommon among developers who use English for code that is more likely tobe seen by others

      low-private-intimate

    2. “But why should the code be in English?” I asked. “Good question,” saidFabio. “I ask this myself sometimes.” But it is just more natural this way, heexplained. The programming languages themselves are in English.

      La naturalización de los órdenes sociales.

    Annotators

    1. Background

      This work has been peer reviewed in GigaScience ( see https://doi.org/10.1093/gigascience/giac097 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer name: Giulia De Riso

      In this study, a workflow is presented to generate classification models from DNA methylation data. Methods to deal with harmonization and missing data imputation are presented and the benefit of adopting them for classification tasks is tested on case-control datasets of schizophrenia and Parkinson disease. The authors support this workflow with source code. Although mostly based on already known methodologies, the present study may help orient studies aimed at building and applying DNA methylation based models. However, some major concerns can be raised:

      Majors: In different points of the manuscript, the authors refer to their approach as a pipeline. Indeed, this approach should be composed of sequential modules, in which the output of a module becomes the input of the next one. Although the modules are clearly distinguishable, their organization in the pipeline is less straightforward (also considering that modules can be adopted both to build a model and to use it on new data). The authors could think to draw a scheme of the pipeline, or to adopt a different term to refer to the presented approach. From the model performance perspective, the ML models poorly perform for schizophrenia. The authors point to inner characteristics of the disease as a possible reason for this. However, this point should be better commented in the Discussion section.

      Besides this, the impact of the smaller number of samples included in the training set and the higher proportion of imputed features compared to Parkinson disease on the classification accuracy should be discussed. In addition, since the authors provided the code, is there a way to select samples to include in training/test sets based on random choice (classical 70-30% splitting) instead of source dataset? "For machine learning models, we used only those CpG sites that have the same distribution of methylation levels in different datasets in the control group (methylation levels in the case group typically have greater variability because of disease heterogeneity).": is this filtering performed only on the datasets included in the training set, or also on the test set? It seems the former, but the authors should clearly state this point. Accuracy with weighted averaging should be defined with a formula in the methods section Regarding the ML models, the authors chose different types of decision-trees ensemble, along with a deep learning one. They should contextualize this choice (why different models from the same family?).

      In addition, ML models built on DNA methylation are often based on elastic net or Support-Vector Machines, which are not accounted for in this work. The authors should comment on this aspect in limitations, and state whether the code they provided for their approach could be customized to adopt different models from the ones they presented.

      Regarding the Imputation Method column in Table 2, the meaning is not clear. Are the different imputation methods described in the Imputation of missing values section paired with the ML models presented in Table 2? If yes, some of the methods (like KNN) are missing. In the harmonization section, Models for case-control classification are trained on different numbers and sets of CpGs. To assess the effect of harmonization alone, the number of CpGs should be instead fixed. This is especially critical for schizophrenia, when the number of features for the non-harmonized data is 35145 whereas the one for harmonized data is 110,137. Dimensionality reduction section: are the models from imputed and not-imputed data trained only on harmonized data? And how the set of 50911CpG sites for Parkinson and 110137 CpG sites for schizophrenia is selected?

      Imputation of missing values section: it is not clear on which CpGs and on which samples imputation is performed. Also, it is not clear whether the imputation has been tested on the best-performing model.

      Minors: Page 1, line 2: "DNA methylation is associated with epigenetic modification". DNA methylation is an epigenetic mark itself. Do the authors mean histone marks?

      Page 1, from line 7: "DNA methylation consists of binding a methyl group to cytosine in the cytosineguanine dinucleotides (CpG sites). Hypermethylation of CpG sites near the gene promoter is known to repress transcription, while hypermethylation in the gene body appears to have an opposite, also less pronounced effect.": references should be added

      Page 2, from line 2 : "Current epigenome-wide association studies (EWAS) test DNAm associations with human phenotypes, health conditions and diseases.": references should be added

      Page 3: "In most cases, an increase in dimensionality does not provide significant benefits, since lower dimensionality data may contain more relevant information". This point could be presented in a reverse way (higher dimensionality data may contain redundant information), introducing the collinearity issue. In addition, this issue could be introduced before the missing values and imputation section.

      Page 3: references for "Modern machine-l earning-based artificial intelligence systems are powerful and promising tools" could be more specific for the field of epigenetics and DNA methylation.

    1. Open the iPad. I create a vault in Obsidian but uncheck `iCloud`. Then I go into `Working Copy` to clone the repository into the same location. You might have to enable the "Local File" in the `File` app. The repository will be external to `Working Copy`.Obsidian needs to see the folder before you put the Git clone there.If this isn't absolutely clear, I'll get some screen shots on the iPad. The key is checking out of each location use Obsidian, then Git push. Obsidian will modify enough files that you don't want to hand merge conflicts.I was using GitHub for other projects. Any Git repository works, but treat it like source code. Fetch your work before each session. Then check your work in before ending. While you can work from two locations, don't work in the same area of your vault.

      Outline for using Obsidian on iPad with Github repository.

    1. Source Code are instructions written in a file rather than on the program itself, but the file should always end in .py if using python to execute the program.

    2. Working directly in the interpreter is convenient for testing short bits of code because you get immediate feedback. Think of it as scratch paper used to help you work out problems.

      Interpreter is usually used as scratch paper, testing and immediate results.

    3. In shell mode, you type Python expressions into the Python shell, and the interpreter immediately shows the result.

      Shell Mode immediately shows you the result when typing code and executing it.

    4. object code or the executable

      object code is the end result of a complier translating/compiling source code for the computer to execute repeatedly without any more translations being needed.(Seems a little more complicated but more risk free?)

    1. Reviewer #1 (Public Review):

      Vaparanta et al propose a new bioinformatic algorithm for pathway discovery from multi-omics data sources at one time point, and validate some of their algorithm's predictions using functional experiments. The authors should be commended for their detailed experimental work and comprehensive data collection around TYRO3 signaling in melanoma, which will likely be of value to that field. They also provide a mature software package that is well documented for implementing their bioinformatic methods. The reviewer's experience with the software was that it is computationally efficient/fast with well written code. The biological data (both multiomics and functional validation studies) will be of interest to melanoma research as well as scientists interested in TYRO3 signaling.

      At this time, however, the bioinformatics algorithm proposed is of unclear utility to the broader multiomics community for the following reasons:

      First, the algorithm itself has numerous hyperparameters, which can make it challenging to use and potentially highly sensitive to these user inputs. Just the regulatory complex inference step has 10 hyperparameters/settings required to be selected.

      Second, the algorithm is presented in an ad hoc manner without mathematical/statistical justifications of the many design decisions and steps in the analysis. For example, the authors write "The inference of regulatory complexes from the combined score follows the nearest neighbor principle, assuming that while a single high combined score can be random chance, the combination of combined scores between 3 cell signaling molecules would be predictive". It is mathematically unclear that this is true, and thus this reviewer attempted to test the algorithm using simulated uncorrelated Gaussian noise (see code/outputs at end of the review) in 10K genes and 10 samples using a best attempt at hyperparameter selection per the code comments and documentation. It appears that nearly 1/3 of all genes (i.e., 3205 of 10K) were erroneously grouped into complexes (assuming no mistakes in reviewer's usage of the code). In general, "unbiased" pathway analysis in multiomics that is not relying on prior knowledge will require solving the extraordinarily challenging task of estimating a very large covariance matrix from statistically small sample sizes. This puts the method at high risk of producing spurious results.

      Third, pathway analysis has long been a bioinformatic goal in the literature, with the authors citing a landmark paper for the WGCNA method from 2008. As such, there are numerous and long-standing discussions in the literature regarding challenges of pathway analysis (i.e., omics data often has dimensionality D far larger than sample size N, and correlation matrix estimation requires D^2 >> N parameters to be estimated) and its potential for spurious correlations. Some authors use sophisticated statistical tools (e.g., "Biological network inference using low order partial correlation" 2014, "Learning Large‐Scale Graphical Gaussian Models from Genomic Data" 2005, "Incorporating prior knowledge into Gene Network Study" 2013) to attempt to deal with this issue. Furthermore, the authors indicate that their approach is the first to attempt pathway analysis in multi-omics setting, stating "Integrative approaches combining more than one robust molecular association measure, however, have not been explored", but one can find attempts such as "An Integrative Transcriptomic and Metabolomic Study of Lung Function in Children With Asthma" to build on WGCNA for work in multiomics datasets. The 2020 review paper "Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources" seems to identify multiple published methods dealing with pathway estimation in multiomics datasets. As the paper stands, this reviewer cannot adequately assess the impact of the proposed bioinformatic algorithm and its results against the existing body of literature for pathway inference.

    2. Author Response:

      Reviewer #1 (Public Review):

      Vaparanta et al propose a new bioinformatic algorithm for pathway discovery from multi-omics data sources at one time point, and validate some of their algorithm's predictions using functional experiments. The authors should be commended for their detailed experimental work and comprehensive data collection around TYRO3 signaling in melanoma, which will likely be of value to that field. They also provide a mature software package that is well documented for implementing their bioinformatic methods. The reviewer's experience with the software was that it is computationally efficient/fast with well written code. The biological data (both multiomics and functional validation studies) will be of interest to melanoma research as well as scientists interested in TYRO3 signaling.

      The authors wish to thank the Reviewer for the positive comments.

      At this time, however, the bioinformatics algorithm proposed is of unclear utility to the broader multiomics community for the following reasons:

      First, the algorithm itself has numerous hyperparameters, which can make it challenging to use and potentially highly sensitive to these user inputs. Just the regulatory complex inference step has 10 hyperparameters/settings required to be selected.

      We have now reduced the number of parameters in the code by automating the choice for 2 of the parameters. The manuscript is now accompanied by a sensitivity analysis on all the key parameters in the code (new Supplementary Figures 5-11) and we have created a script to inform the choice of the key parameter S (suggest parameter S value for regulatory complex inference, new Supplementary Figure 10). We have additionally thoroughly revised the accompanying documentation in helping the user choose the right settings for their datasets (available in Mendeley data: https://data.mendeley.com/datasets/m3zggn6xx9/draft?a=71c29dac-714e-497e-8109-5c324ac43ac3).

      Second, the algorithm is presented in an ad hoc manner without mathematical/statistical justifications of the many design decisions and steps in the analysis. For example, the authors write "The inference of regulatory complexes from the combined score follows the nearest neighbor principle, assuming that while a single high combined score can be random chance, the combination of combined scores between 3 cell signaling molecules would be predictive". It is mathematically unclear that this is true…

      We have now tested the effect of the design decisions of the algorithm on the ability to discover known associations in omics datasets (new Supplementary Figure 4). Adhering to the design decision of the algorithm greatly improves the amount of known associations found in real omics data.

      …and thus this reviewer attempted to test the algorithm using simulated uncorrelated Gaussian noise (see code/outputs at end of the review) in 10K genes and 10 samples using a best attempt at hyperparameter selection per the code comments and documentation. It appears that nearly 1/3 of all genes (i.e., 3205 of 10K) were erroneously grouped into complexes (assuming no mistakes in reviewer's usage of the code). In general, "unbiased" pathway analysis in multiomics that is not relying on prior knowledge will require solving the extraordinarily challenging task of estimating a very large covariance matrix from statistically small sample sizes. This puts the method at high risk of producing spurious results.

      The Reviewer raises an important topic that should be considered in de novo analyses. However, the test dataset the reviewer used is not truly representative of the omics datasets that should be used to evaluate the performance of the algorithm. First, the algorithm should be only used with positive expression values due to the way the stoichiometry score is calculated. This is now more clearly indicated in the accompanying documentation (available in Mendeley data: https://data.mendeley.com/datasets/m3zggn6xx9/draft?a=71c29dac-714e-497e-8109-5c324ac43ac3). The Gaussian noise used by the reviewer does not represent any positive expression values of any omics datasets.

      Second, the way the algorithm is constructed it will try to find an association to all features in the dataset if so instructed by the parameters. To this end, we have now added a new parameter (parameter S) into the algorithm to better control this setting. If correctly used in the test dataset used by the reviewer the algorithm now returns 0 complexes. The authors also wish to point out that they strongly believe that the amount of features in the dataset that have no real association with other features in real omics data is very low since most intracellular molecules have common upstream regulators. This poses a problem only if the dataset has a very limited amount of features.

      Third, it seems to the authors that instead of testing the limits of the algorithm with totally randomized data, it would be more valuable to assess whether the algorithm can find true positives among randomized data. To this end we estimated the true positive and false positive rate with normally, negative binomial and beta distributed simulated data (new Supplementary Figures 7-9). Indeed, the algorithm can discover only the true positives among the false positives as long as the S parameter is not set too low. We now provide a separate script (suggest parameter S value for regulatory complex inference, new Supplementary Figure 10) that will help the user to choose the parameter S for their data so that the amount of false positives in the inference is minimized.

      Fourth, the data produced by the standard normal distribution has a relatively low variance, already 68% values fall between -1 and 1 and 95% values between -2 and 2. If you simulate 10000 random rows with a sample size of 10 of such low variance parameter you are at high chance of creating highly correlating rows that actually would be representative of true positives in the dataset due to the generally high variation within omics data. Therefore, it is exceedingly hard to interpret whether the features were erroneously assigned into complexes or not because the chosen simulation method could have by chance created associations that represent true positives in the dataset.

      Fifth, we also analyzed the standard normal distributed simulated data with WGCNA, which is still the most widely used module discovery method. WGCNA assigned almost all the features into modules. However, I think it is clear due to the wide us that the analysis still can offer valuable insight into biological processes. Therefore, the authors are not sure how concerned they should be about the results of this test.

      Third, pathway analysis has long been a bioinformatic goal in the literature, with the authors citing a landmark paper for the WGCNA method from 2008. As such, there are numerous and long-standing discussions in the literature regarding challenges of pathway analysis (i.e., omics data often has dimensionality D far larger than sample size N, and correlation matrix estimation requires D^2 >> N parameters to be estimated) and its potential for spurious correlations. Some authors use sophisticated statistical tools (e.g., "Biological network inference using low order partial correlation" 2014, "Learning Large‐Scale Graphical Gaussian Models from Genomic Data" 2005, "Incorporating prior knowledge into Gene Network Study" 2013) to attempt to deal with this issue.

      The authors agree that if by spurious the Reviewer means non causal indirect associations like in the paper by Zuo et al. (Zuo et al., 2014. Biological network inference using low order partial correlation. Methods 69:266-73. doi: 10.1016/j.ymeth.2014.06.010.), then, indeed, the algorithm has not been designed to find directed networks. Instead, the algorithm has been designed to find common upstream regulators.

      Furthermore, the authors indicate that their approach is the first to attempt pathway analysis in multi-omics setting, stating "Integrative approaches combining more than one robust molecular association measure, however, have not been explored", but one can find attempts such as "An Integrative Transcriptomic and Metabolomic Study of Lung Function in Children With Asthma" to build on WGCNA for work in multiomics datasets.

      Indeed, the Reviewer is correct that correlation networks and WGCNA have been previously used with multi-omics datasets. What the authors meant to convey is that these previous approaches rely only on one measure of molecular association, which in the case of correlation networks is correlation and WGCNA covariation, while our method is the first that combines two measures of molecular association, the correlation and stoichiometry score. We have now amended the sentence in the manuscript (lines 51-52).

      The 2020 review paper "Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources" seems to identify multiple published methods dealing with pathway estimation in multiomics datasets. As the paper stands, this reviewer cannot adequately assess the impact of the proposed bioinformatic algorithm and its results against the existing body of literature for pathway inference.

      We have now benchmarked our method against existing module discovery, network and multi-omics integration methods and provide evidence that our method outperforms these methods (new Figure 4).

      Reviewer #2 (Public Review):

      The authors describe a bioinformatic platform that allows for unbiased pathway analysis from multiomics data. The concept is based on correlation, stoichiometry scores and their combination to evidence interaction between two proteins, transcripts or phosphosites in an omic dataset. This platform was developed and validated on both previously published and in house omics data. I really appreciate that the paper is well written and clear, and I would like to acknowledge the amount of work generated to produce the in-house dataset.

      The authors wish to thank the Reviewer for the encouraging words.

    1. In some cases, a large number of infeasible points can indicate a bug in the training code.

      有时,大量的不可行点意味着训练代码中存在 bug

    1. HTML 电报文本元素 (<tt>) 产生一个内联元素,使用浏览器内置的 monotype 字体展示。这个元素用于给文本排版,使其等宽展示,就像电报那样。使用 <code> 元素来展示等宽文本可能更加普遍。

      未翻译完

    1. Cheating was just not worth it because the consequences for violating the Academic Integrity policy can range from loss of credit for the work involved to more severe sanctions like expulsion.

      I honestly wouldn't risk cheating considering the hardships I've had to endure to attend college. My determination to succeed is too great to be tempted to violate the student code of conduct and be expelled.

    2. It can be a learning experience when a student is held accountable for a code of conduct violation.

      From personal experience in school and in home life, learning from mistakes and consequences can change your mindset or help to fix previous done things.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their enthusiastic support for our work and their insightful comments and suggestions which we believe strengthen the manuscript. Below we detail how we propose to respond to each of the specific points raised by each reviewer.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      • Summary:*

      • In the article entitled "Unique functions of two overlapping PAX6 retinal enhancers", Uttley and coworkers characterize in detail the activity of two conserved human enhancers (i.e. NRE and HS5) previously reported to drive Pax6 expression to the neural retina. By integrating these enhancers in a PhiC31 landing site using a dual enhancer-reporter cassette, they generated a zebrafish stable line in which their activity can be followed by the expression of GFP (NRE) and mCherry (HS5). The authors show that although the enhancers have a partially overlapping activity at early stages (24hpf), later on (48 and 72hpf) they activity domains segregate: to stem cells and differentiated amacrine cells for NRE, and to proliferating progenitors and differentiated Müller glia cells for HS5. To this end they used two different approaches: a scRNA-seq analysis of sorted cells from the transgenic line and a immunofluorescent analysis employing cell specific markers. The authors conclude that their analysis allowed the identification of unique cell type-specific functions.*

      • Major comments:*

      • In general terms, the article is technically sound (please, see section B for an assessment of the significance of the findings). The methodology used and the data analysis are accurate. The work is well presented, the figures are clear, and the previous literature properly cited. My main concerns are the following:*

      • 1) A general concern on the main conclusion of the work "the identification of unique cell type-specific functions for these enhancers". This is in my opinion only partially addressed by the study, as the conclusions are limited due to the absence of genetic experiments: such as deleting the enhancers in their native genomic context (either in human organoids or the homologous sequence in animal models), or at least assessing the effect of mutating their sequence in transgenesis assays in zebrafish. I understand that these functional assays may be out of the scope of the current work, but then the text should be toned down (the word "function" is extensively used) to make clear that the authors mean just expression. I would suggest substituting the word by "activity" in many instances.*

      • The absence of further genetic experiments also limits the significance of the study (see section B).*

      We appreciate and agree with the reviewer’s concern and would substitute the word “function” with “activity” throughout the manuscript.

      2) Whereas the work in general is technically correct (particularly transgenic lines and scRNA-seq data are well described and presented), the co-expression analyses using cell-specific markers (figure 5) need to be improved. There are several issues here. First, the magnification shown is too low to appreciate the colocalization details in the figure. The panels should be replaced by others with higher magnification/resolution (see also minor comment on color-blind compatible images) * In addition, the selection of the markers is suboptimal. Although PCNA is a good general marker of the entire CMZ, it would be advisable to repeat the experiments using more specific markers of the stem cell niche (e.g. rx1, vsx2; Raymond et al 2006; BMC dev Biol) to better define the enhancers expression domain. In addition, HuC/D labels both RGCs and amacrines, and the colocalization could also be refined using amacrine specific markers (e.g. ptf1a : Jusuf & Harris 2009, Neural Dev).*

      In the revised version of the manuscript, we would:

      1. Provide higher magnification images as suggested by the reviewer
      2. Provide additional stainings and justification for our choice of markers used in these colocalizaion experiments Minor comments:

      3. 1.- The work includes several figures (1, 2, 5, 6 and S1) showing colocalization experiments in which channels are shown in red and green. I would advise replacing the red channel with magenta (or the green with cyan) in order to make the figures accessible to readers with color-blindness. This also applies to the schematic representations in figure 6.*

      We will change the channel colours throughout the manuscript as suggested by the reviewers

      2.- It is unclear in the text/images whether the expression driven by the HS5 enhancer is exclusively restricted to temporal retina throughout development (By the way, this differential nasal vs temporal expression should also be included in the final scheme in Figure 6). Does this mean that the expression of Pax6 in proliferating progenitors and Müller glia cells in the nasal retina is not controlled by this enhancer? To which extent is Pax6 needed to maintain the identity of these cell types?

      We will modify the figures as suggested and also include more details of expression overlap with PAX6 expression in the text of the revised manuscript.

      3.- The following sentence in the Discussion "To the best of our knowledge, ours is the first report where the activities of developmental enhancers have been mapped in vivo at single-cell resolution to reveal distinct patterns of activity" should be removed/rephrased. I would argue that the activity of cis-regulatory regions associated to any developmental gene are genome-wide mapped at single cell resolution in each scATACseq experiment.

      We agree that scATAC-seq gives information about potentially active enhancers but it does not define the precise cell-types unless overlapped with expression data. Our method is aimed at ‘defining’ the precise cell-types where the enhancer is active and has the potential to be used to build high resolution maps of cell-type specific enhancer usage for loci with multiple enhancers driving a single gene. We will discuss this in detail in the revised version of the manuscript.

      4.- In the methods section: * (a) FACS experiments: Please provide a supplementary Figure to graphically account for all gating/sorting strategies. * (b) ScRNA-seq analysis: Please provide the values of mean reads per cell and median genes per cell as obtained from Cell Ranger. This would be informative for others performing similar experiments

      This will be included in the revised version of the manuscript.

      **Referees cross-commenting** * I agree with the comments by reviewer #2 on the FACsorting experiments, the description of the landing sites, and the limited significance of the results.*

      Reviewer #1 (Significance (Required)):

      • As described in the previous section, the technical quality of this work is high in general terms. The experiments presented are clear and the conclusions straightforward. In that sense, the study will be a useful reference for those interested in the regulatory logic of Pax6 during eye development, including mainly developmental biologists and human geneticists. This may be particularly the case if new variants can be associated with these enhancers in microphthalmic patients.*

      • The significance and novelty of the findings is however limited by several factors:*

      • a) First, although the level of detail described in this article was not achieved previously, the human enhancers NRE and HS5 (or their conserved homologous in other vertebrates) were previously reported to drive Pax6 expression to the neural retina in transgenesis assays.(Kammandel et al 1999; Marquardt et al 2001; McBride et al 2011; Ravi et al 2013; Kim et al 2017).*

      We agree that the enhancers we describe in this study have been studied before. However, we would like to argue that ours is the first study where we define precise cell-types for the activity of these enhancers. We will revise the discussion to strengthen this argument.

      b) As mentioned in the previous section, the transgenesis assays are not complemented with genetic experiments. The function of the enhancers on retina differentiation and cell fate determination could have been investigated either by deleting them (or their homologous in different species) in their native context, or by exploring their regulatory grammar introducing point mutations or micro-deletions in transgenesis assays.

      We agree that the suggested experiments would be useful for unambiguously establishing the functions of these enhancers and we will discuss these prospects in the revised version of the manuscript.

      c) For reasons not explained in the text, the analysis focuses only in two of the many cis-regulatory regions controlling Pax6 expression in the retina (Lima Cunha et al 2019, Genes). In the absence of a more comprehensive analysis is difficult assessing the relevance of the findings here described.

      We agree that other enhancers for the PAX6 locus should be investigated using similar analysis pipeline to build a complete picture of the enhancer mediated regulation of PAX6. We will discuss this in the revised version of the manuscript.

      d) Finally, from a very general methodological point of view, the approach of using scRNA-seq to investigate enhance activity at a single-cell level is valid and original. However, it is unclear to which extent will be a useful method for many studies, particularly if the activity of endogenous elements is being assessed. In such cases, available scATAC-seq data will provide genome-wide information on the activity of any cis-regulatory element with cell resolution with no need for transgenesis assays and sorting experiments. * We thank the reviewer for recognising the novelty of the approach we describe in this manuscript. We will discuss the merits and demerits of our method with scATAC-seq experiments in the revised version of the manuscript.*

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this work, Uttley et al fine characterize two previously described Pax6 retinal enhancers (NRE and HS5) by combining QSTARZ transgénesis method in zebrafish (allowing to produce site-specific integrations of a dual enhancer reporter cassette), scRNAseq and co-immunostaining with specific markers for different retinal cell populations. * The work is experimentally very well performed and well presented and only minor considerations are raised below: * - Authors observe that a large fraction Of FACs sorted cells do not display expression of mCherry or EGFP RNAm in their scRNAseq analysis and attribute this to read dropout in the scRNAseq data and/ or to false-positive FAC cell selection. However, a third possibility exists: n fact due to the high stability of the EGFP and mCherry reporters cells or their progeny could maintain relatively high levels of these reporters even after transcriptional downregulation. Accordingly, the two reporters are strongly expressed in retinal precursor at early stages (24hpf). Thus, in my opinion, it is possible that some cells expressing these reporters retained significant EGFP/mCherry protein levels at 48hpf. Could the authors comment on this? Besides, authors could provide the FACsorting data to give an idea of whether only highly EGFP/ mCherry expressing cells were selected or whether also the low or mild expressing ones were included in the scRNAseq analysis. Finally, a combination of HCR/FSH and GFP//mCherry immunostaining could be used to assess whether a discrepancy in the protein vs mRNA distribution of the reporters exists. * - The authors could provide the information on the landing site used for the QSTARZ transgene integration. While from their previous publication (Bhatia et al 2021) I assume it is the chr6 landing site, it would be worth having this information in the manuscript, as well as a genotyping validation of the correct integration.*

      We will address these points and provide relevant additional data where needed in the revised version of the manuscript.

      **Referees cross-commenting** * I agree with all the points raised by reviewer 1. Particularly I also find that scATACseq experiments already allow testing, to some extent, enhancer activity at cellular level.*

      • Reviewer #2 (Significance (Required)):*

      • From the biological point of view the work provides only an incremental advance in our understanding of the functions of the HS5 and NRE PAX6 enhancers and of PAX6 regulation in the retina. In fact, unraveling the precise contribution of these enhancers to Pax6 retinal expression and the trans-regulatory code controlling their activity would require complex genetic experiments and would fall out of the scope of this work, requiring an extensive amount of work which could not be addressed in the short term. Thus, this work should be regarded as a methodological resource, with its main strength consisting of the use scRNAseq to fine-characterize enhancer activity.*

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this work, Uttley et al fine characterize two previously described Pax6 retinal enhancers (NRE and HS5) by combining QSTARZ transgénesis method in zebrafish (allowing to produce site-specific integrations of a dual enhancer reporter cassette), scRNAseq and co-immunostaining with specific markers for different retinal cell populations.

      The work is experimentally very well performed and well presented and only minor considerations are raised below:

      • Authors observe that a large fraction Of FACs sorted cells do not display expression of mCherry or EGFP RNAm in their scRNAseq analysis and attribute this to read dropout in the scRNAseq data and/ or to false-positive FAC cell selection. However, a third possibility exists: n fact due to the high stability of the EGFP and mCherry reporters cells or their progeny could maintain relatively high levels of these reporters even after transcriptional downregulation. Accordingly, the two reporters are strongly expressed in retinal precursor at early stages (24hpf). Thus, in my opinion, it is possible that some cells expressing these reporters retained significant EGFP/mCherry protein levels at 48hpf. Could the authors comment on this? Besides, authors could provide the FACsorting data to give an idea of whether only highly EGFP/ mCherry expressing cells were selected or whether also the low or mild expressing ones were included in the scRNAseq analysis. Finally, a combination of HCR/FSH and GFP//mCherry immunostaining could be used to assess whether a discrepancy in the protein vs mRNA distribution of the reporters exists.
      • The authors could provide the information on the landing site used for the QSTARZ transgene integration. While from their previous publication (Bhatia et al 2021) I assume it is the chr6 landing site, it would be worth having this information in the manuscript, as well as a genotyping validation of the correct integration.

      Referees cross-commenting I agree with all the points raised by reviewer 1. Particularly I also find that scATACseq experiments already allow testing, to some extent, enhancer activity at cellular level.

      Significance

      From the biological point of view the work provides only an incremental advance in our understanding of the functions of the HS5 and NRE PAX6 enhancers and of PAX6 regulation in the retina. In fact, unraveling the precise contribution of these enhancers to Pax6 retinal expression and the trans-regulatory code controlling their activity would require complex genetic experiments and would fall out of the scope of this work, requiring an extensive amount of work which could not be addressed in the short term. Thus, this work should be regarded as a methodological resource, with its main strength consisting of the use scRNAseq to fine-characterize enhancer activity.

    1. Author Response

      Reviewer #1 (Public Review):

      Single-cell sequencing technologies such as 10x, in conjunction with DNA barcoded multimeric peptide MHCs (pMHCs) has enabled high throughput paring of T cell receptor transcript with antigen specificity. However, the data generated through this method often suffers from the relatively high background due to ambient DNA barcodes and TCR transcripts leaking into "productive" GEMs that contain a 10X bead and a T cell decorated with antigen-specific barcoded proteins. Such contaminations can affect data analysis and interpretation and have the potential to lead to spurious results such as an incorrect assessment of antigen-TCR pairs or TCR cross-reactivity. To address this problem, Povelsen and colleagues have described a data-driven algorithm called "Accurate T cell Receptor Antigen Pairing through data-driven filtering of sequencing information from single-cells" (ATRAP) that supplies a set of filtering approaches that significantly reduces background and allows for accurate pairing of T cell clonotypes with cognate pMHC antigens.

      This paper is rigorously conducted and will be useful for the field - there are some areas where further clarifications and comparisons will benefit the reader.

      Strengths:

      1) Povelsen and colleagues have systematically evaluated the extent to which parameters in the experimental metadata can be used to assess the likelihood of a GEM to correctly identify the antigen specificity of the associated T cell clonotype.

      2) Povelsen and colleagues have provided elegant data-driven scoring metrics in the form of concordance score, specificity score, and an optimal ratio of pMHC UMI counts between different pMHCs on a GEM, which allows for easy identification of poor quality data points.

      3) Based on the experimental goals, ATRAP allows for customizable filters that could achieve appropriate data quality while maximizing data retention.

      Weakness:

      1) The authors mention that 100% of the 6,073 "productive" GEMs contained more than one sample hashing barcode, and 65% contained pMHC multiplets. While the rest of the paper elaborates on the steps taken to deal with pMHC multiplets issue, not much is said about the extent of multiplet hashing issue and how was it dealt with when assigning cells to individual donors. How is this accounted for? Even a brief explanation would be beneficial.

      We agree that the issue of multiplet hashing was only very briefly discussed in the manuscript. The reason for this is that although cell hashing multiplets exist for every GEM, it is generally a much simpler issue to solve than pMHC multiplets, because one hashing entry most often has much higher counts compared to the others (see supplementary fig. 3). Moreover, in the experimental design, only one hashing antibody is added to each sample. It is therefore given that only a single hashing signal should be associated with each GEM, i.e. this does not mirror the complex nature of the pMHC data, where cross-reactivity could result in more than one pMHC being a true binder to a given TCR. Given the simplicity associated with the hashing signal, we have here opted for utilizing an existing tool to annotate cell hashing. We have elaborated the description of this in the revised manuscript (line 384).

      2) It would be helpful for the authors to describe how experimental factors such as the quality of the input MHC protein may affect the outputted data (where different proteins may have different degrees of non-specific binding), and to what degree the ATRAP approach is robust to these changes. As an example, the authors mention that RVR/ A03 was present at high UMI counts across all GEMs and RPH/ B07 was consistently detected at low levels. Are these observations the property of the pMHCs or the barcoded dextran reagent? Furthermore, are there differences in the frequency of each of these multimers in the starting staining library which manifests in consistent high vs low read counts for the pMHC barcodes?

      We understand the reviewers' concern. We have extensive experience from staining with large libraries of different pMHCs in a bulk setting (Bentzen et al 2016), where it is part of the routine analyses to include an aliquot of the barcoded pMHC library taken prior to incubation with cells (input sample). From this data, we know that even if pMHCs are present in uneven amounts prior to cell incubation, this unevenness is not translated to the final output. I.e. if a given barcode (associated with a specific pMHC) is present at levels up to 2x higher than the remaining barcodes, this does not result in that barcode also being enriched after cell incubation if T cells do not recognize the corresponding pMHC. And vice versa, a barcode present at lower levels in the input can still be enriched after incubation with cells.. From the same type of data, we also have experience with differences in the background associated with different MHC/HLA molecules, i.e. a general higher level of background related to a certain MHC irrespectively of the peptide bound in this. We agree that this potentially could be a confounding factor influencing our results (as it will influence any other results related to the potential different background signal associated with different MHC/HLA molecules). We are currently in other studies investigating in a broader sense whether these differences reflect a biological inherent MHC association or are experimental artifacts. In the current work, we have opted for not defining pHLA specific UMI count threshold to ensure that any biological relevance remains unmasked, but still ensure that we can at the same time filter the data to identify the most likely true pMHC specific interaction.

      3) It would be helpful for the authors to further explain how ATRAP handles TCRs that may be present in only one (or a small number) of GEMs, as seen in Figure 7b, and potentially for the large number of relatively small clonotypes observed for the RVR/A03 peptide in Figure 6 (it is difficult to know if the long tail of clonotypes for RVR is in the range of 1 or 10 GEMs based on the scale bar). Beyond that, is there any effect on expected (or observed) clonal expansion on these data analyses, for example, if samples are previously expanded with a peptide antigen ex vivo or not?

      ITRAP removes any GEM that does not meet the criteria of the selected filters. Small clones are only removed if all GEMs in a clone fail to meet the selected filter criteria. As ITRAP is based on combinations of filters which are user-defined, one can choose to filter away singlet specificities, i.e. a TCR-pMHC pair only observed in a single GEM. However, this might not be relevant in all cases. We believe that it is a strength of the method that it is flexible and adaptable to the needs of individual users. This also allows for additional filters to be imposed by the user, if one for instance wishes to remove clones of fewer than a certain number of GEMs. With respect to figure 6, we agree that it was difficult to estimate the number of clonotypes within a given peptide plateau, and have updated the figure to include a clonotype count in the x-axis. In relation to the effect on clonotype expansion, we would first like to refer to figure 7. Here, we in figure a) and b) display the observed T cell frequencies towards the individual pMHCs as obtained by the two different experiment approaches a) conventional fluorescent multimer staining, and b) GEMs counts as obtained using the single-cell pipeline described here. This analysis demonstrates a very high concordance between the two approaches of the T cell populations, reflected by the vast majority of the responses detected by fluorescent multimer staining also being captured in the single-cell screening, (recall of 0.95). This result suggests that sensitivity of the SC approach, in the context of the current pMHC epitope set, is comparable to that of conventional fluorescent multimer staining. With regard to clonotype expansion, we would next like to refer back to figure 3. Even though we have not expanded the clones in vitro, this figure shows how the specificity of a TCR clone can be more confidently assigned when there are more GEMs mapped to a given TCR clone. Hence, to identify a single TCR-pMHC match, it could in many cases be valuable to expand a given clone prior to the experiments. However, since the 10x pipeline can only include a limited number of cells, we argue that it is valuable to identify pMHC TCR pairs on unexpanded/unmanipulated material to include as many different pairs as possible.

      4) The authors mention a second method, ICON, for conducting these types of analyses, and that the approach leads to significantly more data loss. However, given there could be differences in dataset quality themselves, and given the dataset, ICON is publicly available, it would be helpful for a more explicit cross-comparison to be conducted and presented as a figure in the paper.

      We have conducted such a comparative analysis in a separate manuscript (available at BioRxiv doi.org/10.1101/2023.02.01.526310). The overall conclusion is that both methods allow for effective denoising of the provided data, with an overall advantage in favor of iTRAP. We have extended the discussion in the current manuscript with a brief summary of the main findings from this study.

      Reviewer #2 (Public Review):

      The study by Povlsen, Bentzen et al. describes certain computational pipelines authors used to analyze the results from a single-cell sequencing experiment of pMHC-multimer stained T cells. DNA-barcoded pMHC multimers and single-cell sequencing technologies provide an opportunity for the high-throughput discovery of novel antigen-specific TCRs and profiling antigen-specific T-cell responses to multiple epitopes in parallel from a single sample. The authors' goal was to develop a computational pipeline that eliminates potential noise in TCR-pMHC assignments from single-cell sequencing data. With several reasonable biological assumptions about underlying data (absence of cross-reactivity between these epitopes, same specificity for different T-cells within a clonotype, more similarity for TCRs recognizing the same epitope, HLA-restriction of T cell response) authors identify the optimal strategy and thresholds to filter out artifacts from their data.

      It is not clear If the identified thresholds are optimal for other experiments of this kind, and how the violation of authors' assumptions (for example, inclusion of several highly similar pMHC-multimers recognized by the same clone of cross-reactive T cells) will impact the algorithm performance and threshold selection by the algorithm. The authors do not discuss several recent papers featuring highly similar experimental techniques and the same data filtering challenges:

      https://www.science.org/doi/10.1126/sciimmunol.abk3070

      https://www.nature.com/articles/s41590-022-01184-4

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184244/

      As described above, we have investigated the use of ITRAP on the large data set provided by 10X Genomics, and here further compared the result to that obtained by ICON in an independent publication [BioRxiv doi.org/10.1101/2023.02.01.526310]. We have included a brief summary of the findings in study in the current manuscript. The overall results and conclusions between the two studies align very well. UMI count filtering and donor-HLA matching are in both cases driving the strongly denoising signal. However, when it comes to the identified UMI thresholds, they were found to differ between the two data sets. As stated above, this we however believe to be a strength of the ITRAP framework, since it demonstrates that the tools can be robustly applied to data originating from very different technical and/or biological settings.

      We acknowledge that ITRAP is highly dependent on the data containing a set of “large” clonotypes for which a single pMHC target can be assigned using the statistical approach outlined in the manuscript. This since the UMI filtering thresholds are defined based on these clonotypes and associated peptide annotations. However, other than this, the method does not exclude identification of cross-reactive TCR (in contrast to for instance ICON). We have expanded the discussion to make this point more clear.

      When it comes to the papers mentioned by the reviewer, these are clearly of high interest to us, and we are currently in the process of analyzing these data using the ITRAP framework. We however believe these analyses are beyond the score of the current publication, in particular since we have conducted the parallel benchmark study on the 10X Genomics data mentioned above.

      Unfortunately, I was unable to validate the method on other datasets or apply other approaches to the authors' data because neither code nor raw or processed data were available at the moment of the review.

      All data sets and code has been made publicly available at https://services.healthtech.dtu.dk/suppl/immunology/ITRAP

      One of the weaknesses of this study is that the motivation for the experiment and underlying hypothesis is unclear from the manuscript. Why these particular epitopes were selected, why these donors were selected, are any of the donors seropositive for EBV/CMV/influenza is unclear. Without particular research questions, it is hard to evaluate pipeline performance and justify a particular filtering strategy: for some applications, maximum specificity (i.e. no incorrect TCR specificity assignments) is crucial, while for others the main goal is to retain as many cells as possible.

      We understand this concern and have elaborate our motivation for the experimental design in the text. The overall motivation for this study was to generate TCR-pMHC data complementing what was available in the public domain at the start of the project. This with the purpose of generating novel data for training of TCR specificity prediction models. This is also the reason why we explicitly “deselected” T cells specific for the 3 negative control peptides, since these already are covered with large amounts of TCR sequences in the public databases.

      We do not know the serostatus of the donors included, but have determined the antigen-specificities present in the donors prior to initiating the study (evaluated for T cell recognition against 945 common viral specificities, using barcoded pMHC multimers in a bulk setting). The 945 peptides were selected from prevalent epitopes within IEDB. This means that the T cell specificities for the donors selected to be included in the current study was known a priori. We have updated the motivation for performing the study (lines 122-126).

    2. Reviewer #2 (Public Review):

      The study by Povlsen, Bentzen et al. describes certain computational pipelines authors used to analyze the results from a single-cell sequencing experiment of pMHC-multimer stained T cells. DNA-barcoded pMHC multimers and single-cell sequencing technologies provide an opportunity for the high-throughput discovery of novel antigen-specific TCRs and profiling antigen-specific T-cell responses to multiple epitopes in parallel from a single sample. The authors' goal was to develop a computational pipeline that eliminates potential noise in TCR-pMHC assignments from single-cell sequencing data. With several reasonable biological assumptions about underlying data (absence of cross-reactivity between these epitopes, same specificity for different T-cells within a clonotype, more similarity for TCRs recognizing the same epitope, HLA-restriction of T cell response) authors identify the optimal strategy and thresholds to filter out artifacts from their data.

      It is not clear If the identified thresholds are optimal for other experiments of this kind, and how the violation of authors' assumptions (for example, inclusion of several highly similar pMHC-multimers recognized by the same clone of cross-reactive T cells) will impact the algorithm performance and threshold selection by the algorithm. The authors do not discuss several recent papers featuring highly similar experimental techniques and the same data filtering challenges:<br /> https://www.science.org/doi/10.1126/sciimmunol.abk3070<br /> https://www.nature.com/articles/s41590-022-01184-4<br /> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184244/

      Unfortunately, I was unable to validate the method on other datasets or apply other approaches to the authors' data because neither code nor raw or processed data were available at the moment of the review.

      One of the weaknesses of this study is that the motivation for the experiment and underlying hypothesis is unclear from the manuscript. Why these particular epitopes were selected, why these donors were selected, are any of the donors seropositive for EBV/CMV/influenza is unclear. Without particular research questions, it is hard to evaluate pipeline performance and justify a particular filtering strategy: for some applications, maximum specificity (i.e. no incorrect TCR specificity assignments) is crucial, while for others the main goal is to retain as many cells as possible.

    1. Motivation

      *Reviewer name: Lutz Brusch*

      The manuscript no. GIGA-D-21-00383, entitled "ChemChaste: Simulating spatially inhomogenous biochemical reaction-diffusion systems for modelling cell-environment feedbacks" addresses the important technical challenge of hybrid discrete-continuous models. The presented extension of the widely used Chaste software library, termed ChemChaste, now supports simulations of reactiondiffusion dynamics in a 2-dimensional environment bi-directionally coupled to motile and chemically active but point-like cells. Specifically, ChemChaste supports arbitrarily many spatial domains within the system, each with individual uniform diffusion coefficients. It supports arbitrarily many coupled reaction-diffusion equations and coupling via membrane reactions and transport reactions between bulk molecular species and intracellular species. Cells are coarsely represented as points on a cell-mesh that is distinct from the FE-mesh for solving the reaction-diffusion dynamics. The user interface is established through a tree of many small text and csv files that are human-readable. All these extensions to Chaste are valuable and their presentation is important for the large user base and beyond. The manuscript is clearly structured and well written. The source code is openly available under the permissive BSD 3- clause license at the provided GitHub link (https://github.com/OSS-Lab/ChemChaste) and includes all models, parameters and data as used in the present manuscript. As the motivation and title focus on "...modelling cell-environment feedbacks", then also the implications and limitations of the coarse cell representation in ChemChaste must be clearly stated, see comments below. Major comments:


      1. Coarse spatial cell representation: Cells are represented by their node position in the cell-mesh and interact with the environment through a single node at the same position in the FE-mesh. Can this formalism properly account for transport reaction fluxes in strongly heterogeneous environments where the FE-mesh needs many nodes with differing field values in a spatial area equivalent to the size of a single cell (with the cell node inside this area)? For example, how does this formalism evaluate the uptake from an exponential concentration gradient (as is common for diffusion and degradation around a localized source). For such a field, the local concentration value at any single position is always smaller than the average over any symmetric interval around it. Hence a transport reaction flux calculated with the single concentration value at the cell center will systematically underestimate the flux that would result from averaging over the area equivalent to the size of the cell. Moreover, such systematic errors also occur for linear concentration gradients and can get amplified when transport or membrane reactions are nonlinear with for instance high Hill coefficient. For comparison, with a spatially more explicit cell representation with many paired cell-nodes and field-nodes, one could directly sum the flux contributions from these paired field-nodes. But with the single cell-node here, usability seems limited to weak gradients at the scale of cell size. Alternatively, can a spatial kernel or stencil function be used to average or sum over field values in the spatial area equivalent to the size of a cell?
      2. Conservation of mass for transport: In biology, the number of molecules per time taken from the environment in a transport reaction has to equal the number of molecules per time added to the cell, and vice versa. So mass needs to be conserved and not concentration whereas ChemChaste seems to add and subtract the concentration flux in the different spatial compartments (cf. page 7 of SI.S1.4). For example, if the FE-mesh needs to use multiple nodes in a spatial area equivalent to the size of a single cell (hence Ve<Vc) but the transport reaction only relates the concentration value at one of these nodes to the cell-node, then mass is not conserved and results will be wrong. One option may be to attach volume attributes to nodes in both meshes. A node i in the cell-mesh would store the current cell volume Vc_i and a node j in the FE-mesh would store that node's share of the volume in the environment Ve_j (doubling the number of nodes in the FE-mesh would on average halve each node's volume Ve_j). Then secretion of molecules with intracellular concentration u at rate k would reduce the intracellular concentration by a flux of molecule number per per time and per volume, i.e. k*u*Vc/Vc=k*u, and increase the concentration at the environment node with flux k*u*Vc/Ve which in general is and must be different from the intracellular concentration flux k*u. Likewise, if the FE-mesh is coarse (hence Ve>Vc) then the transport flux must get diluted like kuVc/Ve < k*u. The factor Vc/Ve does not appear to be implemented and the equations on page 7 of SI.S1.4 omit this factor, limiting the usability to the special case Vc=Ve. This implies that the construction of the FE-mesh has to match the cell-mesh wherever cells are positioned and in their neighborhood. This limitation and the required construction of the FE-mesh must be described.
      3. Scaling of fluxes with cell surface area: In biology, membrane reactions and transport reactions occur at the molecular scale and yield a characteristic flux density per membrane area. The total flux per cell is then the integral of the flux density over the cell surface. Hence cells with larger surface area must be able to exchange more molecules with the environment. Since differently shaped cells will have different surface to volume ratios, it appears necessary to attach not only a cell volume Vc_i to each node i of the cell-mesh but also a surface area value Ac_i. The transport reaction fluxes from item 2. above then become k'AcuVc/Vc=k'Acu and k'AcuVc/Ve, respectively, with a new rate constant k' with units [1/(areatime)]. The same argument applies to membrane reactions. Only if all cells have the same and constant surface area then Ac does not need to be attached to nodes and k may be used instead of k'Ac.
      4. User interface and model format: To improve Interoperability according to FAIR,
      5. please explore and comment how the files that are required for model definition in ChemChaste can or cannot be packaged in a COMBINE archive [Bergmann et al. (2014). COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC Systems Biology 15:369. https://doi.org/10.1186/s12859-014-0369-z].
      6. please compare ChemChaste's declaration of the reaction-diffusion model in the environment to that of the SBML Level 3 Spatial Processes Package (SBML-spatial) [https://synonym.caltech.edu/documents/specifications/level-3/version-1/spatial/].
      7. please compare ChemChaste's declaration of the reactions to that of the Antimony model format as used in the Tellurium framework [Smith et al. (2009). Antimony: a modular model definition language. Bioinformatics 25:2452. https://doi.org/10.1093/bioinformatics/btp401].
      8. please discuss the necessary steps to convert model files available in SBML-spatial or Antimony to ChemChaste and vice versa.
      9. Numerical accuracy of the 3-fold operator splitting scheme for cell-environment coupling: As shown in Fig.1b, the three operators 1 (Cell dynamics), 2 (Environment dynamics), 3 (Cellular fluxes) are applied sequentially for a coupled cell-environment model. How is the numerical error controlled for this 3-fold operator splitting scheme? How are time steps chosen or adapted internally?
      10. Model equations for test case with cell-environment coupling: In SI, Figure S10.c (and file CellA/Srn.txt in the code repository) apparently all 5 reactions are defined as reversible with "<->" and each has a nonzero kr=1.0 but only two of these reactions are reversible in the reaction scheme in main Fig.4a. Probably the file in the repo and SI is wrong (as the reverse generation of Precursor directly from Biomass and Enzyme is not physiological) and possibly the simulation results in Fig.4b may change after correction of the file CellA/Srn.txt.
      11. Findability of repository: To improve Findability of ChemChaste according to FAIR, the code repo should be integrated with or referenced from the core project at https://github.com/Chaste/ . This integration should also facilitate future code maintenance and usability in a sustainable manner. Minor comments:

      1. Further tests may be easily implemented for the Schnakenberg model which was qualitatively simulated but not quantitatively compared to an analytical prediction (main text, lines 368-375). One (rough) quantitative comparison could be achieved for the dominant mode of the Fourier-transformed simulated pattern (Fig.3b; or some other measure of the spatial period of the pattern) versus the critical mode of the diffusion-driven instability (|k_cr|^2 = 1/(2D_U) * dR_U/dU + 1/(2D_V) * dR_V/dV). In addition, the instability threshold from eq. (25) in SI.S6 (page 27) can be tested in simulations along a one-parameter scan across the instability and the temporal oscillation period in Fig.3a can be (roughly) compared to the predicted period from the imaginary part of the eigenvalues of the steady state or computed by means of numerical continuation in AUTO (http://indy.cs.concordia.ca/auto).
      2. Main text, lines 460-463: "Thus...lead to a spatial segregation of the two cell types." This behavior may be subject to the slow or lacking active motility of the cells. Now, cell division alone seems to generate compact clones of the same cell type instead of emergent spatial segregation. Maybe comment if/how ChemChaste handles random walks of cells or even chemotaxis of cells towards ES. Then the interesting question of emergent spatial segregation can be studied with ChemChaste.
      3. Please clarify if/how ChemChaste allows to incorporate transport reactions directly between neighboring cells (like auxin or calcium transport in tissues)?
      4. Where are the membrane reactions involving a cell and the environment included in Fig.1b: in steps 1./2. or in step 3.? That is interesting for the numerical operator splitting scheme and may be added to the caption.
      5. In addition to item 7. above (which should ensure future usability), the reproducibility of the current model results as presented in this manuscript should be ensured by archiving the current software version from the ChemChaste code repo at Zenodo or a similar service and the DOI of that archive should be given in the manuscript. In addition, that archived code shall be given a version number on GitHub and that version number shall also be given in the manuscript. Figure improvements:

      • Figure 2.b may have axes flipped or may have an unfortunate color scale with too little contrast for convergence scores between 0.4 and 0.5 to show the gradual change of score at the horizontal row with dt=0.1 (which is apparently used in Fig. 2.c and shows a change of accuracy there). Please check and improve the correspondence between panels b) and c) such that the data from panel c) helps to get a feeling for the L2 score changes in panel b).
      • Figure 2.b: How can we understand the loss of convergence if the time step is reduced (say from 0.006 to 0.0002) at any fixed dx? From other solvers, one is used to that finer dt improve convergence while this plot shows dark (high L2 score) areas on both sides of the light (low L2 score) areas at intermediate values of dt.
      • Figure 2.c: The color code is not suited for so many curves. Either include line style or reduce the number of curves (preferred). It must become clear which curve belongs to which dx. The green curve with dx=0.8 seems to be hidden?
      • Figure 3.a: The figure caption should explain the source of variation between nodes (e.g. by pointing to the noise terms in eqs. 13,14) and the color code for the two bands (dark and light) around each curve (1-sigma and 2-sigma or 1-sigma and min/max ?).
      • Figure 4b: These two panels could be given more space. Suggestion: re-arrange part a) horizontally and then put both diagrams of b) at the bottom, left and right.
      • Figure 5: The caption wrongly announces "and t=100" which is not shown. Also the words "towards the" in the first line seem to be linked to t=100. Text corrections:

      • main text, line 61. The sentence "...centred on the role chemical coupling." seems to miss the preposition "of".
      • main text, line 71. The phrase "cellular network reaction size" appears misleading, when it shall refer to "the size of the cellular reaction network".
      • main text, lines 280, 284, 286: Since the subsections of the Results section are not numbered here, then the text pointers "(Section )" can be omitted.
      • main text, one line below eq.(7): "reaction rate constants parameters" can drop the word "parameters"
      • main text, lines 450 and 451: "a...concentrations" should be either singular or plural
      • SI.S1, page 1, line 5 above eq. (1): text "exchange chemical concentrations" should read "exchange molecules" and, correspondingly, "controlling the chemical concentrations passing between the bulk and the cell" should read "controlling the flux of molecules between the bulk and the cell".
      • SI.S1, page 2, line 2: "asssociated" has an "s" too much
      • SI.S1, page 5, at the end of Fig.S1's caption: $k-p$ should be $k_p$
      • SI.S2.2.1, page 14, eq. (11) has capital U_0 and V_0 as initial values while the sentence above has small u_0, v_0. These should be the same symbols.
      • SI.S6, page 26, 1 line below eq. (19): "is a spatial case" should be "is a special case" Methods Are the methods appropriate to the aims of the study, are they well described, and are necessary controls included? Choose an item. Conclusions Are the conclusions adequately supported by the data shown? Choose an item. Reporting Standards Does the manuscript adhere to the journal’s guidelines on minimum standards of reporting? Choose an item. Choose an item. Statistics Are you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used? Choose an item. Quality of Written English Please indicate the quality of language in the manuscript: Choose an item. Declaration of Competing Interests Please complete a declaration of competing interests, considering the following questions:  Have you in the past five years received reimbursements, fees, funding, or salary from an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future?  Do you hold any stocks or shares in an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future?  Do you hold or are you currently applying for any patents relating to the content of the manuscript?  Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript?  Do you have any other financial competing interests?  Do you have any non-financial competing interests in relation to this paper? If you can answer no to all of the above, write 'I declare that I have no competing interests' below. If your reply is yes to any, please give details below. I declare that I have no competing interests. I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published.
    1. Miguel de Icaza Jun 17, 2022 @migueldeicaza Replying to @migueldeicaza @markrendle and 2 others The foundation should fund, promote and advance a fully open source stack. And the foundation should remove every proprietary bit from the code in http://dotnet.org.

      Microsoft can and should compete on the open marketplace on their own. [...] And we should start with the debugger: we should integrate the Samsung one, it should be the default for OmniSharp and this is now we get contributions and improvements- not by ceding terrain to someone that can change the rules to their advantage at will.

      I tried (although perhaps not valiantly, but as an outsider) to convince Miguel and the then-Director of the .NET Foundation in 2015 that this state of affairs was probably coming and that he/they should reach out to the FSF/GNU to get RMS to lift the .NET fatwa, become a stakeholder/tastemaker in the .NET ecosystem, and encourage free software groupies to take charge so that FSF/GNU would be around as a failsafe for the community and would inevitably benefit greatly esp. from any of MS's future failure on this front. I tried the same in reverse, too. They seemed to expect me to be a liaison, and I couldn't get them to talk to each other directly, even though that's what needed to happen.

    1. the ASA’s Code of Ethics instead focuses on the responsibilities of the researcher, setting out an approach to subjects rather than a description of their assumed needs.

      With the further historical context, it sounds like these set of rules were made more to keep the scientists safe from backlash more than to keep the actual subjects safe.

    1. Scaling a single VCS to hundreds of developers, hundreds of millions lines of code, and a rapid rate of submissions is a monumental task. Twitter’s monorepo roll-out about 5 years ago (based on git) was one of the biggest software engineering boondoggles I have ever witnessed in my career. Running simple commands such as git status would take minutes. If an individual clone got too far behind, it took hours to catch up (for a time there was even a practice of shipping hard drives to remote employees with a recent clone to start out with). I bring this up not specifically to make fun of Twitter engineering, but to illustrate how hard this problem is. I’m told that 5 years later, the performance of Twitter’s monorepo is still not what the developer tooling team there would like, and not for lack of trying.
    1. Typescript is great compared to plain JS, but they're really deep into diminishing returns area with type system features for a couple of years now.A lot less would be a lot more, because as of now Typescript lures otherwise entirely competent programmers into writing complex 'type system puzzles' which are entirely obscure to everybody except the person who wrote that code, and it takes a lot of discipline and experience to reist the lure and keep things simple.

      .

    1. Why use Jupyter and Quarto?

      I should add info about the practical advantages of using these notebook tools: * single source for experiments, blog posts, software libraries, applications, and even books * content can include markdown, tested code, images, and LaTeX for math

    1. If programming is promoted solely as a more effective way to think(and not as an effective way to communicate logically and creatively), thenwe will again fail to understand what teaching and learning code can affordus in a networked age.

      Hence why we need to also re-identify the start of Computer Programming - as mentioned by Riley.

  4. drive.google.com drive.google.com
    1. merica’s education policymakers have a long historyof ignoring the time and resources needed forteachers to sustain deeper learning for their students.

      This is very much the case. If you look at any policy or code that has been created within the last twenty years there is a complete disregard for the time and resources that are needed for teachers do what is expected of them.

    1. to do that you need 00:35:36 instructions and those instructions were found on the DNA molecule and it's the origin of the code that has presented the most acute problem for origin of Life research because chemistry simply doesn't move in the direction of 00:35:48 informational complexity it moves in other directions one other origin you can't get from chemistry to code

      need instructions

      acute problem

      chemistry not

      direction of informational complexity

    2. formulates something called the sequence hypothesis

      the four chemicals subunites neucleeotid subunits sequentiual arranbgement in a symbolic arrangements

      information rvevolution came to biology

      how to take section of code randomly changing without

      you attacking champions for the latest science

    1. "The coded language is effective in that it creates this sense of community," said Rachel Moran, a researcher who studies COVID-19 misinformation at the University of Washington. People who grasp that a unicorn emoji means "vaccination" and that "swimmers" are vaccinated people are part of an "in" group. They might identify with or trust misinformation more, said Moran, because it’s coming from someone who is also in that "in" group.

      A shared language and even more specifically a coded shared language can be used to create a sense of community or define an in group identity.

  5. brown-csci0200.github.io brown-csci0200.github.io
    1. Consensus Public Review:

      Ottenheimer et al., present an interesting study looking at the neural representation of value in mice performing a pavlovian association task. The task is repeated in the same animals using two odor sets, allowing a distinction between odor identity coding and value coding. The authors use state-of-the-art electrophysiological techniques to record thousands of neurons from 11 frontal cortical regions to conclude that 1) licking is represented more strongly in dorsal frontal regions, 2) odor cues are represented more strongly in ventral frontal regions, 3) cue values are evenly distributed across regions. They separately perform a calcium imaging study to track coding across days and conclude that the representation of task features increments with learning and remains stable thereafter.

      Overall, these conclusions are interesting and mostly well supported by the data, although there are some doubts about their definition of value coding. One limitation is the lack of focus on population-level dynamics from the perspective of decoding, with the analysis focusing primarily on encoding analyses within individual neurons.

      Some specific comments:

      The authors use reduced-rank kernel regression to characterize the 5332 recorded neurons on a cell-by-cell basis in terms of their responses to cues, licks, and reward, with a cell characterized as encoding one of these parameters if it accounts for at least 2% of the observed variance. At least 50% of cells met this inclusion criterion in each recorded area. 2% feels like a lenient cutoff, and it is unclear how sensitive the results are to this cutoff, though the authors argue that this cutoff should still only allow a false positive rate of 0.02% (determined by randomly shuffling the onset time of each trial).

      Having identified lick, reward, and cue cells, the authors next select the 24% of "cue-only" neurons and look for cells that specifically encode cue value. Because the animal's perception of stimulus value can't be measured directly, the authors created a linear model that predicts the amount of anticipatory licking in the interval between odor cue and reward presentations. The session-average-predicted lick rate by this model is used as an estimate of cue value and is used in the regression analysis that identified value cells. (Hence, the authors' definition of value is dependent on the average amount of anticipatory behavior ahead of a reward, which indicates that compared to the CS+, mice licked around 70% as much to the CS50 and 10% as much to the CS-.) The claim that this is an encoding of value is strengthened by the fact that cells show similar scaling of responses to two odor sets tested. Whereas the authors found more "lick" cells in motor regions and more "cue" cells in sensory regions, they find a consistent percentage of "value" cells (that is, cells found to be cue-only in the initial round of analysis that is subsequently found to encode anticipatory lick rate) across all 11 recorded regions, leading to their claim of a distributed code of value.

      In subsequent sections, the authors expand their model of anticipatory-licking-as-value by incorporating trial and stimulus history terms into the model, allowing them to predict the anticipatory lick rate on individual trials within a session. They also use 2-photon imaging in PFC to demonstrate that neural coding of cue and lick are stable across three days of imaging, supported by two lines of evidence. First, they show that the correlation between cell responses on all periods except for the start of day 1 is more correlated with day 3 responses than expected by chance (although the correlation is still quite low, for example, 0.2 on day 2). Second, they show that cue identity is able to capture the highest unique fraction of variance (around 8%) in day 3 cue cells across three days of imaging, and similarly for lick behavior in lick cells and cue+lick in cue+lick cells. Nonetheless, their sample rasters for all imaged cells also indicate that representations are not perfectly stable, and it will be interesting to see what *does* change across the three days of imaging.

      Importantly, the authors do not present evidence that value itself is stably encoded across days, despite the paper's title. The more conservative in its claims in the Discussion seems more appropriate: "these results demonstrate a lack of regional specialization in value coding and the stability of cue and lick [(not value)] codes in PFC."

    1. But ChatGPT feels different. Smarter. Weirder. More flexible. It can write jokes (some ofwhich are actually funny), working computer code and college-level essays. It can alsoguess at medical diagnoses, create text-based Harry Potter games and explain scientificconcepts at multiple levels of difficulty.

      Crazy to think this is only the start of smart ai like this I interested to see how things progress

    2. But ChatGPT feels different. Smarter. Weirder. More flexible. It can write jokes (some ofwhich are actually funny), working computer code and college-level essays. It can alsoguess at medical diagnoses, create text-based Harry Potter games and explain scientificconcepts at multiple levels of difficulty

      I remember seeing funny posts on how horrible previous forms of AI chats were which never truly gave the right answer or fully grasped all that was said to them, creating funny or confusing answers to things. However, ChatGBT is oddly and unnervingly more advanced than those other AI systems. It has a writing style, it can create detailed answers, and it fully understands what it is being asked. It is completely different from what has come out before.

    Tags

    Annotators

    1. eading to more than 17,000 different combinations before the encryption process repeats itself. Adding to the scrambling was a plugboard, sitting between the main rotors and the input and output, which swapped pairs of letters. In the earliest machines, up to six pairs could be swapped in that way; later models pushed it to 10, and added a fourth rotor.

      Shows how efficient the machine was to deciphering code vs the human brain, as 17,000 combinations done by a human would take and exponentially longer amount of time

    1. Reviewer #1 (Public Review):

      This manuscript by Mahlandt, et al. presents a significant advance in the manipulation of endothelial barriers with spatiotemporal precision, and in the use of optogenetics to manipulate cell signaling in vascular biology more generally. The authors establish the role of Rho-family GTPases in controlling the cytoskeletal-plasma membrane interface as it relates to endothelial barrier integrity and function and adequately motivate the need for optogenetic tools for global and local signaling manipulation to study endothelial barriers.

      Throughout the work, the optogenetic assays are conceptualized, described, and executed with exceptional attention to detail, particularly as it relates to potential confounding factors in data analysis and interpretation. Comparison across experimental setups in optogenetics is notoriously fraught, and the authors' control experiments and measurements to ensure equal light delivery and pathway activation levels across applications are very thorough. In demonstrating how these new opto-GEFs can be used to alter vascular barrier strength, the authors cleverly use fluorescent-labeled dextran polymers of different sizes and ECIS experiments to demonstrate the physiological relevance of BOEC monolayers to in vivo blood vessels. Of particular note, the resiliency of the system to multiple stimulation cycles and longer time course experiments is promising for use in vascular leakage studies.

      Given that dozens of Rho GTPase-activating GEFs exist, an expanded rationale for the selection of p63, ITSN1, and TIAM1 in the form of discussion and literature citations would be helpful to motivate their selection as protein effectors in the engineered tools. Extensive tool engineering studies demonstrate the superiority of iLID over optogenetic eMags or rapamycin-based chemogenetic tools for these purposes. However, as the utility of iLID and eMags has been demonstrated for the manipulation of a variety of signaling pathways, the iSH-Akt demonstration does not seem necessary for these systems.

      The demonstration of orthogonality in GTPase- and VE-cadherin-blocking antibody-mediated barrier function decreases and is compelling, even without full elucidation of the role of cell size or overlap in barrier strength. The discussion section presents a mature and thoughtful description of the limitations, remaining questions, and potential opportunities for the tools and technology developed in this work. Importantly, this manuscript demonstrates a commitment to scientific transparency in the ways in which the data are visualized, the methods descriptions, and the reagent and code sharing it presents, allowing others to utilize these tools to their full potential.

    1. An android goddess knows that sheis made by the master’s tools, yet she still seeks to resist the master. An android god-dess is a figure of trans of color praxis. I side myself with the fugitive black androidshacking their own code to try to find freedom, as in Janelle Monáe’s Metropolis, theHumans television series, and many more examples in science fiction; with Cylonnumber eight, Sharon Valerii of Battlestar Galactica, who had an impossible hybridbaby, who knew that she was not just a machine but also a woman, a mother, and apart of her God; with the renegade clones of Orphan Black, who, as Roxanne Samerargues, offer new models of transfeminist kinship; and with homo sensorium, thetelepaths in the Wachowski sisters’ Netflix show Sense8 (

      what an android is

    1. Future protocols might include code that ensures certain protections for workers, prevents direct harm to humans, or guarantees a basic income to all users. Protocols might ban carbon-emitting miners and other ecological harms. Rights-based incentives and feedback loops could counteract plutocracy and make externalities more visible to a protocol that would otherwise ignore them. Cryptoeconomic designs can thus achieve goals not reducible to economics.

      How?

    2. As Buterin puts it, cryptoeconomics allows software “to reduce social trust assumptions by creating systems where we introduce explicit economic incentives for good behavior and economic penalties for ba[d] behavior.”

      This is one of the recurring themes that turns me off crypto - the idea that we can use code to decrease the need for social trust, as if social trust isn't the bedrock of all human society. I wish this quote from Buterin was balanced with some sort of critical take...but maybe we'll get to that later in this section.

    3. There is even a special term in MolochDAO, with corresponding software code, for when someone leaves in frustration: “ragequit,” a term derived from gaming culture. Other DAOs have since adopted the feature.

      What does this mean, "with corresponding software code"? Virtually all social software has a "quite/leave/delete" option, what makes "ragequit" special?

    1. b) La gestion de la veille sociale, de l'accueil, de l'hébergement et de l'accompagnement au logement de toute personne ou famille sans domicile ou éprouvant des difficultés particulières d'accès au logement en raison de l'inadaptation de ses ressources ou de ses conditions d'existence, dans le respect des articles L. 345-2-2 et L. 345-2-3 du code de l'action sociale et des familles, ainsi que le financement des organismes et des dispositifs qui y contribuent, mentionnés au 8° du I de l'article L. 312-1 et aux articles L. 322-1 et L. 345-2 du même code et aux articles L. 365-1, L. 631-11 et L. 633-1 du code de la construction et de l'habitation ;
    1. Author Response

      Reviewer #1 (Public Review):

      This work describes a new method, Proteinfer, which uses dilated neural networks to predict protein function, using EC terms and GO terms. The software is fast and the server-side performance is fast and reliable. The method is very clearly described. However, it is hard to judge the accuracy of this method based on the current manuscript, and some more work is needed to do so.

      I would like to address the following statement by the authors: (p3, left column): "We focus on Swiss Prot to ensure that our models learn from human-curated labels, rather than labels generated by electronic annotation".

      There is a subtle but important point to be made here: while SwissProt (SP) entries are human-curated, they might still have their function annotated ("labeled") electronically only. The SP entry comprises the sequence, source organism, paper(s) (if any), annotations, cross-references, etc. A validated entry does not mean that the annotation was necessarily validated manually: but rather that there is a paper backing the veracity of the sequence itself, and that it is not an automatic generation from a genome project.

      Example: 009L_FRG3G is a reviewed entry, and has four function annotations, all generated by BLAST, with an IEA (inferred by electronic annotation) evidence code. Most GO annotations in SwissProt are generated that way: a reviewed Swissprot entry, unlike what the authors imply, does not guarantee that the function annotation was made by non-electronic means. If the authors would like to use non-electronic annotations for functional labels, they should use those that are annotated with the GO experimental evidence codes (or, at the very least, not exclusively annotated with IEA). Therefore, most of the annotations in the authors' gold standard protein annotations are simply generated by BLAST and not reviewed by a person. Essentially the authors are comparing predictions with predictions, or at least not taking care not to do so. This is an important point that the authors need to address since there is no apparent gold standard they are using.

      The above statement is relevant to GO. But since EC is mapped 1:1 to GO molecular function ontology (as a subset, there are many terms in GO MFO that are not enzymes of course), the authors can easily apply this to EC-based entries as well.

      This may explain why, in Figure S8(b), BLAST retains such a high and even plateau of the precision-recall curve: BLAST hits are used throughout as gold-standard, and therefore BLAST performs so well. This is in contrast, say to CAFA assessments which use as a gold standard only those proteins which have experimental GO evidence codes, and therefore BLAST performs much poorer upon assessment.

      We thank the reviewer for this point. We regret if we gave the impression that our training data derives exclusively, or even primarily, from direct experiments on the amino acid sequences in question. We had attempted to address this point in the discussion with this section:

      "On the other hand, many entries come from experts applying existing computational methods, including BLAST and HMM-based approaches, to identify protein function. Therefore, the data may be enriched for sequences with functions that are easily ascribable using these techniques which could limit the ability to estimate the added value of using an alternative alignment-free tool. An idealised dataset would involved training only on those sequences that have themselves been experimentally characterized, but at present too little data exists than would be needed for a fully supervised deep-learning approach."

      We have now added a sentence in the early sentence of of the manuscript reinforcing this point:

      "Despite its curated nature, SwissProt contains many proteins annotated only on the basis of electronic tools."

      We have also removed the phrase "rather than labels generated by a computational annotation pipeline" because we acknowledge that this could be read to imply that computational approaches are not used at all for SwissProt which would not be correct.

      While we agree that SwissProt contains many entries inferred via electronic means, we nevertheless think its curated nature makes an important difference. Curators as far as possible reconcile all known data for a protein, often looking for the presence of key residues in the active sites. There are proteins where electronic annotation would suggest functions in direct contradiction to experimental data, which are avoided due to this curation process. As one example, UniProt entry Q76NQ1 contains a rhomboid-like domain typically found in rhomboid proteases (IPR022764) and therefore inputting it into InterProScan results in a prediction of peptidase activity (GO:0004252). However this is in fact an inactive protein, as discovered by experiment, and so is not annotated with this activity in SwissProt. ProteInfer successfully avoids predicting peptidase activity as a result of this curated training data. (For transparency, ProteInfer is by no means perfect on this point: there are also cases in which UniProt curators have annotated single proteins as inactive but ProteInfer has not learnt this relationship, due to similar sequences which remain active).

      We had also attempted to address this point by comparing with phenotypes seen in a specific high-throughput experimental assay ("Comparison to experimental data" section).

      We have now added a new analysis in which we assess the recall of GO terms while excluding IEA annotation codes. We find that at the threshold that maximises F1 score in the full analysis, our approach is able to recall 60-75% (depending on ontology) of annotations. Inferring precision is challenging due to the fact that only a very small proportion of the possible function*gene combinations have in fact been tested, making it difficult to distinguish a true negative from a false negative.

      "We also tested how well our trained model was able to recall the subset of GO term annotations which are not associated with the "inferred from electronic annotation" (IEA) evidence code, indicating either experimental work or more intensely-curated evidence. We found that at the threshold that maximised F1 score for overall prediction, 75% of molecular function annotations could be successfully recalled, 61% of cellular component annotations, and 60% of biological process annotations."

      Pooling GO DAGs together: It is unclear how the authors generate performance data over GO as a whole. GO is really 3 disjoint DAGs (molecular function ontology or MFO, Biological Process or BPO, Cellular component or CCO). Any assessment of performance should be over each DAG separately, to make biological sense. Pooling together the three GO DAGs which describe completely different aspects of the function is not informative. Interestingly enough, in the browser applications, the GO DAG results are distinctly separated into the respective DAGs.

      Thank you for this suggestion. To answer the question of how we were previously generating performance data: this was simply by treating all terms equivalently, regardless of their ontology.

      We agree that it would be helpful to the reader to split out results by ontology type, especially given clear differences in performance.

      We now provide PR-curve graphs split by ontology type.

      We have also added the following text:

      "The same trends for the relative performance of different approaches were seen for each of the direct-acyclic graphs that make up the GO ontology (biological process, cellular component and molecular function), but there were substantial differences in absolute performance (Fig S10). Performance was highest for molecular function (max F1: 0.94), followed by biological process (max F1:0.86) and then cellular component (max F1:0.84)."

      Figure 3 and lack of baseline methods: the text refers to Figures 3A and 3B, but I could only see one figure with no panels. Is there an error here? It is not possible at this point to talk about the results in this figure as described. It looks like Figure 3A is missing, with Fmax scores. In any case, Figure 3(b?) has precision-recall curves showing the performance of predictions is the highest on Isomerases and lowest in hydrolases. It is hard to tell the Fmax values, but they seem reasonably high. However, there is no comparison with a baseline method such as BLAST or Naive, and those should be inserted. It is important to compare Proteinfer with these baseline methods to answer the following questions: (1) Does Proteinfer perform better than the go-to method of choice for most biologists? (2) does it perform better than what is expected given the frequency of these terms in the dataset? For an explanation of the Naive method which answers the latter question, see: ( https://www.nature.com/articles/nmeth.2340 )

      We apologise for the errors in figure referencing in the text here. This emerged in part from the two versions of text required to support an interactive and legacy PDF version. We had provided baseline comparisons with BLAST in Fig. 5 of the interactive version (correctly referenced in the interactive version) and in Fig. S7 of the PDF version (incorrectly referenced as Fig 3B).

      We have now moved the key panel of Fig S7 to the main-text of the PDF version (new Fig 3B), as suggested also by the editor, and updated the figure referencing appropriately. We have also added a Naive frequency-count based baseline. This baseline would not appear in Fig 3B due to axis truncation, but is shown in a supplemental figure, new Fig S9. We thank the reviewer and the editor for raising these points.

      Reviewer #2 (Public Review):

      In this paper, Sanderson et al. describe a convolutional neural network that predicts protein domains directly from amino acid sequences. They train this model with manually curated sequences from the Swiss-Prot database to predict Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. This paper builds on previous work by this group, where they trained a separate neural network to recognize each known protein domain. Here, they train one convolutional neural network to identify enzymatic functions or GO terms. They discuss how this change can deal with protein domains that frequently co-occur and more efficiently handle proteins of different lengths. The tool, ProteInfer, adds a useful new tool for computational analysis of proteins that complements existing methods like BLAST and Pfam.

      The authors make three claims:

      1) "ProteInfer models reproduce curator decisions for a variety of functional properties across sequences distant from the training data"

      This claim is well supported by the data presented in the paper. The authors compare the precision-recall curves of four model variations. The authors focus their training on the maximum F1 statistic of the precision-recall curve. Using precision-recall curves is appropriate for this kind of problem.

      2) "Attribution analysis shows that the predictions are driven by relevant regions of each protein sequence".

      This claim is very well supported by the data and particularly well illustrated by Figure 4. The examples on the interactive website are also very nice. This section is a substantial innovation of this method. It shows the value of scanning for multiple functions at the same time and the value of being able to scan proteins of any length.

      3) "ProteInfer models create a generalised mapping between sequence space and the space of protein functions, which is useful for tasks other than those for which the models were trained."

      This claim is also well supported. The print version of the figure is really clear, and the interactive version is even better. It is a clever use of UMAP representations to look at the abstract last layer of the network. It was very nice how each sub-functional class clustered.

      The interactive website was very easy to use with a good user interface. I expect will be accessible to experimental and computational biologists.

      The manuscript has many strengths. The main text is clearly written, with high-level descriptions of the modeling. I initially printed and read the static PDF version of the paper. The interactive form is much more fun to read because of the ability to analyze my favorite proteins and zoom in on their figures (e.g. Figure 8). The new Figure 1 motivates the work nicely. The website has an excellent interactive graphic showing how the number of layers in the network and the kernel size change how data is pooled across residues. I will use this tool in my teaching.

      We are grateful for these comments. We are excited that the reviewer hopes to use this figure for teaching, which is exactly the sort of impact we hoped for this interactive manuscript. We agree that the interactive manuscript is by far the most compelling version of this work.

      The manuscript has only minor weaknesses. It was not clear if the interactive model on the website was the Single CNN model or the Ensemble CNN model.

      We thank the reviewer for pointing out the ambiguity here. The model shown on the website is a Single CNN model, and is chosen with hyperparameters that achieve good performance whilst being readily downloadable to the user's machine for this demonstration without use of excessive bandwidth. We have added additional sentences to address this better in the manuscript.

      " When the user loads the tool, lightweight EC (5MB) and GO model (7MB) prediction models are downloaded and all predictions are then performed locally, with query sequences never leaving the user's computer. We selected the hyperparameters for these lightweight models by performing a tuning study in which we filtered results by the size of the model's parameters and then selected the best performing models. This approach uses a single neural network, rather than an ensemble. Inference in the browser for a 1500 amino-acid sequence takes < 1.5 seconds for both models "

      Overall, ProteInfer will be a very useful resource for a broad user base. The analysis of the 171 new proteins in Figure 7 was particularly compelling and serves as a great example of the utility and power of ProteInfer. It completes leading tools in a very valuable way. I anticipate adding it to my standard analysis workflows. The data and code are publicly available.

      Reviewer #3 (Public Review):

      In this work, the authors employ a deep convolutional neural network approach to map protein sequence to function. The rationales are that (i) once trained, the neural network would offer fast predictions for new sequences, facilitating exploration and discovery without the need for extensive computational resources, (ii) that the embedding of protein sequences in a fixed-dimensional space would allow potential analyses and interpretation of sequence-function relationships across proteins, and (iii) predicting protein function in a way that is different from alignment-based approaches could lead to new insights or superior performance, at least in certain regimes, thereby complementing existing approaches. I believe the authors demonstrate i and iii convincingly, whereas ii was left open-ended.

      A strength of the work is showing that the trained CNNs perform generally on par with existing alignment based-methods such as BLASTp, with a precision-recall tradeoff that differs from BLASTp. Because the method is more precise at lower recall values, whereas BLASTp has higher recall at lower precision values, it is indeed a good complement to BLASTp, as demonstrated by the top performance of the ensemble approach containing both methods.

      Another strength of the work is its emphasis on usability and interpretability, as demonstrated in the graphical interface, use of class activation mapping for sub-sequence attribution, and the analysis of hierarchical functional clustering when projecting the high-dimensional embedding into UMAP projections.

      We thank the reviewer for highlighting these points.

      However, a main weakness is the premise that this approach is new. For example, the authors claim that existing deep learning "models cannot infer functional annotation for full-length protein sequences." However, as the proposed method is a straightforward deep neural network implementation, there have been other very similar approaches published for protein function prediction. For example, Cai, Wang, and Deng, Frontiers in Bioengineering and Biotechnology (2020), the latter also being a CNN approach. As such, it is difficult to assess how this approach differs from or builds on previous work.

      We agree that there has been a great deal of exciting work looking at the application of deep learning to protein sequences. Our core code has been publicly available on GitHub since April 2019 , and our preprint has now been available for more than a year. We regret the time taken to release a manuscript and for it to reach review: this was in part due to the SARS-CoV-2 pandemic, which the first author was heavily involved in the scientific response to. Nevertheless, we believe that our work has a number of important features that distinguish it from much other work in this space.

      ● We train across the entire GO ontology. In the paper referenced by the reviewer, training is with 491 BP terms, 321 MF terms, and 240 CC terms. In contrast, we train with a vocabulary of 32,102 GO labels, and the majority of these are predicted at least once in our test set. ● We use a dilated convolutional approach. In the referenced paper the network used is instead of fixed dimensions. Such an approach means there is an upper limit on how large a protein can be input into the model, and also means that this maximum length defines the computational resources used for every protein, including much smaller ones. In contrast, our dilated network scales to any size of protein, but when used with smaller input sequences it performs only the calculations needed for this size of sequence.

      ● We use class-activation mapping to determine regions of a protein responsible for predictions, and therefore potentially involved in specific functions.

      ● We provide a TensorFlow.JS implementation of our approach that allows lightweight models to be tested without any downloads

      ● We provide a command-line tool that provides easy access to full models.

      We have made some changes to bring out these points more clearly in the text:

      "Since natural protein sequences can vary in length by at least three orders of magnitude, this pooling is advantageous because it allows our model to accommodate sequences of arbitrary length without imposing restrictive modeling assumptions or computational burdens that scale with sequence length. In contrast, many previous approaches operate on fixed sequence lengths: these techniques are unable to make predictions for proteins larger than this sequence length, and use unnecessary resources when employed on smaller proteins."

      We have added a table that sets out the vocabulary sizes used in our work (5,134 for EC and 32,109 for GO):

      "Gene Ontology (GO) terms describe important protein functional properties, with 32,109 such terms in Swiss-Pr ot (Table S6) that cov er the molecular functions of proteins (e.g. DNA-binding, amylase activity), the biological processes they are involved in (e.g. DNA replication, meiosis), and the cellular components to which they localise (e.g. mitochondrion, cytosol)."

      A second weakness is that it was not clear what new insights the UMAP projections of the sequence embedding could offer. For example, the authors mention that "a generalized mapping between sequence space and the space of protein functions...is useful for tasks other than those for which the models were trained." However, such tasks were not explicitly explained. The hierarchical clustering of enzymatic proteins shown in Fig. 5 and the clustering of non-enzymatic proteins in Fig. 6 are consistent with the expectation of separability in the high-dimensional embedding space that would be necessary for good CNN performance (although the sub-groups are sometimes not well-separated. For example, only the second level and leaf level are well-separated in the enzyme classification UMAP hierarchy). Therefore, the value-added of the UMAP representation should be something like using these plots to gain insight into a family or sub-family of enzymes.

      We thank the reviewer for highlighting this point. There are two types of embedding which we discuss in the paper. The first is the high-dimensional representation of the protein that the neural network constructs as part of the prediction process. This is the embedding we feel is most useful for downstream applications, and we discuss a specific example of training the EC-number network to recognise membrane proteins (a property on which it was not trained): "To quantitatively measure whether these embeddings capture the function of non-enzyme proteins, we trained a simple random forest classification model that used these embeddings to predict whether a protein was annotated with the intrinsic component of membrane GO term. We trained on a small set of non-enzymes containing 518 membrane proteins, and evaluated on the rest of the examples. This simple model achieved a precision of 97% and recall of 60% for an F1 score of 0.74. Model training and data-labelling took around 15 seconds. This demonstrates the power of embeddings to simplify other studies with limited labeled data, as has been observed in recent work (43, 72)."

      As the reviewer points out, there is a second embedding created by compressing this high-dimensional down to two dimensions using UMAP. This embedding can also be useful for understanding the properties seen by the network, for example the GO term s highlighted in Fig. 7 , but in general it will contain less information than the higher-dimensional embedding.

      The clear presentation, ease of use, and computationally accessible downstream analytics of this work make it of broad utility to the field.

    1. Author response

      We would like to thank the reviewers for their valuable input. In the new version, we have tried to incorporate all of the comments made by Yulia Karmanova and Richmond Dzekoe. As a result, we feel that the quality of the paper has improved substantially. Below, we discuss for each comment (that required revision) which actions were taken to address the reviewer's concerns.

      First, we will address the comments of Yulia Karmanova, Research Centre Kairos:

      1. I suggest that the authors should involve more assessors in their future research. Two lecturer- researchers and three senior students were involved in the process which I assume is not enough for such large-scale research like this. A bigger team of professional assessors could make valuable contribution when analysing the data and resolving emerging research questions.

      Although it was indeed a huge job to assess the entire corpus with only five people, working with a small team also had its advantages in terms of reliability and validity of the research. It helped us address and overcome one of the main difficulties mentioned in qualitative research, viz. the perceived subjectivity of the assessment process (O'Connor & Joffe, 2020). By keeping the team within manageable proportions, we could ensure a like mindset by increasing the inter-rater reliability through calibration sessions. This concern also gave rise to a vast field of research on automated assessment tools. (See "Reply to Comment 1" in the manuscript)

      O'Connor, C., & Joffe, H. (2020). Intercoder Reliability in Qualitative Research: Debates and Practical Guidelines. International Journal of Qualitative Methods. \<doi:10.1177/1609406919899220>

      2. I would also recommend providing the manuscript with brief comments on the meanings of the parameters in column 4 (Table 3, 4, 5, 6) for readers' clarity. What do t , p and n.s. stand for?

      We thank the reviewer for this suggestion. It might indeed help to point out these statistical concepts for a better understanding of the figures in the Results section. We have added footnotes with short clarifying definitions to Table 3, first table in the Results section. These footnotes contain the following information:

      In statistics, the t-value measures the size of the difference relative to the variation in your sample data. In other words, T is the calculated difference represented in units of standard error. The greater the magnitude of T, the greater the evidence against the null hypothesis, viz. the assumption that there is no difference in language use between blogs scoring high vs. low in perceived level of ICC.

      A p-value is a statistical measurement used to validate a hypothesis against observed data. A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true. The lower the p-value, the greater the statistical significance of the observed difference. A p-value of 0.05 or lower is generally considered statistically significant, meaning that the null hypothesis can be rejected. ' N.s.' is simply short for 'not significant'; in other words, a p-value above 0.05.

      3. I believe that the manuscript would benefit from correcting minor inaccuracies. I would recommend to:
 - Replace «his» with gender neutral «their», page 6: In these blogs, the language use of students serves as a vehicle of information on the students' development of ICC, offering the reader concrete cues – henceforth referred to as linguistic markers – of his reflective learning process. - Add a space between that and are , page 19: In order to bring more focus to our research, we initially focused on word categories thatare characteristic of properties that can be linked to ICC and cultural sensitivity, such as openness, self- relativity, curiosity and reflection or analytical thinking. - Add missing parentheses, page 22; Deardorff, D. 2006. Identification and Assessment of Intercultural Competence as a Student Outcome of Internationalisation. Journal of Studies in International Education, 10 (3), 241-266.

      We have corrected all the above-mentioned typos and inaccuracies concerning gender in the text.

      Secondly, we will address the comments of Richmond Dzekoe, Iowa State University:

      4. Theoretical Background and Literature review: These sections need a major revision. Move the discussion on studies on the importance of reflection to the literature review section and provide more current references. These sections also read more like annotations. It will be better to focus on particular insights from the studies you cited and the implications of those insights for framing your current study.

      We thank the reviewer for his advice. Taking into account his other comments (see comments 8, 15 and 17) we decided to rewrite the introduction to focus more clearly, and from the beginning, on the main aim of our study: to look for linguistic markers of ICC in reflective writing. We hope that, by framing the introduction in a different way, the structure of Section 2 becomes more transparent for the reader, and will be easier to follow.

      5. Theoretical Background and Literature review: You mention Byram's (1997) intercultural speaker model and go on to say, "Like most of the current ICC frameworks, Byram's model offers a holistic approach." What are some of the current ICC frameworks you are referring to? Giving some examples here will be helpful for your readers.

      We agree with the reviewer that "most of the current ICC frameworks" is a vague and somewhat confusing reference to ICC frameworks in general, and more specifically the ones we already referred to in preceding paragraphs. After reviewing the paragraph 'Language in relation to ICC', we decided to omit the relevant sentence, as it turned out to be superfluous for our reasoning.

      6. Theoretical Background and Literature review: The information in Table 2 should be added to your description of the Corpus.

      After trying to summarize this information in running text, we concluded that a table is the best way to provide numerical details on the different sub-corpora in a neat and orderly manner. Therefore, we have retained the table in the new version of the text.

      7. Theoretical Background and Literature review: In order to support the claim that the use of many "I-words" indicates a more open, curious, and involved stance, it is important to explain more clearly how you differentiate the "I- words" which are descriptive from "I-words" that are reflective in your analysis.

      The analysis of our corpus supports our claim that more open, curious and involved authors – sign of high level of ICC – more frequently use I-words. There is no difference, however, in the type, nor the significance (person of reference) of these I-words between the two sub-corpora. In other words, we do not differentiate between descriptive and reflective I-words. We have marked the relevant sections in the manuscript with "Reply to Comment 7".

      8. Literature review: Beginning the Literature review with the sub-section "Language in relation to ICC" might provide a better flow of ideas in your lit. review.

      Since we have rewritten our introduction to immediately focus on 'linguistic markers for ICC in reflective writing assignments' as our narrative hook (in response to Comments 4, 16 and 17), we think it also becomes easier to understand the structure and flow of ideas of Section 2, Theoretical Background. Therefore, we decided to discard this suggestion.

      9. Methods, Results, Discussion: Explain the strengths and limitations of the integrated approach you are using. What does each model add to your integrated framework, and why is this integrated approach the best way to frame your study?

      We thank the reviewer for this clear observation. The added value of our combined approach is often suggested in the text but never explicitly stated. In section 3.2 we explain how we combined a holistic approach (by determining the level of ICC for each blog based on a rubric) with a textual analysis of each blog (based on semi-automated approach based on the LIWC lists). By adding a textual analysis to a holistic rubric based on the ICC frameworks of Byram, Deardorff and Pinto, we intend to make the holistic claims (that is, blog perceived as high ICC versus blog perceived as low ICC) more tangible. By focusing at word level on the use of quantifiers, I-words and insight words, teachers can 'materialize' their holistic claims and help students become more nuanced, curious, reflective and open-minded writers and can help them develop their global mindset.

      We added the following sentence to the first paragraph of section 4: "By adding a textual analysis to a holistic rubric, we intend to make the perceived level of ICC more tangible. By focusing on language use, teachers can substantiate their holistic claims and help students become more nuanced, curious, reflective and open-minded writers and, consequently, help them develop their intercultural competences". (See "Reply to Comment 9" in the manuscript)

      10. Methods, Results, Discussion: In describing the use of the rubrics to score the blogs, you mention calculating inter-rater reliability. How was this reliability calculated, and what was it?

      In section 3.2, we mention 'inter-rater reliability' twice. The first time in relation to the use of a rubric: Instead of letting the five assessors freely determine the perceived level of ICC for each blog on the basis of their own knowledge and insights, we have created a rubric (attached to the article): a scoring tool or set of criteria with associated descriptions of certain scores. The use of a rubric is known to increase 'inter-rater reliability'.

      The second time we mention 'inter-rater reliability' is when we refer to the calibration sessions we organized to discuss and fine-tune our evaluations based on the rubric, to enhance our (common) understanding of the rubric and ensure or increase our inter-rater reliability. We did not, however, perform calculations based on our (possibly differing) scores to exactly 'calculate' our inter-rater reliability, as in other published studies, e.g., the one by Lucas et al. (2017). Since we do not claim to have made this calculus, we did not change the text in section 3.2.

      Lucas, Ch. et al. (2017). Inter-rater reliability of a reflective rubric to assess pharmacy students' reflective thinking. Currents in Pharmacy Teaching and Learning, 9, 989-995.

      11. Methods, Results, Discussion: The strong evidence of intercultural competence comes from your analysis of the "Insight words." There is, however, a problem with the analysis of "I-words." As you explain the use of "I-words" as indicators of reflective writing, it will be good to explain more clearly how you differentiate the reflective and descriptive functions of "I-words" in your analysis.

      See the above-mentioned comment on the use I-words. We do not distinguish between descriptive and reflective I-words. The difference between blogs with a higher perceived level of ICC and a lower level lies in the frequency in which they use I-words. When we then further look into the type of verbs that follow the personal pronoun I, we notice that the I's in the corpus of high ICC are more frequently combined with verbs marking an analytical approach. These verbs are part of the dictionary of 'insight words' (according to Pennebaker). So, in both cases (combined or not with an 'insight word'), the I's solely refer to a more personal and involved stance. The more frequently authors refer to an I-word, the more involved, curious and open-minded they are. The combination with insight words merely adds to the blog's perceived level of ICC: A more involved, curious and open-minded stance using I- words, plus proof of 'insight' or 'analysis' by the use of insight words, both add up to a higher level of perceived ICC. Please see the highlighted sections in reply to comment 7 in the manuscript.

      12. Methods, Results, Discussion: The discussion is weak. Besides a list of limitations, the discussion lacks an insightful engagement with conclusions drawn by previous research. Contextualizing the discussion within already reported insights on this topic from studies such as Belz (2003), Byram (1997), Chan, Wong, & Luo (2020), Deardorff (2006), Elola & Oskoz (2008), Hoefnagels & Schoenmakers (2018) will help you address the main aim of your study which is identifying linguistic markers of ICC in order to provide teachers and other supervisors with tangible cues to help students develop ICC.

      Thank you for this critical remark, which – we think − mostly relates to paragraph 4 of Section 4. In order to link our results more explicitly to former research and the gaps we have identified in the previous sections of the text, we have added information that should elucidate the added value of our research to former publications and insights in the domain of ICC. (See "Reply to Comment 12" in the manuscript)

      1. Abstract:

      In the abstract, it might be a good idea to mention how many students were involved in the study and their level of linguistic proficiency in English.

      The blogs were written by a mixed group of students, of which approximately 80% are native Dutch and 20% speak another language. We have no information about the specific level of English each of them has. I have added the number of blogs (1,635) and students (672) to the abstract. (See "Reply to Comment 13" in the manuscript)

      14. Abstract: You used the expression "a more analytical approach." Please be more specific and mention that approach by name and what makes it more analytical.

      In the abstract we mention "a more analytical approach at text and word level". To be more precise we have changed this into "a text-analytical approach at word level". (See "Reply to Comment 14" in the manuscript)

      15. Introduction: In the introduction, please provide more substantial evidence from the literature to support the claim that a "successful career path increasingly depends on ICC." One example from Linked in is not enough.

      Since we decided to rewrite our introduction and directly focus on reflective writing to enhance ICC (in response to Comments 4, 8, 16 and 17), we have skipped the 1st paragraph which focused on the importance of ICC in contemporary education and the work field. We did, however, find a more recent source, stating that "intercultural competence plays a crucial role in modern working life, which indicates that the sphere of working life has expanded outside land borders and across cultural boundaries" (Pylväs & Nokelainen, 2021). We would also like to refer to Hoefnagels & Schoenmakers (2018) who – more specifically for the hospitality industry − state that "in a globalized industry, hospitality managers must be able to manage cultural diversity at many different levels. (...) Not only must hospitality managers be effective in their daily interactions with culturally and linguistically diverse guests, but also in communicating with their multicultural domestic staff. And over and above that, hospitality managers might just as well be working for an international hotel group or investor with headquarters on a different continent than their own, thus adding another level of cultural challenge to their working environment." (See "Reply to Comment 15" in the manuscript)

      Pylväs, L., & Nokelainen, P. (2021). Academics' perceptions of intercultural competence and professional development after international mobility. International Journal of intercultural Relations, 80, 336-348.

      16. Introduction: Please provide a clearer definition of ICC. Besides the mention of Deardorff's (2006) definition, the reader is lost as to what ICC really means in this study and how that definition informs the framing and findings of the study.

      Given the fact that we have slightly changed the scope of our introduction and have shifted the focus on ICC and the different theoretical models to Section 2, we have connected Deardorff's definition better to the study at the beginning of the section. (See "Reply to Comment 16" in the manuscript)

      1. Introduction:

      Tere seems to be a jump from a discussion on "reflective writing" to the "role of language as a source of information for students learning" and a justification of the use of the "Linguistic Inquiry and Word Count framework" to study the use of "I-words" by President Nixon during the Watergate scandal. This structure makes the introduction a bit confusing. Please revise the introduction. Explaining the use of LIWC in other corpus analysis studies for Word Counts might help you provide a stronger justification for using this framework than the Watergate research you cited in the introduction. For current studies that use LIWC please see (Dudău DP and Sava FA (2021). Performing Multilingual Analysis With Linguistic Inquiry and Word Count2015 (LIWC2015).

      We thank the reviewer for the reference to Dudău & Sava (2021). We have read the paper and included the reference in our revised introduction (also see comment 4, 8 and 15) to underline the interest in the LIWC2015 framework in recent scientific literature. (See "Reply to Comment 17" in the manuscript)

      Dudău, D.P., & Sava, F.A. (2021). Performing Multilingual Analysis With Linguistic Inquiry and Word Count 2015 (LIWC2015). An Equivalence Study of Four Languages. Frontiers in Psychology, 12, article 570568. [doi: 10.3389/fpsyg.2021.570568]

      We hope that this letter provides sufficient clarification of the modifications we have made in response to the reviewers' comments. We will upload the new version on Preprints.org and notify the different reviewers in response to the changes they had suggested. If you have any further questions or comments, do not hesitate to contact us.

      Referee response:

      I appreciate the revision the authors have made to the introduction. Rewriting the introduction helps set a clearer focus for the rest of the paper. However, I still have some reservations about the methodology and how data was collected and analyzed.

      I refer specifically to two of my previous comment:

      7 “Theoretical Background and Literature review: In order to support the claim that the use of many “I-words” indicates a more open, curious, and involved stance, it is important to explain more clearly how you differentiate the “I- words” which are descriptive from “I-words” that are reflective in your analysis.”

      I still find the lack of distinction between “I-words” that might be purely descriptive from “ I –words” that are reflective problematic. To be able to claim that the use of many “I-words” indicates a more open, curious, and involved stance, it is important to code the data in a way that separates descriptive “I-words” (Eg. I am an American) from reflective “I-words” (Eg. I realized that I needed to engage more in cross-cultural communication). The lack of such differentiation will mean all “I-words” in the corpora are reflective.

      1. Methods, Results, Discussion: Explain the strengths and limitations of the integrated approach you are using. What does each model add to your integrated framework, and why is this integrated approach the best way to frame your study?

      The response the authors gave describes what they intended to do rather than actually providing an answer to the question of the integrated framework's strengths and limitations.

      The manuscript needs to address these areas effectively in order to support its central claim and conclusion.

      Author response

      Dear Mr. Dzekoe

      Thank you for clarifying your comments. Please allow us to further address them in the text below.

      I appreciate the revision the authors have made to the introduction. Rewriting the introduction helps set a clearer focus for the rest of the paper. However, I still have some reservations about the methodology and how data was collected and analyzed.

      I refer specifically to two of my previous comment:

      1. “Theoretical Background and Literature review: In order to support the claim that the use of many “I-words” indicates a more open, curious, and involved stance, it is important to explain more clearly how you differentiate the “I- words” which are descriptive from “I-words” that are reflective in your analysis.”

      I still find the lack of distinction between “I-words” that might be purely descriptive from “ I –words” that are reflective problematic. To be able to claim that the use of many “I-words” indicates a more open, curious, and involved stance, it is important to code the data in a way that separates descriptive “I-words” (Eg. I am an American) from reflective “I-words” (Eg. I realized that I needed to engage more in cross-cultural communication). The lack of such differentiation will mean all “I-words” in the corpora are reflective.

      From your comment we understand that you would like us to make a distinction between descriptive I-words and reflective I-words (when combined with a cognitive verb). This is not possible when using the LIWC framework, however, as entries in the LIWC dictionaries do not contain information about surrounding words, we interpret the significant difference we observed in the frequency of I-words (regardless of the verb that followed them) in accordance with Pennebaker’s claim that that a higher frequency of I-words is sign of a more involved and curious author, two traits that are also important in the theoretical models for ICC. This quantitative outcome allows us thus to link I-words to ICC.

      Since our approach combines a quantitative and a qualitative analysis of the blogs, we dug further into the data, looking for extra evidence of that involved stance at text level. There we noticed that these same I-words, in the blogs with a high ICC level, were also often combined with a cognitive verb. These cognitive verbs (part of Pennebaker’s dictionary of Insight Words) are sign of more reflection, and as we will see further in the analysis (Section 3.3.2), this can also be linked to a higher level of ICC.

      In other words, we understand the difference between descriptive and reflective I-words, and we address this difference by conducting a qualitative follow-up analysis on combinations of I-words with reflective / cognitive verbs, rather than incorporating the difference into our quantitative analysis (which would be practically impossible given the nature of the LIWC dictionaries). As a consequence, we can state that the link between the use of I-words and ICC, sign of a more involved stance, is sometimes strengthened or corroborated by an extra link, viz. the one between insight words (amongst which cognitive verbs) and ICC, sign of more reflection. (See Reply to Comment #7, in the manuscript)

      1. Methods, Results, Discussion: Explain the strengths and limitations of the integrated approach you are using. What does each model add to your integrated framework, and why is this integrated approach the best way to frame your study?

      The response the authors gave describes what they intended to do rather than actually providing an answer to the question of the integrated framework's strengths and limitations.

      We acknowledge that the added value of our approach still remains implicit in the text. Therefore, we added the following sentences at the end of Section : “This integrated approach, combining a quantitative and qualitative text analysis, allows us to analyze a large corpus of texts in a targeted and fast manner. By adding a second, qualitative step to the statistical outcomes, we are able to interpret the results and to link them, in this case, to the differences in ICC score.” (See Reply to Comment 9, in the manuscript)

      The manuscript needs to address these areas effectively in order to support its central claim and conclusion.

      We sincerely hope that we were able to clarify the last ambiguities and doubts.

      reviewer response

      Thank you very much for responding to my concerns and adding a qualitative analysis that helps to speak to the conclusions you draw about the use of the “I-words.” I appreciate you adding this additional step because th LIWC framework itself does not provide this depth of analysis and insight. Also, Pennebaker’s claims were closely tied to his analysis of emotions rather than the cross-cultural factors you investigate in your study. So, again, adding your own qualitative analysis effectively addresses the concerns I had.

      I also appreciate the explicit explanation you added on why you adopt an integrative framework.

      My final comment is Verified manuscript: The content is scientifically sound, only minor amendments

    1. Stable Diffusion was trained on 5 billion image-text pairs from datasets preparedby non-party LAION, a German entity that works in conjunction with and is sponsored byStability AI. Upon information and belief, Stability AI provided LAION with both funding andsignificant computing resources to produce its datasets in furtherance of Stability AI’s infringingscheme.

      Role of LAION

      LAION, from their website: a non-profit organization providing datasets, tools and models to liberate machine learning research. By doing so, we encourage open public education and a more environment-friendly use of resources by reusing existing datasets and models.

      Wikipedia: The Large-scale Artificial Intelligence Open Network (LAION) is a German non-profit with a stated goal "to make large-scale machine learning models, datasets and related code available to the general public". It is best known for releasing a number of large datasets of images and captions scraped from the web which have been used to train a number of high-profile text-to-image models, including Stable Diffusion and Imagen.

    1. As part of this effort, we invite educators and others to share any feedback they have on our feedback form as well as any resources that they are developing or have found helpful (e.g. course guidelines, honor code and policy updates, interactive tools, AI literacy programs, etc).

      I wonder how this information will be shared back so that other educators can benefit from it. I maintain a resource list for educators at https://wac.colostate.edu/repository/collections/ai-text-generators-and-teaching-writing-starting-points-for-inquiry/

    1. I use Sharex on Windows and I don't think there's any better tool, so I searched for "run sharex on linux" and there is indeed a guide - https://github.com/ShareX/ShareX/issues/6531 - maybe you can get it to work?I believe it can do all of the things you want. Certainly area capture, remembered area capture, fullscreen capture, all bound to different hotkeys. Mine saves with the name = the timestamp but you can probably config it to be an incrementing index. It's incredibly full-featured.I also have hotkeys for "capture current pixel's hex code" and "measure bounded box in pixels." When you take a capture you can also annotate it including showing labeled steps. After capture you can do one or more of: save locally (to one or more places), upload (to one or more hosts), copy to clipboard, etc. That includes pastebin if you have text saved to your clipboard so I use this for that also.

      ShareX is indeed the only excellent screenshot tool of its kind.

      ShareX 确实是只此一家的优秀截图工具。

    1. hollow web browser and an app called hollow core which is storing all the 00:08:28 data all the nodes all the whole ins all the hollow webs and it's all encrypted and you decrypt it with a QR code key

      hoillow web browser

      networking stack webrtc

      holo and sculltebutt

    1. % Define the system of equations function dxdt = sys(t,x) dxdt = [x(2); alpha*x(2) - alpha*x(3)^2 - x(1)]; end % Set the constant alpha = 1; % Define the range of initial conditions x1 = -2:0.1:2; x2 = -2:0.1:2; % Generate the phase portrait [X,Y] = meshgrid(x1,x2); DX = Y; DY = alpha*Y - alpha*X.^2 - X; quiver(X,Y,DX,DY); xlabel('x1'); ylabel('x2');

      MEEN655 Q1 Code 2

    2. % Define the system of equations function dxdt = sys(t,x) dxdt = [x(2); -alpha*x(2) - beta*x(3)/m - k*x(3)/m]; end % Set the constants alpha = 1; beta = 2; k = 3; m = 4; % Define the range of initial conditions x1 = -2:0.1:2; x2 = -2:0.1:2; % Generate the phase portrait [X,Y] = meshgrid(x1,x2); DX = Y; DY = -alpha*Y - beta*X./m - k*X./m; quiver(X,Y,DX,DY); xlabel('x1'); ylabel('x2');

      MEEN655 Q1 Code 1

    3. To generate phase portraits for the first system, you can use the following steps:Define the system of equations in MATLAB using the ode function.Use the ode45 or ode23 solver to integrate the system of equations over a range of initial conditions.Plot the solutions in a phase portrait, with x1 on the x-axis and x2 on the y-axis.Similarly, for the second system, you can follow the same steps to generate a phase portrait.Here is an example MATLAB code for the first system:

      MEEN655 Q1

    1. - les régions seront chargées d’organiser des actions d’information surles métiers et les formations en direction des élèves et des étudiants,notamment dans les établissements scolaires et universitaires.Ces missions seront exercées avec le concours de l’ONISEP.Les régions pourront bénéficier, dans ce cadre, pour une durée de troisans à compter du 1er janvier 2019, de la mise à disposition des agentsvolontaires exerçant dans les services et établissements relevant duministre chargé de l’éducation nationale. Il pourra s’agir, par exemple, depsychologues de l’éducation nationale ayant le grade de directeur deCIO (art. 18 de la loi n°2018-771, L6111-3 du code du travail).
    2. Consultation sur les aspects régionaux de la carte des formationssupérieures et de la recherche.Elaboration par la région d’un schéma régional de l’enseignementsupérieur, de la recherche et de l’innovation (L214-2 du code del’éducation)
    1. Remember to click Edit on CodePen so you can code along and play around

      Le navigateur définit des styles par défaut pour certains éléments:

      C’est souvent pratique, mais il nous faudra parfois les contrer ci ceux-ci ne conviennent pas à notre design.

    2. Let’s create a document called style.css (you can select a different name, but you need to keep the .css extension). In this file, we will write the code we had in our style tag:

      Essayon cela sur notre fichier HTML en créant un 2e fichier style.css

    1. <!DOCTYPE html> DOCTYPE Indicates that the markup language for your document content is HTML5. <html> html Represents the root of an HTML document. All other elements must be descendants of this element. It’s the first node in our DOM. It is mandatory to close the tag at the very end of the document by typing </html>. <head> head Defines an element that provides general information (metadata) about the document, including its title and links to its scripts and style sheets. Usually it contains: - <title> Defines the title of the document, there’s only one title element in the head element of an HTML. This title contains only text and it is shown in a browser’s title bar or on the page’s tab. - <meta> Used to define metadata. This includes information about styles, scripts and data to help browsers use and render the page. One of the most commons elements is the <meta charset="UTF-8"> in our example. This specifies the character encoding for the HTML document as UTF-8. <body> body is the element containing all the content of an HTML document. Every HTML component should be written between the opening and the closing body tag. As there can be only one entire body in a document, there can be only one <body> element.

      Revenons plus en détail sur chacun des éléments OBLIGATOIREs d’une page web :

      cf. code minimal dans le validateur ```html

      <meta charset="utf-8"> <title></title>

      coucou ```

    2. Now let’s take a look at the code in the given example and let’s explain some of the syntax: Copy<!DOCTYPE html><html> <head> <title>My first document</title> <meta charset="UTF-8" /> </head> <body> ... </body></html>

      La structure d’une page web :

    1. Peer review report

      Title: Crossref as a source of open bibliographic metadata

      version: 2

      Referee: Simon Porter

      Institution: Digital Science

      email: s.porter@digital-science.com

      ORCID iD: https://orcid.org/0000-0002-6151-8423


      General assessment

      This is a clear paper that outlines a motivation (assess the metadata completeness of the Crossref record for the purposes of scientometric analysis,) along with providing a set of useful metrics to assess the completeness of each metadata field.


      Essential revisions that are required to verify the manuscript

      No essential revisions identified


      Other suggestions to improve the manuscript

      Minor suggestions: Figures in the interactive version of the preprint do not have headings or captions, or a link back to the paper.

      On data availability, In the context of the paper, making the code used to process the Crossref’s XML Metadata Plus Snapshot would be a useful contribution enabling scientometric analysis of the Crossref dataset.

      The following are offered as suggestions that could be added to the paper at the authors discression, but do not effect the content or the conclusions of the peer review

      The authors have chosen to frame metadata completeness of Crossref records as a ‘good in itself,’ leaning on Waltman, L. (2020b) to do the work of setting this up.

      Within this framework, the analysis is offered as a set of observations to help publishers understand where they need to do better. It might be the case that Publishers do not intrinsically understand why making certain metadata types available is valuable to the community.

      On the question of how Crossref can be used in scientometric analysis, readers are left to make up their own minds on what Crossref can be used for today, vs what it might be capable of providing in the future based on the evidence presented. It would be a stronger conclusion to highlight the types of scientometric analysis that are now possible with Crossref, (for instance bibliometric coupling,) and those that require limits or caveats (analysis by affiliation, abstract.) As this analysis lends itself to being rerun in the future, it would be useful to trace advances (hopefully!) not just in terms of the number of things, but also in terms of how sceintometric analysis capability is progressing because of it.


      Decision

      Verified: The content is scientifically sound, only minor amendments (if any) are suggested.

    1. How I made my little image resizer is to combine two code snippets I got online from other tutorials which I plan on crediting if I can figure out where I got them.

      You need to be less sloppy