10,000 Matching Annotations
  1. Last 7 days
    1. ABSTRACTThe workflow management system Nextflow builds together with the nf-core community an essential ecosystem in Bioinformatics. However, ensuring the correctness and reliability of large and complex pipelines is challenging, since a unified and automated unit-style testing framework specific to Nextflow is still missing. To provide this crucial component to the community, we developed the testing framework nf-test. It introduces a modular approach that enables pipeline developers to test individual process blocks, workflow patterns and entire pipelines in insolation. nf-test is based on a similar syntax as Nextflow DSL 2 and provides unique features such as snapshot testing and smart testing to save resources by testing only changed modules. We show on different pipelines that these improvements minimize development time, reduce test execution time by up to 80% and enhance software quality by identifying bugs and issues early. Already adopted by dozens of pipelines, nf-test improves the robustness and reliability in pipeline development.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf130), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Jose Espinosa-Carrasco

      The article presents nf-test, a new modular and automated testing framework designed specifically for Nextflow workflows, a widely used workflow management system in bioinformatics. nf-test aims to help developers improve the reliability and maintainability of complex Nextflow pipelines. The framework includes very useful features such as snapshot testing, which assesses the computational repeatability of the results produced by the execution of a pipeline or its components and smart testing which optimises computational resources by only executing tests on the parts of the pipeline that were modified, reducing overall run time. Notably, nf-test can be integrated into CI workflows and has already been adopted by the nf-core community, demonstrating its utility and maturity in real-world scenarios

      General comments:

      The manuscript could benefit from reordering some sections to follow a more consistent structure and by removing redundant explanations. I think it would be nice to include one limitation of nf-test, the fact that reproducing previous results does not necessarily imply biological correctness. This point is not entirely clear in the current version of the manuscript (see my comment below). Another aspect that could improve the manuscript is the inclusion of at least one reference or explanation of how nf-test can be applied outside nf-core pipelines, as all the provided examples are currently restricted to nf-core.

      Specific comments:

      On page 3, the sentence "Thus, maintenance requires substantial time and effort to manually verify that the pipeline continues to produce scientifically valid results" could be more precise. I would argue that identical results across versions do not guarantee scientific validity; they merely confirm consistency with previous outputs. True scientific validity requires comparison against a known ground truth or standard.

      On page 4, in the sentence "It is freely available, and extensive documentation is provided on the website", I think it would be nice to include the link to the documentation.

      In the "Evaluation and Validation" section (page 8), it would be helpful to briefly state the goal of each evaluated test, as is done with the nf-gwas example. ou could include something similar for the nf-core/fetchngs and modules examples (e.g. to assess resource optimization through smart testing). Also, the paragraph references the "--related-tests" option, which could benefit from a short explanation of what it does. Lastly, the order in which the pipelines are presented in this section differs from the order in the Results, which makes the structure a bit confusing.

      The sections titled "Unit testing in nf-test", "Test case execution", "Smart testing and parallelization", "Snapshot testing", and "Extensions for bioinformatics" seem more appropriate for the Materials and Methods section, as they describe the design and functionality of nf-test rather than reporting actual results. Please ignore this comment if the current structure follows specific journal formatting requirements that I may not be aware of.

      The Snapshot testing discussion in the Results section feels somewhat repetitive with its earlier explanation. Consider combining both discussions or restructuring the content to reduce duplication.

      On page 11, the sentence "In these cases, MD5 sums cannot be used and validating the dynamic output content can be time-intensive" is not entirely clear to me, does it mean that it is time consuming to implement the test for this kind of files or that the validation of the files is time consuming?

      On page 12, the sentence "Second, we analyzed the last 500 commits..." is confusing because this is actually the third point in the "Evaluation and Validation" section, as mentioned before. reordering would improve clarity.

      On page 14, the authors state "However, changes (b) and (c) lead to incorrect output results without breaking the pipeline. Thus, these are the worst-case scenarios for a pipeline developer." While this is mostly true, I would also add that a change in parameters may produce different, but not necessarily incorrect, results—some may even be more biologically meaningful. I suggest to acknowledge this.

      Typos:

      In the abstract: "Build on a similar syntax as Nextflow DSL2" should be corrected to "Built on a similar syntax as Nextflow DSL2".

      In the legend of Figure 2 (page 19): "nf-tet" should be "nf-test".

      In the legend of Table 2: "Time savings areis calculated..." should be "Time savings are calculated..."

      Recommendation:

      Given the relevance and technical contributions of the manuscript, I recommend its publication after addressing the minor revisions summarized above.

    1. AI’s emergence in medicine offers potential solutions to the efficiency-quality trade-off.

      AI allows for the existence of both efficiency and quality, which could drastically change health care and other components of life.

    2. For instance, AI demonstrates dermato-logical diagnostic accuracy through image analysis that matches or exceeds board-certified dermatologists (Leachman & Merlino, 2017).

      AI can be greater and smarter than humans, but with the drawback of also making mistakes that it must learn from first to not make again.

    3. However, most studies conceptual-ize efficiency and quality as isolated dimensions, rarely examining how AI assistance affects both dimensions simultaneously.

      People are not being brought to the light about how AI is negatively affecting the workplace, as AI has been glorified as this sort of do-no-wrong type of machine that can help you do what you need to get done with no drawbacks.

    4. Therefore, this study redirects scholarly attention from patient to physician behaviors, systematically examining AI’s effects on both workflow efficiency and clinical quality.

      The topic of the article along with the effects of AI in the workplace, along with clinical quality.

    1. AbstractCryogenic electron microscopy (cryoEM) has revolutionized structural biology by enabling atomic-resolution visualization of biomacromolecules. To automate atomic model building from cryoEM maps, artificial intelligence (AI) methods have emerged as powerful tools. Although high-quality, task-specific datasets play a critical role in AI-based modeling, assembling such resources often requires considerable effort and domain expertise. We present CryoDataBot, an automated pipeline that addresses this gap. It streamlines data retrieval, preprocessing, and labeling, with fine-grained quality control and flexible customization, enabling efficient generation of robust datasets. CryoDataBot’s effectiveness is demonstrated through improved training efficiency in U-Net models and rapid, effective retraining of CryoREAD, a widely used RNA modeling tool. By simplifying the workflow and offering customizable quality control, CryoDataBot enables researchers to easily tailor dataset construction to the specific objectives of their models, while ensuring high data quality and reducing manual workload. This flexibility supports a wide range of applications in AI-driven structural biology.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf127), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 3: Nabin Giri

      The paper presents a flexible, integrated framework for filtering and generating customizable cryo-EM training datasets. It builds upon previously available strategies for preparing cryo-EM datasets for AI-based methods, extending them with a user-friendly interface that allows researchers to enter query parameters, interact directly with the Electron Microscopy Data Bank (EMDB), extract and parse relevant metadata, apply quality control measures, and retrieve associated structural data (cryo-EM maps and atomic models).

      While the manuscript improves upon Cryo2StructData and similar data pipelines used in ModelAngelo/DeepTracer, the innovation claim would be strengthened by a deeper technical comparison, for example quantifying the performance impact of each quality control step in isolation. Some filtering and preprocessing concepts (e.g., voxel resampling, redundancy handling) are not entirely new, so a more explicit discussion of how CryoDataBot's implementations differ from prior work and why these differences matter would improve the manuscript. I do not think its challenging to change the resampling or the grid division parameter on the scripts provided by Cryo2StructData github repo or scripts available in ModelAngelo github repo.

      The benchmarking is mainly limited to ribosome datasets. While this choice is understandable for demonstration purposes, the generalizability to other macromolecules (e.g., membrane proteins, small complexes) is not shown. This can include a small-scale test on a different class of structures (e.g., protein's C-alpha positions, backbone atom position or amino acid type prediction (more difficult one) could strengthen the claim of broad applicability. Since the technical innovation limited, this can help to improve the paper.

      The authors state that CryoDataBot ensures reproducibility and provides datasets for AI-method benchmarking. However, EMDB entries can be updated over time (e.g., through reprocessing, resolution improvements, model re-fitting, or correction of atomic coordinates). In my opinion, in the strict sense, reproducibility (producing identical datasets) depends on versioning of EMDB/PDB entries. Without version locking, CryoDataBot ensures procedural reproducibility but not data immutability. The manuscript should either explain how reproducibility is maintained (e.g., version control, archived snapshots) or clarify that reproducibility refers to the workflow, not necessarily the exact dataset content, unless version dataset are provided, as done in Cryo2StructData.

      Some other concerns: (1) The "Generating Structural Labels" section has missing technical details. Please provide more information on how the labels are generated, including labeling radius selection, and how ambiguities are resolved if any encountered. A suggestions on how the user should determine the radius and also the grid size (64^3 or other) would be beneficial. (2) The manuscript states on the adaptive density normalization part : "This method is more flexible and removes more noise than the fixed-threshold approaches commonly used in prior studies." What does noise and signals mean here? - there is a separate body of AI-based works developed for reducing noise such as DeepEMhancer, EMReady to name few. Any metric to support this claim? (3) The manuscript states: "To assess dataset redundancy, we analyzed structural similarity between entries based on InterPro (IPR) domain annotations." Is this a new approach introduced here, or an established practice? How does it compare with sequence-based similarity measures? Or Structure-based similarity such as Foldseek? (4) The statement "underscoring the dataset's superior quality and informativeness" is strong. Is it possible to provide more concrete, quantitative evidence to support this, ideally beyond the U-Net training metrics.? (5) Is there a case where there is multiple PDB IDs for the cryo-EM density map? If so how is a specific atomic model chosen in such case?

    2. AbstractCryogenic electron microscopy (cryoEM) has revolutionized structural biology by enabling atomic-resolution visualization of biomacromolecules. To automate atomic model building from cryoEM maps, artificial intelligence (AI) methods have emerged as powerful tools. Although high-quality, task-specific datasets play a critical role in AI-based modeling, assembling such resources often requires considerable effort and domain expertise. We present CryoDataBot, an automated pipeline that addresses this gap. It streamlines data retrieval, preprocessing, and labeling, with fine-grained quality control and flexible customization, enabling efficient generation of robust datasets. CryoDataBot’s effectiveness is demonstrated through improved training efficiency in U-Net models and rapid, effective retraining of CryoREAD, a widely used RNA modeling tool. By simplifying the workflow and offering customizable quality control, CryoDataBot enables researchers to easily tailor dataset construction to the specific objectives of their models, while ensuring high data quality and reducing manual workload. This flexibility supports a wide range of applications in AI-driven structural biology.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf127), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Dong Si

      This paper discusses CryoDataBot, which creates cryoEM datasets for the use of training with the abilities to filter out based on redundancy, MMF and other user defined parameters. Here are some comments:

      • The data labeling just has helix, sheet, coil, and RNA. The labeling should also consider DNA and other structures.

      • The introduction of a Volume Overlap Fraction (VOF) score to validate map-model fitness (MMF) is a novel method to assess global alignment. However, VOF relies on summing and binarizing 2D projections which may have limitations. It is not clear how sensitive the VOF score is to the binarization process or how it handles complex, non-globular shapes. The paper would be strengthened if the authors could provide more justification for this specific metric over other global 3D correlation scores. An analysis of specific examples of map-model pairs that were discarded by the VOF score but not by the Q-score would be informative.

      • The authors acknowledge the trade-off between higher precision and lower recall that results from overly stringent filtering. While increased precision clearly benefits tasks like model refinement, the resulting reduced recall could be a significant hinder de novo modeling which is dependent upon capturing the entirety of a structure, even with lower confidence. This point could be elaborated on. Is this an area for future work, .e.g. developing pre-configured filtering settings for various downstream tasks, like a Precision vs. Recall bias setting? This might increase utility based on application.

      • The retraining of CryoREAD is a practical validation of the pipeline's utility for RNA modeling, however the experimental dataset used is exclusively from ribosomes. Ribosomes were selected because they contain both protein and RNA and are abundant in the EMDB but they may not represent the full diversity of RNA structures. The authors rightly note that training set composition affects performance. It would be helpful to further discuss the potential shortcomings of an exclusively ribosome-based training set and possible impact to the retrained CryoREAD model's use validating other classes of RNA.

      • The author should consider benchmarking on the other SOTA protein-RNA-DNA modeling tools. Right now it is only benchmarked on their own CryoREAD which is just a RNA/DNA modeling tool.

      • I tried installing CryoDataBot and looks like it requires python version 3.8 or higher but isn't specified anywhere in the paper or the site.

      • Many references and citations are off and wrong.

    3. AbstractCryogenic electron microscopy (cryoEM) has revolutionized structural biology by enabling atomic-resolution visualization of biomacromolecules. To automate atomic model building from cryoEM maps, artificial intelligence (AI) methods have emerged as powerful tools. Although high-quality, task-specific datasets play a critical role in AI-based modeling, assembling such resources often requires considerable effort and domain expertise. We present CryoDataBot, an automated pipeline that addresses this gap. It streamlines data retrieval, preprocessing, and labeling, with fine-grained quality control and flexible customization, enabling efficient generation of robust datasets. CryoDataBot’s effectiveness is demonstrated through improved training efficiency in U-Net models and rapid, effective retraining of CryoREAD, a widely used RNA modeling tool. By simplifying the workflow and offering customizable quality control, CryoDataBot enables researchers to easily tailor dataset construction to the specific objectives of their models, while ensuring high data quality and reducing manual workload. This flexibility supports a wide range of applications in AI-driven structural biology.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf127), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Ashwin Dhakal

      The authors introduce CryoDataBot, a GUI‐driven pipeline for automatically curating cryo EM map / model pairs into machine learning-ready datasets. The study is timely and addresses a real bottleneck in AI driven atomic model building. The manuscript is generally well written and benchmarking experiments (U Net and CryoREAD retraining). Nevertheless, several conceptual and presentation issues should be resolved before the work is suitable for publication:

      1 All quantitative tests focus on ribosome maps in the 3-4 Å range. Because ribosomes are unusually large and RNA rich, it is unclear whether the curation criteria (especially Q score ≥ 0.4 and VOF ≥ 0.82) generalise to smaller or lower resolution particles. Please include at least one additional macromolecule class (e.g. membrane proteins or spliceosomes) or justify why the current benchmark is sufficient.

      2 The manuscript adopts fixed thresholds (Q score 0.4; 70 % similarity; VOF 0.82) yet does not show how sensitive downstream model performance is to these values. A short ablation (e.g. sweep the Q score from 0.3-0.6) would help readers reuse the tool sensibly.

      3 Table 1 claims CryoDataBot "addresses omissions" of Cryo2StructData, but no quantitative head to head benchmarking is provided (e.g. train the same U Net on Cryo2StructData). Please add such a comparison or temper the claim.

      4 For voxel wise classification, F1 scores are affected by severe class imbalance (Nothing ≫ Helix/Sheet/Coil/RNA). Report per class support (number of positive voxels) and consider complementary instance level or backbone trace metrics.

      5 In Fig. 4 the authors show that poor recall/precision partly stems from erroneous deposited models. Quantify how often this occurs across the 18 map test set and discuss implications for automated QC inside CryoDataBot.

      6 The authors note improved precision but slightly reduced recall in CryoDataBot-trained models. This is explained, but strategies to mitigate this tradeoff are not discussed. Could ensemble learning, soft labeling, or multi-resolution data alleviate the recall drop?

    1. AbstractBackground Technological advances in sequencing and computation have allowed deep exploration of the molecular basis of diseases. Biological networks have proven to be a useful framework for interrogating omics data and modeling regulatory gene and protein interactions. Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have provided a rich resource for building and validating new computational methods resulting in a plethora of open-source software for downloading, pre-processing, and analyzing those data. However, for an end-to-end analysis of regulatory networks a coherent and reusable workflow is essential to integrate all relevant packages into a robust pipeline.Findings We developed tcga-data-nf, a Nextflow workflow that allows users to reproducibly infer regulatory networks from the thousands of samples in TCGA using a single command. The workflow can be divided into three main steps: multi-omics data, such as RNA-seq and methylation, are downloaded, preprocessed, and lastly used to infer regulatory network models with the netZoo software tools. The workflow is powered by the NetworkDataCompanion R package, a standalone collection of functions for managing, mapping, and filtering TCGA data. Here we show how the pipeline can be used to study the differences between colon cancer subtypes that could be explained by epigenetic mechanisms. Lastly, we provide pre-generated networks for the 10 most common cancer types that can be readily accessed.Conclusions tcga-data-nf is a complete yet flexible and extensible framework that enables the reproducible inference and analysis of cancer regulatory networks, bridging a gap in the current universe of software tools.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf126), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Jérôme Salignon

      This manuscript presents tcga-data-nf, a Nextflow-based pipeline for downloading, preprocessing, and analyzing TCGA multi-omic data, with a focus on gene regulatory network (GRN) inference. The workflow integrates established bioinformatics tools (PANDA, DRAGON, and LIONESS) and adheres to best practices for reproducibility through containerization (Docker, Conda, and Nextflow profiles). The authors demonstrate the utility of their pipeline by applying it to colorectal cancer subtypes, identifying potential regulatory interactions in TGF-β signaling. The manuscript is well-written and well-structured and provides sufficient methodological details, as well as Jupyter notebooks, for reproducibility. However, there are some areas that require clarification and improvement for acceptance in GigaScience, particularly regarding the scope of the tool, the quality of the inferred regulatory networks, the case study figure, benchmarking, statistical validation, and parameters.

      Major comments:

      • While the pipeline is well designed and executed, the overall impact of the tool feels somewhat limited, especially for a journal like GigaScience, due to its pretty specific application to building GRNs in TCGAs, the relatively small number of parameters, the support of only 2 omics type, and the lack of novel algorithms. To increase the impact of this tool I would recommend adding functionalities, such as:

      o Supporting additional tools. A great strength of the pipeline is the integration with the Network Zoo (NetZoo) ecosystem. However, only three tools are included from NetZoo. Including additional tools would likely increase the scope of users interested in using the pipeline. In particular, an important weakness of the current pipeline is that it is not possible to conduct differential analysis between different networks, which prevents users from identifying the most significant differences between two networks of interest (e.g., CMS2 vs CMS4). The NetZoo contains different tools to conduct such analyses, such as Alpaca 1 or Crane 2, thus this may be implemented to make the pipeline more useful to a broader user base.

      o Adding parameters. A strength of the pipeline is the ability to customize it using various parameters. However, as such the pipeline does not offer many parameters. It would be beneficial to make the pipeline a bit more customizable. For example, novel parameters could be: adding options for excluding selected samples, using different batch correction methods, different methods to map CpGs to genes, additional normalization methods, and additional quality controls (e.g., PCA for methylation samples, md5sum checks). These are just examples and do not need to be all implemented but adding some extra parameters would help make the pipeline more appealing and customizable to various users.

      • The quality of the inferred regulatory networks is hard to judge. There are no direct comparisons with any other tools.

      o For instance, it is mentioned in the text that GRAND networks were derived using a fixed set of parameters, but it could be helpful to show a direct comparison between GRNs built from your tools with those from GRAND. This could reveal how the ability to customize GRNs using the pipeline's parameters helps in getting better biological insights.

      o Alternatively, or in addition, one could compare how networks built by your method fare in comparison to networks built from other methods, like RegEnrich 3 or NetSeekR 4, in terms of biological insights, accuracy, scalability, speed, functionalities and/or memory usage.

      o Another angle to judge the regulatory networks would be to check in a case study if the predicted gene interactions between disease and control networks are enriched in disease and gene-gene interactions databases, such as DisGeNet 5.

      • Figure 2 needs re-work:

      o Panel A and C: text is too small. "tf" should be written TF. "oi" should have another name. These panels might be moved to the supplements.

      o Panel D is confusing. Without significance it is hard to understand what the point of this panel is. I can see that certain TFs are cited in the main text but without information about significance, these may seem like cherry-picking. The legends states: Annotation of all TFs in cluster D (columns) to the Reactome parent term. "Immune system" and "Cellular respondes to stimuli" are more consistenly involved in cluster D, in comparison to cluster A.. However, this is a key result which should be shown in a main figure, not in Figure S6. I would also recommend using a -log scale when displaying the p-values to highlight the most significant entries.

      o Panel E is quite confusing; first, the color coding is unclear. For instance, what represents blue, purple and red colors? Second, what represents the edges' widths? I would recommend using different shapes for the methylation and expression nodes to reduce the number of colors, and adding a color legend. I would also consider merging the two graphs and representing in color the difference in the edge values so the reader can directly see the key differences.

      • Benchmarking analysis could be included to show the runtime and memory requirement for each pipeline step. It would also be beneficial to analyze a larger dataset than colon cancer to assess the scalability.

      • Statistical analysis: If computationally feasible, permutation testing could be implemented to quantify the robustness of inferred regulatory interactions. Also, in the method section, it should be clarified that FDR correction was applied for pathway enrichment analysis.

      Minor comments:

      • I am not sure why duplicate samples are discarded in the pipeline. Why not add counts for RNA-Seq and averaging beta values? I would expect that to yield more robust results.

      • It is a bit unclear in what context the NetworkDataCompanion tool could be used outside the workflow. It is also unclear how it helps with quality controls. Please clarify these aspects.

      • The manuscript is well-written, but words are sometimes missing or wrongly written, it needs careful re-read.

      • The expression '"same-same"' is unclear to me.

      • In this sentence: "Some of "same-same" genes (STAT5A, CREB3L1"…, I am not sure in which table or figure I can find this result?

      • Text is too small in the Directed Acyclic Graph, especially in Figure S4. Also, I would recommend adding the Directed Acyclic Graphs from Figure S1-S4 to the online documentation.

      • Regarding the code, I was puzzled to see a copyConfigFiles process. Also, there are files in bin/r/local_assets, these should be located in assets. And the container for the singularity and docker profile is likely the same, this should be clarified in the code.

      • It is recommended to remove the "defaults" channel from the list of channels declared in the containers/conda_envs/analysis.yml file. Please see information about that here https://www.anaconda.com/blog/is-conda-free and here https://www.theregister.com/2024/08/08/anaconda_puts_the_squeeze_on/.

      Additional comments (which do not need to be addressed):

      • Future work may consider enabling the use of the pipeline to build GRNs from other data sources than TCGA (i.e., nf-netzoo). Recount3 data is already being parsed for GTEx and TCGA samples, so it might be relatively easy to adapt the pipeline so that it can be used on any arbitrary recount3 dataset. Similarly, it could be useful if one could specify a dataset on the recountmethylation database 6 to build GRNs. While these unimodal datasets could not be used with the DRAGON method they would still benefit from all other features of the pipeline.

      • Using a nf-core template would enable better structure of the code and increase the visibility of the tool. Also using multiple containers is usually easier to maintain and update than a single large container, especially when a single tool needs to be updated or when modifying part of the pipeline. Another comment is that the code contains many comments which are not to explain the code but more like quick draft which makes the code harder to read by others.

      References 1. Padi, M., and Quackenbush, J. (2018). Detecting phenotype-driven transitions in regulatory network structure. npj Syst Biol Appl 4, 1-12. https://doi.org/10.1038/s41540-018-0052-5. 2. Lim, J.T., Chen, C., Grant, A.D., and Padi, M. (2021). Generating Ensembles of Gene Regulatory Networks to Assess Robustness of Disease Modules. Front. Genet. 11. https://doi.org/10.3389/fgene.2020.603264. 3. Tao, W., Radstake, T.R.D.J., and Pandit, A. (2022). RegEnrich gene regulator enrichment analysis reveals a key role of the ETS transcription factor family in interferon signaling. Commun Biol 5, 1-12. https://doi.org/10.1038/s42003-021-02991-5. 4. Srivastava, H., Ferrell, D., and Popescu, G.V. (2022). NetSeekR: a network analysis pipeline for RNA-Seq time series data. BMC Bioinformatics 23, 54. https://doi.org/10.1186/s12859-021-04554-1. 5. Hu, Y., Guo, X., Yun, Y., Lu, L., Huang, X., and Jia, S. (2025). DisGeNet: a disease-centric interaction database among diseases and various associated genes. Database 2025, baae122. https://doi.org/10.1093/database/baae122. 6. Maden, S.K., Walsh, B., Ellrott, K., Hansen, K.D., Thompson, R.F., and Nellore, A. (2023). recountmethylation enables flexible analysis of public blood DNA methylation array data. Bioinformatics Advances 3, vbad020. https://doi.org/10.1093/bioadv/vbad020.

    2. AbstractBackground Technological advances in sequencing and computation have allowed deep exploration of the molecular basis of diseases. Biological networks have proven to be a useful framework for interrogating omics data and modeling regulatory gene and protein interactions. Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have provided a rich resource for building and validating new computational methods resulting in a plethora of open-source software for downloading, pre-processing, and analyzing those data. However, for an end-to-end analysis of regulatory networks a coherent and reusable workflow is essential to integrate all relevant packages into a robust pipeline.Findings We developed tcga-data-nf, a Nextflow workflow that allows users to reproducibly infer regulatory networks from the thousands of samples in TCGA using a single command. The workflow can be divided into three main steps: multi-omics data, such as RNA-seq and methylation, are downloaded, preprocessed, and lastly used to infer regulatory network models with the netZoo software tools. The workflow is powered by the NetworkDataCompanion R package, a standalone collection of functions for managing, mapping, and filtering TCGA data. Here we show how the pipeline can be used to study the differences between colon cancer subtypes that could be explained by epigenetic mechanisms. Lastly, we provide pre-generated networks for the 10 most common cancer types that can be readily accessed.Conclusions tcga-data-nf is a complete yet flexible and extensible framework that enables the reproducible inference and analysis of cancer regulatory networks, bridging a gap in the current universe of software tools.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf126), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Xi Chen

      Fanfani et al. present tcga-data-nf, a Nextflow pipeline that streamlines the download, preprocessing, and network inference of TCGA bulk data (gene expression and DNA methylation). Alongside this pipeline, they introduce NetworkDataCompanion (NDC), an R package designed to unify tasks such as sample filtering, identifier mapping, and normalization. By leveraging modern workflow tools—Nextflow, Docker, and conda—they aim to provide a platform that is both reproducible and transparent. The authors illustrate the pipeline's utility with a colon cancer subtype example, showing how multi-omics networks (inferred via PANDA, DRAGON, and LIONESS) may help pinpoint epigenetic factors underlying more aggressive tumor phenotypes. Overall, this work addresses a clear need for standardized approaches in large-scale cancer bioinformatics. While tcga-data-nf promises a valuable resource, the following issues should be addressed more thoroughly before publication: 1. While PANDA, DRAGON, and LIONESS form a cohesive system, they were all developed by the same research group. To strengthen confidence, please include head-to-head comparisons with other GRN inference methods (e.g., ARACNe, GENIE3, Inferelator). A small benchmark dataset with known ground-truth (or partial experimental validation) would be especially valuable. 2. Although the manuscript identifies intriguing TFs and pathways, it lacks confirmation through orthogonal data or experiments. If available, consider including ChIP-seq or CRISPR-based evidence to reinforce at least a subset of inferred regulatory interactions. Even an in silico overlap with known TF-binding sites or curated gene sets would help validate the predictions. 3. PANDA and DRAGON emphasize correlation/partial correlation, so they may overlook nonlinear or combinatorial regulation. If feasible, please provide any preliminary steps taken to capture nonlinearities or discuss approaches that could be integrated into the pipeline. 4. LIONESS reconstructs a network for each sample in a leave-one-out manner, which can be demanding for large cohorts. The paper does not mention runtime or memory requirements. Adding a Methods subsection with approximate CPU/memory benchmarks (e.g., "On an HPC cluster with X cores, building LIONESS networks for 500 samples took Y hours") is recommended to guide prospective users. 5. Currently, the pipeline only covers promoter methylation and standard gene expression, yet TCGA and related projects include other data types (e.g., miRNA, proteomics, histone modifications). If possible, offer a brief example or instructions on adding new omics layers, even conceptually. 6. Recent methods often target single-cell RNA-seq, but tcga-data-nf is geared toward bulk datasets. Please clarify limitations and potential extensions for single-cell or multi-region tumor data. This would help readers understand whether (and how) the pipeline could be adapted to newer high-resolution profiles. Minor point: 1. Provide clear guidance on cutoffs for low-expressed genes, outlier samples, and methylation missing-value imputation. 2. Consider expanding the supplement with a "quick-start" guide, offering step-by-step usage examples. 3. Ensure stable version tagging in your GitHub repository so that readers can reproduce the exact pipeline described in the manuscript.

    1. ABSTRACTNanopore sequencing is a widespread and important method in genomics science. The raw electrical current signal data from a typical nanopore sequencing experiment is large and complex. This can be stored in two alternative file formats that are presently supported: POD5 is a signal data file format used by default on instruments from Oxford Nanopore Technologies (ONT); SLOW5 is an open-source file format originally developed as an alternative to ONT’s previous file format, which was known as FAST5. The choice of format may have important implications for the cost, speed and simplicity of nanopore signal data analysis, management and storage. To inform this choice, we present a comparative evaluation of POD5 vs SLOW5. We conducted benchmarking experiments assessing file size, analysis performance and usability on a variety of different computer architectures. SLOW5 showed superior performance during sequential and non-sequential (random access) file reading on most systems, manifesting in faster, cheaper basecalling and other analysis, and we could find no instance in which POD5 file reading was significantly faster than SLOW5. We demonstrate that SLOW5 file writing is highly parallelisable, thereby meeting the demands of data acquisition on ONT instruments. Our analysis also identified differences in the complexity and stability of the software libraries for SLOW5 (slow5lib) and POD5 (pod5), including a large discrepancy in the number of underlying software dependencies, which may complicate the pod5 compilation process. In summary, many of the advantages originally conceived for SLOW5 remain relevant today, despite the replacement of FAST5 with POD5 as ONT’s core file format.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf118), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Jan Voges

      Comments to Author: Synopsis: The manuscript builds on the authors' previous work introducing the SLOW5 format for Oxford Nanopore signal data as an improvement over the FAST5 format. Since then, Oxford Nanopore Technologies (ONT) has introduced its own new format, POD5. This paper directly compares SLOW5 and POD5. The authors claim that SLOW5 provides higher reading speeds for both sequential and random access, writing speeds sufficient to keep pace with data acquisition in sequencing machines, comparable file sizes with no significant storage penalty, a simpler implementation with fewer dependencies. The paper is clearly written, includes extensive supplementary information, and references the source code for all tools used in the experiments. Comments: - Sequential access performance: To me it is unclear whether SLOW5's advantage in sequential access originates from its file layout or from the use of mmap I/O versus traditional I/O. A small ablation study, forcing both SLOW5 and POD5 tools to use the same I/O method on platforms with currently large performance differences, would clarify where the performance gain originates from. - Figure 4: While POD5's dependency structure is indeed more complex than that of slow5lib, the current tree representation exaggerates this complexity. Many common packages (e.g., Python, zlib) appear multiple times as dependency of multiple other packages. A dependency graph where each package appears only once would be a more informative representation. - Figure 5: POD5 versions prior to 0.1.0 appear to be preview releases (and are even marked as such on GitHub). Breaking changes during early previews are normal, so including them in the same visual space as stable versions risks being misleading. - Figure 5: Breaking change at version 0.1.12: The timeline indicates a breaking change at POD5 version 0.1.12 which seems particularly relevant as the latest breaking change after version 0.1.0. However, this change is not reflected in the POD5 compatibility matrix on the right. An explanation of what type of breaking change occurred would clarify its impact and help readers assess compatibility risk. - Random access "walker strategy": A brief explanation comparing it to SLOW5's index-file approach would improve accessibility without requiring readers to consult external documentation.

    2. ABSTRACTNanopore sequencing is a widespread and important method in genomics science. The raw electrical current signal data from a typical nanopore sequencing experiment is large and complex. This can be stored in two alternative file formats that are presently supported: POD5 is a signal data file format used by default on instruments from Oxford Nanopore Technologies (ONT); SLOW5 is an open-source file format originally developed as an alternative to ONT’s previous file format, which was known as FAST5. The choice of format may have important implications for the cost, speed and simplicity of nanopore signal data analysis, management and storage. To inform this choice, we present a comparative evaluation of POD5 vs SLOW5. We conducted benchmarking experiments assessing file size, analysis performance and usability on a variety of different computer architectures. SLOW5 showed superior performance during sequential and non-sequential (random access) file reading on most systems, manifesting in faster, cheaper basecalling and other analysis, and we could find no instance in which POD5 file reading was significantly faster than SLOW5. We demonstrate that SLOW5 file writing is highly parallelisable, thereby meeting the demands of data acquisition on ONT instruments. Our analysis also identified differences in the complexity and stability of the software libraries for SLOW5 (slow5lib) and POD5 (pod5), including a large discrepancy in the number of underlying software dependencies, which may complicate the pod5 compilation process. In summary, many of the advantages originally conceived for SLOW5 remain relevant today, despite the replacement of FAST5 with POD5 as ONT’s core file format.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf118), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Wouter De Coster

      The authors describe the SLOW5 format and its benefits compared to the standard POD5 format for storing raw sequencing data from nanopore sequencers. The paper is well written and easy to understand. The advantages of the SLOW5 format are clear, and the comparison is adequately executed and described. However, the developers seem unable to persuade others to adopt it widely, and change might need to come from ONT themselves, who may be most concerned about disrupting their existing workflows, especially for parallel writing during sequencing. Nevertheless, the authors seem to have also addressed that issue, as demonstrated with a simulation experiment.

      Please find my specific suggestions below.

      Sincerely, Wouter De Coster

      Major: While I understand that the software name SLOW5 was an initial variation of the FAST5 format, I don't think that the words 'slow' or the number '5' are particularly appropriate descriptions or helpful in making a case for using the file format, as it is neither slow nor related to HDF5. However, once a name is chosen, I understand the reluctance to change it. Additionally, it seems the evaluations are conducted using the binary BLOW5 format. Wouldn't it then make more sense to emphasize BLOW5 in the text and title?

      Minor: I would italicize the 'make' tool for users unfamiliar with build tools in the Usability section, as it is a rather strange sentence if reading 'make' as a verb, not a tool. Perhaps the same could be applied to other dependencies in that section for consistency. Then again, the primary target audience will probably understand what 'make' means in this context.

      There is a typo in the benchmarking procedure section: 'confoudning'.

    1. AbstractBackground Single-cell RNA-seq suffers from unwanted technical variation between cells, caused by its complex experiments and shallow sequencing depths. Many conventional normalization methods try to remove this variation by calculating the relative gene expression per cell. However, their choice of the Maximum Likelihood estimator is not ideal for this application.Results We present GTestimate, a new normalization method based on the Good-Turing estimator, which improves upon conventional normalization methods by accounting for unobserved genes. To validate GTestimate we developed a novel cell targeted PCR-amplification approach (cta-seq), which enables ultra-deep sequencing of single cells. Based on this data we show that the Good-Turing estimator improves relative gene expression estimation and cell-cell distance estimation. Finally, we use GTestimate’s compatibility with Seurat workflows to explore three common example data-sets and show how it can improve downstream results.Conclusion By choosing a more suitable estimator for the relative gene expression per cell, we were able to improve scRNA-seq normalization, with potentially large implications for downstream results. GTestimate is available as an easy-to-use R-package and compatible with a variety of workflows, which should enable widespread adoption.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf084), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Amichai Painsky

      This paper introduces a Good-Turing (GT) estimation scheme for relative gene expression estimation and cell-cell distance estimation. The proposed methods, namely GTestimate, claims to improve upon conventional normalization methods by accounting for unobserved genes. The idea behind this contribution is fairly straightforward - since the relative gene expression is of large alphabet, a GT estimator is expected to preform better than a naive ML approach. However, I am not convinced that the authors applied it correctly. First, the proposed GT estimator (as appears in (GT)) in the text), assigns a zero estimate to unobserved genes (Cg = 0). This contradicts the entire essence of using a GT estimator. Second, it makes no since to use this expression for every Cg > 0. In fact, any reasonable GT based estimator applies GT for relatively small Cg, and ML estimator for large Cg. See [1] for a through discussion. The choice of a threshold between "small" and "large" Cg's is subject to many studied (for example [2], [1]), but it makes no sense to use the above expression for any Cg. Finally, notice that if N_{Cg} > 0 for some g but N_{Cg+1} = 0, the proposed estimator is not defined. There exists several smoothing solutions for such cases (for example [3]), but they need to be properly discussed. to conclude, I am not sure what is the effect of these issues on the experiments in the paper, which makes it difficult to assess the results.

      REFERENCES

      [1] A. Painsky, "Convergence guarantees for the good-turing estimator," Journal of Machine Learning Research, vol. 23, no. 279, pp. 1-37, 2022. [2] E. Drukh and Y. Mansour, "Concentration bounds for unigram language models." Journal of Machine Learning Research, vol. 6, no. 8, 2005. [3] W. A. Gale and G. Sampson, "Good-Turing frequency estimation without tears," Journal of quantitative linguistics, vol. 2, no. 3, pp. 217-237, 1995.

    2. AbstractBackground Single-cell RNA-seq suffers from unwanted technical variation between cells, caused by its complex experiments and shallow sequencing depths. Many conventional normalization methods try to remove this variation by calculating the relative gene expression per cell. However, their choice of the Maximum Likelihood estimator is not ideal for this application.Results We present GTestimate, a new normalization method based on the Good-Turing estimator, which improves upon conventional normalization methods by accounting for unobserved genes. To validate GTestimate we developed a novel cell targeted PCR-amplification approach (cta-seq), which enables ultra-deep sequencing of single cells. Based on this data we show that the Good-Turing estimator improves relative gene expression estimation and cell-cell distance estimation. Finally, we use GTestimate’s compatibility with Seurat workflows to explore three common example data-sets and show how it can improve downstream results.Conclusion By choosing a more suitable estimator for the relative gene expression per cell, we were able to improve scRNA-seq normalization, with potentially large implications for downstream results. GTestimate is available as an easy-to-use R-package and compatible with a variety of workflows, which should enable widespread adoption.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf084), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Gregory Schwartz

      In this manuscript, Fahrenberger et al. propose a new scRNA-seq normalization method to more accurately report UMI counts of individual cells. They specifically use a Good-Turing estimator, compared with a more commonly used Maximum Likelihood estimator, to adjust raw UMI counts. Using their own cta-seq, a cell targeted PCR-amplification strategy, as ground truth, they compare their estimator with a traditional size-corrected estimator. Furthermore, they illustrate downstream changes using their method, including changes to clustering results and spatial transcriptomic readouts. The manuscript was a clear read and presents an interesting alternative solution to an often overlooked, but important, problem. However, there are some aspects of the manuscript that need to be addressed. Some major content missing includes comparisons with more widely-used normalization methods throughout the manuscript, and better ground truth data sets in their downstream analysis. Specific comments are as follows:

      l. 34: To my knowledge, most groups do not use a single division by total UMI count as the only normalization. Seurat has NormalizeData, but also heavily promotes scTransform, a completely different method. Many use log transform (as I believe was done here), some use quantile transform, others use regression techniques etc. It was odd to see these standard normalizations missing in comparisons. The authors should use such standard procedures to demonstrate the superiority of GT.

      l. 42: Is there a justification for the successor function being applied within the frequency ((cg + 1) / total) instead of outside ((cg / total) + 1) as is expected with the Good-Turing estimation?

      Furthermore, there is typically a smoothing function for erratic N_cg values, which I would expect with single-cell data. In the methods there is a brief mention of linear smoothing, but that would imply that the GT equation is misleading and oversimplified. The actual equation should be included in the main text to avoid confusion.

      l. 58: Compared to 16,965 reads average per cell, what is the equivalent for the ultra-deep sequencing (not 23 million reads, as that is not 7.4 fold increase)?

      I am not entirely convinced on the use of cta-seq as a ground-truth for the cells, especially in comparison with ML. The authors should show that cta-seq has similar UMI and gene count distributions to more popular scRNA-seq technologies (e.g. 10x Chromium) or the application may be specific to cta-seq only.

      l. 110: Instead of using unknown classification data sets, there are existing cell-sorted data sets with ground truths (many even on the 10x website). The authors should use these data sets to compare downstream analysis.

      l. 125: The spatial transcriptomic results were very subjective, with no statistical hypotheses. The entire manuscript is missing any sort of statistics when comparing methods, which is a major flaw and should be rectified. Here specifically, the color scale stops at 3, but does this carry over to the relative differential expression? The claim is that it is constant, but if they are all greater than 3 then they must be quite variable, so it is surprising to see such a constant value of 0. Maybe the complete color scale should be shown on all figures to clarify this.

      From my understanding of the manuscript, the 18 cells for analysis and comparison were chosen based on a typical Seurat analysis. This technique introduces a range of biases into the comparison and makes the argument a bit circular.

      For a bias example, the top 2000 most variable genes were used, suggesting that entire classes of genes may be ignored even when highly or lowly expressed, such as housekeeping genes.

      There also appears to be many steps that were not entire justified outside of a "typical analysis", for example excluding a cluster in the analysis (just because it was not that large?), only selection 18 cells (why 6 from each cluster?), removing cells with less than 1000 expressed genes or over 8% mitochrondrial reads (this may be an issue, and removing specific cell types or proliferating cells, this should be a bivariate removal with justification). All of these filterings remove generalizeability of GT.

      Supplementary Figures in the text hyperlink to the main figures which is confusing. More importantly, the caption of Supplementary Figures read "Figure" rather than "Supplementary Figures".

    1. What if your biggest competitive asset is not how fast AI helps you work, but how well you question what it produces?

      The idea that AI isn't all-knowing, but rather we should doubt it and apply ourselves as it was made by humans after all.

    2. One recent study found that in 40 per cent of tasks, knowledge workers —those who turn information into decisions or deliverables, like writers, analysts and designers —accepted AI outputs uncriticallywith zero scrutiny.

      If the workers accept the word of AI blindly, then the owners also accept the word of what the workers gave them, we will be in a world completely run by AI.

    3. One study found that users have a tendency to follow AI advice even when it contradicts their own judgment, resulting in a decline in confidence and autonomous decision-making.

      This is concerning, as the only thing we believe we can trust, we go against constantly because a chatbot or AI tells us otherwise.

    4. Such shifts can affect how people make decisions, calibrate trust and maintain psychological safety in AI-mediated environments.

      AI is far stronger than we realize, even affecting humans on a psychological level, weakening our abilities to think critically, making us more dependent on the AI, and making us lazier.

    5. One recent emerging studytracked professionals’ brain activity over four months and found that ChatGPT users exhibited 55 per cent less neural connectivity compared to those working unassisted. They struggled to remember the essays they just co-authored moments later, as well reduced creative engagement.

      Even the act of using AI consistently is actively weakening the neural connectivity of the brain.

    6. As we are starting to see, the drive for efficiency will not decide which firms are most successful; the ability to interpret and critically assess AI outputs will.

      This is how to truly use AI for good in the workplace, maximizing its abilities and usage.

    7. As researchers who study AI, psychology, human-computer interaction and ethics, we are deeply concerned with the hidden effects and consequences of AI use.

      Time and time again, AI is being perceived as a potential threat to mankind. The fact that we continue to pursue it could be our downfall, the vaulting ambition of our race.

    8. If people don’t set these defaults, tools like AI will instead.

      Incredibly short yet powerful on how AI will impact the job market and the lives of workers.

    9. Most organizational strategies focus on AI’s short-term efficiencies, such as automation, speed and cost saving.

      Companies, despite using AI for many minimal tasks, are not looking at the big picture as to how AI could be applied to more difficult and advanced tasks, whether it be drug synthesis or ideas for marketing.

    10. But in the rush to adopt AI, some organizations are overlooking the real impact it can have on workers and company culture.

      AI is impacting all of us immensely, both visibly and invisibly, from taking jobs from citizens to creating new jobs for others.

    1. Dickinson uses the Fly as a metaphor for the interruptions and uncertainties of death. In the lines, “There interposed a Fly With Blue uncertain stumbling Buzz” she shows how something small and ordinary can disrupt a profound, emotional moment, emphasizing the unpredictability and even the mundane reality present at the time of death.

  2. docdrop.org docdrop.org
    1. The maxim "less contact, less learning" succinctly summarizes the argu-ments supporting students' exposure to quality language models and in-struction. 42 fo·lea~n a°qanguage-we'tr,"'orre"'ftrrmflave-st1stained inter~1?t10ns \~t edueated-~ative-speakers--of-Englrnn, as \vellasgooct1angu_a~ i~s~~~l(-tion. Students can only learn the new language in the style to which they are exposed. If an English-language learner lives and talks daily with Eng-lish speakers in a boarding school in London, she will learn a very different kind of English and sound very different than if she had been immersed in a public school in Atlanta, Sidney, or Toronto. Likewise, someone hoping to improve their Spanish-speaking skills will sound very different ,1fter ,in extended study-abroad stay in Madrid, Mexico City, Santo Domingo, or Buenos Aires.

      “Less contact, less learning.” The key to learning a language lies not in mere classroom hours or memorized vocabulary, but in sustained interaction with high-quality language input. In other words, language proficiency is shaped within authentic contexts, not through isolated grammar drills. The example illustrates how different English or Spanish learning environments cultivate entirely distinct linguistic styles and pronunciation traits, revealing the social and contextual nature of language acquisition. From an educational perspective, this passage reminds teachers that language instruction cannot rely solely on textbooks or exams. Instead, educators should create rich communicative situations that allow students to truly “immerse” themselves in the language and culture. Simultaneously, it reflects the structural inequality faced by immigrant students in language learning—if they lack sustained interaction with native speakers, they are effectively deprived of the conditions necessary for language development.

    2. Clearly, if we are to expect newcomer students to learn English, as they and we would like them to, our schools need to do a better job of develop-ing educational contexts that will make it happen. Our focus at the begin-ning of the study was very student-centered; we considered the resources the students brought with them, the engagement they brought to the task, as well as the educational contexts they encountered. But while these fac-tors certainly contribute to language acquisition, the schools also play a fundamental role in whether students learn English. Our findings parallel those of Gary Orfield, Guadalupe Valdes, Laurie Olsen, and others who have insightfully described the intense physical and linguistic segregation that many newcomer immigrant students encounter. 54 While there have been some attempts to address the needs of students coming in at the ele-mentary level, there has been a lamentable and disconcerting absence of ef-forts to meet the needs of English-language learners arriving at the second-ary school level.55 This gap absolutely needs to be addressed if we wish to harness the energies of all of our newcomer students.

      Immigrant students require at least seven to ten years of high-quality learning environments to truly master “academic English,” yet current education policies demand they pass standardized tests within three years. This unrealistic expectation not only creates psychological pressure but also systematically produces “losers.” It reveals how U.S. education policies prioritize “measurable outcomes” over fairness and growth within the long-term learning process. This “time violence” exemplifies how the education system sacrifices marginalized groups under the logic of efficiency. When annotating this passage, one might reflect on whether educational assessment should shift toward “developmental support” rather than “elimination-based screening.”

    3. Today, immigration is once again a momentous social force, compelling Americans to face the challenge and opportunity of integrating and har-nessing the energy of the greatest number of immigrants in the nation's history. By 2005 there were well over 35 million immigrants in the United States-some 12.4 percent of the U.S. population.

      American society has long harbored cultural anxieties and identity insecurities regarding immigration. The author notes that Americans' concerns over whether immigrants “are willing to learn English” are not a new phenomenon, but rather a recurring “political discourse” that resurfaces during periods of economic and social upheaval. At its core, this anxiety stems from fears about national identity and cultural purity. Learning English here is treated as a symbol measuring “loyalty” and “degree of Americanization,” rather than a matter of linguistic ability. This reflects how language is politically employed as an “assimilation tool,” maintaining the stability of social power structures by creating distinctions between “good immigrants” and “bad immigrants.” When annotating this passage, consider: Is learning English truly an educational goal, or an institutionalized social expectation?

    1. . My parents tried to talk to my teacher about it, but it was kind of hard. They don’t really speak much English and my teacher wasn’t much of a help either. She cancelled a couple meetings with them and, you know, they were taking time off work to go, so they felt bad, like she wasn’t respecting their time. When they fi nally met she really scared them with stories about teachers being attacked by students and that she didn’t feel safe there. They ended up taking me out of school a couple weeks later.

      Parents struggled to communicate with teachers due to limited English proficiency. Hoping to understand the situation through face-to-face interaction, they were further marginalized by the teacher's negligence and fear-mongering narrative. The teacher's repeated cancellations not only reflect a disregard for immigrant families' time and effort but also reveal the system's implicit exclusion of non-native English-speaking parents. More alarmingly, when this teacher used the story of “students attacking teachers” to intimidate parents, she effectively transformed the educational space into a realm of distrust and fear, misleading parents into believing their children were unsafe at school. Ultimately, the student's forced withdrawal from school reveals how structural discrimination, through the accumulation of everyday interactions, quietly deprives immigrant families of educational opportunities. This narrative prompts reflection: true inclusive education occurs not only within the classroom but hinges on whether teachers are willing to listen to every family with respect and equality.

    2. What would be most benefi cial for the successful transitions of undocu-mented immigrant students are school structures and cultures that facilitate positive interactions between students, teachers, and staff, allowing those at all levels to develop school-based social capital and build relationships of trust so critical to their success. By investing in a baseline of support for all students, schools could develop support structures necessary to facilitate more targeted outreach to undocumented students. This is not only a social justice issue, but an economic imperative for the nation

      Institutional support and social capital play a pivotal role in the educational transition of undocumented immigrant students. The author argues that relying solely on individual teachers' compassion or students' personal efforts is insufficient; true change stems from systemic adjustments to school structures and cultures. When schools foster an atmosphere that encourages interaction, trust, and inclusion, the connections formed among students, teachers, and administrators create a “school-based social capital” that prevents undocumented students from remaining isolated. Notably, the author elevates this issue to the levels of social justice and economics, arguing that supporting undocumented students is not only a moral obligation but also vital to the nation's future development. This framing transcends narrow humanitarian perspectives on immigrant education, instead proposing a broader vision for structural reform. It reminds us that educational equity and societal prosperity are interdependent.

    3. Together with six siblings and her two parents, she came to the U.S. when she was just nine years old. Flor’s formative years were diffi cult and shaped in her a sense of ambivalence about the future. She realized from an early age that her lack of papers— papeles—would keep her from the good jobs she dreamed of as a child. She also felt like an outsider at school, internalizing a belief that no one was looking out for her—that she was on her own.

      Flor realized at a young age that “lack of papers” was not merely a legal issue but a form of enduring social exclusion, fostering a sense of “ambivalence” about her future. This internalized feeling of ‘invisibility’ led her to develop a survival strategy of “isolating herself” in school—believing she must face everything alone. This narrative reveals how immigrant status shapes one's self-perception and social positioning at a psychological level, while also exposing the profound impact of institutional exclusion (such as immigration restrictions) on educational opportunities and life aspirations. Flor's story is not an isolated case, but rather a microcosm of the struggles faced by countless undocumented students navigating the American education system.

    1. We see this coun-ter-narrative as a crucial element in the development of a systematic analysis of the racism, classism, and linguicism that permeate much of urban educa-tion as well as in the development of culturally relevant curricula

      Racism, classism, and linguicism are pervasive in urban education, and schools' overemphasis on “monolingual literacy standards” perpetuates these inequalities. By demonstrating how families and communities serve as children's “invisible classrooms,” the author calls on teachers to redefine their roles—not merely as knowledge transmitters, but as cultural bridge-builders. By acknowledging and leveraging students' home literacy experiences—such as religious practices, games, and bilingual storytelling—teachers can make education truly inclusive and socially just.

    2. We came to understand that there is a distinction between places as the actual locations while spaces are constructed by human actors who are, in turn, shaped by those spaces in fluid and reciprocal pro-cesses.

      This passage reveals the theoretical significance of the author's adoption of the “spatial turn”—she distinguishes between ‘place’ and “space.” Place refers to physical existence, while space is a product of social and cultural actions. In other words, literacy spaces are not naturally occurring; they are co-created by family members through daily interactions, language, objects, and emotions. For instance, Benny's bedroom or Miguel's library experience are not merely “places,” but learning “spaces” imbued with meaning through their engagement. This reminds educators that literacy development occurs not only in classrooms but also within children's daily lives. Those seemingly ordinary corners—the dining table, the church pew, the computer desk—are all vital educational settings.

    3. We planned to investigate both the places outside of school, in their homes and communities, where the two children and their families accessed literacy resources and the formal and informal literacy interactions that they con-structed there. In this way, we hoped to problematize the common privileging of school-centered literacy and education, challenge the discriminatory

      The author explicitly states that her research does not aim to replicate the conventional narrative of “resource scarcity in impoverished families,” but rather to construct a counter-narrative revealing how low-income Latino families proactively create literacy opportunities. The key term here is “agency”—meaning families and children are not passive recipients but active knowledge constructors. This perspective overturns the previous school-centered, standardized literacy view rooted in white middle-class norms. It also prompts us to rethink the true meaning of “educational equity”: equity does not mean having every child learn in the same way, but ensuring that every culture's learning methods are seen and respected.

    4. other important volumes were kept on a high shelf. As there were no book stores in his neighborhood, his grandmother took him to secondhand stores to purchase books, looking especially for ones with maps, one of his passions. Both boys owned a DS (dual-screen hand-held game console) and other elec-tronic toys and games

      This paragraph provides a vivid counterexample to the stereotype that low-income families lack educational resources or value literacy. The detailed descriptions of the boys’ homes—filled with books, newspapers, maps, magazines, and even technology like iPods and GPS devices—show how these families actively create literacy-rich environments that reflect their interests, cultures, and daily lives. I believe this approach is absolutely correct. My mother was also born in northern China, an area with scarce educational resources. Yet her mother relentlessly pushed every child in the family to study hard, sending them all to university. That's why I can now enjoy a quality education in a great city. In their eyes, education truly changed their destiny—all because of thoseold books sold one by one at street stalls.

    5. we focused on the strengths and resources of the children and their families, rather than their needs and alleged deficits as often described in the dominant discourse (Arzubiaga, Ceja, & Artiles, 2000). We knew that many Latino children had rich literacy lives—often invisible to teachers in urban schools or dismissed as irrelevant to school learning—and that they and their families possessed expertise and funds of knowledge (González, Moll, & Amanti, 2005; Long et al., 2007; Spencer et al., 2010) that could serve as the basis for a culturally relevant curriculum (Boardman et al., 2014; Gay, 2010)

      I believe diversity in education is crucial. As mentioned in the article, Latino children possess remarkable reading aptitude, yet this talent is often overlooked by teachers. In elementary school, I was a student with severe academic imbalances—I struggled immensely with math, consistently ranking near the bottom of the class. However, I possessed a natural aptitude for both English and Chinese. When given sufficient time to develop my ideas, my compositions were even selected by teachers to be read aloud to the entire class. Consequently, I always believed I had strengths during that time. Because the subjects I excelled in were valued by my teachers, I became even more motivated to study those particular subjects diligently.

    1. However, Arturo is failing as a reader in both English and Spanish. Ms. Stewart, Arturo’s English teacher, views him as a disengaged reader, not mak-ing progress, and not having the English vocabulary to engage with the chapter books that they are read-ing. Arturo is placed in the group with the lowest reading level. The stories they read are not complex, and the work in the group is mostly about vocabu-lary buildup. Ms. Stewart blames Arturo’s slow prog-ress on his Spanish. Similarly, Ms. Medina, Arturo’s Spanish teacher, believes that he does not have suf-ficient Spanish-language vocabulary to make sense of the Spanish-language chapter books. For Ms. Medina, raised and educated in Colombia through university, Arturo’s Spanish is simply deficient

      A shift in educators' perspectives can profoundly impact students' reading abilities. Initially, teachers evaluated Arturo's English and Spanish skills separately, concluding he “failed in both languages.” However, when educators began creating “cross-language spaces” in the classroom—allowing students to freely switch between English and Spanish for performances and discussions—Arturo demonstrated rich critical thinking and cultural insight. This transformation underscores the pivotal role of teacher attitudes in language education—students' “proficiency” is often not lacking, but obscured by narrow assessment methods. The author uses this case to remind us: educational equity lies not merely in offering bilingual programs, but in whether teachers can genuinely understand, respect, and enter students' linguistic worlds.

    2. I start with Paco, the 3-year-old bilingual child whose mother is a U.S.-born Latina woman and whose father is a U.S.-born white man. The mother grew up in a bilingual home, the father in a monolingual one, but he studied Spanish in high school. The family is comfortable in a translanguaging space, where their use of English and Spanish is unbounded, dynamic, and fluid and adapts to meet the communicative expectations of the many different people who enter the home.

      Paco's example vividly demonstrates the naturalness of multilingual practices in early childhood language development. While reading Jorge el Curioso, he freely mixed English and Spanish, using gestures and sounds to express the story—a behavior encouraged and praised in the home environment rather than corrected. This illustrates that language learning itself is multimodal, emotionally charged, and physically engaged, rather than a rigid accumulation of grammar rules. When annotating this passage, note the author's implicit critique: formal schooling often stifles such free expression, transforming children from “language creators” into “language conformists.” Paco's multilingual reading practice at home reminds us that authentic language education should center on comprehension and expression, not solely on linguistic correctness.

    3. In this article, I argue that the act of reading does not depend on the language of the written text or even on the concept of a named language such as English or Spanish. Rather, the act of reading is about readers assembling all their meaning-making resources and acting on them to read themselves.

      The process of reading does not depend on the “designation language” used in the text (such as English or Spanish), but rather on how readers utilize their entire linguistic repertoire to comprehend the text. This perspective challenges the assumption of “language compartmentalization” in traditional language education, proposing a more fluid and authentic approach to understanding. For Hispanic bilingual students, this cross-linguistic perspective holds profound significance, as it acknowledges their natural switching between two cultures and languages as a strength rather than a flaw. It also prompts reflection on the drawbacks of an educational system overly fixated on “linguistic purity”—where schools often view language mixing as “distraction,” when in fact it embodies the very essence of bilingual thinking and creativity.

    1. But told to whom? Who is the reader I’m addressing when I am writing in English?

      My question: I wonder how writers from countries in war can tell their true stories when they write in English, which is not their first language. Do they lose part of their real voice? Or maybe writing in English helps them reach more people and fight back against silence. Can writing in another language be a kind of power or it loses the originality of the story?

      This question makes me think about how translation and writing can change how stories are heard and understood.

    2. All the life squeezed out of them so that they fit into one headline. Sentences become coffins too small to contain all the multitudes of grief.

      Why this truth is important: This line tells a hard truth: that the news often makes stories of war and pain too small. When we read about people suffering, the headlines don’t show how big and real their pain is. The image of “sentences as coffins” means that sometimes writing can hide people’s emotions instead of showing them. It reminds me that we must use words carefully, because they can give life or take it away.

    3. To translate a text is to enter into the most intimate relationship with it possible. It is the translator’s body, almost more so than the translator’s mind, that is the vessel of transfer.

      Why it’s beautiful to me: This line feels beautiful because it turns the act of translation into something alive and human. Mounzer describes translation not as a mechanical task but as a relationship of empathy and feeling , almost like giving life to someone else’s experience inside your own body. As a reader, I find that image powerful because it shows that language connects people emotionally, and not just intellectually.

    4. When you say the word catastrophe, no one need ever ask which one it is you mean
      1. A place in the article where you have a question - try to make the question relevant to things we've been talking about in class, or relevant to your own life and interests.

      One of the most significant interests I have is colonial studies. To paraphrase a famous quote from Malcom X, I find it incredibly interesting to examine the wound left by the knife of colonialism, and how it still effects the global south, in spite of the fact that many people refuse to admit that there is a wound. Through this interest I have learned a decent amount of history about many countries, like Botswana, Egypt, Chile; but what's funny to me (as someone who is Arab) is that I have a huge gap in knowledge when it comes to the history (in particular post-ottoman history) of the Arab world, especially the Levant. So the entire time I was reading this article I was searching in my brain to say if there were any particular conflicts in the area, unfortunately there is a nearly infinite amount of those, that she could be referencing but I couldn't put a finger on it.

      All this to say, I am very interested to know which conflicts she has personally experienced in the region.

    5. There is a violence in undoing someone’s words and reconstituting them in a vocabulary foreign to them, a vocabulary of your own choosing
      1. A sentence, expression, or paragraph that you felt told a very important and deep truth - what makes this truth important or special to you?

      I think this particular quote really hits at something I consider to be very true and does so in a very literarily rich way. Everyone in our class is bilingual but I'm not sure how many people in our class have dual identities like I do. English and Arabic are not just languages to me they represent two very different parts of my identitity and my life, and so her description of the sometimes visceral nature of translation. I experience it every day, I would say about 50% of the Arabic I understand I don't undertsand through the Arabic language itself, it has to be filtered through and translated into English in my head for me to properly understand it. As for when I am speaking, I would say 80% of the Arabic I speak does not come from words or feelings that naturally come to me from the Arabic language, they come to me in English and I have to translate it. In the process, I feel like the words lose their ability to express my emotions, this stripping of their true meaning is what this quote really captures very well.

    6. They were light in English, yes, but also cumbersome and huge. Giant styrofoam shapes
      1. A sentence, expression, or paragraph that you found beautiful - why is it personally beautiful to you?

      I found this particular quote beautiful b/c for whatever reason styrofoam is one of those things thats very tactile-ally memorable to me. Its one of those things whose feeling i can instantly imagine once its called to my attention, and the way the auhor uses it here is really beautiful in my opinion. Its such a great way to convey this unique feeling she is describing, where something is both light but still is a burden and awkard to move with.

    1. The girls rejected mainstream spaces where they often felt marginalized and isolated, such as the ‘Main Street,’ a popular place to sit during lunch, recess, and after school. ‘Main Street’ was a ‘big hallway’ with tall ceilings and many windows located near the main school entrance. It reflected the racial, ethnic, and class diversity of Maple High. It was packed with many groups of students who often sat together based on race, class, and/or gender.

      They perceive the “Main Street” corridor in the main building as representing the school's social hierarchy and aesthetic power center—a sphere to which they do not belong. This rejection is not merely an avoidance of campus social structures but a symbolic critique of society: they refuse to conform to mainstream definitions of ‘attractiveness’ or “popularity,” instead choosing self-defined communities. By actively withdrawing from mainstream spaces, they forge new meaning and security within the “non-mainstream.” This behavior reveals how adolescents express social identity and cultural resistance through seemingly simple “spatial choices” in everyday campus life.

    2. The girls also co-invented a pan-Asian fused language in which Japanese functioned as an Esperanto, an international language. It was their version of ‘language crossing’ (rampton, 1995), using a language that did not ‘belong’ to them. Early in my fieldwork, I was surprised to hear the students use some Japanese words among themselves. While there were no Japanese students or teachers at Maple High, the school offered Japanese as a general language course, and many of the girls took it. Those who had fairly high Japanese skills through taking classes and/or actively watching Japanese dramas, movies, and anime took an active role in using Japanese words such as ‘nani’ (what?), ‘genki?’ (how are you?), and ‘onegai’ (please) with their friends. As the only proximal native Japanese speaker, they happily used a mix of English and Japanese when communicating with me and asked me to teach them Japanese. I often saw the girls carry binders, notebooks, and post-it notes with Japanese words (e.g. their names in Japanese) on them. One day after school, Mino and her basement friends spent time together at a nearby mall writing words and drawing pictures on Meli’s arms, hands, and legs. Mino later showed me a picture she drew on Meli’s arm: a cute rabbit face, which she called an ‘Asian face,’ with the Japanese word ‘kawaii’ written above it

      These girls have created a hybrid language blending elements of Japanese, English, Tagalog, and even Korean to express intimacy and identity among themselves. This linguistic practice demonstrates that they are not passively absorbing mainstream English culture, but actively constructing a multi-layered “pan-Asian cultural identity.” Simultaneously, it reveals the power dynamics underlying language—their choice of Japanese partly stems from Japanese culture's elevated status in global trends. This “cultural borrowing” serves as both a means of self-expression and a reflection of global cultural inequalities. This complexity lies at the heart of the tension inherent in cultural hybridity.

    3. ‘We dominate the basement!’ Gina, a 15-year-old Chinese American girl, proudly proclaims. This article, based on two years of ethnographic fieldwork, examines how a group of Asian American1 immigrant high school girls (Filipina, Vietnamese, Chinese, and Indian) construct this basement into a community, which they name the ‘Basement Group.’ While this group comprises students with diverse backgrounds, I specifically focus on the perspectives, voices, and experiences of a group of Asian American girls who are its founders and core members

      The basement is not merely a physical space; it symbolizes how Asian girls marginalized by mainstream society reclaim agency in the “borderlands.” They reject mainstream social spaces like dining halls and hallways, choosing a dimly lit, overlooked place as their “home.” Phrases like “We rule the basement” express their pride and sense of control. This behavior reflects their resistance to and redefinition of power structures, revealing that belonging and strength can emerge even in seemingly excluded spaces. Marginality does not equate to weakness; it can foster new cultural creations and self-identity.

    4. Since the main goal of this study was to capture the experiences of Asian American girls, I did not include most of the other Basement Group students in my research. There may be gender, ethnic, and/or racial differences that are not reflected in this study. As an exception, I talked with Savannah and Meli, two Salvadoran immigrant girls who were close friends with the Asian American girls and part of the core members of the Basement Community. Their perspectives helped deepen my understanding of the experiences of the main participants

      The author focuses mainly on Asian American girls but includes insights from two Salvadoran immigrant girls to broaden the perspective. This shows an effort to include diverse voices and recognize that gender and ethnicity can shape school experiences in different ways.

    1. I give it most of the credit for the fact that ours isthe wealthiest, most technologically advanced, and most socially just soci-ety in human history, not to mention the fact that we have with ease be-come a military superpower .... The rest of the world is quite rightlyimpressed with us, and it is thus no accident that the United States ofAmerica has become the biggest single exporter of public law in the his-tory of humankind.

      I can't help but think that parts of this attitude expressed by Calabresi is debatable not just in light of the condition of the US in the present day, but even when he made these comments in 1998. Many would certainly disagree that the US is/was the "most socially just society in human history" nor was it the case that the US "with ease [became] a military superpower."

    Annotators

    1. Purpose and Problem Solved

      Overall, your understanding of the Finalizer seems inadequate. The purpose of a Finalizer is threefold: first, to convert EVM words to Circom words, second, to generate a circuit witness (which will be converted into a proof in the backend), and third, to analyze the chain of symbols and generate a permutation.

      I think this part needs to be rewritten.

    2. Raw placement data from execution can be inefficient (redundant wires, unused connections)

      Nothing to do with efficiency. We have to do this so that Circom can deal with the data values produced by the Synthesizer. If we could avoid this, the performance had been improved.

    3. Large circuits slow down proving time

      This is true but not in this case. Doing this splitting wires slows down the proving time, but we have to do.

    4. Purpose and Problem Solved The Finalizer bridges the gap between symbolic execution and concrete circuit generation: Problem 1: Symbolic → Concrete Conversion During execution, the Synthesizer works with symbolic pointers (e.g., StackPt, MemoryPt) The backend prover needs concrete numerical wire connections Solution: Finalizer converts all symbolic references into actual wire indices and constraint equations Problem 2: Circuit Optimization Raw placement data from execution can be inefficient (redundant wires, unused connections) Large circuits slow down proving time EVM uses 256-bit values but Circom's finite field is 254-bit (field overflow risk) Solution: PlacementRefactor optimizes wire sizes, removes unnecessary connections, and splits 256-bit values into two 128-bit limbs for field compatibility Problem 3: Backend Integration Frontend and backend use different data structures Backend needs standardized JSON format for circuit loading Solution: Permutation class generates JSON files that match backend's expected schema Problem 4: Witness Data Management Circuit needs both structure (permutation) and concrete values (witness) Witness data must align with circuit wire indices Solution: Generates permutation.json (structure) and placement-specific witness files

      I think this introduction can be moved to the "Execution Flow" section.

    1. Generate

      "Utilize a combination of subcircuits to derive a new symbol to represent the true value in the EVM memory, ​​from existing symbols in MemoryPt."

    1. This technique can get up to 108 ideas from six participants in just 30 minutes, and it’s great if you want to encourage every participant to generate ideas – especially if your team is predominantly introverts.

      I'm curious about the quality vs. quantity trade-off here. While 108 ideas in 30 minutes sounds impressive, I wonder if this rapid-fire approach actually leads to more superficial thinking?

    2. Storyboarding is about arranging and categorizing ideas and solutions in a linear format and order. It’s best done after brainstorming to generate ideas. Gather previously brainstormed ideas and solutions on post-it notes on the wall or coloured cards on the floor or large table.

      I am curious about storyboards for two reasons. One, this is a technique I will have to use when building the prototype of my immersive college and career coaching platform. Two, it is interesting that this has to be done once there are already some other brainstormed ideas available. It seems like more of an add-on to some of the other ideas.

    3. especially if your team is predominantly introverts. Give each participant a sheet of paper and ask them to generate three ideas in five minutes. Pass all papers to the right. Ask each participant to build on his colleague’s ideas, improving them or using them as inspiration to generate another three ideas. Continue passing papers to the right until they reach their original participant.

      This very useful to me because as an introvert, I do not always come up with ideas very quickly. However, once I have something to work from, the ideas do not stop flowing. I have never done this before but I would love to try this method.

    4. Ask each participant to build on his colleague’s ideas, improving them or using them as inspiration to generate another three ideas.

      If the idea of this is to eliminate fear in sharing, wont the lack of anonymity contribute to the introverts' fear of expressing their ideas? also if this ends in a group discussion, doesn't that mean that the discussion will likely be dominated by a select few? I'm just not sure that this is a reliable method to make sure everyone is included.

    5. Mind maps are visual diagrams used to represent words, ideas, tasks or other items linked to and arranged around a central keyword or idea.

      This technique has been used in Dr. Ha's classes and is very useful. It has helped me create ideas and flows that are linked to the main idea. This structure has been the most effective for me in ideation creation.

    6. This technique can get up to 108 ideas from six participants in just 30 minutes, and it’s great if you want to encourage every participant to generate ideas – especially if your team is predominantly introverts.

      For me this many ideas seems like way too much to process and discuss. To see which ideas would be the most useful would be a long and exhausting process.

    1. Most industries have an orthodoxy – a set of deeply-held, unspoken beliefs that everyone follows when it comes to “how we do things around here.”

      This concept really resonates with me because it explains why breakthrough innovations often come from outsiders to an industry.

    2. First, create a statement that clearly defines what your creative objective is.

      Totally agree! Whether we’re brainstorming ideas or designing a course, it’s so important to fully understand the objective before moving forward.

    3. Next, randomly combine one word from each list and spend time brainstorming around the mini-story they suggest.

      I really like how this strategy pushes people to think outside the box. But I do have concerns about how efficient it is because it depends on “randomly combining” ideas.

    4. Next, randomly combine one word from each list and spend time brainstorming around the mini-story they suggest.

      I love how this strategy pushes people to think outside the box. But I do have concerns about how efficient it is because it depends on "randomly combining" ideas.

    5. Semantic intuition is a technique that can inject fresh energy into a group that is starting to feel brain dead toward the end of a brainstorming session, according to Mattimore. It prompts participants to create new ideas by having them combine several categories of key words to create a name for a new idea – even though they have no idea what the newly-named idea IS yet. The first step is to select the three categories of words that are related to your challenge. For a consumer product, Mattimore suggests that three possibilities would be places in a store, kinds of promotional appeals and benefits of the product or needs of the customer. Next, generate variations on each of these category words. Next, randomly combine one word from each list and spend time brainstorming around the mini-story they suggest. Mattimore points out there are no “rules” to using this technique. Don’t be afraid to let the keyword prompts take you far afield from them. And don’t be concerned if you generated an idea that only uses two of the three words. The point of semantic intuition is simply to get you to think differently.

      This is very useful, especially for a group project when we are ideating together. It would be playful, which helps to reengage.

    6. This technique works surprisingly well because it tends to mentally disarm brainstorming participants.

      Not sure this is very effective in a rgeular setting where people dont feel pressured. Maybe use this and ask: "Whats the worst idea for our business? What can we do to fail as soon as possible?" Then turn it around.

    7. Next, pick three of the most interesting words in the opportunity statement and generate creative alternatives for each of them. Mattimore recommends using words that represent the 5W’s and H – who, what, when, where, why and how – of your challenge. Once you have generated your three lists of alternative words, place them in a table, with the original words at the top of each column and the alternatives you have brainstormed arranged in columns below them.

      Structured and randomized. It is easy to follow and sparks surprizing and creative results. Try it out!

    8. When each “pass” takes place, Mattimore points out, the facilitator can suggest different ideation techniques or triggers.

      This is a bit curious to me because I am wondering if this can become somewhat confusing for the team members. I'd like to see this one in action to see how the switching tactics could help.

    9. Let your imagination run wild – the crazier the ideas, the better. Don’t restrict your thinking at all.

      I find this strategy useful because many times it is easy to just come up with really wacky ideas. It brings a child-like creativity to the brainstorming process.

    10. When each “pass” takes place, Mattimore points out, the facilitator can suggest different ideation techniques or triggers. This helps people who may not be able to think of any new ideas and may help them to see the ideas their colleagues have written in a new light. It also helps the team generate a wider diversity of ideas.

      I really like this because it highlights the benefit of working with a team and sharing ideas. I think that in academia (and general work environments), learning from others can be overlooked by producing original ideas, so I like the focus on growing from various ideas and diversifying team outcomes and work.

    11. For example, “How can WE SELL more insurance to CATHOLICS?” could become “How can we get FRIENDS OF CATHOLICS to BE INCENTIVIZED to sell life insurance to CATHOLIC GRANDPARENTS?”

      This technique is a little confusing and seems like its solving completely different issues that will still include another brainstorming session to figure out the new problem statement.

    12. The solution was for Fraser and his team to question every facet of their business

      I thought this comment is very useful as if you begin to question every aspect of your business, you can always search for improvement and more effective ways to do things to progress your business forward.

    1. Be sure not to put this off. Theabove is what has to be communicated

      This is a very straightforward but very accurate and effective letter. He makes all the best arguments, pointing out that the English are either openly selling drugs out of their enormous greed, or are either too unwilling or inefficient to control what their own traders are doing. Its funny how clearly it is just a kindly worded diss. He's pretty much saying that the English can either stop being evil or otherwise China will not give them awesome stuff.

    2. This is the source fromwhich your country has become known for its wealth

      Wow, basically saying the Chinese gave the British their bag. Its so funny how in documents like these people are just throwing shade and talking smack.

    3. enowned for his competencein administering fiscal matters and public works, and his skill at governance.

      Likely also a philosopher / political philosopher. China had a very unique focus on the learning and philosophy of governance. I should consult my reading from PS10, but as I recall there was a rich history of political philosophy and development. Confucius himself was a statesman. Even the Dao which advocates for a solitary life gives governing advice. That is all to say that the Chinese had a rich history of political philosophy as well as great respect for good statesmen. There was a very different, service oriented, attitude which contrasts with the European binary of government by tyrant or republic.

    Annotators

    1. The term ‘judicial review’ describes the power of courts to declare legislation or actionsof the executive in violation of the constitution.

      DEFINITION

    2. Semi-presidential systems are particularly problematicwhen, in a multi-party system, divided minority governments result, in which neither theparty of the president nor of the prime minister enjoys a majority in the legislature.

      ARGUES THAT semi-presidential systems are particularly problematic when in a multi-party system, divided minority govs are created

    3. hough the President is by far the stronger of the two offices, the Presidentand Prime Minister to some degree share executive power.

      president stronger than PM but they to some degree share executive power

    4. the South African President is actually selected by the parliament rather thanby direct election,®”

      South African president selected by the parliament rather than by direct election

    5. England has a bicameral legislature, consisting of the House of Commons andthe House of Lords.

      England= bicameral legislature, House of Commons and House of Lords

    6. Many Latin American presidents hadthe power of ‘line-item veto’,44 and greater independent authority to appoint federal andstate officials.

      Reason: Many latin american presidents had the power of line-item veto and greater independent authority to appoint federal and state officials

    7. presidency and reduced authority in the legislature and courts.

      Reason: Scholars noticed that latin american cons provided greater powers in the office of the presidency and reduced authority in the legislature and courts

    8. There are a variety of differenttheories for why this might be so, but a dominant one is the idea that when the presidentdoes not enjoy the support of a majority of the legislature

      the reasons for troubled democracy in Latin America could be that when the president does NOT enjoy the support of the majority of the legislature, it can lead to constituional breakdown

    9. Observers divide most constitutional systems into presidential (typified by the UnitedStates), parliamentary (typified by the United Kingdom), and semi-presidential (typifiedby France).

      Constitutional System Types:

      1) Presidential 2) Parliamentary 3) Semi-Presidential

    10. Accordingly, he argued that the powers of government should be divided amongdifferent persons or bodies, which would act as a check on each other.

      Montesquieu's argument

    11. despotic.

      A despotism is a government in which a single ruler governs without laws or constraints, according to their own will.

      Principle: Fear — subjects obey out of terror of punishment.

      There are no formal checks on power, and the ruler is above the law.

      Example: Absolute autocracies or tyrannies (Montesquieu often cited the Ottoman Empire as an example).

    12. monarchical,

      A monarchy is a government in which a single person (the monarch) rules, but according to fixed and established laws.

      Principle: Honor — the motivation of nobles to serve the king and maintain hierarchy.

      The monarch’s power is limited by tradition, law, or institutions (like courts or parliaments).

      Example: France under Louis XIV, or Britain under a constitutional monarchy

    13. republican (either democratic or aristocratic),

      A republic is a government in which the people (or a portion of them) hold sovereign power. It can take two main forms:

      Democratic republic: Power is held by the whole people — citizens rule directly or through elected representatives.

      Principle: Virtue (citizens’ love of equality and the common good).

      Example: Ancient Athens or modern democracies.

      Aristocratic republic: Power is held by a select group of citizens — often the nobility or elite.

      Principle: Moderation (restraint and fairness among the ruling class).

      Example: The Roman Republic, Venice.

    14. That Actalso strengthened judicial independence by requiring that judges should remain in officeduring good behavior and could only be removed by parliament.!

      The Act strengthened judicial independence by requiring that judges should remain in office during good behavior and could only be removed by parliament

    15. The English Bill of Rights Act of 1689 established someof the central principles of Britain's constitutional monarchy by declaring that ‘thepretended power of suspending the laws or the execution of laws by regal authoritywithout consent of Parliament is illegal’ and that parliamentary consent was required toraise revenue or maintain a standing army.

      English Bill of Rights Act of 1689- established -pretended power of suspending laws or execution of laws without consent of parliament was illegal, this ultimately gave parliament a lot of power

    16. The constitutional struggles between the king and parliament in England in theseventeenth century gave rise to the related, but distinct, idea of a functional separationof powers, which is the core of the modern doctrine.

      Constitutional struggle between king & parliament in England in the 17th century gave rise to the idea of a functional separation of powers

    17. the idea that dividingpower will inhibit government action and therefore tyranny; the idea that different typesof government bodies are more or less competent at certain tasks; and the idea thatcertain allocations of authority will help ensure democratic legitimacy for governmentpolicies.

      Arguments for why separation of powers is considered normatively desirable: 1) idea that dividing power will inhibit gov action and therefore tyranny 2) the idea that different types of gov bodies are more or less competent at certain tasks 3) the idea that certain allocations of authority will help ensure democratic legitimacy for government policies

    1. This... this line is chilling. Palantir, Curtis Yarvin, etc. It also ignores that Jim Crow wasn't passed by the majority public opinion. It was enacted into law by a small group of elected officials elected by the majority of the white voting public. Most people don't vote, and in many areas, the population that wasn't White was greater or equal to the White population, but still didn't have a say. The system is not actually the majority opinion. Additionally, information is still filtered by someone, with all their biases, and the biases they built into the computer

    Annotators

    1. Eachdaughter cell represents one outcome of all possible combi-nations of maternal and paternal chromosomes.

      ex. 1 pair of homologs (2 inherited chromosomes) = 2 possible daughter cell combinations. as shown by 2^n, where n= the number of homolog pairs.

    Annotators

    1. Ethical frameworks must generally be tailored to the ethicalissues and challenges at hand. Hence, although they mayappeal to similar ethical principles, there are likely to bedifferent ethical frameworks for questions related to publichealth surveillance and for individual treatment decisions

      The process of assigning or tailoring ethical principles to situations sounds complicated and almost like it needs its own selection framework...!

    2. This is another case of a moral con-flict – between the freedom to relocate and associate freelyand the need to improve the health of some of the mostvulnerable people.

      An interesting example of ethical conflict in public health that I hadn't considered - the tension between allowing free movement of people across borders and keeping skilled workers local.

    3. Whileaccess to good health may be thought to be a vitally impor-tant ethical principle, it remains unavailable to most peo-ple.

      I agree that this is true. I've only highlighted the sentence because it's both sobering and depressing to read in 2025.

    4. Research ethics committees perform the important roleof assessing the potential risks and benefits involved inresearch. In some cases, such committees may decidethat the risks of the study are not justified by the poten-tial benefits and decide not to allow the research to goahead.

      These committees act as an ethical check on upcoming research. However, who makes up these ethics committees and what safeguards are in place to prevent conflicts of interest and other issues?

    1. The monks and scholars at these monasteries used a new writing system called Carolingian Miniscule that, unlike earlier script, began using lower-case letters, spaces between words, and punctuation. This allowed them to transcribe three or four pages in a day, rather than just one. As a result, over a thousand volumes survive using the new technique, versus just fifty from the previous Merovingian system.

      This is also very interesting. I cannot believe that over a thousand volumes survive.

    1. The money used along the Silk Road included Abbasid gold dinars, Tang silver ingots, and Samanid (Bukhara) silver dirhams. Copper coins were not used in long-distance trade because they were too heavy.

      I did not know this. I guess, coins are heavy and if there is a long ride ahead, you do not want yourself or the animals to tire very fast.

    1. "House of Wisdom" where  Muhammad al-Khwarizmi published a book on mathematics now known as "Algebra"

      This is interesting. Is it the Algebra we know of today? I wonder how much it changed if it did.

    2. The foes of Charles the Fat and his uncle and predecessor, Charles the Bald, that helped end the Carolingian dynasty were the Vikings, a seafaring people originally from Scandinavia.

      It’s crazy to see that the Vikings didn’t just raid but they reshaped Europe and helped bring an end tot he powerful dynasty.

    3. By 750, the Arab rulers of the Umayyad Caliphate had spread their empire and Islam beyond the Arabian peninsula, to Iberia and North Africa in the west, the edge of the Byzantine Empire in the north, and as far ass the Indus River Valley in the east.

      It’s interesting how by 750, the Umayyad Caliphate had created one of the largest empires in history.

    1. My own personal experience of civic engagement was volunteering and helping to start an afterschool ecology program at PS 126’s urban farm in Manhattan on the Lower East Side.

      Q- why would someone decide to do something like this without knowing the possible outcome.

    2. Understanding from across communities and participation by residents across different communities, encourages more participation, raising levels of equity and inclusion.

      L- love the involvement of the community.

    3. Active participation can be measured with the help of a survey or questionnaires given to students. These tools can determine how often they get involved within their community and provide a sample of what types of activities they involve themselves in that meet the requirement of civic engagement. Multiple choice questions can be used to gauge a student’s level of active participation and there’s the option of adding open ended questions which will provide more information since a student would have to think back to any prior experience they may have had. Providing open ended questions allows a student to open up and get a closer look and understanding if they know what active participation means.  Another option could be to study a focus group of students as this can show if there are any similarities between them about how they view active participation. This could be used to get a sense of how students are thinking about active participation.

      I- found it interesting how they take note of participation

    1. The ways of your culture are familiar to you, often so deeply ingrained that they come naturally. Culture itself feels like home.

      Reading about how this paragraph describes culture reminds of the word ethnocentrism we talked about in class. In a way, ethnocentrism and culture can be distinguishable to people growing up in a isolated place.

    2. Dominant ideas about work, gender, marriage, parenting, hospitality, and status all shape the places we call home.

      Houses are built to accommodate all of human needs. All these factors make everything about our homes unique to each family that inhabits it.

    3. In Bourdieu’s analysis, the Kabyle house was divided into two realms: a dark, low realm associated with animals and natural activities (sleeping, sex, childbirth, and death) and a lighter, higher realm associated with humans and cultural activities (weaving, cooking, brides, and guests).

      This is very interesting, these ideas are very similar to how we view our living room, and bedrooms.

    4. With the loom and the hearth, the main area of human activity in the house was associated with the work of women.

      Women worked mostly in the house during this time period so it makes sense they occupied the nicest parts.

    1. An academic coach/advisor uses GenAI to draft a tailored study plan for a student struggling in STEM courses. Then, the coach reviews and edits the plan to ensure fit. They also two strategies appropriate for the student, which GenAI missed, and have worked well for other STEM students.

      I think last sentence is missing words or was edited but no longer makes sense

    1. Who it’s for (students, faculty, staff) What the task is (announcement, summary, email, syllabus)

      Learning flow could benefit from switching these. To me it seems more natural to say who am I , what ma I building, who am I building it for and reading that in this section seems natural to read it that way to reinforce that order of operations.

      I typically would say, I am < insert who I am here > building a < insert task here > for < insert audience here >

    1. You have free access to Microsoft Copilot through the university's Microsoft 365 subscription

      Link or Resources section to take users to these resources would be good!

    1. eLife Assessment

      The authors present a set of wrappers around previously developed software and machine-learning toolkits, and demonstrate their use in identifying endogenous sterols binding to a GPCR. The resulting pipeline is potentially useful for molecular pharmacology researchers due to its accessibility and ease of use. However, the evidence supporting the GPCR-related findings remains incomplete, as the machine-learning model shows indications of overfitting, and no direct ligand-binding assays are provided for validation.

    2. Reviewer #1 (Public review):

      This is a re-review following an author revision. I will go point-by-point in response to my original critiques and the authors' responses. I appreciate the authors taking the time to thoughtfully respond to the reviewer critiques.

      Query 1. Based on the authors' description of their contribution to the algorithm design, it sounds like a hyperparameter search wrapped around existing software tools. I think that the use of their own language to describe these modules is confusing to potential users as well as unintentionally hides the contributions of the original LigBuilder developers. The authors should just explain the protocol plainly using language that refers specifically to the established software tools. Whether they use LigBuilder or something else, at the end of the day the description is a protocol for a specific use of an existing software rather than the creation of a new toolkit.

      Query 2. I see. Correct me if I am mistaken, but it seems as though the authors are proposing using the Authenticator to identify the best distributions of compounds based on an in silico oracle (in this case, Vina score), and train to discriminate them. This is similar to training QSAR models to predict docking scores, such as in the manuscript I shared during the first round of review. In principle, one could perform this in successive rounds to create molecules that are increasingly composed of features that yield higher docking scores. This is an established idea that the authors demonstrate in a narrow context, but it also raises concern that one is just enriching for compounds with e.g., an abundance of hydrogen bond donors and acceptors. Regarding points (4) and (5), it is unclear to me how the authors perform train/test splits on unlabeled data with supervised machine learning approaches in this setting. This seems akin to a Y-scramble sanity check. Finally, regarding the discussion on the use of experimental data or FEP calculations for the determination of HABs and LABs, I appreciate the authors' point; however, the concern here is that in the absence of any true oracle the models will just learn to identify and/or generate compounds that exploit limitations of docking scores. Again, please correct me if I am mistaken. It is unclear to me how this advances previous literature in CADD outside of the specific context of incorporating some ideas into a GPCR-Gprotein framework.

      Query 3. The authors mention that the hyperparameters for the ML models are just the package defaults in the absence of specification by the user. I would be helpful to know specifically what the the hyperparameters were for the benchmarks in this study; however, I think a deeper concern is still that these models are almost certainly far overparameterized given the limited training data used for the models. It is unclear why the authors did not just build a random forest classifier to discriminate their HABs and LABs using ligand- or protein-ligand interaction fingerprints or related ideas.

      Query 4. It is good, and expected, that increasing the fraction of the training set size in a random split validation all the way to 100% would allow the model to perfectly discriminate HABs and LABs. This does not demonstrate that the model has significant enrichment in prospective screening, particularly compared to simpler methods. The concern remains that these models are overparameterized and insufficiently validated. The authors did not perform any scaffold splits or other out-of-distribution analysis.

      Query 5. The authors contend that Gcoupler uniquely enables training models when data is scarce and ultra-large screening libraries are unavailable. Today, it is rather straightforward to dock a minimum of thousands of compounds. Using tools such as QuickVina2-GPU (https://pubs.acs.org/doi/10.1021/acs.jcim.2c01504), it is possible to quite readily dock millions in a day with a single GPU and obtain the AutoDock Vina score. GPU-acclerated Vina has been combined with cavity detection tools likely multiple times, including here (https://arxiv.org/abs/2506.20043). There are multiple cavity detection tools, including the ones the authors use in their protocol.

      Query 6. The authors contend that the simulations are converged, but they elected not to demonstrate stability in the predicting MM/GBSA binding energies with block averaging across the trajectory. This could have been done through the existing trajectories without additional simulation.

    3. Reviewer #1 (Public review):

      This is a re-review following an author revision. I will go point-by-point in response to my original critiques and the authors' responses. I appreciate the authors taking the time to thoughtfully respond to the reviewer critiques.

      Query 1. Based on the authors' description of their contribution to the algorithm design, it sounds like a hyperparameter search wrapped around existing software tools. I think that the use of their own language to describe these modules is confusing to potential users as well as unintentionally hides the contributions of the original LigBuilder developers. The authors should just explain the protocol plainly using language that refers specifically to the established software tools. Whether they use LigBuilder or something else, at the end of the day the description is a protocol for a specific use of an existing software rather than the creation of a new toolkit.

      Query 2. I see. Correct me if I am mistaken, but it seems as though the authors are proposing using the Authenticator to identify the best distributions of compounds based on an in silico oracle (in this case, Vina score), and train to discriminate them. This is similar to training QSAR models to predict docking scores, such as in the manuscript I shared during the first round of review. In principle, one could perform this in successive rounds to create molecules that are increasingly composed of features that yield higher docking scores. This is an established idea that the authors demonstrate in a narrow context, but it also raises concern that one is just enriching for compounds with e.g., an abundance of hydrogen bond donors and acceptors. Regarding points (4) and (5), it is unclear to me how the authors perform train/test splits on unlabeled data with supervised machine learning approaches in this setting. This seems akin to a Y-scramble sanity check. Finally, regarding the discussion on the use of experimental data or FEP calculations for the determination of HABs and LABs, I appreciate the authors' point; however, the concern here is that in the absence of any true oracle the models will just learn to identify and/or generate compounds that exploit limitations of docking scores. Again, please correct me if I am mistaken. It is unclear to me how this advances previous literature in CADD outside of the specific context of incorporating some ideas into a GPCR-Gprotein framework.

      Query 3. The authors mention that the hyperparameters for the ML models are just the package defaults in the absence of specification by the user. I would be helpful to know specifically what the the hyperparameters were for the benchmarks in this study; however, I think a deeper concern is still that these models are almost certainly far overparameterized given the limited training data used for the models. It is unclear why the authors did not just build a random forest classifier to discriminate their HABs and LABs using ligand- or protein-ligand interaction fingerprints or related ideas.

      Query 4. It is good, and expected, that increasing the fraction of the training set size in a random split validation all the way to 100% would allow the model to perfectly discriminate HABs and LABs. This does not demonstrate that the model has significant enrichment in prospective screening, particularly compared to simpler methods. The concern remains that these models are overparameterized and insufficiently validated. The authors did not perform any scaffold splits or other out-of-distribution analysis.

      Query 5. The authors contend that Gcoupler uniquely enables training models when data is scarce and ultra-large screening libraries are unavailable. Today, it is rather straightforward to dock a minimum of thousands of compounds. Using tools such as QuickVina2-GPU (https://pubs.acs.org/doi/10.1021/acs.jcim.2c01504), it is possible to quite readily dock millions in a day with a single GPU and obtain the AutoDock Vina score. GPU-acclerated Vina has been combined with cavity detection tools likely multiple times, including here (https://arxiv.org/abs/2506.20043). There are multiple cavity detection tools, including the ones the authors use in their protocol.

      Query 6. The authors contend that the simulations are converged, but they elected not to demonstrate stability in the predicting MM/GBSA binding energies with block averaging across the trajectory. This could have been done through the existing trajectories without additional simulation.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Query: In this manuscript, the authors introduce Gcoupler, a Python-based computational pipeline designed to identify endogenous intracellular metabolites that function as allosteric modulators at the G protein-coupled receptor (GPCR) - Gα protein interface. Gcoupler is comprised of four modules:

      I. Synthesizer - identifies protein cavities and generates synthetic ligands using LigBuilder3

      II. Authenticator - classifies ligands into high-affinity binders (HABs) and low-affinity binders (LABs) based on AutoDock Vina binding energies

      III. Generator - trains graph neural network (GNN) models (GCM, GCN, AFP, GAT) to predict binding affinity using synthetic ligands

      IV. BioRanker - prioritizes ligands based on statistical and bioactivity data

      The authors apply Gcoupler to study the Ste2p-Gpa1p interface in yeast, identifying sterols such as zymosterol (ZST) and lanosterol (LST) as modulators of GPCR signaling. Our review will focus on the computational aspects of the work. Overall, we found the Gcoupler approach interesting and potentially valuable, but we have several concerns with the methods and validation that need to be addressed prior to publication/dissemination.

      We express our gratitude to Reviewer #1 for their concise summary and commendation of our work. We sincerely apologize for the lack of sufficient detail in summarizing the underlying methods employed in Gcoupler, as well as its subsequent experimental validations using yeast, human cell lines, and primary rat cardiomyocyte-based assays.

      We wish to state that substantial improvements have been made in the revised manuscript, every section has been elaborated upon to enhance clarity. Please refer to the point-by-point response below and the revised manuscript.

      Query: (1) The exact algorithmic advancement of the Synthesizer beyond being some type of application wrapper around LigBuilder is unclear. Is the grow-link approach mentioned in the methods already a component of LigBuilder, or is it custom? If it is custom, what does it do? Is the API for custom optimization routines new with the Synthesizer, or is this a component of LigBuilder? Is the genetic algorithm novel or already an existing software implementation? Is the cavity detection tool a component of LigBuilder or novel in some way? Is the fragment library utilized in the Synthesizer the default fragment library in LigBuilder, or has it been customized? Are there rules that dictate how molecule growth can occur? The scientific contribution of the Synthesizer is unclear. If there has not been any new methodological development, then it may be more appropriate to just refer to this part of the algorithm as an application layer for LigBuilder.

      We appreciate Reviewer #1's constructive suggestion. We wish to emphasize that

      (1) The LigBuilder software comprises various modules designed for distinct functions. The Synthesizer in Gcoupler strategically utilizes two of these modules: "CAVITY" for binding site detection and "BUILD" for de novo ligand design.

      (2) While both modules are integral to LigBuilder, the Synthesizer plays a crucial role in enabling their targeted, automated, and context-aware application for GPCR drug discovery.

      (3) The CAVITY module is a structure-based protein binding site detection program, which the Synthesizer employs for identifying ligand binding sites on the protein surface.

      (4) The Synthesizer also leverages the BUILD module for constructing molecules tailored to the target protein, implementing a fragment-based design strategy using its integrated fragment library.

      (5) The GROW and LINK methods represent two independent approaches encompassed within the aforementioned BUILD module.

      Author response image 1.

      Schematic representation of the key strategy used in the Synthesizer module of Gcoupler.

      Our manuscript details the "grow-link" hybrid approach, which was implemented using a genetic algorithm through the following stages:

      (1) Initial population generation based on a seed structure via the GROW method.

      (2) Selection of "parent" molecules from the current population for inclusion in the mating pool using the LINK method.

      (3) Transfer of "elite" molecules from the current population to the new population.

      (4) Population expansion through structural manipulations (mutation, deletion, and crossover) applied to molecules within the mating pool.

      Please note, the outcome of this process is not fixed, as it is highly dependent on the target cavity topology and the constraint parameters employed for population evaluation. Synthesizer customizes generational cycles and optimization parameters based on cavity-specific constraints, with the objective of either generating a specified number of compounds or comprehensively exploring chemical diversity against a given cavity topology.

      While these components are integral to LigBuilder, Synthesizer's innovation lies

      (1) in its programmatic integration and dynamic adjustment of these modules.

      (2) Synthesizer distinguishes itself not by reinventing these algorithms, but by their automated coordination, fine-tuning, and integration within a cavity-specific framework.

      (3) It dynamically modifies generation parameters according to cavity topology and druggability constraints, a capability not inherently supported by LigBuilder.

      (4) This renders Synthesizer particularly valuable in practical scenarios where manual optimization is either inefficient or impractical.

      In summary, Synthesizer offers researchers a streamlined interface, abstracting the technical complexities of LigBuilder and thereby enabling more accessible and reproducible ligand generation pipelines, especially for individuals with limited experience in structural or cheminformatics tools.

      Query: (2) The use of AutoDock Vina binding energy scores to classify ligands into HABs and LABs is problematic. AutoDock Vina's energy function is primarily tuned for pose prediction and displays highly system-dependent affinity ranking capabilities. Moreover, the HAB/LAB thresholds of -7 kcal/mol or -8 kcal/mol lack justification. Were these arbitrarily selected cutoffs, or was benchmarking performed to identify appropriate cutoffs? It seems like these thresholds should be determined by calibrating the docking scores with experimental binding data (e.g., known binders with measured affinities) or through re-scoring molecules with a rigorous alchemical free energy approach.

      We again express our gratitude to Reviewer #1 for these inquiries. We sincerely apologize for the lack of sufficient detail in the original version of the manuscript. In the revised manuscript, we have ensured the inclusion of a detailed rationale for every threshold utilized to prioritize high-affinity binders. Please refer to the comprehensive explanation below, as well as the revised manuscript, for further details.

      We would like to clarify that:

      (1) The Authenticator module is not solely reliant on absolute binding energy values for classification. Instead, it calculates binding energies for all generated compounds and applies a statistical decision-making layer to define HAB and LAB classes.

      (2) Rather than using fixed thresholds, the module employs distribution-based methods, such as the Empirical Cumulative Distribution Function (ECDF), to assess the overall energy landscape of the compound set. We then applied multiple statistical tests to evaluate the HAB and LAB distributions and determine an optimal, data-specific cutoff that balances class sizes and minimizes overlap.

      (3) This adaptive approach avoids rigid thresholds and instead ensures context-sensitive classification, with safeguards in place to maintain adequate representation of both classes for downstream model training, and in this way, the framework prioritizes robust statistical reasoning over arbitrary energy cutoffs and aims to reduce the risks associated with direct reliance on Vina scores alone.

      (4) To assess the necessity and effectiveness of the Authenticator module, we conducted a benchmarking analysis where we deliberately omitted the HAB and LAB class labels, treating the compound pool as a heterogeneous, unlabeled dataset. We then performed random train-test splits using the Synthesizer-generated compounds and trained independent models.

      (5) The results from this approach demonstrated notably poorer model performance, indicating that arbitrary or unstructured data partitioning does not effectively capture the underlying affinity patterns. These experiments highlight the importance of using the statistical framework within the Authenticator module to establish meaningful, data-driven thresholds for distinguishing High- and Low-Affinity Binders. The cutoff values are thus not arbitrary but emerge from a systematic benchmarking and validation process tailored to each dataset.

      Please note: While calibrating docking scores with experimental binding affinities or using rigorous methods like alchemical free energy calculations can improve precision, these approaches are often computationally intensive and reliant on the availability of high-quality experimental data, a major limitation in many real-world screening scenarios.

      In summary, the primary goal of Gcoupler is to enable fast, scalable, and broadly accessible screening, particularly for cases where experimental data is sparse or unavailable. Incorporating such resource-heavy methods would not only significantly increase computational overhead but also undermine the framework’s intended usability and efficiency for large-scale applications. Instead, our workflow relies on statistically robust, data-driven classification methods that balance speed, generalizability, and practical feasibility.

      Query: (3) Neither the Results nor Methods sections provide information on how the GNNs were trained in this study. Details such as node features, edge attributes, standardization, pooling, activation functions, layers, dropout, etc., should all be described in detail. The training protocol should also be described, including loss functions, independent monitoring and early stopping criteria, learning rate adjustments, etc.

      We again thank Reviewer #1 for this suggestion. We would like to mention that in the revised manuscript, we have added all the requested details. Please refer to the points below for more information.

      (1) The Generator module of Gcoupler is designed as a flexible and automated framework that leverages multiple Graph Neural Network architectures, including Graph Convolutional Model (GCM), Graph Convolutional Network (GCN), Attentive FP, and Graph Attention Network (GAT), to build classification models based on the synthetic ligand datasets produced earlier in the pipeline.

      (2) By default, Generator tests all four models using standard hyperparameters provided by the DeepChem framework (https://deepchem.io/), offering a baseline performance comparison across architectures. This includes pre-defined choices for node features, edge attributes, message-passing layers, pooling strategies, activation functions, and dropout values, ensuring reproducibility and consistency. All models are trained with binary cross-entropy loss and support default settings for early stopping, learning rate, and batch standardization where applicable.

      (3) In addition, Generator supports model refinement through hyperparameter tuning and k-fold cross-validation (default: 3 folds). Users can either customize the hyperparameter grid or rely on Generator’s recommended parameter ranges to optimize model performance. This allows for robust model selection and stability assessment of tuned parameters.

      (4) Finally, the trained models can be used to predict binding probabilities for user-supplied compounds, making it a comprehensive and user-adaptive tool for ligand screening.

      Based on the reviewer #1 suggestion, we have now added a detailed description about the Generator module of Gcoupler, and also provided relevant citations regarding the DeepChem workflow.

      Query: (4) GNN model training seems to occur on at most 500 molecules per training run? This is unclear from the manuscript. That is a very small number of training samples if true. Please clarify. How was upsampling performed? What were the HAB/LAB class distributions? In addition, it seems as though only synthetically generated molecules are used for training, and the task is to discriminate synthetic molecules based on their docking scores. Synthetic ligands generated by LigBuilder may occupy distinct chemical space, making classification trivial, particularly in the setting of a random split k-folds validation approach. In the absence of a leave-class-out validation, it is unclear if the model learns generalizable features or exploits clear chemical differences. Historically, it was inappropriate to evaluate ligand-based QSAR models on synthetic decoys such as the DUD-E sets - synthetic ligands can be much more easily distinguished by heavily parameterized ligand-based machine learning models than by physically constrained single-point docking score functions.

      We thank reviewer #1 for these detailed technical queries. We would like to clarify that:

      (1) The recommended minimum for the training set is 500 molecules, but users can add as many synthesized compounds as needed to thoroughly explore the chemical space related to the target cavity.

      (2) Our systematic evaluation demonstrated that expanding the training set size consistently enhanced model performance, especially when compared to AutoDock docking scores. This observation underscores the framework's scalability and its ability to improve predictive accuracy with more training compounds.

      (3) The Authenticator module initially categorizes all synthesized molecules into HAB and LAB classes. These labeled molecules are then utilized for training the Generator module. To tackle class imbalance, the class with fewer data points undergoes upsampling. This process aims to achieve an approximate 1:1 ratio between the two classes, thereby ensuring balanced learning during GNN model training.

      (4) The Authenticator module's affinity scores are the primary determinant of the HAB/LAB class distribution, with a higher cutoff for HABs ensuring statistically significant class separation. This distribution is also indirectly shaped by the target cavity's topology and druggability, as the Synthesizer tends to produce more potent candidates for cavities with favorable binding characteristics.

      (5) While it's true that synthetic ligands may occupy distinct chemical space, our benchmarking exploration for different sites on the same receptor still showed inter-cavity specificity along with intra-cavity diversity of the synthesized molecules.

      (6) The utility of random k-fold validation shouldn't be dismissed outright; it provides a reasonable estimate of performance under practical settings where class boundaries are often unknown. Nonetheless, we agree that complementary validation strategies like leave-class-out could further strengthen the robustness assessment.

      (7) We agree that using synthetic decoys like those from the DUD-E dataset can introduce bias in ligand-based QSAR model evaluations if not handled carefully. In our workflow, the inclusion of DUD-E compounds is entirely optional and only considered as a fallback, specifically in scenarios where the number of low-affinity binders (LABs) synthesized by the Synthesizer module is insufficient to proceed with model training.

      (8) The primary approach relies on classifying generated compounds based on their derived affinity scores via the Authenticator module. However, in rare cases where this results in a heavily imbalanced dataset, DUD-E compounds are introduced not as part of the core benchmarking, but solely to maintain minimal class balance for initial model training. Even then, care is taken to interpret results with this limitation in mind. Ultimately, our framework is designed to prioritize data-driven generation of both HABs and LABs, minimizing reliance on synthetic decoys wherever possible.

      Author response image 2.

      Scatter plots depicting the segregation of High/Low-Affinity Metabolites (HAM/LAM) (indicated in green and red) identified using Gcoupler workflow with 100% training data. Notably, models trained on lesser training data size (25%, 50%, and 75% of HAB/LAB) severely failed to segregate HAM and LAM (along Y-axis). X-axis represents the binding affinity calculated using IC4-specific docking using AutoDock.

      Based on the reviewer #1’s suggestion, we have now added all these technical details in the revised version of the manuscript.

      Query: (5) Training QSAR models on docking scores to accelerate virtual screening is not in itself novel (see here for a nice recent example: https://www.nature.com/articles/s43588-025-00777-x), but can be highly useful to focus structure-based analysis on the most promising areas of ligand chemical space; however, we are perplexed by the motivation here. If only a few hundred or a few thousand molecules are being sampled, why not just use AutoDock Vina? The models are trained to try to discriminate molecules by AutoDock Vina score rather than experimental affinity, so it seems like we would ideally just run Vina? Perhaps we are misunderstanding the scale of the screening that was done here. Please clarify the manuscript methods to help justify the approach.

      We acknowledge the effectiveness of training QSAR models on docking scores for prioritizing chemical space, as demonstrated by the referenced study (https://www.nature.com/articles/s43588-025-00777-x) on machine-learning-guided docking screen frameworks.

      We would like to mention that:

      (1) While such protocols often rely on extensive pre-docked datasets across numerous protein targets or utilize a highly skewed input distribution, training on as little as 1-10% of ligand-protein complexes and testing on the remainder in iterative cycles.

      (2) While powerful for ultra-large libraries, this approach can introduce bias towards the limited training set and incur significant overhead in data curation, pre-computation, and infrastructure.

      (3) In contrast, Gcoupler prioritizes flexibility and accessibility, especially when experimental data is scarce and large pre-docked libraries are unavailable. Instead of depending on fixed docking scores from external pipelines, Gcoupler integrates target-specific cavity detection, de novo compound generation, and model training into a self-contained, end-to-end framework. Its QSAR models are trained directly on contextually relevant compounds synthesized for a given binding site, employing a statistical classification strategy that avoids arbitrary thresholds or precomputed biases.

      (4) Furthermore, Gcoupler is open-source, lightweight, and user-friendly, making it easily deployable without the need for extensive infrastructure or prior docking expertise. While not a complete replacement for full-scale docking in all use cases, Gcoupler aims to provide a streamlined and interpretable screening framework that supports both focused chemical design and broader chemical space exploration, without the computational burden associated with deep learning docking workflows.

      (5) Practically, even with computational resources, manually running AutoDock Vina on millions of compounds presents challenges such as format conversion, binding site annotation, grid parameter tuning, and execution logistics, all typically requiring advanced structural bioinformatics expertise.

      (6) Gcoupler's Authenticator module, however, streamlines this process. Users only need to input a list of SMILES and a receptor PDB structure, and the module automatically handles compound preparation, cavity mapping, parameter optimization, and high-throughput scoring. This automation reduces time and effort while democratizing access to structure-based screening workflows for users without specialized expertise.

      Ultimately, Gcoupler's motivation is to make large-scale, structure-informed virtual screening both efficient and accessible. The model serves as a surrogate to filter and prioritize compounds before deeper docking or experimental validation, thereby accelerating targeted drug discovery.

      Query: (6) The brevity of the MD simulations raises some concerns that the results may be over-interpreted. RMSD plots do not reliably compare the affinity behavior in this context because of the short timescales coupled with the dramatic topological differences between the ligands being compared; CoQ6 is long and highly flexible compared to ZST and LST. Convergence metrics, such as block averaging and time-dependent MM/GBSA energies, should be included over much longer timescales. For CoQ6, the authors may need to run multiple simulations of several microseconds, identify the longest-lived metastable states of CoQ6, and perform MM/GBSA energies for each state weighted by each state's probability.

      We appreciate Reviewer #1's suggestion regarding simulation length, as it is indeed crucial for interpreting molecular dynamics (MD) outcomes. We would like to mention that:

      (1) Our simulation strategy varied based on the analysis objective, ranging from short (~5 ns) runs for preliminary or receptor-only evaluations to intermediate (~100 ns) and extended (~550 ns) runs for receptor-ligand complex validation and stability assessment.

      (2) Specifically, we conducted three independent 100 ns MD simulations for each receptor-metabolite complex in distinct cavities of interest. This allowed us to assess the reproducibility and persistence of binding interactions. To further support these observations, a longer 550 ns simulation was performed for the IC4 cavity, which reinforced the 100 ns findings by demonstrating sustained interaction stability over extended timescales.

      (3) While we acknowledge that even longer simulations (e.g., in the microsecond range) could provide deeper insights into metastable state transitions, especially for highly flexible molecules like CoQ6, our current design balances computational feasibility with the goal of screening multiple cavities and ligands.

      (4) In our current workflow, MM/GBSA binding free energies were calculated by extracting 1000 representative snapshots from the final 10 ns of each MD trajectory. These configurations were used to compute time-averaged binding energies, incorporating contributions from van der Waals, electrostatic, polar, and non-polar solvation terms. This approach offers a more reliable estimate of ligand binding affinity compared to single-point molecular docking, as it accounts for conformational flexibility and dynamic interactions within the binding cavity.

      (5) Although we did not explicitly perform state-specific MM/GBSA calculations weighted by metastable state probabilities, our use of ensemble-averaged energy estimates from a thermally equilibrated segment of the trajectory captures many of the same benefits. We acknowledge, however, that a more rigorous decomposition based on metastable state analysis could offer finer resolution of binding behavior, particularly for highly flexible ligands like CoQ6, and we consider this a valuable direction for future refinement of the framework.

      Reviewer #2 (Public review):

      Summary:

      Query: Mohanty et al. present a new deep learning method to identify intracellular allosteric modulators of GPCRs. This is an interesting field for e.g. the design of novel small molecule inhibitors of GPCR signalling. A key limitation, as mentioned by the authors, is the limited availability of data. The method presented, Gcoupler, aims to overcome these limitations, as shown by experimental validation of sterols in the inhibition of Ste2p, which has been shown to be relevant molecules in human and rat cardiac hypertrophy models. They have made their code available for download and installation, which can easily be followed to set up software on a local machine.

      Strengths:

      Clear GitHub repository

      Extensive data on yeast systems

      We sincerely thank Reviewer #2 for their thorough review, summary, and appreciation of our work. We highly value their comments and suggestions.

      Weaknesses:

      Query: No assay to directly determine the affinity of the compounds to the protein of interest.

      We thank Reviewer #2 for raising these insightful questions. During the experimental design phase, we carefully accounted for validating the impact of metabolites in the rescue response by pheromone.

      We would like to mention that we performed an array of methods to validate our hypothesis and observed similar rescue effects. These assays include:

      a. Cell viability assay (FDA/PI Flourometry-based)

      b. Cell growth assay

      c. FUN1<sup>TM</sup>-based microscopy assessment

      d. Shmoo formation assays

      e. Mating assays

      f. Site-directed mutagenesis-based loss of function

      g. ransgenic reporter-based assay

      h. MAPK signaling assessment using Western blot.

      i. And via computational techniques.

      Concerning the in vitro interaction studies of Ste2p and metabolites, we made significant efforts to purify Ste2p by incorporating a His tag at the N-terminal. Despite dedicated attempts over the past year, we were unsuccessful in purifying the protein, primarily due to our limited expertise in protein purification for this specific system. As a result, we opted for genetic-based interventions (e.g., point mutants), which provide a more physiological and comprehensive approach to demonstrating the interaction between Ste2p and the metabolites.

      Author response image 3.

      (a) Affinity purification of Ste2p from Saccharomyces cerevisiae. Western blot analysis using anti-His antibody showing the distribution of Ste2p in various fractions during the affinity purification process. The fractions include pellet, supernatant, wash buffer, and sequential elution fractions (1–4). Wild-type and ste2Δ strains served as positive and negative controls, respectively. (b) Optimization of Ste2p extraction protocol. Ponceau staining (left) and Western blot analysis using anti-His antibody (right) showing Ste2p extraction efficiency. The conditions tested include lysis buffers containing different concentrations of CHAPS detergent (0.5%, 1%) and glycerol (10%, 20%).

      Furthermore, in addition to the clarification above, we have added the following statement in the discussion section to tone down our claims: “A critical limitation of our study is the absence of direct binding assays to validate the interaction between the metabolites and Ste2p. While our results from genetic interventions, molecular dynamics simulations, and docking studies strongly suggest that the metabolites interact with the Ste2p-Gpa1 interface, these findings remain indirect. Direct binding confirmation through techniques such as surface plasmon resonance, isothermal titration calorimetry, or co-crystallization would provide definitive evidence of this interaction. Addressing this limitation in future work would significantly strengthen our conclusions and provide deeper insights into the precise molecular mechanisms underlying the observed phenotypic effects.”

      We request Reviewer #2 to kindly refer to the assays conducted on the point mutants created in this study, as these experiments offer robust evidence supporting our claims.

      Query: In conclusion, the authors present an interesting new method to identify allosteric inhibitors of GPCRs, which can easily be employed by research labs. Whilst their efforts to characterize the compounds in yeast cells, in order to confirm their findings, it would be beneficial if the authors show their compounds are active in a simple binding assay.

      We express our gratitude and sincere appreciation for the time and effort dedicated by Reviewer #2 in reviewing our manuscript. We are confident that our clarifications address the reviewer's concerns.

      Reviewer #3 (Public review):

      Summary:

      Query: In this paper, the authors introduce the Gcoupler software, an open-source deep learning-based platform for structure-guided discovery of ligands targeting GPCR interfaces. Overall, this manuscript represents a field-advancing contribution at the intersection of AI-based ligand discovery and GPCR signaling regulation.

      Strengths:

      The paper presents a comprehensive and well-structured workflow combining cavity identification, de novo ligand generation, statistical validation, and graph neural network-based classification. Notably, the authors use Gcoupler to identify endogenous intracellular sterols as allosteric modulators of the GPCR-Gα interface in yeast, with experimental validations extending to mammalian systems. The ability to systematically explore intracellular metabolite modulation of GPCR signaling represents a novel and impactful contribution. This study significantly advances the field of GPCR biology and computational ligand discovery.

      We thank and appreciate Reviewer #3 for vesting time and efforts in reviewing our manuscript and for appreciating our efforts.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We encourage the authors to address the points raised during revision to elevate the assessment from "incomplete" to "solid" or ideally "convincing." In particular, we ask the authors to improve the justification for their methodological choices and to provide greater detail and clarity regarding each computational layer of the pipeline.

      We are grateful for the editors' suggestions. We have incorporated significant revisions into the manuscript, providing comprehensive technical details to prevent any misunderstandings. Furthermore, we meticulously explained every aspect of the computational workflow.

      Reviewer #2 (Recommendations for the authors):

      Query: Would it be possible to make the package itself pip installable?

      Yes, it already exists under the testpip repository and we have now migrated it to the main pip. Please access the link from here: https://pypi.org/project/gcoupler/

      Query: I am confused by the binding free energies reported in Supplementary Figure 8. Is the total DG reported that of the protein-ligand complex? If that is the case, the affinities of the ligands would be extremely high. They are also very far off from the reported -7 kcal/mol active/inactive cut-off.

      We thank Reviewer #2 for this query. We would like to mention that we have provided a detailed explanation in the point-by-point response to Reviewer #2's original comment. Briefly, to clarify, the -7 kcal/mol active/inactive cutoff mentioned in the manuscript refers specifically to the docking-based binding free energies (ΔG) calculated using AutoDock or AutoDock Vina, which are used for compound classification or validation against the Gcoupler framework.

      In contrast, the binding free energies reported in Supplementary Figure 8 are obtained through the MM-GBSA method, which provides a more detailed and physics-based estimate of binding affinity by incorporating solvation and enthalpic contributions. It is well-documented in the literature that MM-GBSA tends to systematically underestimate absolute binding free energies when compared to experimental values (10.2174/1568026616666161117112604; Table 1).

      Author response image 4.

      Scatter plot comparing the predicted binding affinity calculated by Docking and MM/GBSA methods, against experimental ΔG (10.1007/s10822-023-00499-0)

      Our use of MM-GBSA is not to match experimental ΔG directly, but rather to assess relative binding preferences among ligands. Despite its limitations in predicting absolute affinities, MM-GBSA is known to perform better than docking for ranking compounds by their binding potential. In this context, an MM-GBSA energy value still reliably indicates stronger predicted binding, even if the numerical values appear extremely higher than typical experimental or docking-derived cutoffs.

      Thus, the two energy values, docking-based and MM-GBSA, serve different purposes in our workflow. Docking scores are used for classification and thresholding, while MM-GBSA energies provide post hoc validation and a higher-resolution comparison of binding strength across compounds.

      To corroborate their findings, can the authors include direct binding affinity assays for yeast and human Ste2p? This will help in establishing whether the observed phenotypic effects are indeed driven by binding of the metabolites.

      We thank Reviewer #2 for raising these insightful questions. During the experimental design phase, we carefully accounted for validating the impact of metabolites in the rescue response by pheromone.

      We would like to mention that we performed an array of methods to validate our hypothesis and observed similar rescue effects. These assays include:

      a. Cell viability assay (FDA/PI Flourometry- based)

      b. Cell growth assay

      c. FUN1<sup>TM</sup>-based microscopy assessment

      d. Shmoo formation assays

      e. Mating assays

      f. Site-directed mutagenesis-based loss of function

      g. Transgenic reporter-based assay

      h. MAPK signaling assessment using Western blot.

      i. And via computational techniques.

      Concerning the in vitro interaction studies of Ste2p and metabolites, we made significant efforts to purify Ste2p by incorporating a His tag at the N-terminal. Despite dedicated attempts over the past year, we were unsuccessful in purifying the protein, primarily due to our limited expertise in protein purification for this specific system. As a result, we opted for genetic-based interventions (e.g., point mutants), which provide a more physiological and comprehensive approach to demonstrating the interaction between Ste2p and the metabolites.

      Furthermore, in addition to the clarification above, we have added the following statement in the discussion section to tone down our claims: “A critical limitation of our study is the absence of direct binding assays to validate the interaction between the metabolites and Ste2p. While our results from genetic interventions, molecular dynamics simulations, and docking studies strongly suggest that the metabolites interact with the Ste2p-Gpa1 interface, these findings remain indirect. Direct binding confirmation through techniques such as surface plasmon resonance, isothermal titration calorimetry, or co-crystallization would provide definitive evidence of this interaction. Addressing this limitation in future work would significantly strengthen our conclusions and provide deeper insights into the precise molecular mechanisms underlying the observed phenotypic effects.”

      We request Reviewer #2 to kindly refer to the assays conducted on the point mutants created in this study, as these experiments offer robust evidence supporting our claims.

      Did the authors perform expression assays to make sure the mutant proteins were similarly expressed to wt?

      We thank reviewer #2 for this comment. We would like to mention that:

      (1) In our mutants (S75A, T155D, L289K)-based assays, all mutants were generated using integration at the same chromosomal TRP1 locus under the GAL1 promoter and share the same C-terminal CYC1 terminator sequence used for the reconstituted wild-type (rtWT) construct, thus reducing the likelihood of strain-specific expression differences.

      (2) Furthermore, all strains were grown under identical conditions using the same media, temperature, and shaking parameters. Each construct underwent the same GAL1 induction protocol in YPGR medium for identical durations, ensuring uniform transcriptional activation across all strains and minimizing culture-dependent variability in protein expression.

      (3) Importantly, both the rtWT and two of the mutants (T155D, L289K) retained α-factor-induced cell death (PI and FUN1-based fluorometry and microscopy; Figure 4c-d) and MAPK activation (western blot; Figure 4e), demonstrating that the mutant proteins are expressed at levels sufficient to support signalling.

      Reviewer #3 (Recommendations for the authors):

      My comments that would enhance the impact of this method are:

      (1) While the authors have compared the accuracy and efficiency of Gcoupler to AutoDock Vina, one of the main points of Gcoupler is the neural network module. It would be beneficial to have it evaluated against other available deep learning ligand generative modules, such as the following: 10.1186/s13321-024-00829-w, 10.1039/D1SC04444C.

      Thank you for the observation. To clarify, our benchmarking of Gcoupler’s accuracy and efficiency was performed against AutoDock, not AutoDock Vina. This choice was intentional, as AutoDock is one of the most widely used classical techniques in computer-aided drug design (CADD) for obtaining high-resolution predictions of ligand binding energy, binding poses, and detailed atomic-level interactions with receptor residues. In contrast, AutoDock Vina is primarily optimized for large-scale virtual screening, offering faster results but typically with lower resolution and limited configurational detail.

      Since Gcoupler is designed to balance accuracy with computational efficiency in structure-based screening, AutoDock served as a more appropriate reference point for evaluating its predictions.

      We agree that benchmarking against other deep learning-based ligand generative tools is important for contextualizing Gcoupler’s capabilities. However, it's worth noting that only a few existing methods focus specifically on cavity- or pocket-driven de novo drug design using generative AI, and among them, most are either partially closed-source or limited in functionality.

      While PocketCrafter (10.1186/s13321-024-00829-w) offers a structure-based generative framework, it differs from Gcoupler in several key respects. PocketCrafter requires proprietary preprocessing tools, such as the MOE QuickPrep module, to prepare protein pocket structures, limiting its accessibility and reproducibility. In addition, PocketCrafter’s pipeline stops at the generation of cavity-linked compounds and does not support any further learning from the generated data.

      Similarly, DeepLigBuilder (10.1039/D1SC04444C) provides de novo ligand generation using deep learning, but the source code is not publicly available, preventing direct benchmarking or customization. Like PocketCrafter, it also lacks integrated learning modules, which limits its utility for screening large, user-defined libraries or compounds of interest.

      Additionally, tools like AutoDesigner from Schrödinger, while powerful, are not publicly accessible and hence fall outside the scope of open benchmarking.

      Author response table 1.

      Comparison of de novo drug design tools. SBDD refers to Structure-Based Drug Design, and LBDD refers to Ligand-Based Drug Design.

      In contrast, Gcoupler is a fully open-source, end-to-end platform that integrates both Ligand-Based and Structure-Based Drug Design. It spans from cavity detection and molecule generation to automated model training using GNNs, allowing users to evaluate and prioritize candidate ligands across large chemical spaces without the need for commercial software or advanced coding expertise.

      (2) In Figure 2, the authors mention that IC4 and IC5 potential binding sites are on the direct G protein coupling interface ("This led to the identification of 17 potential surface cavities on Ste2p, with two intracellular regions, IC4 and IC5, accounting for over 95% of the Ste2p-Gpa1p interface (Figure 2a-b, Supplementary Figure 4j-n)..."). Later, however, in Figure 4, when discussing which residues affect the binding of the metabolites the most, the authors didn't perform MD simulations of mutant STE2 and just Gpa1p (without metabolites present). It would be beneficial to compare the binding of G protein with and without metabolites present, as these interface mutations might be affecting the binding of G protein by itself.

      Thank you for this insightful suggestion. While we did not perform in silico MD simulations of the mutant Ste2-Gpa1 complex in the absence of metabolites, we conducted experimental validation to functionally assess the impact of interface mutations. Specifically, we generated site-directed mutants (S75A, L289K, T155D) and expressed them in a ste2Δ background to isolate their effects.

      As shown in the Supplementary Figure, these mutants failed to rescue cells from α-factor-induced programmed cell death (PCD) upon metabolite pre-treatment. This was confirmed through fluorometry-based viability assays, FUN1<sup>TM</sup> staining, and p-Fus3 signaling analysis, which collectively monitor MAPK pathway activation (Figure 4c–e).

      Importantly, the induction of PCD in response to α-factor in these mutants demonstrates that G protein coupling is still functionally intact, indicating that the mutations do not interfere with Gpa1 binding itself. However, the absence of rescue by metabolites strongly suggests that the mutated residues play a direct role in metabolite binding at the Ste2p–Gpa1p interface, thus modulating downstream signaling.

      While further MD simulations could provide structural insight into the isolated mutant receptor–G protein interaction, our experimental data supports the functional relevance of metabolite binding at the identified interface.

      (3) While the experiments, performed by the authors, do support the hypothesis that metabolites regulate GPCR signaling, there are no experiments evaluating direct biophysical measurements (e.g., dissociation constants are measured only in silicon).

      We thank Reviewer #3 for raising these insightful comments. We would like to mention that we performed an array of methods to validate our hypothesis and observed similar rescue effects. These assays include:

      a. Cell viability assay (FDA/PI Flourometry- based)

      b. Cell growth assay

      c. FUN1<sup>TM</sup>-based microscopy assessment

      d. Shmoo formation assays

      e. Mating assays

      f. Site-directed mutagenesis-based loss of function

      g. Transgenic reporter-based assay

      h. MAPK signaling assessment using Western blot.

      i. And via computational techniques.

      Concerning the direct biophysical measurements of Ste2p and metabolites, we made significant efforts to purify Ste2p by incorporating a His tag at the N-terminal, with the goal of performing Microscale Thermophoresis (MST) and Isothermal Titration Calorimetry (ITC) measurements. Despite dedicated attempts over the past year, we were unsuccessful in purifying the protein, primarily due to our limited expertise in protein purification for this specific system. As a result, we opted for genetic-based interventions (e.g., point mutants), which provide a more physiological and comprehensive approach to demonstrating the interaction between Ste2p and the metabolites.

      Furthermore, in addition to the clarification above, we have added the following statement in the discussion section to tone down our claims: “A critical limitation of our study is the absence of direct binding assays to validate the interaction between the metabolites and Ste2p. While our results from genetic interventions, molecular dynamics simulations, and docking studies strongly suggest that the metabolites interact with the Ste2p-Gpa1 interface, these findings remain indirect. Direct binding confirmation through techniques such as surface plasmon resonance, isothermal titration calorimetry, or co-crystallization would provide definitive evidence of this interaction. Addressing this limitation in future work would significantly strengthen our conclusions and provide deeper insights into the precise molecular mechanisms underlying the observed phenotypic effects.”

      (4) The authors do not discuss the effects of the metabolites at their physiological concentrations. Overall, this manuscript represents a field-advancing contribution at the intersection of AI-based ligand discovery and GPCR signaling regulation.

      We thank reviewer #3 for this comment and for recognising the value of our work. Although direct quantification of intracellular free metabolite levels is challenging, several lines of evidence support the physiological relevance of our test concentrations.

      - Genetic validation supports endogenous relevance: Our genetic screen of 53 metabolic knockout mutants showed that deletions in biosynthetic pathways for these metabolites consistently disrupted the α-factor-induced cell death, with the vast majority of strains (94.4%) resisting the α-factor-induced cell death, and notably, a subset even displayed accelerated growth in the presence of α‑factor. This suggests that endogenous levels of these metabolites normally provide some degree of protection, supporting their physiological role in GPCR regulation.

      - Metabolomics confirms in vivo accumulation: Our untargeted metabolomics analysis revealed that α-factor-treated survivors consistently showed enrichment of CoQ6 and zymosterol compared to sensitive cells. This demonstrates that these metabolites naturally accumulate to protective levels during stress responses, validating their biological relevance.

    1. Explicitlanguage planning and policy making in the United States−when it does occur−tends to be done at thestate, local, or institutional levels

      decentralizations lead to inconsistent approaches, reinforces SAE as default standard

    2. one striking feature is the absence of a guiding overarching explicit national edu-cational language policy.

      lack of national policy allows for fragmented approach, can lead to inequities in multilingual education, reinforces monolingual norms

    1. Fourth Amendment simply does not apply toeavesdropping

      I see a clear difference between simply eavesdropping on a conversation on a public space and purposefully using technology to tap a telephone call. I think that a textualism approach to the constitution in this case is challenging due to the advancements in technology that have happened since the writing of the constitution.

    2. unlike a field,

      What does the constitution think about people being non consensually filmed in a public field and that being used in court?

    3. too likely to be subtlyinfluenced by the familiar shortcomings of hindsight judgment.

      This makes so much sense! Of course if you find out through the investigation that the suspect is doing bad stuff, you are going to concede that the prior suspicion had due cause.

    4. physical penetration of the telephone booth

      This argument makes no sense. Of course they didn't burst into the telephone booth, that doesn't give them to the right to listen to any calls.

    5. "right to privacy."

      The difference between a general "right to privacy" and what the court argues is outlined in the 4th amendment is very interesting to me. What privacy does the 4th amendment protect and from who does it protect citizens? The government can't invade our privacy, but can other citizens? What does this mean about private investigators?

    6. Fourth Amendment

      Fourth Amendment: protects people from unreasonable searches and seizures. This means that the government cannot search through your belongings without a warrant. This entire case states that the Fourth Amendment protects the people of America (through search and seizure, even in Katz's case).

    Annotators

  3. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Elon Musk [@elonmusk]. Trashing accounts that you hate will cause our algorithm to show you more of those accounts, as it is keying off of your interactions. Basically saying if you love trashing *that* account, then you will probably also love trashing *this* account. Not actually wrong lol. January 2023. URL: https://twitter.com/elonmusk/status/1615194151737520128 (visited on 2023-12-07).

      This is a very interesting algorithm choice by Elon Musk, as I find it strange that he made it so interacting with accounts you dislike will cause you too see more of them. The basic concept of it makes any normal person assume that this would deter people from his app "X", but it actually makes sense when you think about how much drama, controversies, and hate is prevalent within that app. I think he is using this strategy to basically "rage bait" people to engaging more with the app by causing them to try to win internet battles, etc.

    2. Elon Musk [@elonmusk]. Trashing accounts that you hate will cause our algorithm to show you more of those accounts, as it is keying off of your interactions. Basically saying if you love trashing *that* account, then you will probably also love trashing *this* account. Not actually wrong lol. January 2023. URL: https://twitter.com/elonmusk/status/1615194151737520128 (visited on 2023-12-07).

      This source is criticizing how Elon musk is trying to control X by trashing specific accounts that he hates, but the X algorithm is just recommending more things that he hates. Thus, creating a hate filled trash fest of a recommendation that he helped create.

    3. Systemic bias. November 2023. Page Version ID: 1185361788. URL: https://en.wikipedia.org/w/index.php?title=Systemic_bias&oldid=1185361788 (visited on 2023-12-07).

      This article talks about systematic bias, which is basically a tendency to operate in ways which result in certain social groups being favored and others being devalued. This is something I had studied in history class, as we had learned about historic examples such as when the U.S. criminal sentencing guidelines created a harsher sentence on cheaper cocaine more common with black communities and people of color, compared to more expensive cocaine which was used in white populations, in order to criminalize POC greater. This is also something which I have seen being used on social media, as recently I have seen a lot of media of illegal immigrants committing crimes on the news and social media platforms, compared to crimes committed by white people.

    4. Echo chamber (media). December 2023. Page Version ID: 1188142141. URL: https://en.wikipedia.org/w/index.php?title=Echo_chamber_(media)&oldid=1188142141#Echo_chambers_vs_epistemic_bubbles (visited on 2023-12-07).

      Echo chambers are often the cause for the creation and execution of extremist ideologies and actions. That said, the ethnicity of moderating such groups could be problematic, as for some ethical frame works, this could violate a common pillar, that of freedom of speech. However, in some other frame works, the need to stop any potential dangers caused by these echo chambers might out weight the need to maintain freedom of speech. However, this then brings into question the extent of control and moderation a governing body should have on any group.

    5. Petter Törnberg. How digital media drive affective polarization through partisan sorting. Proceedings of the National Academy of Sciences, 119(42):e2207159119, October 2022. URL: https://www.pnas.org/doi/10.1073/pnas.2207159119 (visited on 2023-12-07), doi:10.1073/pnas.2207159119.

      It struck me how relevant this paper is to the chapter’s point that recommendation algorithms don’t just serve content but shape what we see and how we interpret it. The study shows how digital media can drive affective polarization via “partisan sorting” — which nicely connects to the chapter’s warning that algorithms can deepen divisions by reinforcing “you vs them” dynamics.

    6. BBC. YouTube aids flat earth conspiracy theorists, research suggests. BBC, February 2019. URL: https://www.bbc.com/news/technology-47279253 (visited on 2023-12-07). { requestKernel: true, binderOptions: { repo: "binder-examples/jupyter-stacks-datascience", ref: "master", }, codeMirrorConfig: { theme: "abcdef", mode: "python" }, kernelOptions: { name: "python3", path: "./ch11_recommendations" }, predefinedOutput: true } kernelName = 'python3'

      You tube is a big part in the spreading the idea of the flat Earth to people online. Of course You tube is full of information but also misinformation, but the Youtube algorithm makes it all too easy to funnel users down a conspiracy theory rabbit hole. After interviewing people at fat earth conventions, they found that many of them got the idea from Youtube videos. They propose that the only way to fight misinformation on Youtube is to make accurate, informative videos themselves, which I would argue is happening a lot on You tube today.

    7. Elon Musk [@elonmusk]. Trashing accounts that you hate will cause our algorithm to show you more of those accounts, as it is keying off of your interactions. Basically saying if you love trashing *that* account, then you will probably also love trashing *this* account. Not actually wrong lol. January 2023. URL: https://twitter.com/elonmusk/status/1615194151737520128 (visited on 2023-12-07).

      I find this loop hole funny. By interacting with accounts you don't like you will see more of them. I also find it interesting that there is no solutions being made to resolve this. Although I think the solution would be for the algorithim to scan what you post which would be weird.

    8. Systemic bias. November 2023. Page Version ID: 1185361788. URL: https://en.wikipedia.org/w/index.php?title=Systemic_bias&oldid=1185361788 (visited on 2023-12-07).

      This article basically talks about the definition and the impact of systemic bias. This bias usually comes from the education and occupation field. It impacts a lot people even when people did not do anything.

    9. Kashmir Hill. How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did. Forbes, February 2012. URL

      This article shocked me. It talks about Target and says that they keep a profile on customers to keep track of what they buy and notice trends in what they buy. They do this tracking so that they can do targeted advertising and send coupons to people for things they think the person may be interested in buying in the future. Specifically this article talks about how Target can figure out when someone is pregnant and they send coupons of baby stuff to the persons house. They did this with a high school girl and her dad got upset about it because he thought it was insensitive to send baby stuff to a high schooler. It turned out the girl was pregnant and her father did not know yet. It is crazy to think that Target was able to figure out that she was pregnant before her dad and that was the reason he found out. I think it is really weird though that they keep track of people's purchases in this way, there is no privacy anymore.

    10. Kelsey D. Atherton [@AthertonKD]. Oh, you're experiencing a structural problem? Have you ever considered trying different personal choices instead? April 2019. URL: https://twitter.com/AthertonKD/status/1120376944061583360 (visited on 2023-12-07).

      Kelsey D. Altherton's tweet from 2019 stuck with me. I love his sarcasm and point of view with structural problems. I'm tying my own experience with these types of dumb views on structural problems vs personal issues. I grew up in a low income family where my family worked in agriculture, with very very little pay. Where raises were non existent in this workforce despite living in this world of rising prices with anything imaginable. So it was very difficult for groceries, mortgage, taxes, insurance, bills, etc. So I agree with Kelsey, it's definitely not a personal issue nor can it be fixed through the individual.

    11. Fiona Tapp. Digital Reminders of a Lost Pregnancy. The Atlantic, November 2018. URL: https://www.theatlantic.com/family/archive/2018/11/digital-reminders-miscarriages/575050/ (visited on 2023-12-07).

      I always believe that apps or algorithms that are related to a person's own health, including both mental and physical health, should be doubly considered before it is published since they could directly influence a person. If a person doesn't like a social media app, they can just delete it, and it will not affect their personal life much. In this case, this app was continuously recalling the user's sufferings, which largely affected the user's life and mental health.

    12. Systemic bias. November 2023. Page Version ID: 1185361788. URL: https://en.wikipedia.org/w/index.php?

      Systemic bias is a tendency of a particular outcome that favors or disfavors specific social groups. It is interchangeable with the term institutional bias and structural bias. Most of the times, systemic bias is used towards minorities, and it can lead to institutional racism. Due to systemic bias that occurs in organizations, it causes human resource mistreatment and decrease the productivity and viability of organizations.

    13. Kashmir Hill. How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did. Forbes, February 2012. URL: https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/ (visited on 2023-12-07).

      This article outlines how retail companies, in this example, Target, collect customers' data and make assumptions about their personal lives based on what they are buying. This information has shocked and scared customers based on the accuracy of the data and the non-consensual nature of the way these companies are collecting this data.

    14. Lauren Feiner. DOJ settles lawsuit with Facebook over allegedly discriminatory housing advertising. CNBC, June 2022. URL: https://www.cnbc.com/2022/06/21/doj-settles-with-facebook-over-allegedly-discriminatory-housing-ads.html (visited on 2023-12-07).

      Many people think that algorithms are just "automatically running numbers", without emotions or biases. However, in this news story, Facebook was investigated by the US Department of Justice for its advertising recommendation system being suspected of discrimination, and later reached a settlement. The real irony is that no one was sitting there manually excluding certain groups; it was just the platform's "optimization logic" that automated, scaled, and executed this bias at a low cost. In other words, discrimination doesn't require a bad person; it only needs an "efficiency-first, click-through rate-king" advertising system. On the surface, it seems to be helping businesses find "the most likely people to see the ads", but in reality, it has set up invisible thresholds in the real world, preventing certain groups from ever seeing the same opportunities. As a result, "technological neutrality" has become a nice-sounding but empty slogan.

    15. [k5]

      Elon musk in this tweet is saying that algorithms are based off of your interactions so if you interact with things that make you upset then the algorithm will show you more of that. I think that he is right, which is the first time anyone has said that ever, people do enjoy getting riled up by things on social media. The thing that he fails to understand is that some people have a right to be angry and express that and because they cant stop themselves does not make them weaker.

    16. Lauren Goode. I Called Off My Wedding. The Internet Will Never Forget. Wired, 2021. URL: https://www.wired.com/story/weddings-social-media-apps-photos-memories-miscarriage-problem/ (visited on 2023-12-07).

      This article highlights how the internet is forever even if we don't want it to be. It is interesting to me because although I have always been careful what I post on social media it had never really occurred to me that something could unintentionally go viral and remind me of a bad time in my life forever.

    1. What strategies do you think might work to improve how social media platforms use recommendations?

      Some strategies I think social media platforms could do to improve recommendations could be an optional section where you could include some of your interests. A different strategy could be consensually analyzing the user and their data to better prescribe things they could be interested in.

    2. Content recommendations can go well when users find content they are interested in. Sometimes algorithms do a good job of it and users are appreciative. TikTok has been mentioned in particular as providing surprisingly accurate recommendations, though Professor Arvind Narayanan argues [k11] that TikTok’s success with its recommendations relies less on advanced recommendation algorithms, and more on the design of the site making it very easy to skip the bad recommendations and get to the good ones.

      Majority of the times, people stay on one video for longer because they are interested at the content of the video. However, what if people continue playing that video just because they are doing something else and not paying attention to the video. I'm curious about wether the platform would count that as the interested content? Does the company only evaluate the amount of time that user spends on one video, or is there other perspective in the evaluation.

    1. Some recommendation algorithms can be simple such as reverse chronological order, meaning it shows users the latest posts (like how blogs work, or Twitter’s “See latest tweets” option). They can also be very complicated taking into account many factors, such as: Time since posting (e.g., show newer posts, or remind me of posts that were made 5 years ago today) Whether the post was made or liked by my friends or people I’m following How much this post has been liked, interacted with, or hovered over Which other posts I’ve been liking, interacting with, or hovering over What people connected to me or similar to me have been liking, interacting with, or hovering over What people near you have been liking, interacting with, or hovering over (they can find your approximate location, like your city, from your internet IP address, and they may know even more precisely) This perhaps explains why sometimes when you talk about something out loud it gets recommended to you (because someone around you then searched for it). Or maybe they are actually recording what you are saying and recommending based on that. Phone numbers or email addresses (sometimes collected deceptively [k1]) can be used to suggest friends or contacts. And probably many more factors as well! Now, how these algorithms precisely work is hard to know, because social media sites keep these algorithms secret, probably for multiple reasons:

      Recommendation algorithms have always been sort of a mystery to me. I'm always seeing people on TikTok posting videos such as "How to beat the algorithm", and "how to go viral", so this really shows that it can often be a strategic thing to have the algorithm recommend your content. However one thing I do find it a lot of these algorithm primarily rely on engagement, and how often users are searching for that content, in order to further recommend it to others.

    2. What experiences do you have of social media sites making particularly good recommendations for you?

      I often use social medias such as intergram to recommend me ad of different clothing brands that I might be interested in, instead of seeking it out myself. I found that by doing my own research for clothing, it has been difficult to find brands that I actually like. However, with the amount of ads along with the algorithm understanding my interests, they often do show me brands and items that I quite like, and would purchase.

    3. What experiences do you have of social media sites making particularly good recommendations for you?

      I like how reddit reccomends posts for mee based on the topics and communities I follow as well was what posts within those communities are trending.

    4. Some recommendation algorithms can be simple such as reverse chronological order, meaning it shows users the latest posts (like how blogs work, or Twitter’s “See latest tweets” option). They can also be very complicated taking into account many factors, such as:

      These algorithms help users to get a better using experience. Without the algorithms, the users might find out the app or the platform is boring. Some of the algorithms can be so complicate, and it gathers most information and data on your platform to perform a better using experience.

    5. Time since posting (e.g., show newer posts, or remind me of posts that were made 5 years ago today)

      My experience with this explanation of what algorithms can do is exactly this. I know with snapchat specifically, I have watched their recommendation algorithm grow significantly over the years. Their saved gallery section now has "x year(s) ago today" which is awesome- I like the recommendation algorithm snapchat has going. Now with instagram, they have something similar but that idea did come off snapchats algorithm,

    6. Some recommendation algorithms can be simple such as reverse chronological order, meaning it shows users the latest posts (like how blogs work, or Twitter’s “See latest tweets” option).

      This section helped me understand how recommendation algorithms decide what content I see online. I was surprised by how many personal factors, like my location and interactions, can influence what gets shown to me. It also makes me think about how much data these platforms collect and how that affects my privacy.

    7. What people near you have been liking, interacting with, or hovering over (they can find your approximate location, like your city, from your internet IP address, and they may know even more precisely) This perhaps explains why sometimes when you talk about something out loud it gets recommended to you (because someone around you then searched for it). Or maybe they are actually recording what you are saying and recommending based on that. Phone numbers or email addresses (sometimes collected deceptively [k1]) can be used to suggest friends or contacts. And probably many more factors as well!

      I find it really surprising that social media companies can use the search history from the people around me to recommend me content. By constantly finding content that is relevant to what I was talking about in real life, social media websites can keep me engaged. It's honestly a little dystopian; it makes it seem like every single social media company knows everything about your life and is manipulating you into engaging with their platform.

    8. recommendation algorithm, which is an algorithm (a series of steps or rules, such as in a computer program) that recommends posts for users to see, people for users to follow, ads for users to view, or reminders for users.

      This recommendation algorithm may seem like something helpful but I think it is part of and facilitates the evil side of social media in my opinion. The evil side of social media that I how found is the addictive side and the way that social media companies go about creating their apps is to make them addictive. The more time users spend on their app the more they can sell ads for and so they want to keep people's attention as long as possible. This recommendation algorithm aids in this by recommending posts to users that it thinks will keep them on the app and get their attention the most.

    9. The method of determining what is shown to users is called a recommendation algorithm, which is an algorithm (a series of steps or rules, such as in a computer program) that recommends posts for users to see, people for users to follow, ads for users to view, or reminders for users.

      This is so interesting to me. The idea that recommendation algorithms use our data to show us what they think we want to see/buy. In my experience with recommendation algorithms they are fairly accurate and that is a little scary.

    1. In a recent paper, researchers from Harvard and the University of Pisa reported that “U.S. data centers produced 105 million tons CO2 equivalent gasses in the past year with a carbon intensity 48 percent higher than the national average.”

      why is this?

    1. Sometimes though, individuals are still blamed for systemic problems. For example, Elon Musk, who has the power to change Twitters recommendation algorithm, blames the users for the results: Fig. 11.4 A tweet [k5] from current Twitter owner Elon Musk blaming users for how the recommendation algorithm interprets their behavior

      This tweet my Elon is interesting, because while this could just be me, it feels like Elon is in favor of this "you get more accounts that you hate" mechanic on the site. Makes sense since hate and malice is what gets people to stay on sites longer, but still, its funny how the person with the most power in this situation is actively blaming the user for outcomes completely in his control.

    2. Individuals still have responsibility with how they behave, but the system itself may be set up so that individual efforts cannot not be overcome the problems in the system.

      We always emphasize how these social media algorithms let people become addicted to scrolling information streams and wasting all our time. However, we should reflect on who's taking the responsibility to manage our own time. The answer is very clear, ourselves. And what we post on social media should also take responsibility for any consequences, even though the algorithm provided the information. Therefore, I believe it is significant to improve the algorithm and regulate our own behaviors online to create a more friendly and moral virtual society on the platform.

    3. when these guidelines were followed, they had have racially biased (that is, racist) outcomes regardless of intent or bias of the individual judges.

      The most heart-wrenching aspect of this statement lies in the fact that it reveals the problem does not stem from a single "bad judge", but rather the entire system itself is inherently biased. In other words, sometimes, without anyone intentionally discriminating, the rules themselves will automatically enforce discrimination, and even make people believe that this is a "normal" or "neutral" procedure.

    1. 11.4.1. Filter Bubbles

      Companies are almost incentivized to put people in echo chambers because of how that increases time spent in the app. People like to have their beliefs validated. I feel like this could be limiting because of how one's beliefs would never be challenged.

    1. As with kairotic space, the stakes of a situation—that is, the potential for harm or benefit—are always diff er ent for diff er ent actors; are not per-ceived the same way by diff er ent actors; and, in the case of a bodymind event, are governed by differing knowledges of time.

      Potential for harm or benefit different for every actor

    1. Hydrogen bonds have about a tenth of the strength of an average covalent bond, and are constantly broken and reformed in liquid water. If you liken the covalent bond between the oxygen and hydrogen to a stable marriage, the hydrogen bond has "just good friends" status.

      Realation to bond

    1. he following table provides a summary of the list methods shown above. The column labeled result gives an explanation as to what the return value is as it relates to the new value of the list. The word mutator means that the list is changed by the method but nothing is returned (actually None is returned). A hybrid method is one that not only changes the list but also returns a value as its result. Finally, if the result is simply a return, then the list is unchanged by the method. Be sure to experiment with these methods to gain a better understanding of what they do. Method Parameters Result Description append item mutator Adds a new item to the end of a list insert position, item mutator Inserts a new item at the position given pop none hybrid Removes and returns the last item pop position hybrid Removes and returns the item at position sort none mutator Modifies a list to be sorted reverse none mutator Modifies a list to be in reverse order index item return idx Returns the position of first occurrence of item count item return ct Returns the number of occurrences of item remove item mutator Removes the first occurrence of item

      SaveFprReference

    1. idiosyncratic

      Peculiar to an individual or group; characterized by unique, personal, or quirky traits that deviate from the norm or standard — often in behavior, thinking, language, or style.

    2. interdisciplinary

      Involving two or more academic disciplines or fields of study that integrate concepts, methods, theories, or tools to address a common problem, question, or phenomenon — going beyond the boundaries of a single discipline.

    3. It is important to remember the range of sym-bol systems considered in deriving these conclusions.They include gesture; oral language; written lan-guage; number systems; mathematical notation; sys-tems for inscription (e.g., graphs, maps); and, to alesser degree, other systems. The multiplicity of sym-bol systems considered in the volume certainly givesgreater weight and credibility to the editors' conclu-sions. This multiplicity is also powerful for us as ear-ly literacy researchers, a point to which we now turn

      different symbol systems

    4. SSSS model is intended to apply sim-ilarly to other symbolic systems. It would beinteresting to see early literacy researchers apply thisframework to their own data

      finding out what the sss model is intended to do

    1. “Malware” is short for “malicious software.” Malware is typically installed on a user’s device for the purpose of stealing personal information.

      Question: How do you know your device has malware? Such as a phone or tablet.

    2. Clear cookies from your browser.

      Question: I have done this before in the past to free up space on computer, but I still see popups related to searches even if I clear my cache and cookies. Are the cookies always going to generate personal ads based on what I searched even if cleared?

    3. Cookies—small pieces of data with a unique ID placed on your device by websites—are online tracking tools that enable this to happen. Cookies can store your website-specific browsing behaviour and any site-specific customization

      Comment: I never really considered cookies to be the reason why I see relevant topics on other websites. For example, when I google something and then two minutes later see it on my tiktok I always joke "our phones can hear us" but no, it's actually us looking it up and the cookies generating it across platforms.

    1. Collaboration: How will teachers collaborate to support students? How will students collaborate with each other?

      does the tool provide ways for teachers to collaborate?

    2. Existing Digital Resources: What are we currently using? Who has licenses? Is teacher training required and/or has it taken place?

      This is something that is often overlooks. I've been in organizations that have tools that ultimately do the same thing

    1. Democracies thrive when politicians believe they are better off playing by the rules of that game — even when they lose elections — because that’s the way to maximize their self-interest over time.

      But, what changed?

    1. Come up with at least two different theoretical sets of rules (recommendation algorithms) for what would make a “good” social media post to recommend.
      1. A post should be shown to a user if they have designated in a list selection that those kinds of posts are something they want to see. If a video falls outside those parameters hide it from them.

      2. Show a post to a user if it is similar to posts that they have interacted with in the past. In either subject, tag, creator, etc.

    1. ethical discipline

      here, and again below, it is "ethical disciple" not "discipline" that translates tshul khrims/ śīla. So please make the glossary entry tag the whole term "ethical discipline"

    1. eLife Assessment

      This study provides valuable insights into the evolutionary conservation of sex determination mechanisms in ants by identifying a candidate sex-determining region in a parthenogenetic species. It uses solid, well-executed genomic analyses based on differences in heterozygosity between females and diploid males. While the candidate locus awaits functional validation in this species, the study provides convincing support for the ancient origin of a non-coding locus implicated in sex determination.

    2. Reviewer #1 (Public review):

      The authors have implemented several clarifications in the text and improved the connection between their findings and previous work. As stated in my initial review, I had no major criticisms of the previous version of the manuscript, and I continue to consider this a solid and well-written study. However, the revised manuscript still largely reiterates existing findings and does not offer novel conceptual or experimental advances. It supports previous conclusions suggesting a likely conserved sex determination locus in aculeate hymenopterans, but does so without functional validation (i.e., via experimental manipulation) of the candidate locus in O. biroi. I also wish to clarify that I did not intend to imply that functional assessments in the Pan et al. study were conducted in more than one focal species; my previous review explicitly states that the locus's functional role was validated in the Argentine ant.

    3. Reviewer #3 (Public review):

      The authors have made considerable efforts to conduct functional analyses to the fullest extent possible in this study; however, it is understandable that meaningful results have not yet been obtained. In the revised version, they have appropriately framed their claims within the limits of the current data and have adjusted their statements as needed in response to the reviewers' comments.