22 Matching Annotations
  1. Last 7 days
    1. At

      I think a bit more introductory background is needed. I.e. patients can either be 1) There is only one type of infection, 2) There is only a single time-point in which decisions for any given patient are made, etc.

    2. however, the resistance profile of a potential infection is controlled by the array of AMR_LeakyBalloon models in the environment, and the resistance profile determines the likelihood that a given antibiotic treatment will be effective.

      I'd reverse the order - first describe what the resistance profile is, and then how it is controlled.

    3. For our RL agents, we used PPO (proximal policy optimization) implementations from the Python package stable-baselines3. The specific agent architectures used — flat memoryless, flat recurrent, hierarchical memoryless, and hierarchical recurrent — are described in Section 3.3.

      I would leave this out of this section - I think it's a bit too specific

    4. A fixed prescribing rule (see Section 3.5), which emulates how real-world prescribers make antibiotic treatment decisions using observable patient and resistance information Reinforcement learning (RL) agents (see Section 3.3), which learn policies by interacting with the environment over many training episodes, using feedback from the reward signal to discover policies that maximize cumulative long-term reward. Unlike the fixed rule, RL agents are not pre-programmed with a specific decision logic; instead, they discover effective prescribing strategies through trial-and-error exploration.

      Again - looks like two papers. First to do #1. And second to do #2 and compare results to #1.

    5. By comparing the performance of the RL algorithms against the fixed prescribing rule baseline, we demonstrate that our simulator enables quantitative comparative policy analysis, allowing us to characterize the magnitude and nature of gains achievable by adaptive prescribing strategies relative to static prescribing rules.

      This has two messages (and again makes me wonder if there are two papers). One is that the simulator allow comparative policy analysis. The second is that RL can out-perforam static presecription rules.

    6. We note here that this application of simulation and reinforcement learning to the area of antimicrobial resistance is relatively novel.

      I wouldn't be so explicit about noting 'novelty'- can lead that for reviewers/editors to decide. Could rub some folks the wrong way - particularly if they have published a paper that they think is even more novel. I'd rather emphasize how it builds/extends existing literature and findings, adresses gaps, etc.

    7. We then compared the performance of two types of prescribing algorithms in these scenarios: a ‘baseline’ fixed prescribing rule that emulates how real-world prescribers make decisions about antibiotic treatment, and reinforcement learning (RL) algorithms. Because the consequences of prescribing decisions unfold over time — treatment decisions made today can drive resistance that undermines antibiotic efficacy months later — antibiotic prescribing is fundamentally a sequential decision-making problem. Reinforcement learning is a subfield of machine learning focused on designing algorithms to optimize sequences of decisions in order to maximize cumulative long-term outcomes, making it a useful approach for this problem (see Section 3.1 for further details).

      Great paragraph

    8. Given these challenges, it is difficult to directly quantify the long-term effects of antibiotic stewardship program interventions using observational or interventional studies alone (Bertollo et al. 2018; Schweitzer et al. 2019). When real-world systems lack sufficient data to support direct modeling, simulation can offer a complementary framework in which researchers are able to directly and explicitly specify the underlying ground truth of key mechanisms, enabling controlled investigation of trade-offs that are difficult or impossible to observe in real-world settings.

      Great paragraph for motivating the benefit (and novelty) of AS simulation.

    9. A fundamental challenge to performing quantitative impact evaluation of these programs is the pervasive partial observability of key system components (Laxminarayan et al. 2013). Even in resource-rich settings where antibiotic prescription records are available, true selection pressure on pathogen populations is incompletely observed due to unmeasured sources of antibiotic exposure, including agricultural use and environmental contamination (Van Boeckel et al. 2015). Measurement of AMR itself presents an even greater challenge: while initiatives such as the WHO’s Global Antimicrobial Resistance and Use Surveillance System (GLASS) have sought to standardize surveillance, participation remains voluntary and coverage incomplete, and no universally adopted scalar metric of resistance at the community level yet exists (Organization 2022; Leth and Schultsz 2023).

      I would be careful about highlighting these as challenges early in the intro as it sets the paper up as a 'solution' to these challenges. I would rather emphasize why simulation is useful by itself (even if you had perfect observation / mechanistic understanding). Then you can introduce the limitations mentioned here as constraints (rather than challenges) that need to be incorporated into simulations.

    10. evaluating the impact of stewardship strategie

      Feels you have mixed messages here. On the one hand you are 'evaluating the impact of stewardship strategies' (which sounds like you are focusing on evaluating current strategies0 and on the other you are developing novel RL strategies. I'm still of the mind these are two separate papers! But I'll keep reading :)

    11. benchmark prescribing policies discovered by reinforcement learning agents against a clinically-realistic fixed prescribing rule across four sets of experiments of increasing complexity, where each set of experiments varied in type and degree of observed information degradation.

      This is quit a mouthful. I'd break it down more. Why is RL used? What types of experiments are you running? What type of information is observed and degraded?