Hypothesis

22 Matching Annotations

Last 7 days
019dad20-3915-9e7c-382c-36ba60262202.share.connect.posit.cloud 019dad20-3915-9e7c-382c-36ba60262202.share.connect.posit.cloud

Reinforcement Learning for Antibiotic Stewardship: Optimizing Prescribing Policies Under Antimicrobial Resistance Dynamics

22
1. sethblumberg 12 May 2026
  
  in Public
  
  At
  
  I think a bit more introductory background is needed. I.e. patients can either be 1) There is only one type of infection, 2) There is only a single time-point in which decisions for any given patient are made, etc.
2. sethblumberg 12 May 2026
  
  in Public
  
  partial observability
  
  Can just say 'observability'
3. sethblumberg 12 May 2026
  
  in Public
  
  however, the resistance profile of a potential infection is controlled by the array of AMR_LeakyBalloon models in the environment, and the resistance profile determines the likelihood that a given antibiotic treatment will be effective.
  
  I'd reverse the order - first describe what the resistance profile is, and then how it is controlled.
4. sethblumberg 12 May 2026
  
  in Public
  
  clinical benefit and failure multipliers
  
  Need to explain these a bit more.
5. sethblumberg 12 May 2026
  
  in Public
  
  Figure 1:
  
  I would consider removing the components that aren't relevant for the particular step. E.g. for (1), can just have 'patient generator' and 'patient'
6. sethblumberg 12 May 2026
  
  in Public
  
  For our RL agents, we used PPO (proximal policy optimization) implementations from the Python package stable-baselines3. The specific agent architectures used — flat memoryless, flat recurrent, hierarchical memoryless, and hierarchical recurrent — are described in Section 3.3.
  
  I would leave this out of this section - I think it's a bit too specific
7. sethblumberg 12 May 2026
  
  in Public
  
  A fixed prescribing rule (see Section 3.5), which emulates how real-world prescribers make antibiotic treatment decisions using observable patient and resistance information Reinforcement learning (RL) agents (see Section 3.3), which learn policies by interacting with the environment over many training episodes, using feedback from the reward signal to discover policies that maximize cumulative long-term reward. Unlike the fixed rule, RL agents are not pre-programmed with a specific decision logic; instead, they discover effective prescribing strategies through trial-and-error exploration.
  
  Again - looks like two papers. First to do #1. And second to do #2 and compare results to #1.
8. sethblumberg 12 May 2026
  
  in Public
  
  (Lee 2026) and companion preprint (Lee and Blumberg 2026)
  
  Can just cite both at once
9. sethblumberg 12 May 2026
  
  in Public
  
  These experiments
  
  I wouldn't introduce experiments quite yet - as it's a lot for a first-time reader to absrob. Can you just say 'Simulations'?
10. sethblumberg 12 May 2026
  
  in Public
  
  By comparing the performance of the RL algorithms against the fixed prescribing rule baseline, we demonstrate that our simulator enables quantitative comparative policy analysis, allowing us to characterize the magnitude and nature of gains achievable by adaptive prescribing strategies relative to static prescribing rules.
  
  This has two messages (and again makes me wonder if there are two papers). One is that the simulator allow comparative policy analysis. The second is that RL can out-perforam static presecription rules.
11. sethblumberg 12 May 2026
  
  in Public
  
  We note here that this application of simulation and reinforcement learning to the area of antimicrobial resistance is relatively novel.
  
  I wouldn't be so explicit about noting 'novelty'- can lead that for reviewers/editors to decide. Could rub some folks the wrong way - particularly if they have published a paper that they think is even more novel. I'd rather emphasize how it builds/extends existing literature and findings, adresses gaps, etc.
12. sethblumberg 12 May 2026
  
  in Public
  
  We then compared the performance of two types of prescribing algorithms in these scenarios: a ‘baseline’ fixed prescribing rule that emulates how real-world prescribers make decisions about antibiotic treatment, and reinforcement learning (RL) algorithms. Because the consequences of prescribing decisions unfold over time — treatment decisions made today can drive resistance that undermines antibiotic efficacy months later — antibiotic prescribing is fundamentally a sequential decision-making problem. Reinforcement learning is a subfield of machine learning focused on designing algorithms to optimize sequences of decisions in order to maximize cumulative long-term outcomes, making it a useful approach for this problem (see Section 3.1 for further details).
  
  Great paragraph
13. sethblumberg 12 May 2026
  
  in Public
  
  A detailed description of the simulator’s architecture and functionality is provided in a separate manuscript (Lee and Blumberg 2026).
  
  Move to methods
14. sethblumberg 12 May 2026
  
  in Public
  
  Antibiotic Stewardship
  
  FYI - I'm going to use AS for antibiotic stewardshp in my comments
15. sethblumberg 12 May 2026
  
  in Public
  
  Given these challenges, it is difficult to directly quantify the long-term effects of antibiotic stewardship program interventions using observational or interventional studies alone (Bertollo et al. 2018; Schweitzer et al. 2019). When real-world systems lack sufficient data to support direct modeling, simulation can offer a complementary framework in which researchers are able to directly and explicitly specify the underlying ground truth of key mechanisms, enabling controlled investigation of trade-offs that are difficult or impossible to observe in real-world settings.
  
  Great paragraph for motivating the benefit (and novelty) of AS simulation.
16. sethblumberg 12 May 2026
  
  in Public
  
  A fundamental challenge to performing quantitative impact evaluation of these programs is the pervasive partial observability of key system components (Laxminarayan et al. 2013). Even in resource-rich settings where antibiotic prescription records are available, true selection pressure on pathogen populations is incompletely observed due to unmeasured sources of antibiotic exposure, including agricultural use and environmental contamination (Van Boeckel et al. 2015). Measurement of AMR itself presents an even greater challenge: while initiatives such as the WHO’s Global Antimicrobial Resistance and Use Surveillance System (GLASS) have sought to standardize surveillance, participation remains voluntary and coverage incomplete, and no universally adopted scalar metric of resistance at the community level yet exists (Organization 2022; Leth and Schultsz 2023).
  
  I would be careful about highlighting these as challenges early in the intro as it sets the paper up as a 'solution' to these challenges. I would rather emphasize why simulation is useful by itself (even if you had perfect observation / mechanistic understanding). Then you can introduce the limitations mentioned here as constraints (rather than challenges) that need to be incorporated into simulations.
17. sethblumberg 12 May 2026
  
  in Public
  
  to saturation
  
  Not sure what this
18. sethblumberg 12 May 2026
  
  in Public
  
  dring
  
  typo
19. sethblumberg 12 May 2026
  
  in Public
  
  hierarchical RL agents outperformed flat RL agents
  
  briefly describe the distinction
20. sethblumberg 12 May 2026
  
  in Public
  
  evaluating the impact of stewardship strategie
  
  Feels you have mixed messages here. On the one hand you are 'evaluating the impact of stewardship strategies' (which sounds like you are focusing on evaluating current strategies0 and on the other you are developing novel RL strategies. I'm still of the mind these are two separate papers! But I'll keep reading :)
21. sethblumberg 12 May 2026
  
  in Public
  
  benchmark prescribing policies discovered by reinforcement learning agents against a clinically-realistic fixed prescribing rule across four sets of experiments of increasing complexity, where each set of experiments varied in type and degree of observed information degradation.
  
  This is quit a mouthful. I'd break it down more. Why is RL used? What types of experiments are you running? What type of information is observed and degraded?
22. sethblumberg 12 May 2026
  
  in Public
  
  Optimizing Prescribing Policies Under Antimicrobial Resistance Dynamics
  
  I would lead with this, and then follow with the approach
Visit annotations in context

Annotators

sethblumberg

URL

019dad20-3915-9e7c-382c-36ba60262202.share.connect.posit.cloud/

Annotators

URL