R0:
Reviewer #1: Title: Probabilistic Forecasting of Monthly Dengue Cases Using Epidemiological and Climate Signals: A BiLSTM–Naive Bayes Model Versus Mechanistic and Count-Model Baselines.
Manuscript Number: PGPH-D-25-03170
This manuscript presents a rigorous comparative study of probabilistic forecasting models for monthly dengue incidence in Freetown, Sierra Leone, covering the period 2015–2025. It evaluates four major model classes—NB-GLM, INGARCH-NB, Renewal-NB, and BiLSTM-NB—under a leakage-safe rolling-origin evaluation. The article demonstrates strong methodological maturity, careful control of data leakage, and thorough probabilistic evaluation using proper scoring rules, interval coverage, sharpness metrics, PIT diagnostics, and Diebold–Mariano tests.
The manuscript is generally well-written, technically sound, and addresses an important operational public health problem. It positions itself as one of the few works offering aligned comparisons of mechanistic, statistical, and deep-learning models under realistic constraints for West African dengue surveillance.
This article presents a methodologically rigorous comparison of four probabilistic forecasting approaches—NB-GLM, INGARCH-NB, Renewal-NB, and BiLSTM-NB—applied to monthly dengue case data from Freetown, Sierra Leone (2015–2025). The study addresses an important gap by evaluating mechanistic, statistical, and deep-learning models under aligned, leakage-safe conditions. While the work is comprehensive and technically strong, several critical issues affect its accessibility, interpretability, and broader applicability.
Strengths
The study excels in methodological rigor. Its strict leakage safeguards, careful feature-timing rules, and use of expanding-window rolling-origin evaluation significantly strengthen reliability. The inclusion of proper scoring rules, interval coverage, sharpness metrics, PIT histograms, and Diebold–Mariano tests provides a complete probabilistic evaluation rarely seen in dengue forecasting studies. The horizon-specific findings—INGARCH-NB outperforming at 1–2 months and BiLSTM-NB excelling at 3 months—are well supported by aligned comparisons and statistical significance tests. The transparency of data, code, and alignment artefacts enhances reproducibility and credibility. Additionally, the manuscript offers practical guidance for operational forecasting, including a realistic “light climate” input strategy suitable for resource-limited settings.
Limitations
Despite its strengths, the manuscript is heavily technical, with extensive mathematical exposition in the main text. This may limit accessibility for public-health practitioners who are likely part of the target audience. The mechanistic renewal model is presented as a baseline but is arguably underspecified; the use of a short, fixed 3-month kernel may not realistically capture dengue’s generation interval dynamics, likely contributing to its poor performance. This limits the interpretive value of the mechanistic comparison. This limitation should be addressed.
The study’s climate treatment, while intentionally conservative, may underexploit important environmental drivers. Although justified operationally, this constraint restricts exploration of potentially meaningful lag structures or seasonal climate anomalies. The analysis is limited to a single city and monthly data frequency, raising questions about generalizability across geographies with different climate patterns and dengue transmission dynamics. Moreover, the monthly temporal resolution may obscure rapid outbreak shifts, possibly disadvantaging mechanistic and hybrid models that rely on finer-grained dynamics. This should be addressed.
The manuscript makes a valuable and original contribution to dengue forecasting, offering robust methodological innovations and practical insights for real-time surveillance systems. However, improved clarity, stronger justification for mechanistic assumptions, and expanded discussion of generalizability would enhance its usefulness and scholarly impact. With revisions to improve accessibility and contextual depth, the study is well positioned for publication and for informing operational forecasting practice in similar settings.
Reviewer #2: 1. What is PIT in the abstract stand for? The authors should avoid using abbreviations in the abstract.
2. The authors should providing some additional analysis, such as experimenting with alternative or longer serial-interval kernels, or simple sensitivity checks (e.g., different window lengths, or, if possible, finer temporal resolution).
3. Please, justifies the small climate feature set, mentioning any exploratory work with larger sets.
4. The authors should add a clearly labelled missing-data handling subsection that specifies: The imputation method, the number of imputed months, and how they were used in training/evaluation, plus any sensitivity.
5. While the architecture, optimization, and calibration steps are described, the process for choosing hyperparameters is not fully audit-ready.
6. I recommend that the authors conduct an additional experiment to demonstrate the generalizability of the proposed model.