Reviewer #2 (Public review):
Summary:
This manuscript tackles an important and often neglected aspect of time-series analysis in ecology - the multitude of "small" methodological choices that can alter outcomes. The findings are solid, though they may be limited in terms of generalizability, due to the simple use case tested.
Strengths:
(1) Comprehensive Methodological Benchmarking:
The study systematically evaluates 30 test variants (5 correlation statistics × 6 surrogate methods), which is commendable and provides a broad view of methodological behavior.
(2) Important Practical Recommendations:
The manuscript provides valuable real-world guidance, such as the superiority of tailored lags over fixed lags, the risks of using shuffling-based nulls, and the importance of selecting appropriate surrogate templates for directional tests.
(3) Novel Insights into System Dependence:
A key contribution is the demonstration that test results can vary dramatically with system state (e.g., initial conditions or abundance asymmetries), even when interaction parameters remain constant. This highlights a real-world issue for ecological inference.
(4) Clarification of Surrogate Template Effects:
The study uncovers a rarely discussed but critical issue: that the choice of which variable to surrogate in directional tests (e.g., convergent cross mapping) can drastically affect false-positive rates.
(5) Lag Selection Analysis:
The comparison of lag selection methods is a valuable addition, offering a clear takeaway that fixed-lag strategies can severely inflate false positives and that tailored-lag approaches are preferred.
(6) Transparency and Reproducibility Focus:
The authors advocate for full methodological transparency, encouraging researchers to report all analytical choices and test multiple methods.
Weaknesses / Areas for Improvement:
(1) Limited Model Generality:
The study relies solely on two-species systems and two types of competitive dynamics. This limits the ecological realism and generalizability of the findings. It's unclear how well the results would transfer to more complex ecosystems or interaction types (e.g., predator-prey, mutualism, or chaotic systems).
(2) Method Description Clarity:
Some method descriptions are too terse, and table references are mislabeled (e.g., Table 1 vs. Table 2 confusion). This reduces reproducibility and clarity for readers unfamiliar with the specific tests.
(3) Insufficient Discussion of Broader Applicability:
While the pairwise test setup justifies two-species models, the authors should more explicitly address whether the observed test sensitivities (e.g., effect of system state, template choice) are expected to hold in multi-species or networked settings.
(4) Lack of Practical Summary:
The paper offers great insights, but currently spreads recommendations throughout the text. A dedicated section or table summarizing "Best Practices" would increase accessibility and application by practitioners.
(5) No Real-World Validation:
The work is based entirely on simulation. Including or referencing an empirical case study would help illustrate how these methodological choices play out in actual ecological datasets.