Reviewer #2 (Public Review):
The authors consider the application of Granger causality (GC) analysis to calcium imaging data and identify several challenges therein and provide methodological approaches to address them. In particular, they consider case studies involving fluorescence recordings from the motoneurons in embryonic zebrafish and the brainstem and hindbrain of larval zebrafish to demonstrate the utility of the proposed solutions in removing the spurious links that the naive GC identifies.
The paper is well-written and the results on the chosen case studies are compelling. However, the proposed work would benefit from discussing the contributions of this work in the context of existing and relevant literature and clarifying some of the methodological points that require more rigorous treatment. I have the following comments:
Major comments:
1) I would like to point out recent literature that adapts the classical GC for both electrophysiology data and calcium imaging data:
[1] A. Sheikhattar et al., "Extracting Neuronal Functional Network Dynamics via Adaptive Granger Causality Analysis", PNAS, Vol. 115, No. 17, E3869-E3878, 2018.
[2] N. A. Francis et al., "Small Networks Encode Decision-Making in Primary Auditory Cortex", Neuron, Vol. 97, No. 4, 2018.
[3] N. A. Francis et al., "Sequential Transmission of Task-Relevant Information in Cortical Neuronal Networks", Cell Reports, Vol. 39, No. 9, 110878, 2022.
In reference [1], a variation of GC based on GLM log-likelihoods is proposed that addresses the issues of non-linearity, non-stationarity, and non-Gaussianity of electrophysiology data. In [2] and [3], a variation of GC using sparse multi-variate models is introduced with application to calcium imaging data. In particular, all three references use the sparse estimation of the MVAR parameters in order to mitigate overfitting and also use corrections for multiple comparisons that also reduce the number of spurious links (see my related comments below). I suggest discussing these relevant references in the introduction (paragraphs 2 and 3) and discussion.
2) A major issue of GC applied to calcium imaging data is that the trials are typically limited in duration, which results in overfitting of the MVAR parameters when using least squares (See references [2] and [3] above, for example). The authors mention on page 4 that they use least squares to estimate the parameters. However, for the networks of ~10 neurons considered in this work, stationary trials of a long enough duration are required to estimate the parameters correctly. I suggest that the authors discuss this point and explicitly mention the trial durations and test whether the trial durations suffice for stable estimation of the MVAR parameters (this can be done by repeating some of the results on the synthetic data and using different trial lengths and then assessing the consistency of the detected GC links).
3) The definition of the "knee" of the average GC values as a function of the lag L needs to be a bit more formalized. In Fig. 2H using the synthetic data, the "knee" effect is more clear, but in the real data shown in Fig. 2I, the knee is not obvious, given that the confidence intervals are quite wide. Is there a way to quantify the "knee" by comparing the average GC values as well as their confidence bounds along the lag axis?
4) While the measures of W_{IC} and W_{RC} form suitable guiding principles for the pipeline presented in this work, it would be helpful if the authors discuss how such measures can be used for other applications of GC to calcium imaging data in which a priori information regarding the left/right symmetry or the rostrocaudal flow of information is missing.
5) Removing the "strange" neurons discussed in Section C5 is definitely an important pre-processing step in applying GC. However, the criterion for identifying the strange neurons seems a bit ad hoc and unclear. Could this be done by clustering the neurons into several categories (based on their time courses) and then removing a "strange" cluster? Please clarify.
6) Another key element of existing GC methods applied to large-scale networks is dealing with the issue of multiple comparisons: for instance, in Figures 2, 3, 4, 6, 7, and 8, it seems like all arrows corresponding to all possible links are shown, where the colormap indicates the GC value. However, when performing multiple statistical tests, many of these links can be removed by a correction such as the Benjamini-Hochberg procedure. It seems that the authors did not consider any correction of multiple comparisons; I suggest doing so and adding this to your pipeline.
7) The authors use TV denoising and also mention that it is a global operator, and changes the values of a time series at time t based on both the past and future values of the process. As such, it is not clear how TV denoising could affect the "causal" relations of the time series. In particular, TV denoising would significantly change the \Gamma_{ii} coefficients in Eq. (8). Is it possible to apply a version of TV denoising that only uses the information from the past to denoise the process at time t? In other words, using a "filter" as opposed to a "smoother". Please clarify.
8) The idea of using an adaptive threshold as in Section C8 is interesting; but this problem was previously considered in [30] (in the manuscript) and reference [1] above, in which new test statistics based on log-likelihoods are used that have well-known asymptotic null distributions (i.e., chi-square distributions). In particular, reference [1] above identifies and applies the required rescaling for the asymptotic null distributional assumptions to hold. I suggest discussing your work regarding the adaptive thresholds in the context of these existing results.
9) Related to the previous comment, given that the authors use a shuffling procedure to obtain the null, it is not clear why fitting the F-distribution parametrically and using its quantiles for testing would provide further benefits. In fact, as shown in Figure S9B, the rescaled F-distribution does not fully match the empirical null distribution, so it may be worth using the empirical null to obtain the non-parametric quantiles for testing. Please clarify.
10) In Figure 5C, the values of W_IC for the MV cases seem to be more than 1, whereas by definition they should be less than or equal to 1. Please clarify.
11) Is there evidence that the lateralized and rostrocaudal connectivity of the motoneurons occurs at the time-scale of ~750 ms? Given that this time scale is long enough for multiple synapses, it could be the case that some contralateral and non-rostrocaudal connections could be "real", as they reflect multi-hop synaptic connections. Please clarify.
12) While it is useful to see the comparison of the BV and MV cases shown in Figs. 1 and 2, given extensive evidence in the GC literature on the shortcomings of the BV version of GC, it seems unnecessary to report the BV results in Figs. 3 onward. I suggest discussing the shortcoming of the BV case when presenting figures 1 and 2 and removing the BV results from the subsequent results.