Reviewer #1 (Public review):
Summary:
This paper presents rare and unique recordings of single neurons, LFPs, and SEEG data from human patients performing reading and listening tasks. They identify single neurons in temporal and ventral occipito-temporal cortex that respond specifically to spoken and written language, and primarily encode either phonological or orthographic features of the stimuli. They also identify neurons in the middle temporal and inferior frontal cortex that respond to both modalities, which they interpret as amodal language responses. In general, neuronal population firing rates are correlated with both micro- and macro- scale broadband gamma responses, though they observe some dissociations, particularly with the macro-scale. The results are interpreted to support a model of modality-specific to amodal processing throughout many distributed brain areas for language.
Strengths:
(1) The data are truly unique, providing a large-scale characterization of single neuron responses from the human brain during written and spoken language processing.
(2) The task and stimulus conditions allow for examination of both low-level (e.g., orthographic/phonological) and higher-level (e.g., syntactic) encoding.
(3) Showing relationships between single neuron and multi-scale LFP recordings from the same sites helps bridge neuronal and meso/macroscale literatures.
Weaknesses:
(1) My main comment about the paper is that it feels like a collection of somewhat random descriptions of a very small number of hand-picked single neurons. I think that the task and stimulus design shown in Figure 1A sets up some clear hypotheses that could be tested rigorously across the full neuronal population, but instead, the authors pick a few neurons and fit encoding models that don't take advantage of the contrasts. I agree that encoding models are a powerful approach, but with only 508 total words and what appears to be a limited set of variability across the various features, it's not clear to me that the stimuli, which were apparently designed as minimal pairs, provide enough power to find robust results. Perhaps this is why the majority of the results only show a very small number of units (most of which are actually buried in the supplement), but it's odd to me that they don't show the results of the minimal contrasts other than for length.
(2) Related to point (1), other than Figure 2H and Figure 6A-B, the results are only shown for a tiny number of units. This is great for demonstrating qualitatively what the effects look like, but there is no quantification of the findings across the population, which undermines the point in the abstract that 1000 neurons were recorded. This is acknowledged in some places, but as a reader, it leaves me wondering how seriously to take the interpretations if they seemingly cannot be replicated. I understand this is a challenge with human single neuron recordings, but as presented, the paper as a whole comes across as largely anecdotal.
(3) Some of the key claims rest on the idea that neurons were recorded from the superior temporal gyrus and fusiform gyrus. For the STG claim, I don't understand how this was done, or what specifically they mean by STG, since the microwire locations do not appear to be anywhere near the lateral surface. This makes sense given the profile of the Behnke-Fried electrodes, but if they want to claim that there are neurons from the STG, they need to be more specific and show where precisely these wires are. If they are more medial as it appears, they need to explain how they dissociated STG from Heschl's gyrus. Similarly, for the fusiform neurons, I can only see a couple of probes that appear to have their tips near where I would think this area is. Perhaps this is more of a visualization issue with Figure 1F, but overall, I am not convinced that the neurons are exactly where they say they are.
(4) Related to point (3), some of the authors have made strong claims in prior work about the precise coordinates of the VWFA, so it would help to know how many units are within this exact region. The ROIs marked in Figure 2 are quite large, and given results like Vinckier et al. 2007, it's important to know where along the hierarchy the recordings were actually performed. Similarly, given the framing in the intro around the VWFA as a key area, the idea that some of the best example neurons are from the right fusiform is a bit confusing. I don't think they can make the claims about visual hemifields since it does not appear that they recorded eye tracking to verify constant central fixation, and it may be a bit surprising to see such strong orthographic selectivity in the right hemisphere (though, as a result, it may suggest a more nuanced view of lateralization of reading at the single neuron.
(5) In many sections of the paper, there are vague and unquantified claims like "many neurons" or "a large number of units". This needs to be made explicit. It would also help to show where statistical threshold cutoffs are on plots like Figure 2H, since the "brain-score" is used to select units for many analyses.
(6) More detail on the TRF models is needed in the methods. At the very least, a complete list of the features in each group is necessary to evaluate claims about very broad sets of features like "syntax". It would also help to know how the features were coded, especially where there is a mixture of continuous and discrete features within the model.
(7) Depending on how exactly the features were defined, I'm skeptical of some of the claims, like position-specific "w". There are some obvious confounds that need to be controlled here, like whether word-initial "w" is strongly associated with shorter, higher frequency words (like "wh-" words). There are other examples, like whether specific forked letters tend to appear in certain syllables in English words. While it may be the case that these kinds of patterns are uniformly distributed, it needs to be established in this particular stimulus set.
(8) The claim that there is monotonic encoding of word length does not seem strongly supported in the data. In both PC1 and the single neuron examples, it seems like there may be a non-linear relationship, which could suggest that another correlated feature (e.g., word frequency) is involved.
Minor Points:
(1) What are "boundaries"? They are not described anywhere I could find, but they are a feature group that was used in the TRFs. )
(2) The caption for Figure 6C says MTG and insula, but the text says MTG and IFG. Similar to the above comment about STG and fusiform, it's not clear to me how they achieved single-unit recordings with Behnke-Fried probes in these areas.
(3) The somewhat less robust correlations between firing rate and BGA in macro vs micro contacts are potentially interesting. However, did they verify that the closest macro contact was always in the gray matter of the same gyrus as the microwire?