Reviewer #1 (Public review):
Wojcik et al. conducted a working memory (WM) experiment in which participants had to press the right or left button after being presented with a square (upright) or diamond stimulus. The response mapping ('context') depended on a colour cue presented at the start of each trial. This results in an XOR task, requiring participants to integrate colour and shape information. Importantly, multiple colours could map onto the same context, allowing the authors to disentangle the (neural) representations of context from those of colour.
The authors report that participants learn the appropriate context mappings quickly over the course of the experiment. Neural context representation is evident in the WM delay and emerges later in the experiment, unlike colour representation, which is present only during colour presentation and does not evolve over experimental time. There are furthermore results on neural geometry (averaged cross-generalized decoding) and neural dimensionality (averaged decoding after shattering all task dimensions), which are somewhat harder to interpret.
Overall, the findings are likely Important, as they highlight the flexible and future-oriented nature of WM. The strength of support at the moment is incomplete: there are some loose ends on the context/colour generalization, and the evidence for the XOR neural representation is not (yet) well-established.
I have one (major) concern and several suggestions for improvement.
(1a) As the authors also acknowledge in several places, the XOR dimension is strongly correlated with motor responses, in any case toward the end of the task (and by definition for all correct trials). This should be dealt with properly. Right now, e.g. Figures 2g/i, 2h/j, 3e/g, 3f/h are highly similar, respectively, because of this strong collinearity. I would remove the semi-duplicate graphs and/or deal with this explicitly through some partial regression, trial selection, or similar (and report these correlations).
(1b) Most worrisome in this respect is that one of the key results presented is that XOR decoding increases with learning. But also task accuracy increases, meaning that the proportion of correct trials increases with learning, meaning that the XOR and motor regressors become more similar over experimental time. This means that any classifier picking up on motor signals will be better able to do so later on in the task than earlier on. (In other words, the XOR regressor may be a noisy version of the motor regressor early on, and a more precise version of the motor regressor later on.) Therefore, the increase in XOR decoding over experimental time may be (entirely) due to an increase in similarity between the XOR and motor dimensions. The authors should either rule out this explanation, and/or remove/tone down the conclusions regarding the XOR coding increase. (Note that the takeaway regarding colour/context generalization does not depend on this analysis, fortunately.) The absence of a change in motor decoding with learning (as reported on page 11) does not affect this potential confound; in fact it is made more likely with it.
(2) Bayes factors would be valuable in several places, especially with null results (p. 5) or cases with borderline-significant p-values.
(3) The authors' interpretation of the key results implies that the abstract coding learned over the task should be relevant for behaviour. The current results do not show a particularly strong behavioural relevance of coding, to put it mildly. It might be worth exploring whether neural coding expresses itself in reaction times, rather than (in)correct responses, and reflecting on the (lack of) behavioural relevance in the Discussion.
(4) All data and experiment/analysis code should be made available, in public repositories (i.e., not "upon request").