Reviewer #3 (Public review):
Summary:
The present manuscript investigates and proposes different mechanisms for the effects of two therapeutic approaches - cognitive distancing technique and use of antidepressants - on subjective ratings of happiness, confidence, and task engagement, and on the influence of such subjective experiences on choice behavior. Both approaches were found to link to changes in affective state dynamics in a choice task, specifically reduced drift (cognitive distancing) and increased baseline (antidepressant use). Results also suggest that cognitive distancing may reduce the weighing of recent expected values in the happiness model, while antidepressant use may reduce forgetting of choices and outcomes.
Strengths:
This is a timely topic and a significant contribution to ongoing efforts to improve our mechanistic understanding of psychopathology and devise effective novel interventions. The relevance of the manuscript's central question is clear, and the links to previous literature and the broader field of computational psychiatry are well established. The modelling approaches are thoughtful and rigorously tested, with appropriate model checks and persuasive evidence that modelling complements the theoretical argument and empirical findings.
Weaknesses:
Some vagueness and lack of clarity in theoretical mechanisms and interpretation of results leave outstanding questions regarding (a) the specific links drawn between affective biases, therapies aimed at mitigating them, and mental health function, and (b) the structure and assumptions of the modelling, and how they support the manuscript's central claims. Broadly, I do not fully understand the distinction between how choice behavior vs. affect are impacted separately or together by cognitive distancing. Clarification on this point is needed, possibly through a more explicit proposal of a mechanism (or several alternative mechanisms?) in the introduction and more explicit interpretation of the modelling results in the context of the cyclical choice-affect mechanism.
(1) Theoretical framework and proposed mechanisms
The link between affective biases and negative thinking patterns is a bit unclear. The authors seem to make a causal claim that "affective biases are precipitated and maintained by negative thinking patterns", but it is unclear what precisely these negative patterns are; earlier in the same paragraph, they state that affective biases "cause low mood" and possibly shift choices toward those that maintain low mood. So the directionality of the mechanism here is unclear - possibly explaining a bit more of the cyclic nature of this mechanism, and maybe clarifying what "negative thinking patterns" refer to will be helpful.
More generally, this link between affect and choices, especially given the modelling results later on, should be clarified further. What is the mechanism by which these two impact each other? How do the models of choice and affect ratings in the RL task test this mechanism? I'm not quite sure the paper answers these questions clearly right now.
The authors also seem to implicitly make the claim that symptoms of mental ill-health are at least in part related to choice behavior. I find this a persuasive claim generally; however, it is understated and undersupported in the introduction, to the point where a reader may need to rely on significant prior knowledge to understand why mitigating the impact of affective biases on choice behavior would make sense as the target of therapeutic interventions. This is a core tenet of the paper, and it would be beneficial to clarify this earlier on.
It would be helpful to interpret a bit more clearly the findings from 3.4. on decreased drift in all three subjective assessments in the cognitive distancing group. What is the proposed mechanism for this? The discussion mentions that "attenuated declines [...] over time, [add] to our previously reported findings that this psychotherapeutic technique alters aspects of reward learning" - but this is vague and I do not understand, if an explanation for how this happens is offered, what that explanation is. Given the strong correlation of the drift with fatigue, is the explanation that cognitive distancing mitigates affect drift under fatigue? Or is this merely reporting the result without an interpretation around potential mechanisms?
(Relatedly, aside from possibly explaining the drift parameter, do the fatigue ratings link with choice behavior in any way? Is it possible that the cognitive distancing was helping participants improve choices under fatigue?)
(2) Task Structure and Modelling
It is unclear what counted as a "rewarding" vs. "unrewarding" trial in the model. From my understanding of the task description, participants obtained positive or no reward (no losses), and verbal feedback, Correct/Incorrect. But given the probabilistic nature of the task, it follows that even some correct choices likely had unrewarding results. Was the verbal feedback still "Correct" in those cases, but with no points shown? I did not see any discussion on whether it is the #points earned or the verbal feedback that is considered a reward in the model. I am assuming the former, but based on previous literature, likely both play a role; so it would be interesting - and possibly necessary to strengthen the paper's argument - to see a model that assigns value to positive/negative feedback and earned points separately.
From a theory perspective, it's interesting that the authors chose to assume separate learning rates for rewarding and non-rewarding trials. Why not, for example, separate reward sensitivity parameters? E.g., rather than a scaling parameter on the PE, a parameter modifying the r term inside the PE equation to, perhaps, assign different values to positive and zero points? (While I think overall the math works out similarly at the fitting time, this type of model should be less flexible on scaling the expected value and more flexible on scaling the actual #points / the subjective experience of the obtained verbal feedback, which seems more in line with the theoretical argument made in the introduction). The introduction explicitly states that negative biases "may cause low mood by making outcomes appear less rewarding" - which in modelling equations seems more likely to translate to different reward-perception biases, and not different learning rates. Alternatively, one might incorporate a perseveration parameter (e.g., similar to Collins et al. 2014) that would also accomplish a negative bias. Either of these two mechanisms seems perhaps worth testing out in a model - especially in a model that defines more clearly what rewarding vs. unrewarding may mean to the participant.
If I understand correctly, the affect ratings models assume that the Q-value and the PE independently impact rating (so they have different weights, w2 and w3), but there is no parameter allowing for different impact for perceived rewarding and unrewarding outcomes? (I may be misreading equations 4-5, but if not, Q-value and PE impact the model via static rather than dynamic parameters.) Given the joint RL-affect fit, this seems to carry the assumption that any perceptual processing differences leading to different subjective perceptions of reward associated with each outcome only impact choice behavior, but not affect? (whereas affect is more broadly impacted, if I'm understanding this correctly, just by the magnitude of the values and PEs?) This is an interesting assumption, and the authors seem to have tested it a bit more in the Supplementary material, as shown in Figure S4. I'm wondering why this was excluded from the main text - it seems like the more flexible model found some potentially interesting differences which may be worth including, especially as they might shed additional insight into the influence of cognitive distancing on the cyclical choice-affect mechanisms proposed.
Minor comments:
If fatigue ratings were strongly associated with drift in the best-fitting model (as per page 13), I wonder if it would make sense to use those fatigue ratings as a proxy rather than allow the parameter to vary freely? (This does not in any way detract from the winning model's explanatory power, but if a parameter seems to be strongly explained by a variable we have empirical data for, it's not clear what extra benefit is earned by having that parameter in the model).