584 Matching Annotations

Oct 2020
Local file Local file

Theta lingua franca: A common midfrontal substrate for action monitoring processes

6
1. sven.wientjes 05 Oct 2020
  
  in Public
  
  episodic memory encod-ing and retrieval
  
  It 'recruits' the hippocampus?
2. sven.wientjes 05 Oct 2020
  
  in Public
  
  stimulus-locked P2-N2 complex and theresponse-locked ERN/CRN
  
  These are THE components! (ERN is also stimulus locked if you present a feedback screen?)
3. sven.wientjes 05 Oct 2020
  
  in Public
  
  may fail to communicate common features of these com-ponents
  
  Valid, but different goal?
4. sven.wientjes 05 Oct 2020
  
  in Public
  
  differentiated literature of action-monitoring ERPs
  
  valid
5. sven.wientjes 05 Oct 2020
  
  in Public
  
  may reflect disparate processes in unique cognitivecircumstances, these aforementioned processes have all been spe-cifically associated with ACC function during attention orientationand/or action selection
  
  This paper does not discredit the idea that different types of N2 and the reward positivity are epistemically different processes
6. sven.wientjes 05 Oct 2020
  
  in Public
  
  the integration of contextual cues with actionselection to optimize goal-driven performance
  
  their conceptualization of ACC function
Annotators

sven.wientjes
Local file Local file

The impact of deliberative strategy dissociates ERP components related to conflict processing vs. reinforcement learning

4
1. sven.wientjes 05 Oct 2020
  
  in Public
  
  These differences cannot beattributed to greater engagement of the NE system in the ActiveExperiment (rather than greater DA system engagement) becausegreater NE release would have produced alargernegativity toinfrequent reward feedback
  
  This is how DA and NE are dissociated in terms of their predictions!
2. sven.wientjes 05 Oct 2020
  
  in Public
  
  in the Active Experiment, the rawN2 to infrequent reward feedback was significantly smaller thanthe raw N2 to infrequent reward feedback in both the Passive andModerate Experiments, suggesting that greater DA system engage-ment resulted in a larger DA-associated positivity that attenuatedthe raw N2.
  
  So here DA is independent of NE
3. sven.wientjes 05 Oct 2020
  
  in Public
  
  interaction of reward condition andfrequency condition such that the effect of reward was largerin the infrequent condition than in the frequent condition
  
  This is interesting! NE modulated DA so that it obscures N2 even more?
4. sven.wientjes 05 Oct 2020
  
  in Public
  
  dN2in the Moderate Experiment trended toward being significantlylarger than the dN2 in Active Experiment,
  
  not significant
Annotators

sven.wientjes
Sep 2020
Local file Local file

The impact of deliberative strategy dissociates ERP components related to conflict processing vs. reinforcement learning

4
1. sven.wientjes 30 Sep 2020
  
  in Public
  
  identical task stimuli (colored faces) presented with identicaltask designs
  
  It is compelling that task set can yield such a difference -> it is definitely somewhat goal-related
2. sven.wientjes 30 Sep 2020
  
  in Public
  
  variablescalp distribution dependent on relative engagement of the dif-ferent cortical areas giving rise to the ERP
  
  So then neither N2 nor P3 have a very easy to localize or 'centered' scalp distribution -> but the N2 does have a localizable distribution (acc?)
3. sven.wientjes 30 Sep 2020
  
  in Public
  
  LC refractory period coincides withP3 generation
  
  So this is the devil stirring the pot
4. sven.wientjes 30 Sep 2020
  
  in Public
  
  increase in amplitude of the N2
  
  But it has also been theorized to be the source of the P300? They are deflections in opposite directions, how??
Annotators

sven.wientjes
Local file Local file

Mars_et_al_2008b.pdf

17
1. sven.wientjes 30 Sep 2020
  
  in Public
  
  By taking into account P3b versusP3a effects and latency information, it may be possible to con-sider surprise in the context of other mental states contributing togoal-oriented behavior
  
  If ACC is very goal-heavy in its processing signature this could be something to look into!
2. sven.wientjes 30 Sep 2020
  
  in Public
  
  cannotdetermine whether the P300 modulation was purely due to thesurprise conveyed by the visual stimuli, or whether it was relatedto the response selection on each trial
  
  P300 in passive viewing, or related to the giving of response (motor preparatory process could very well play a role! would not happen in the case of passive viewing)
3. sven.wientjes 30 Sep 2020
  
  in Public
  
  unexpected changes in the worldwithin the context of a task
  
  This seems like a very RL model - esque claim though!
4. sven.wientjes 30 Sep 2020
  
  in Public
  
  P300 reflects the arrival of a phasic norepinephrine (NE) signal incortical areas, which serves to increase signal transmission in thecortex
  
  This could be interesting, especially if it is an LC-NE system link! Pupillometry will add some info in that case, and read the papers about acc-Lc-Ne
5. sven.wientjes 30 Sep 2020
  
  in Public
  
  compared our model to an alternative mea-sure of surprise based on the Kullback–Leibler divergence.
  
  KL-divergence is what is used in the O'Reilly study right? So there are some differences here. There, there is a clear behaviourally relevant model update (which might be reflected in some other process, although perhaps not even EEG measurable, since fERN seems more concerned with expectations?) Expectations of errors can of course be part of a task-model...
6. sven.wientjes 30 Sep 2020
  
  in Public
  
  This evidence was used to compare competing models de-fined in terms of the explanatory variables inZ1.
  
  Also set up model comparison for different regressors
7. sven.wientjes 30 Sep 2020
  
  in Public
  
  identity of trials on which participants responded er-roneously, trials that were rejected during the preprocessing of the EEGdata, and a constant term.
  
  nuisance modeled at the same time!
8. sven.wientjes 30 Sep 2020
  
  in Public
  
  regressor models variance related to stimulus probabilitywithin a block and does not take into account any learning
  
  Simply assume structure is known and use that to predict p300
9. sven.wientjes 30 Sep 2020
  
  in Public
  
  quantified in terms of how much they change posteriorbeliefs.
  
  So if prediction error is used to update the model, this is also intuitive -> in fact, in an RL setting, the notion of 'surprise' as prediction error, and 'model update', are identical.
10. sven.wientjes 30 Sep 2020
  
  in Public
  
  Harrison et al. (2006), where thecurrent event depended on the previous
  
  Look at their model for inspiration?
11. sven.wientjes 30 Sep 2020
  
  in Public
  
  Dj1, can beused to predict the probability of each event occurring
  
  This is actually what can make an (ordinal?) prediction about the P300 amplitude!
12. sven.wientjes 30 Sep 2020
  
  in Public
  
  nkjrefers to the number of occurrences of outcomekup untilobservationj
  
  Natural way of updating the multinomial is by registering the counts in the Dirichlet
13. sven.wientjes 30 Sep 2020
  
  in Public
  
  time point at which the averaged P300s were mod-ulated maximally
  
  Define the amplitude of a single-trial P300 by the averaged peak, separately for participants.
14. sven.wientjes 30 Sep 2020
  
  in Public
  
  All stimuli occurred equally oftenover the course of the experiment
  
  Good counterbalancing
15. sven.wientjes 30 Sep 2020
  
  in Public
  
  not informed
  
  crucial?
16. sven.wientjes 30 Sep 2020
  
  in Public
  
  updating of task-relevant information inanticipation of subsequent events
  
  This seems very much like it could be an SR thing!
17. sven.wientjes 30 Sep 2020
  
  in Public
  
  P300 has commonly been linked to therevision of a participant’s expectation about the current task con-text
  
  Context -> do they mean structure or task-set (for reward outcome?)
Annotators

sven.wientjes
Local file Local file

The feedback correct-related positivity: Sensitivity of the event-related brain potential to unexpected positive feedback

16
1. sven.wientjes 30 Sep 2020
  
  in Public
  
  Thepresent findings suggest that the variance in fERN amplitudeacross conditions results more from the effect of unpredictedpositive feedback than from unpredicted negative feedback
  
  So negative prediction error does not yield the same attentuating impact on N200 measurement as positive prediction error does for enhancing it?
2. sven.wientjes 30 Sep 2020
  
  in Public
  
  essentialdifference between conditions is associated with neural activityon correct trials rather than neural activity on error trials
  
  This is an important distinction to make - N200 as component shows mostly responsivity to infrequency. Original definition of fERN as a difference wave with correct trials was interpreted as different from N200, assumed because of corrective processes taking place after incorrect feedback. However, this study picks N200 vs fRxx apart and shows, difference in difference wave vs N200 is caused by more negativity on infrequent correct feedback trials
3. sven.wientjes 30 Sep 2020
  
  in Public
  
  error–oddball differ-ence should be smaller than the correct–oddball difference
  
  Makes sense - If something else is happening on the correct feedbacks that makes them look N200 absent
4. sven.wientjes 30 Sep 2020
  
  in Public
  
  what causesthe absence of the fERN/N200 on correct trials
  
  Exactly! The definition should then be the same as an oddball for the correct feedback
5. sven.wientjes 30 Sep 2020
  
  in Public
  
  in-frequent targets in an oddball task and infrequent error feedbackin a time estimation task both elicit a frontal or frontal-centrallydistributed, negative-going component that reaches maximumamplitude at approximately 280–310 ms.
  
  Suggestive of coming from an identical source with identical function!
6. sven.wientjes 30 Sep 2020
  
  in Public
  
  infrequent oddball ERPs from the infrequent correctERPs.
  
  N200 - correct hard condition -> also just an oddball in a sense
7. sven.wientjes 30 Sep 2020
  
  in Public
  
  subtracted the latency-corrected infrequent oddball ERPs fromthe infrequent error ERPs.
  
  n200 - fERN
8. sven.wientjes 30 Sep 2020
  
  in Public
  
  maximum negative value of the ERP recorded atchannel FCz within a 150–350-ms window
  
  Latency correction between conditions
9. sven.wientjes 30 Sep 2020
  
  in Public
  
  virtual-ERNs
  
  Spatial PCA -> select fronto-central electrodes
10. sven.wientjes 30 Sep 2020
  
  in Public
  
  first by examining the ERPs directly, and second by per-forming a spatial principal components analysis (PCA)
  
  This is to get the scalp distribution - That allows to compare possible sources for the N200 and fERN
11. sven.wientjes 30 Sep 2020
  
  in Public
  
  difference between the ERP oncorrect trials and the N200 should be larger than the differencebetween the fERN and the N200
  
  If fERN == N200, subtractive difference between them should be very small. The difference with the correct trial ERP (fCRP) would be larger!
12. sven.wientjes 30 Sep 2020
  
  in Public
  
  it is equally possible that thedifference between the ERPs on correct and incorrect trials arisesfrom a process associated with correct trials rather than witherror trials
  
  So that would mean the fERN is NOT some component that is elicited by errors specifically - actually it is elicited by the correct feedbacks?? => completely invalidates the idea of behavioural adjustment through ACC inhibitory release?
13. sven.wientjes 30 Sep 2020
  
  in Public
  
  entirelydifferent ERP component as the source of the apparent variancein fERN amplitude.
  
  What exactly does this mean?
14. sven.wientjes 30 Sep 2020
  
  in Public
  
  fERN is elicited by unexpected negativefeedback stimuli, but not by unexpected positive feedback stimuli
  
  So that is a clear distinction with the N200 - it is modulated by correct v incorrectness? It was also modulated by expectation of the feedback...
15. sven.wientjes 30 Sep 2020
  
  in Public
  
  N200 increases in pro-portion to the unexpectedness of the event
  
  So that is a good candidate for SR vector error magnitude?
16. sven.wientjes 30 Sep 2020
  
  in Public
  
  we refer here to the negativedeflection that is seen in so-called oddball tasks
  
  N200 is typical oddball ERP
Annotators

sven.wientjes
Local file Local file

untitled

9
1. sven.wientjes 29 Sep 2020
  
  in Public
  
  confound fERN amplitude with the P300
  
  Also something to watch out for!
2. sven.wientjes 29 Sep 2020
  
  in Public
  
  resulted from an increase in the amplitude of thefERN, rather than from overlap with a different ERP compo-nent (such as the P300).
  
  Scalp distribution used to identify source (p300 vs ERN)
3. sven.wientjes 29 Sep 2020
  
  in Public
  
  nteraction of valence with expec-tancy
  
  unexpected feedback leads to more modification of RT -> errors in easy biggest, correct in hard smallest
4. sven.wientjes 29 Sep 2020
  
  in Public
  
  no main effect ofexpectancy
  
  Pure expectancy of feedback does not modulate behaviour
5. sven.wientjes 29 Sep 2020
  
  in Public
  
  main effect of valence
  
  change in response time correlated with previous valence (correct v incorrect) -> duh because window size change?
6. sven.wientjes 29 Sep 2020
  
  in Public
  
  Participants made more errors in the hardcondition (76%) than in the easy condition (23%)
  
  This is of course what is kind of ensured by that moving window approach! smart
7. sven.wientjes 29 Sep 2020
  
  in Public
  
  correct ERP in the easy condition fromthe error ERP in the hard condition
  
  Expected conditions subtracted -> again what is left?
8. sven.wientjes 29 Sep 2020
  
  in Public
  
  correct ERP in the hard conditionfrom the error ERP in the easy condition
  
  In both cases this is the unexpected condition -> unexpectedness gets removed but what is left?
9. sven.wientjes 29 Sep 2020
  
  in Public
  
  correct feedback in the hard con-dition
  
  The ERN would be bigger for correct responses? -> That's funny, opposite of what the name indicates. But it is because it is modulated by what is expected => EXPECTATION error
Annotators

sven.wientjes
Local file Local file

untitled

15
1. sven.wientjes 29 Sep 2020
  
  in Public
  
  sensory events [80]
  
  OFC sensory prediction
2. sven.wientjes 29 Sep 2020
  
  in Public
  
  C wasassociated with X before any association with food
  
  And never after, so the SR error never propagated
3. sven.wientjes 29 Sep 2020
  
  in Public
  
  SR represents the association between the stimulusandfood,andisalsoabletoupdatetherewardfunctionofthefood as a result of devaluation
  
  But the transitions are initially learned policy-dependent, which means inflexible! This requires SR-dyna style updated, or formed SR through undirected exploration?
4. sven.wientjes 29 Sep 2020
  
  in Public
  
  A reduced acquisition of conditionedresponding to C and D, compared to F, which was trainedin compound with a novel stimulus
  
  A was already directly associated with X -> AD and AC to X showed 'blocking' compared to EF which was novel preceded
5. sven.wientjes 29 Sep 2020
  
  in Public
  
  shifts in value (amount of reward) and identity(reward flavour).
  
  So it seems the common idea here is: Change in the reward stimulus, not the state-state (prior to reward) transitions. You can change both reward value and identity and see if it has a modulatory effect using this model!
6. sven.wientjes 29 Sep 2020
  
  in Public
  
  although the RPE correlate has famously been evident insingle units, representation of these more complex or subtleprediction errors may be an ensemble property.
  
  Perhaps some pattern analysis with fMRI would be able to say things about this...
7. sven.wientjes 29 Sep 2020
  
  in Public
  
  First, it naturally captures SPEs, as we will illus-trate shortly. Second, it also captures RPEs if reward is oneof the features.
  
  It can incorporate both SPE of SR and the RPE into one error signal?? -> would allow for cool modeling and dissociation tricks!
8. sven.wientjes 29 Sep 2020
  
  in Public
  
  expected TD error isthen proportional to the superposition of feature-specific TDerrors,PjdMt(j).
  
  This is a strong assumption but it is functional - Maybe we should incorporate such an encoding in our model/paper too?
9. sven.wientjes 29 Sep 2020
  
  in Public
  
  Althoughprediction errors are useful for updating estimates of thereward and transition functions used in model-based algor-ithms, these do not require a TD error.
  
  TD error is not useful for model-based updates, because these updates can be local and complete, at least as long as there are only local/short-term dependencies (e.g. Markov property in graph)
10. sven.wientjes 29 Sep 2020
  
  in Public
  
  building on the pioneeringwork of Suri [46], we argue that dopamine transients pre-viously understood to signal RPEs may instead constitutethe SPE signal used to update the SR
  
  Also a somewhat important reference!
11. sven.wientjes 29 Sep 2020
  
  in Public
  
  dopamine transients are necessary for learn-ing induced by unexpected changes in the sensory features ofexpected rewards [37]
  
  Good reference
12. sven.wientjes 29 Sep 2020
  
  in Public
  
  sensitive to movement-related variables such as action initiation and termination
  
  What do these have to do with value? Modulation based on course of action could be a signal that options are present...! => option-specific value-function
13. sven.wientjes 29 Sep 2020
  
  in Public
  
  some dopamine neuronsrespond to aversive stimuli
  
  The exact opposite of what you would expect if response indicates appetite - Seems more like an (unsigned?) prediction error then
14. sven.wientjes 29 Sep 2020
  
  in Public
  
  natomically segregated projection frommidbrain to striatum
  
  So DA could have the state-state learning signal, but then it would be segragated from the value-projections which run into PFC / ACC?
15. sven.wientjes 29 Sep 2020
  
  in Public
  
  value is affected by novelty [21] or uncertainty [22]
  
  Could be an actual modulator of value, or at least how much to learn from its encounter
Annotators

sven.wientjes
Local file Local file

Predictive Maps in Rats and Humans for Spatial Navigation

3
1. sven.wientjes 29 Sep 2020
  
  in Public
  
  Finally, to test if the differences in these measures are sufficiently robust to allowcategorisation, we trained a support vector machine (SVM) classifier to accurately predict trajectories as either model-free, model-based or SR agents (Fig 6A-B). When the decoder was given data from the biological behaviour, the SVM classifier most frequently predicted those trajectories to be an SR agent
  
  This is a very cool idea! Train classifier on generated RL agent data, and let it then classify a batch of real data -> to what is it most similar?
2. sven.wientjes 29 Sep 2020
  
  in Public
  
  model-based algorithm was consistently the most successful
  
  But it was not most correlated with human/animal behaviour!
3. sven.wientjes 29 Sep 2020
  
  in Public
  
  attributes a role for dopamine in forming sensory predictionerrors (Gardner et al., 2018) - similar to what has been observed experimentally(Menegas et al., 2017; Sharpe et al., 2017; Takahashi et al., 2017).
  
  This seems like some stuff to check out!
Annotators

sven.wientjes
www.cell.com www.cell.com

States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning

6
1. sven.wientjes 28 Sep 2020
  
  in Public
  
  This abs(RPE) signal correlated with anumber of brain regions including an IPS locus anterior to wherewe found SPE correlates at p < 0.001 uncorrected (Figure S3).However, a direct comparison between SPE and abs(RPE) re-vealed a region of both posterior IPS that was significantly betterexplained by the SPE than by the abs(RPE) signal at p < 0.05corrected, as well as a region of latPFC that showed a differenceat p < 0.001 uncorrected
  
  So there is some degree of overlap: could it be dodgy?
2. sven.wientjes 28 Sep 2020
  
  in Public
  
  the degree to which pIPS encodes an SPE representationNeuronReward and State Prediction Errors in HumansNeuron66, 585–595, May 27, 2010ª2010 Elsevier Inc.589
  
  Behavioural relevance is established!
3. sven.wientjes 28 Sep 2020
  
  in Public
  
  partici-pants had indeed acquired knowledge about the particularsequence of state transitions during the first session: 99.6% ofpermutation samples provided a poorer explanation of choicesthan the original (p = 0.004).
  
  So earlier latent learning CAN cause for state transition model to emerge -> very useful, do not need to instruct
4. sven.wientjes 28 Sep 2020
  
  in Public
  
  If participants’ beliefsabout the transition probabilities were updated by error-drivenmodel-based learning (with a fixed learning rate, as assumedin FORWARD), this may have left a bias toward the most recentlyexperienced transitions
  
  A 'lingering effect' of recent learning, just like the successive contradictory feedback in Holroyd and Coles 2002
5. sven.wientjes 28 Sep 2020
  
  in Public
  
  exponential decay fromFORWARD to SARSA
  
  transition from MB to MF could be expected!
6. sven.wientjes 28 Sep 2020
  
  in Public
  
  volunteers were firstexposed to just the state space in the absence of any rewards,much as in a latent learning design. This provides a pure assess-ment of an SPE
  
  Is this similar to what Danesh did?
Visit annotations in context

Annotators

sven.wientjes

URL

cell.com/action/showPdf
Local file Local file

Miller & Venditto - Multi-step planning in the brain.pdf

10
1. sven.wientjes 26 Sep 2020
  
  in Public
  
  “replay”trajectories that the animal has taken in the past, as well as to follow novel trajectories that the animal has neverbefore taken
  
  consolidation and integration?
2. sven.wientjes 26 Sep 2020
  
  in Public
  
  online planning process, perhaps similar to tree search or to online dynamic programming
  
  It does include forms of planning apprarently
3. sven.wientjes 26 Sep 2020
  
  in Public
  
  Theta sequences typically occur when an animal is moving
  
  So if theta synchronizes with the ACC, it can keep updating its environmental model according to task progression?
4. sven.wientjes 26 Sep 2020
  
  in Public
  
  monkeys use a joystick to control a digital predator pursuing digital prey
  
  This is in a sense also a foraging task because it is food-seeking? But i dont see effects like depletion and opporunity costs
5. sven.wientjes 26 Sep 2020
  
  in Public
  
  dlPFC activity correlates with a “state prediction error”
  
  but the unsigned version - so it is not goal-specific (and could it even be used to update SR in the right direction?)
6. sven.wientjes 26 Sep 2020
  
  in Public
  
  silencing OFC activity on a particular trialselectively impairs the influence of this expected outcome signal on learning but does not impair choosing.
  
  So the OFC really does seem to carry a learning signal as well!
7. sven.wientjes 26 Sep 2020
  
  in Public
  
  “inference”: the ability to combine separately-learned associationsin order to form new associations between items that may never have been encountered together before
  
  This is basically association but for unseen combinations - shows a world model is abstracted from earlier knowledge
8. sven.wientjes 26 Sep 2020
  
  in Public
  
  associate stimuli or actions with specificexpected outcomes
  
  This is something ACC does very well
9. sven.wientjes 26 Sep 2020
  
  in Public
  
  Multi-stepplanning can be thought of as a process that uses this map to guide an extended sequence of behavior towards adistant goal
  
  The cognitive map is related to the HPC, and this is why it keeps popping up in ACC research as well -> it is very tightly linked to decision making and model building of also distal relationships!
10. sven.wientjes 26 Sep 2020
  
  in Public
  
  other lines
  
  Structure learning
  
  Foraging Both rely on models that can look at further horizons
Annotators

sven.wientjes
Local file Local file

Two Anatomically and Computationally Distinct Learning Signals Predict Changes to Stimulus-Outcome Associations in Hippocampus

13
1. sven.wientjes 25 Sep 2020
  
  in Public
  
  Notably, these signals were signed based on the subject’sgoal, consistent with a mechanism for determining how much toupdate beliefs and in which direction: toward confirmation (pos-itive) or reconsideration of one’s rewarding goal (negative)
  
  This seems like something very important to how ACC does model updating as well!
2. sven.wientjes 25 Sep 2020
  
  in Public
  
  but may not store the model locally
  
  This would be in the HPC right
3. sven.wientjes 25 Sep 2020
  
  in Public
  
  extracted the feedback-locked BOLD response in left lOFC (at 6 s post-feedback onset)at trialtand regressed this against the (signed) change to hippo-campal CSS (i.e., the change in the difference between LC andHC presentations in [ipsilateral] left hippocampus from the pre-ceding blockt0.5 to the subsequent blockt+ 0.5)
  
  Cool parametric analysis - How did (signed) prediction error influence the difference between LC and HC representations? (do they move further apart or closer together)
4. sven.wientjes 25 Sep 2020
  
  in Public
  
  the unsignedDKLterm, corresponding to the magni-tude of the belief update, independent of its direction, instead re-cruited a dorsal frontoparietal network, consistent with previousfindings related to unsigned state prediction errors during latentlearning
  
  So signing the Dkl term is important for identifying the ACC!? This means we have to include some sort of goal perhaps?
5. sven.wientjes 25 Sep 2020
  
  in Public
  
  stimulus-outcome update effects in lateral OFC/ventro-lateral prefrontal cortex (VLPFC) and also a distributed networkincluding anterior cingulate cortex, inferior temporal cortex, andposterior cingulate cortex
  
  Relevant: ACC is included in this! Hurray the experiment might be saved :D
6. sven.wientjes 25 Sep 2020
  
  in Public
  
  reduction in the BOLD response for HC items when comparedto LC items
  
  Because of accurate prediction -> previous activation? Or because in HC the next stimulus was already 'explained away' in a predictive processing sense!
7. sven.wientjes 25 Sep 2020
  
  in Public
  
  and the other inferred based on thesubject’s knowledge of the inverse relationship between stimuliand outcomes dictated by the task structure
  
  There is counterfactual updating possible in this task! Assumes internal model is accurate (but very likely)
8. sven.wientjes 25 Sep 2020
  
  in Public
  
  In each CSS block, each stimulus-outcome transition was pre-sented once
  
  So a suppressed representation is yielded for the outcome variable, both for the common and the rare transition
9. sven.wientjes 25 Sep 2020
  
  in Public
  
  firstselect the more desired gift card goal based on the current po-tential payouts and then reverse-infer the stimulus they believedwould most likely lead to that desired outcome
  
  So the reward is given, pps have to 'calculate' the optimal path -> separate reward function (given, unlearned) applied over learned model (OR SR!)
10. sven.wientjes 25 Sep 2020
  
  in Public
  
  but not about the reward amountobtained on a gift card
  
  Isolated from reward prediction error updating (?)
11. sven.wientjes 25 Sep 2020
  
  in Public
  
  advantageous to learn the transition probabilities
  
  This is important - emphasizing the structure of the problem
12. sven.wientjes 25 Sep 2020
  
  in Public
  
  reward-size-in-dependent stimulus-outcome associations
  
  We have to assess if this mimicks successor representations in some way
13. sven.wientjes 25 Sep 2020
  
  in Public
  
  little is known about how these different signalsare used in the brain
  
  There are many types of learning signals - prediction errors for all kinds of domains of cognition, not just reward or sensory. However, only for striatal dopamine (RPE) there is some knowledge of how it affects behaviour / other neural computation
Annotators

sven.wientjes
Local file Local file

pnas201305373 3660..3669

7
1. sven.wientjes 24 Sep 2020
  
  in Public
  
  facilitate theupdating of internal models from which future action is gener-ated (55)
  
  Then could it also perform the function of clustering task sets based on outcomes? We have seen based on contextual cues it doesn't - that seems like a HPC thing
2. sven.wientjes 24 Sep 2020
  
  in Public
  
  The currentfindings suggest a specific computational functionfor the ACC: It is involved in updating internal models to fa-cilitate future information processing.
  
  A one-off update to guide future information processing -> it could happen through short term plasticity mechanisms?
3. sven.wientjes 24 Sep 2020
  
  in Public
  
  “reset”signal for internal models
  
  So following this theory, the model updating should actually have a positive effect on the pupil diameter because increase of LC-NE activity?
4. sven.wientjes 24 Sep 2020
  
  in Public
  
  Participants’internal models of the targets’distribution couldbe expected to differ from the true (generative) distribution
  
  Exactly: So how do you determine the model used by participants?
5. sven.wientjes 24 Sep 2020
  
  in Public
  
  dwell time reflectsupdating
  
  it was also seen in the behavioural Behrens paper that pps were slower after jumps?
6. sven.wientjes 24 Sep 2020
  
  in Public
  
  no further behavioral cost onsubsequent trials
  
  instant reprogramming?
7. sven.wientjes 24 Sep 2020
  
  in Public
  
  the two types could be easily distinguished
  
  minor confounding, difference could also draw some sort of bottom up attentional process?
Annotators

sven.wientjes
Local file Local file

Neural representations of events arise from temporal community structure

10
1. sven.wientjes 24 Sep 2020
  
  in Public
  
  mPFC may track changes in activity patterns in regions with community-based representational similarity, providing a signal that could underlie parsing decisions.
  
  mPFC uses specifically the predictive structure to identify boundaries?
2. sven.wientjes 24 Sep 2020
  
  in Public
  
  three-layer neural network model (Fig. 6a). The network took input representing the current stimulus and was trained to predict which stimulus would occur next.
  
  predictive RNN -> should lead to successor rep?
3. sven.wientjes 24 Sep 2020
  
  in Public
  
  within the set of Hamiltonian paths, the probability of transitioning from one cluster boundary node (one of the pale nodes in Fig. 1a) to the adjacent one, if not yet visited, is always exactly
  
  otherwise the hamiltonian walk cannot be made
4. sven.wientjes 24 Sep 2020
  
  in Public
  
  passage into a new cluster significantly more often than at other times
  
  behavioural validation of latent learning!
5. sven.wientjes 24 Sep 2020
  
  in Public
  
  set of possible successor items on each step depends only on the current item, this uniformity in transition prob-abilities holds whether one takes into account only the most recent item or the n most recent items
  
  True uniformity! not some secret non-markov transition probability influence
6. sven.wientjes 24 Sep 2020
  
  in Public
  
  items will fall close together in representational space when they are preceded and followed by similar distributions of items in familiar sequence
  
  i.e. have similar successor representations.
7. sven.wientjes 24 Sep 2020
  
  in Public
  
  non-uniform transition probabilities.
  
  In surprise signals, elevation would happen when a rare transition occurs
8. sven.wientjes 24 Sep 2020
  
  in Public
  
  judgments are quite reliable
  
  People give very consistent answers to when events change -> pretty interesting?
9. sven.wientjes 24 Sep 2020
  
  in Public
  
  different account,
  
  They claim temporal community structure is DIFFERENT from surprise based event segmentation
10. sven.wientjes 24 Sep 2020
  
  in Public
  
  transient elevations in predictive uncertainty or surprise as the primary signal driving event segmentation.
  
  Surprise is potentially signaled in ACC -> HER Model?
Annotators

sven.wientjes
Local file Local file

14933092231453 1..20

11
1. sven.wientjes 23 Sep 2020
  
  in Public
  
  educed the total number of stimulus–stimulus transitions and therebyincreased statistical powe
  
  Clever experimental protocol
2. sven.wientjes 23 Sep 2020
  
  in Public
  
  the next day, subjects were presented with object sequences in the scanner
  
  So only the second test session was in the scanner!
3. sven.wientjes 23 Sep 2020
  
  in Public
  
  instructedto remember which of two buttons to press for a particular object orientation
  
  This is a simple behavioural measure
4. sven.wientjes 23 Sep 2020
  
  in Public
  
  prediction errors in the orbitofrontal cortex dur-ing active learning predict later changes in hippocampal representations of the stored model(Boorman et al., 2016)
  
  So it is OFC and not ACC?
5. sven.wientjes 23 Sep 2020
  
  in Public
  
  Indeed, we note that neural signals can be recorded in frontal and parietal cortices, reflectingthe ‘state-prediction errors’ that ensue when predicted state relationships are breached duringbehavioural control (Gla ̈scher et al., 2010)
  
  This seems very important
6. sven.wientjes 23 Sep 2020
  
  in Public
  
  We hypothesised that implicit knowledge about the graph structurewould influence response times, such that subjects would respond faster if a preceding object in thetest sequence was closer on the graph structure underlying the train sequence. Indeed, we foundthat log-transformed response times were longer the further away the preceding object was on thegraph (Figure 5C, D)
  
  Cool behavioural validation!
7. sven.wientjes 23 Sep 2020
  
  in Public
  
  communicabilitysignificantly distorts the graph structure by shortening links that form part of many paths around thegraph structure and lengthening links that would be less frequently visited by a random navigator
  
  So it is like a random exploration SR representation, but no need to fit gamma
8. sven.wientjes 23 Sep 2020
  
  in Public
  
  it was the symmetrised version alonethat predicted the fMRI suppression effect
  
  It is not the actual number of experienced transitions between graph nodes -> this is a good validation for the representation of an abstract relational structure, and that it's not some sort of experiential or temporal effect
9. sven.wientjes 23 Sep 2020
  
  in Public
  
  was also present behaviourall
  
  look at how they check this
10. sven.wientjes 23 Sep 2020
  
  in Public
  
  fMRI adaptation paradigm
  
  So it is different from representational similarity analysis!
11. sven.wientjes 23 Sep 2020
  
  in Public
  
  can be read out directly from functional magnetic resonance imaging(fMRI) data in the entorhinal cortex
  
  Readable coding from entorhinal cortex -> how to compare with goal as shown by Yoo et al (the neural basis of predictive pursuit)? Should be the same, but entorhinal vs dACC is different structure...?
Annotators

sven.wientjes
cyber.sci-hub.se cyber.sci-hub.se

675_2_additional_review_material_0_sssstt.pdf

10
1. sven.wientjes 23 Sep 2020
  
  in Public
  
  SR in com-plimentary learning systems, especially in the medial PFC and the hippocampus
  
  so this encapsulates ACC within it as well
2. sven.wientjes 23 Sep 2020
  
  in Public
  
  rela-tive balance between ‘state prediction errors’ and ‘reward prediction errors’ may be used for arbitration in an MB–MF hybrid learner
  
  Talk about state prediction errors in MF-MB literature?
3. sven.wientjes 23 Sep 2020
  
  in Public
  
  we chose to linearly combine the ratings from MB and SR algorithms
  
  So SR sort of becomes the replacement for MF -> explains why after proper learning / caching, increased alternative task demands don't have detrimental effect on MBness?
4. sven.wientjes 23 Sep 2020
  
  in Public
  
  response times were slower in the transition revaluation condition compared with both the reward revalu-ation condition (t57= 2.08, P< 0.05) and the control condition (t57= 4.04, P< 0.00
  
  This could be a signature some model-based planning is grinding the participants mental gears
5. sven.wientjes 23 Sep 2020
  
  in Public
  
  reward revaluation
  
  Now, they should prefer the other starting state
6. sven.wientjes 23 Sep 2020
  
  in Public
  
  transition revalua-tion condition
  
  So in SR this would not work! Long-run state caches is updated slowly
7. sven.wientjes 23 Sep 2020
  
  in Public
  
  indicate which starting state
  
  This is the actual performance -> Do they understand how starting state relates to reward at end of trajectory?
8. sven.wientjes 23 Sep 2020
  
  in Public
  
  indicate their preference
  
  So they had some mild agency / incentive to pay attention to the transitional structure! Maybe we can have specific states in some community (B) pay out a small reward, and later validate this with choices from community A -> to which would you transition? -> choose bottleneck to community B
  
  Preference check is attentional / structure learning check
9. sven.wientjes 23 Sep 2020
  
  in Public
  
  Experiment 1 used a passive learning task, which permitted the simplest possible test of the the-ory, removing the need to model action selection.
  
  Passive learning task of the SR
10. sven.wientjes 23 Sep 2020
  
  in Public
  
  Specifically, they learn and store a one-step internal representation or model of the short-term environmental dynamics: specifically, a state transition function T and a reward function R
  
  So learning T is substantially different from learning M. Learning T might not be more difficult, but using it for planning will take more time than using M. This is the benefit of SR -> computational time at decision time
Visit annotations in context

Annotators

sven.wientjes

URL

cyber.sci-hub.se/MTAuMTAzOC9zNDE1NjItMDE3LTAxODAtOA==/momennejad2017.pdf
www.biorxiv.org www.biorxiv.org

Single-Trial Inhibition of Anterior Cingulate Disrupts Model-based Reinforcement Learning in a Two-step Decision Task

22
1. sven.wientjes 23 Sep 2020
  
  in Public
  
  posterior dorsomedial striatum (Oh et al., 5312014; Hintiryan et al., 2016), a region necessary for learning and expression of goal directed action 532as assessed by outcome devaluation
  
  So learning signals could also come from here?
2. sven.wientjes 23 Sep 2020
  
  in Public
  
  Likewise, neuroimaging in a saccade task in which subjects constructed and updated a model of the 522location of target appearance observed ACC activation when subjects updated an internal model of 523where saccade targets were likely to appear, (O’Reilly et al., 2013)
  
  This is the update v no update oddball task
3. sven.wientjes 23 Sep 2020
  
  in Public
  
  , neuroimaging in the Daw two-step task has identified representation of model-based value 520in the BOLD signal in anterior- and mid-cingulate regions
  
  This is where ACC relevance in two-step task comes in explicitly!
4. sven.wientjes 23 Sep 2020
  
  in Public
  
  In 465humans, extensive training renders apparently model-based behaviour resistant to a cognitive load 466manipulation (Economides et al., 2015) which normally disrupts model-based control (Otto et al., 4672013), suggesting that it is possible to develop automatized strategies which closely resemble 468planning.
  
  This seems like a possible SR influence as well!
5. sven.wientjes 23 Sep 2020
  
  in Public
  
  ACC inhibition on model-based control because the effects would not survive multiple comparison 428correction for the large number of model parameters.
  
  So do not conclusively cite this study on anything specific
6. sven.wientjes 23 Sep 2020
  
  in Public
  
  xed and are known to be so by 302the human subjects
  
  So with fixed transition probabilities people will learn the interaction effect in the logistic regression. However, when participants also have to learn the transitional structure, the logistic regression will show effects separate for transitions and rewards, and no interaction!
7. sven.wientjes 23 Sep 2020
  
  in Public
  
  The absence of transition-outcome interaction has been used in the 298context of the traditional Daw two-step task (Daw 2011) to suggest that behaviour is model-free. 299However, we have previously shown (Akam et al. 2015) that this depends on the subjects not 300learning the transition probabilities from the transitions they experience.
  
  So this might be a crucial theoretical problem for typical 2-step tasks?
8. sven.wientjes 23 Sep 2020
  
  in Public
  
  direct biases of 286choice
  
  Some patterns in choices are simple response biases, but can mimick MBRL to some extent
9. sven.wientjes 22 Sep 2020
  
  in Public
  
  Subjects learned the task in 3 weeks
  
  This is where the difference between human subjects and animals comes into play: Humans can be instructed to ínstantly grasp the task. Will that be a major issue in performing a similar setup with humans?
10. sven.wientjes 22 Sep 2020
  
  in Public
  
  Reversals in which first-step action (high or low) had higher reward 216probability, could therefore occur either due to the reward probabilities of the second-step states 217reversing, or due to the transition probabilities linking the first-step actions to the second-step states 218reversing.
  
  That does seem like a convincing dissociation that the brain might want to keep separate!
11. sven.wientjes 22 Sep 2020
  
  in Public
  
  except on transitions to neutral 214blocks, 50% of which were accompanied by a change in the transition probabilities
  
  So sometimes here we have a double update required?
12. sven.wientjes 22 Sep 2020
  
  in Public
  
  At block transitions, either 213the reward probabilities or the transition probabilities changed
  
  so you have a dissociation between updates on poke transitions and reward outcomes from these pokes, thats clever
13. sven.wientjes 22 Sep 2020
  
  in Public
  
  introduction of reversals in the 189transition probabilities mapping the first-step actions to the second-step states. This step was taken 190to preclude subjects developing habitual strategies consisting of mappings from second-step states 191in which rewards had recently been obtained to specific actions at the first step (e.g. rewards in 192state X  chose action x, where action x is that which commonly leads to state X). Such strategies 193can, in principle, generate behaviour that looks very similar to model-based control despite not using 194a forward model which predicts the future state given chosen action (
  
  Is this the Dezfouli and Balleine style decisions they are referring to?
14. sven.wientjes 22 Sep 2020
  
  in Public
  
  block-based reward probability distribution
  
  Should promote task engagement how?
15. sven.wientjes 22 Sep 2020
  
  in Public
  
  in each second-step state there was a single action rather than a 181choice between two actions available, reducing the number of reward probabilities the subject must 182track from four to two
  
  This is how the 'burden' on participants is relieved - they can now focus more on learning the state transitions. But doesn't this identify the reached second stage with guaranteed reward, hence allowing direct encoding of reward to be a confound?
16. sven.wientjes 22 Sep 2020
  
  in Public
  
  model-based mechanisms 171which learn action-state transition probabilities and use these to guide choice.
  
  slightly different representation of the SR (in the plos paper it would be the H matrix of SR-Dyna?)
17. sven.wientjes 22 Sep 2020
  
  in Public
  
  optogenetic silencing of ACC neurons on individual trials reduced the influence of the 110experienced state transition on subsequent choice without affecting the influence of the trial 111outcome
  
  HEAVILY IMPLIES ACC AS LEARNING STATE-TO-STATE TRANSITIONS (signalling the error for SR updates?)
18. sven.wientjes 22 Sep 2020
  
  in Public
  
  in depth computational analysis 101(Akam et al., 2015
  
  Curious
19. sven.wientjes 22 Sep 2020
  
  in Public
  
  developing a new version in which both the reward probabilities in the leaf states of the decision 103tree and the action-state transition probabilities change over time.
  
  This seems like a sensible approach - now also transitions have to be learned. My understanding was no other experimentalists really did this because it would make the setup too involved and difficult for participants to grasp.
20. sven.wientjes 22 Sep 2020
  
  in Public
  
  (ACC), a region expected to be centrally involved
  
  ACC is expected to be centrally involved in the two-step decision task?
21. sven.wientjes 22 Sep 2020
  
  in Public
  
  representing task contingencies beyond model-57free cached values
  
  This is ALSO an SR feature!
22. sven.wientjes 22 Sep 2020
  
  in Public
  
  Firstly, the ACC provides a massive input to posterior dorsomedial 54striatum (Oh et al., 2014; Hintiryan et al., 2016), a region critical for model-based control as assessed 55through outcome-devaluation
  
  Outcome devaluation is also present in SR however!
Visit annotations in context

Annotators

sven.wientjes

URL

biorxiv.org/content/10.1101/126292v1.full.pdf
Local file Local file

Predictive representations can link model-based reinforcement learning to model-free mechanisms

32
1. sven.wientjes 18 Sep 2020
  
  in Public
  
  failurestoflexiblyupdatedeci-sionpoliciesthatarecausedbycachingofeitherthesuccessorrepresentation(asinSR-TDorSR-Dynawithinsufficientreplay)ora decisionpolicy(asinSR-MB)shouldbeaccompaniedbyneuralmarkersofnon-updatedfuturestateoccupancypredictions
  
  Good basis for some kind of experiment?
2. sven.wientjes 18 Sep 2020
  
  in Public
  
  unlikethehippocampus,partsofthePFCappeartobeinvolvedinactionrepresentationinadditiontostaterepresentation
  
  This could be relevant for the SR-Dyna model where there is H(a,s) matrix
3. sven.wientjes 18 Sep 2020
  
  in Public
  
  valueweightswouldbeleanedbyneuronsconnectingthehippocampustoventralstriatum,inthesameTDmannerdiscussedinthispaper
  
  Or perhaps a detectable error with ERP produced in ACC?
4. sven.wientjes 18 Sep 2020
  
  in Public
  
  fMRImeasuresoftherepresentationofvisualstimuliintaskswheresuchstimuliarepresentedsequentially
  
  Paradigms to measure SR with fMRI!
5. sven.wientjes 18 Sep 2020
  
  in Public
  
  [74]demonstratedthata sophisticatedrepresentationthatincludesrewardhistorycanpro-ducemodel-basedlikebehaviorinthetwo-steprewardrevaluationtask
  
  Could we be able to decode first-stage action at second stage decision time from ACC potentially? Maybe when trained with an RNN?
6. sven.wientjes 18 Sep 2020
  
  in Public
  
  moresophisticatedstaterepresentations
  
  This is always going to be an ill-defined potential confound in any RL research I believe
7. sven.wientjes 18 Sep 2020
  
  in Public
  
  predictionerrorrelatedBOLDsignalsinhumans
  
  How do we find this? :D
8. sven.wientjes 18 Sep 2020
  
  in Public
  
  nsteadstoredinprefrontalcortex
  
  Would make sense as the seat of planning etc
9. sven.wientjes 18 Sep 2020
  
  in Public
  
  successormatrixupdatedbySR-Dynamightitselfexistintherecurrentconnectionsofhippocampalneurons
  
  There is already a paper on this?
10. sven.wientjes 18 Sep 2020
  
  in Public
  
  SR-Dynacansupportrapidactionselectionbyinspectingitslookuptable
  
  It basically does tree-search beforehand, and caches the values it finds from the tree search, so they can be applied directly at task-time
11. sven.wientjes 18 Sep 2020
  
  in Public
  
  ThissimulationdemonstratesthatSR-Dynacanthusproducebehavioridenticalto“full”model-basedvalueiterationinthistask
  
  And it is still computed using DA plausible TD techniques. Only the representational space gets more and more complex, and the Replay mechanic gets added.
12. sven.wientjes 18 Sep 2020
  
  in Public
  
  recurrentneuralnetworksoffera simplewaytocomputeMπ(s,:)basedonspreadingactivationimplementingEq11.
  
  This is beautiful information for us!
13. sven.wientjes 18 Sep 2020
  
  in Public
  
  SR-MBcannotsolvethenovel“policy”revaluationtask
  
  But it also depends on its exploration strategy if this is continuously active
14. sven.wientjes 18 Sep 2020
  
  in Public
  
  SR-TDandSR-MBarethus“on-policy”methods–theirestimatesofVπcanbecomparedtotheestimatesofa traditionalmodel-basedapproach
  
  So they do not generalize to other policies necessarily! Though M learned under a uniform, fully random policy should consititute the accurate transition model?
15. sven.wientjes 18 Sep 2020
  
  in Public
  
  SR-MBlearnsa one-steptransitionmodel,Tπandusesit,atdecisiontime,toderivea solutiontoEq9
  
  With eq.9. being the solution for M. So in essence it is a 'double' SR? It holds a model for the long run expectancies in M, but which is composed from the one-step expectancies learned in T
16. sven.wientjes 18 Sep 2020
  
  in Public
  
  combinetheSRwithfunctionapproximationanddistributedrepresentations
  
  Function approximation == Neural Networks Distributed Representations == What we are looking into!
17. sven.wientjes 18 Sep 2020
  
  in Public
  
  Crucially,despitethefunctionalsimi-laritybetweenthisruleandtheTDupdateprescribedtodopamine,wedonotsuggestthatdopaminecarriesthisseconderrorsignal
  
  So no error detectable with ERP technique will probably guide us in this direction of SR-TD? Depends on content of paper in ref 55.
18. sven.wientjes 18 Sep 2020
  
  in Public
  
  standarddopaminergicTDrule
  
  This speaks for the benefits of SR-TD
19. sven.wientjes 18 Sep 2020
  
  in Public
  
  agentis allowedtoexperiencethischangeonlylocally
  
  Local exploration of introduced blockade -> correct representation only for entries that were experienced
20. sven.wientjes 18 Sep 2020
  
  in Public
  
  BecauseMπreflectslong-runcumulativestateoccupancies,ratherthantheindividualone-steptransitiondistribution,P(s’|s,a),SR-TDcannotadjustitsvaluationstolocalchangesinthetransitionswithoutfirstupdat-ingMπatdifferentlocations.
  
  A 'smarter' algorithm could evaluate M completely if it learns a local change, which could even be learned through TD (only relevant vector s -> s').
21. sven.wientjes 18 Sep 2020
  
  in Public
  
  SR-TDcan,withoutfurtherlearning,producea newpolicyreflectingtheshortestpathtotherewardedlocation
  
  And then, using learned M(pi) from random policy and rewarded state by direct placement, it can compute shortest path from any random initialization!
22. sven.wientjes 18 Sep 2020
  
  in Public
  
  firstexploresthegrid-worldran-domly,duringwhichit learnsthesuccessormatrixMπcorrespondingtoa randompolicy
  
  It can learn M(pi) according to a random policy without any reward entering the system ever!
23. sven.wientjes 18 Sep 2020
  
  in Public
  
  gMpðs0;:Þ
  
  So it makes M(s,:) a little bit more similar to M(s',:)? But it will become an aggregate of all possible s' since you could transition to any (with a certain probability) -> weighted exactly by transition structure! Weighting happens slowly over time hence source of inflexibility to transitioning.
24. sven.wientjes 18 Sep 2020
  
  in Public
  
  Thethreemodels
  
  So actually, for the learning of M, three different models are proposed in this paper! There is not simply one learning algorithm involved in this SR thing.
25. sven.wientjes 18 Sep 2020
  
  in Public
  
  Tπis theone-stepstatetransitionmatrixthatis dependentonπ
  
  Policy dependence = online?
26. sven.wientjes 18 Sep 2020
  
  in Public
  
  approximationwillbecorrectwhentheweightw(s0) foreachsuccessorstatecorrespondstoitsone-stepreward,averagedoveractionsins
  
  Why average over the actions and not pick the max?
27. sven.wientjes 18 Sep 2020
  
  in Public
  
  learnedrewardvaluesinventralstriatum[53]
  
  Already a learning signal for successor representations as potentially seen in HPC?
28. sven.wientjes 18 Sep 2020
  
  in Public
  
  Doya[51]introduceda circuitbywhichprojectionsviathecerebellumper-formonestepofforwardstateprediction,whichactivatesa dopaminergicpredictionerrorfortheanticipatedstate
  
  State prediction error coming from cerebellum?
29. sven.wientjes 18 Sep 2020
  
  in Public
  
  andneuro-imagingofpredictionerrorsignalsinhumanstriatum[6]
  
  Detecting prediction errors with fMRI?
30. sven.wientjes 18 Sep 2020
  
  in Public
  
  thusper-hapsinvolvinganalogous(striatal)computationsoperatingoverdistinct(cortical)inputrepre-sentations[47].
  
  Would make sense, TD errors and other learning signals could be used by many systems for many different kind of learning or updating
31. sven.wientjes 18 Sep 2020
  
  in Public
  
  Typically,model-basedmethodsareoff-policy(sincehavinglearnedaone-stepmodelit is possibletouseEq2 todirectlycomputetheoptimalpolicy);whereasdif-ferentTDlearningvariantscanbeeitheron-oroff-policy.
  
  Model-based is considered off-policy, because we have a pretty complete representation of the environment, which is not biased by taken actions. If the relevant mappings of actions make a difference for the estimation, it would be considered on-policy.
32. sven.wientjes 18 Sep 2020
  
  in Public
  
  butalsointhemoreflexi-blechoiceadjustmentsthatseemtoreflectmodel-basedlearning
  
  E.g. error updating from counterfactuals, which suggests model-based inference of reward that was not obtained!
Annotators

sven.wientjes
Local file Local file

pcbi.1003364 1..14

5
1. sven.wientjes 17 Sep 2020
  
  in Public
  
  n previousversions, subjects at each stage chose between two symbols insteadof two fixed actions and the symbols moved from side to side ateach trial ensuring there was no consistent mapping between thebutton presses and the symbols
  
  This could be a major difference for the paradigm! Now the actions are mapped directly and not through an environmental link
2. sven.wientjes 17 Sep 2020
  
  in Public
  
  decisions that are insensitive to (i) the valuesof the outcomes [8] and (ii) the contingency between specificactions and their outcomes
  
  Clear statement that HRL sequences can become insensitive to the transition structure of the problem at hand
3. sven.wientjes 17 Sep 2020
  
  in Public
  
  abitual (when actionsequences are selected) and goal-directed (when single actions areselected) action
  
  This is a different 'habitual' than the MF-RL implied habit
4. sven.wientjes 17 Sep 2020
  
  in Public
  
  a first stage action is the best action
  
  Here they do analyze the difference in expected reward from different first stage actions. Danesh paper dismissed this as intractable for participants
5. sven.wientjes 17 Sep 2020
  
  in Public
  
  with a small probability (1:7), the rewardingprobability of each key changed randomly to either the high or lowprobability.
  
  So not the slow drifting as in Daw and in Danesh paper
Annotators

sven.wientjes

sven.wientjes

Annotations: 584

Joined: June 24, 2020

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

URL

Annotators

Annotators

Annotators

Annotators

Annotators

Annotators

URL

Annotators

URL

Annotators

Annotators