584 Matching Annotations
  1. Sep 2020
    1. executed one after another without considering feedbackfrom the environment during the sequence

      The hierarchical account states we do not take the second stage context into account and simply pick what we already decided before we even saw the second stage context.

    2. action control in stage two should not dependon stage one

      This is key to the whole setup of the two-step markov decision task right: Once we have arrived in the second stage, we do select based on the known context, regardless of MB or MF RL. The effect happens on the switching the next trial.

    3. Here we show that first stage habitual actions, explained by themodel-free evaluation in previous work, can also be explained byassuming that first stage actions chunk with second stage actions

      So this does not actually account for the model-based behaviour, which we hope we can build

    Annotators

    1. participant could use this information to select the action that has a relatively high expected valueon common transitions. Thus, arare transition would lead to a state with lower expected value,yieldinga negative RPE

      This seems very likely to me! Is there some natural way to correct for this? Current estimation of expected value as a covariate, regress it out, or something?

    2. analysing frequency of stay on the first step choice should reveal an interaction effect between transition type and second choice feedback outcome in the preceding trial on the frequency of repeating the first step choice

      Participants have selected one spaceship because they like its major planet -> move to minor planet -> obtain reward -> select alternative spaceship (win-shift)

      Move to minor planet -> obtain no reward -> select same old spaceship again (you want to go to major planet) = lose-stay

    3. Gläscher et al. (2010) conducted an fMRI experiment usinga paradigm that featuredcommon and uncommon transitions and found that the intraparietal sulcus and lateral PFC are sensitive toSPEs

      So empirical results show: aMCC does not do this

    Annotators

    1. ACC appears to encodehierarchical structure with distributed representations, which makes the parcellation problemeven harder

      It is not immediately obvious how hierarchy can be encoded in continuous time distributed neural representations. This is a different beast from a symbolic AI algorithm that updates a and then draws b etc.

    2. Individual ACC neurons seemcapable of responding to most task events, with particular mixtures of sensitivities within andacross neurons continually reallocated according to changing task conditions [72]

      Very cool target for a modeling study?

    3. phasic bursts of norepinephrine, which may serve as a neural-interrupt signal[67], can reset network activity in ACC [68] and thus allow for module re-binding

      So the LC-NE system can 'reprogram' the communication through the ACC?

    4. Specifically, when task demands are high (e.g., afteran error), ACC would send a synchronizing signal to lower-order modules, with consequentsynchronization and thus improved communication between those lower-order modules

      ACC is now also the great orchestrator of communication everywhere - What is left for the dlPFC?

    5. ACC motivates sticking to a plan

      Framing the ACC for extended control of sequences thus states that it keeps track of how much of this cost of planning would likely still be worth it. This is basically the same idea as the 'expected value of control' theory, although the function of ACC is expanded upon much by HMB-HRL theory.

    6. At face value, such a self-regulating controlmechanism is both computationally [48] and evolutionarily [49] maladaptive

      No! A self-regulating control systems should sometimes turn itself off! This is the whole reason we have cost added into the mix.

    7. feedback-based control mechanisms constitute thebread-and-butter of control theory in engineering (Box 1), but these always concern theregulation of subordinate systems, never self-regulation

      So a theory in which the ACC adapts its own control by detecting conflict is not 'natural' from an engineering standpoint - it should modify subordinate systems?

    8. For example, aprominent computational model of ACC contains units that exhaustively predict all possiblestates of the task environment, generating prediction errors to unexpected transitions; thoughnot explicitly used in the model for this purpose, in principle the prediction errors can providelearning signals for MB-RL [47]

      ACC-dlPFC theory as a super-learner for all unexpected events? Sound a bit predictive-processing ish

    9. ACC could use such models to plan overtemporally extended action sequences

      So it would take a planning function, which is in much literature associated with the HPC? How do the two interact, seems like a very relevant question!

    Annotators

    1. can be useful in the context of multitask learning to extract useful, reusable policies

      The DR is some sort of generalized representation of shared structure of a task (family)?

    2. default policy plays the role of prior over policy space and rewards play the role of the likelihood function

      Options also function as some sort of prior over action selection potentially?

    3. empiricallyunderconstrainedtheoreticalflexibilityinspecifyinghowatask’sstatespaceshouldbe.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted May 5, 2020. . https://doi.org/10.1101/856849doi: bioRxiv preprint

      Exactly a problem of PFC research - empirically underconstrained in what should be represented

    4. distinguishing between terminal states (representing goals), and nonterminal states (those that may be traversed on the way to goals)

      How exactly is this implemented, and what does this mean for our RNN architecture, which has a goal-representation space separately?

    5. “control cost,” KLL𝜋||𝜋MN, whichisincreasinginthe dissimilarity (KL divergence) between the chosen distribution 𝜋andsomedefaultdistribution,𝜋M.

      Control cost inherent in the model! Can be linked to expected-value of control model very naturally

    6. The only way to find the latter using equation(2)is by iteratively re-solving the equation to repeatedly update 𝜋and𝐒untiltheyeventuallyconvergeto𝜋∗

      Policy iteration (solving through search)

    7. assuming that all choices are made followingpolicy 𝜋

      Since in practice, the state transition function is likely to be dependent on the chosen actions, it is wise to note it as S(pi), as the long-run state visits depend on the action selection according to this fixed policy

    8. The default policy and cost term introduced to make linear RL tractable offers a natural explanation for these tendencies,quantifies in units of common-currency reward how costly it is to overcome them in different circumstances,and relatedly offersa novel rationale and explanation for a classic problem in cognitive control: the source of the apparent costs of “control-demanding” actions

      Deviations from the default policy are what constitute 'control demanding' actions? => So the more we deviate from this, the more dACC activity we can expect, something like this??

    9. SR theory predicts grid fields must continually change to reflect updated successor state predictions as the animal’s choice policy evolves, which is inconsistent with evidence

      Entorhinal grid cells have this 'fourier-domain map of task space' but do not continuously change their representations to fit with different goals - as would be necessary under vanilla SR theory

    10. stable and useful even under changes in the current goalsand the decision policy they imply

      How exactly would they achieve this difference from the SR? Long-run state expectancies sieems to be the definition and also the problem of SR

    11. For instance, a change in goals implies a new optimal policy that visits a different set of states, and a different SR is then required to compute it.

      This is exactly what an option would look like!

    12. However, it simply assumes away the key interdependent optimization problem by evaluating actions under a fixed choice policy(implied by the stored state expectancies)for future steps.

      Assumes fixed, constant, probabilities of future state visits

    Annotators

    1. The present study confirms that the aMCC’s distributed code for temporal information is not sufficiently consistent across blocks and sequence types to be detectable using the ROI classification approach, revealing only a weak effect size in the generalization analysis.The two approaches therefore appear to provide complementary information

      Re-examine the methodology of the RSA in the aMCC study - is it very tailored?

    2. domain-general role to pars orbitalis in learning the relationship between environmental events and transition probabilities between various environmental states

      Is this sucessor representation learning??

    3. By contrast, inconsistentwith the past literature, in the ROI analysis, we did not find evidence for involvement of the aMCC and hippocampus

      So basically our theory is already under heavy scrutiny? Exactly the opposite of what we want to see!

    4. discriminate between the first and second instance that the stir action was performed

      Temporal information -> progression through sequence information (regardless of coffee vs tea task)

    Annotators

    1. At the same time,we observed that dlPFC reinstatement of CTD positively scaledwith the hippocampal pattern similarity between the two over-lapping contexts

      So in dlPFC the CTD did have similarity over the two contexts, even though they were distinct in the HPC?

    2. congruency: match/mismatch between the CTD and the actualtask demands on the trial)

      Some trials in context 1/2 will have task demand associated with 3/4. This is 'incongruent'

    3. ask-setsinclude additional instructions on“how”

      Task sets are conceptually different from the process that can identify the task set based on cueing - that is an associative / semantic process?

    Annotators

    1. Black dots indicate stable fixed points

      You can see in the DMC the RNN has created 3 stable states it can occupy - not only the fixation at the start. Also two for the sample->delay moment, dependent on the category of the sample stimulus!

    2. stable states associated with each category at the end of the sample period in the DMC task

      This really is maintanence of classificatory information after the first stimulus!

    3. It is likely that this phenomenonis mediatedbyinteractions among different brain regions involvedinthe OIC and DMC tasks. Indeed, LIP is connected with the dorsolateral prefrontal cortex(DLPFC)

      Cognitive Control area through dlPFC might be responsible for 'reprogramming' what goes on in LIP? Flexible readouts, different effect of recurrent encoding, etc?

    4. greater compression of activity among direction within categories in the DMC task

      Exactly what you would expect right, as direction does not matter for encoding category itself? We are talking about matching.

    5. compressing variability among directions within a category

      So it were mainly the response directions that were still encoded in the OIC task in LIP. In DMC this disappeared in favour of more population-level category coding.

    6. we evaluated the temporal stability of category decoding using SVM decoders that were trained at one time point and then tested at all other times pointsin the shared sample period

      Check the maintenance of information at time-point 1 by training a decoder on it and applying it to future time-points!

    7. category computations are supported by a common subpopulation of neurons in the early sample period and different subpopulations of neurons or different readout mechanisms in the late sample period

      Some perceptual mechanism is task-independent, simply information providing and encoding. Later readout can be flexible according to task demands!

    8. attractor dynamics appears to compress category-related information to a simpler, binary format by collapsing all directions within a category towards a single population state

      Working-Memory component induced in RNN as a function of task demand

    Annotators

    1. assume a predefined state and action space

      It is really the representational structure that makes a HUGE difference in the effectiveness of any learning strategy deployed over it

    2. apply MF RL updates on retrospectively inferred latent states

      Have a model at the ready but don't do anticipatory planning - only retrospective evaluation according to MF-RL learning rules

    3. For MB control to materialize, the agent must identify its goal, search its model for a path leading to that goal and then act on its plan

      Hard to model and understand the exact scope of the MB controller, so default to MF evidence if not accurate?

    4. compound representations

      Essentially leveraging working memory to create apparently more 'flexible' behaviour, while in reality MF-RL is the only real 'learning' mechanism

    5. DAergic signals support both instrumental (action–value) and non-instrumental (state–value) learning in the striatum.

      The correct error signals are provided to facilitate any form of RL based learning. In Striatum?

    6. Computational RL theory built on the principles that animal behaviourists had distilled through experimentation, to develop the method of temporal difference (TD) learning (a MF algorithm)

      Origins of RL are purely associative learning - delta rule style

    7. derive value estimates for the different states or actions available

      Practical difference between computing desired states and inferring best actions VS directly computing desired actions without explicit state values

    8. dimensionality of learning — the axes of variance that describe how individuals learn and make choices — is well beyond two

      It is not only the speed / accuracy that is being traded off - which is what MB/MF and all other two-systems seem to boil down to.

    Annotators

  2. Aug 2020
  3. Local file Local file
    II
    15
    1. rostral ACC activity will predict the probability of switching strategies, whereas caudal ACC activity will predict the probability of staying within a strategy

      Univariate more activity in rostral (higher up the hierarchy) means switching? not perhaps more top-down control to STAY in the current strategy?

    2. ill-equipped to simulate control processes that are inherently dynamic, such as the response delays introduced by switching between tasks

      Yes, or maybe also the continuous tasks as proposed by Hayden group!

    3. cells in isolation, or univariate indicators of ACC function that average across the activity of entire cell populations

      Distributed patterns will not be picked up - the guiding function of ACC cannot be detected. Or perhaps, only the 'energizing' part of it

    4. caudal ACC and rostral ACC apply control signals that attenuate costs associated with the production of low-level actions

      Or is this in some way similar to a 'gating' mechanism, allowing stable representations for control to be 'updated' or amended to current needs, detected by a 'higher' system

    5. ACC damage does not interfere strongly with many of the putative functions that have been attributed to it

      Crucial aspect of the HRL theory: Everything can still happen, the ACC does not EXCLUSIVELY execute all of these functions, it simply strengthens / guides

    6. do large and sudden changes in the state space explain the conflict-likesignals that are commonly observed in ACC?

      So basically it is not necessarily an explicit encoding of error, rather the updating of the current context?

    7. the model will predict how hierarchical action sequences are representedat different levels of abstraction along the frontal midline

      increasingly abstract goals will be represented along a gradient of the spatial organization of the network - this might have something to do with connectivity between layers? cf. convolutional neural nets

    Annotators

    1. multiple regression problem, in which the different codingmodels were treated as predictor variables to the observed similarity matrix, the analysis enabledjoint estimation of multiple coding models (

      Super relevant: We can estimate the DEGREE to which different features are encoded on a trial-by-trial basis??

    2. by using a between-subjects RSAapproach, the analysis was not optimized to capture finer-grained representational structure thatcould be subject-specific

      Of course between people the geometry can take on a distinct form, but even if the encoded information is the same? Shouldn't RSA abstract over that?

    3. This exclusive focuson behavioral measures may be suboptimal for construct validation, as brain activity measures canprovide more proximal, higher-dimensional readouts of the neural mechanisms of interest

      Is it expected that even though we cannot find common patterns in behaviour, we can still find common patterns in the brain? I thought even within the same task over different sessions, the control representation might differ...

    4. lso relevant is the insight that RSA can be conducted in a time-dependent mannerwithin fMRI, such thattrialsform the dimensions of the similarity matrices

      How about sub-trails (e.g. only the picking of sugar) ?

    5. For example, inter-ference occurs when a goal-relevant task set and an irrelevant yet prepotent set are simultaneouslyactive.

      Is this really a more elaborate model? Would'nt the 'interference' simply move any RSA model closer to the midline between color / word naming task set? We won't have very clear access to neural firing strengths or anything, if that would be relevant.

    6. one-dimensionalstructure of the model

      The model only investigated the representational strength along the face-house attentional dimension (not an interesting representational geometry?)

    7. specifying and comparing representational models is more flexible within RSA

      So the benefit is that we get more insight into the geometry of the representation, not just its presence / decodability? Compare different computational models.

    8. classification-based decoding, which we simply refer to here as “classification”, and RSA

      Distributed patterns can be subdivided like this: decoding and encoding models

    9. type and form of informationencoded in LPFC and associated regions of the FPN and CO

      So there is relevant information encoded in task set control reps: but there will probably still be a relevant 'intensity' summarizable in a scalar value?

    10. independent of particular stimuli, responses, or other task information

      Abstract control related factors such as 'congruency' abstract control signals away from directly related stimulus signals

    11. highly abstracted, one-dimensional factors

      This is the main issue right - Setting up experiments to identify one 'factor' of cognitive control, e.g. 'Congruency' in stroop tasks. It is much more complex multidimensional than that.

    Annotators

    1. inferred in the current experimentbecause the future value of a patch is, by design, different from itspast value

      dACC continuously learning a variable - the slope specifically? Or is it 'simply' trying to predict the prediction errors of lower layers. It seems like the latter would generalize less!

    2. The opposing time-linked signals observed do notsuggest that dACC and the other regions integrate rewards to asimple mean estimate (as RL-simple would), but instead pointtowards a comparison of recent and past reward rates necessaryfor the computation of reward trends.

      But this was based on full-region regression weights over different time bins, computed from a particular choice moment. How can you determine separate representations with whole-area betas, simultaneously recent reward increases, whereas past reward decreases, the activity of the whole region?

    3. which was updated on everytime step using a higher-order PE, PE* (that is, the difference ofobserved PE and expected PE)

      So there is some explicit hierarchy in the prediction errors - but is that really what goes on, or is there an explicit estimation of a trend, not necessarily based on the prediction errors themselves? It is mathematically identical!

    4. it is also possible that PEs may be used as a decisionvariable to guide decisions

      The ability of PEs to directly influence decision making, and not just learning, goes above and beyond simple-RL

    Annotators

    1. this relationship is heterogeneous; of these 58 neurons, 31.03% (n= 18/58) showed a positive slope and 18.97 % (n= 11/58) showed a negative slop

      Distance to prey is an important variable, and it is the actual code for time of impending reward - but it is not encoded by an overall rise in activity (typical fMRI analysis assumption!) It is encoded distributed over the neurons (perhaps RSA, but could still mask it if very single-neuron heavy?)

    Annotators

  4. Jul 2020
    1. Finally, some research in deep RL proposes to tackle explora-tion by sampling randomly in the space of hierarchical behaviors(Machado et al., 2017;Jinnai et al., 2020;Hansen et al., 2020).This induces a form of directed, temporally extended, randomexploration reminiscent of some animal foraging models (Viswa-nathan et al., 1999).

      Sampling from hierarchical behaviours?

    2. An example is prediction learning, inwhich the agent is trained to predict, on the basis of its currentsituation, what it will observe at future time steps (Wayne et al.,2018;Gelada et al., 2019).

      This might be what can get confused for successor representation signal from dACC?

    3. Song et al. (2017)trained a recurrent deep RL modelon a series of reward-based decision making tasks that havebeen studied in the neuroscience literature, reporting close cor-respondences between the activation patterns observed in thenetwork’s internal units and neurons in dorsolateral prefrontal,orbitofrontal, and parietal cortices (Figure 2C).

      Relevant stuff!

    Annotators

    1. An important future goal is to create multiscale neural computational models that better predictmore complex real world behaviors

      Is this something that we induce automatically with recurrence?

    2. conflict betweenshort-term (safe options) and long-term (risky options) was mediated by the dorsal anterior cingulatecortex (dACC)

      dACC signals their conflict - does it do so by strengthening the representations of the one it favours?

    3. superimposition of computations at the shorter time scale (a trial) and the longer time scale (a blockof trials)

      now, participants have to continuously evaluate which task they should be engaged in (which computations to perform?)

    4. One research area that examined decisions of multiple time scales is foraging theory

      Foraging == The study of the tradeoff between exploiting the current local habit / inertia VS finding a different niche?

    5. miss the crucial resources that could be available if the animal maintains largerscale computations about the wider environment

      This is basically an explore-exploit tradeoff, but now not over one uniform action space but explicitly emphasizing local//global environment?

    6. shifts of anterior to posterior brain areas

      General organizational principle: The more something is habit-formed, the more posterior it shifts? The more control it requires, the more frontal it is?

    7. effectively modulatebetween local tasks while also considering multiple global goals and contextual factors

      Comes together nicely with a view on RL as central controller for 'homeostasis', or something like that. Would require highly hierarchical system of goals and representations.

    8. t has been historically assumed that theinformation processing in each trial is independent from information processing from other trials,and that once one trial completes, all the information processing is reset

      No inter-trial dependencies - but of course there are processing benefits / interferences and maybe trial-by-trial updates of response caution known as speed-accuracy tradeoff

    9. The requirement to integrate information over spatial and temporal scales in a widevariety of environments would seem to be a common feature underlying intelligent systems, and onewhose performance has a profound impact on behavior [16–21]

      Exactly: Generalization over representations and over temporal grain of behaviour

    10. much of the progress made in the latest“AI spring” are, as we describe below, achievements of multiscale processing.

      Generalization and broader tasks are the hallmark of the current success of AI

    Annotators

  5. Jun 2020
    1. Cells that report choice independently of taskshould lie on the diagonal (i.e., an angle ofp/4).Instead, the distribution of angles was signif-icantly bimodal across all cells

      So the MFC has different cells specialized for different tasks (familiarity vs categorization)? Disappointing - hoped for generalized / remapping.

    2. Choice decoding in the MFC was strongestshortly after stimulus onset, well before theresponse was made

      So it is not the actual execution of an action which is the information that is picked up - It is the direction contextual state-space representation heads towards?

    3. In contrast, in theMFC (Fig. 3E, right), the relative positions ofthe four conditions were not preserved.

      MFC seems to rely on different representations. Familiarity - category pairs are not perserved over tasks hence not really represented fundamentally. In the HPC it seems the determining factor (for Dim 1).

    4. In the HA, the ability to decode categorywas not significantly different between thetwo tasks

      HPC / Amyg encode stimulus category in memory & re-represent, regardless of task context. Part of stimulus representation in general.

    5. In the MFC, decoding accuracy for imagecategory was significantly higher in the memorytask

      MFC encodes task variable = stimulus category only during relevant context / task-set

    6. MS cell responses reflected amemory process: they strengthened over blocksas memories became stronger

      memory-selective identified cells fire stronger and stronger with repeated presentation of stimulus & identify false from true negative!

    7. We first trained a decoder to discriminate tasktype on trials where the subject was instructedto reply with a button press, and then we testedthe performance of this decoder on trials wherethe subject was instructed to use saccades

      Decoder should generalize task classification across response modalitied - does so in MFC (dACC and SMA)

    8. . Cells showed significantmodulation of their firing rate during thebaseline period as a function of task type

      So already from instruction there is a reconfiguration - very complicated mapping from linguistic input to representations for decision making...

    9. Subjects indicated choices using either sac-cades (leftward or rightward eye movement)or button press while maintaining fixationat the center of the screen

      Different response modalities allows for disambiguation of coding towards a very specific execution - if similar encoding it's really more cognitive/central!

    10. whether an image was novel orfamiliar, or whether an image belonged to agiven visual category

      recognition memory vs categorization: yes or no responses in both case. For 'pictures' so stimuli can be the same across tasks

    Annotators

    1. pattern of conflict modulation during one correct response is 489 orthogonal to the pattern during another correct response

      i.e. it is not a 'general boosting' effect -> only on average the activity of neurons can still increase, but it is all about upregulating the relevant neurons for this correct response

    2. amplification hypothesis, conversely, does not predict a unified conflict 341 detection axis in the population. Instead, it makes a prediction that is exactly contrary to 342 the epiphenomenal view: that conflict should shift population activity along task-variable 343 coding dimensions, but in the opposite direction. That is, conflict is predicted to amplify 344 task-relevant neural responses

      conflict means more control will be exterted. Heavier representation of whatever info it is that dACC encodes that 'pushes' for the correct action. This function of dACC would be in line with the context layer!?

    3. At the population level, then, the epiphenomenon hypothesis330 predicts that conflict should decrease the amount of information about the correct response 331 and shift neuronal population activity down along the axis in firing rate space that encodes 332 this response

      Because less % of neurons 'fighting' for the correct response are active, at least in total.

    4. In fact, the majority of conflict-sensitive 288 dACC neurons were not selective for either correct response or distractor responses (66.7%

      So the conflict is represented separately, not having much to do with action-outcomes.

    5. did still signal either Ericksen or Simon 277 conflict

      Simply the C-term in the ANOVA which is a binary coder for the general presence? Would also have more trials where its parameter is influential, does that influence estimation?

    6. separate pools of 266 neurons corresponding to the two conflicting actions, and that conflict increases activity 267 because it uniquely activates both pools

      more neurons activate for the different possible action outcomes = more activity overall --> conflict signal. Makes sense.

    7. Furthermore, the population of cells whose responses were significantly 244 affected by Eriksen conflict was almost entirely non-overlapping with the population 245 significantly affected by Simon conflict (specifically, only one cell was significantly 246 modulated by both)

      Really separate representations for different aspects of the current task-set?

    8. (n=15/145) neurons had significantly different firing rates between Simon and no-196 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted March 15, 2020. . https://doi.org/10.1101/2020.03.14.991745doi: bioRxiv preprint

      No significant main effect but more single cells had a significant effect...? -> also directionality is not all positive, some positive some negative

    9. A small number of individual 187 neurons also had different activity levels on Eriksen conflict and no conflict trials (8.2%, 188 n=12/145 neurons, within-cell t-test)

      Note the difference between 'averaged over all neurons' (first report) or 'within one specific neuron' (this report)

    10. Within each task condition 730 (combination of correct response and distractor response), firing rates from separately 731 recorded neurons were randomly drawn with replacement to create a pseudotrial firing rate 732 vector for that task condition, with each entry corresponding to the activity of one neuron 733 in that condition

      Definition of pseudotrial

    11. The separating hyperplane for each choice i is the vector (a) that satisfies: 770 771 772 773 Meaning that βi is a vector orthogonal to the separating hyperplane in neuron-774 dimensional space, along which position is proportional to the log odds of that correct 775 response: this is the the coding dimension for that correct response

      Makes sense: If Beta is proportional to the log-odds of a correct response, a is the hyperplane that provides the best cutoff, which must be orthogonal. Multiplying two orthogonal vectors yields 0.

    12. Subtracting this expectation from the observed pattern 723 of activity left the residual activity that could not be explained by the linear co-activation 724 of task and distractor conditions

      So this is what to analyze: If this still covaries with conflict in some way it means we go beyond epiphenomenal?