584 Matching Annotations
  1. Sep 2020
    1. executed one after another without considering feedbackfrom the environment during the sequence

      The hierarchical account states we do not take the second stage context into account and simply pick what we already decided before we even saw the second stage context.

    2. action control in stage two should not dependon stage one

      This is key to the whole setup of the two-step markov decision task right: Once we have arrived in the second stage, we do select based on the known context, regardless of MB or MF RL. The effect happens on the switching the next trial.

    3. Here we show that first stage habitual actions, explained by themodel-free evaluation in previous work, can also be explained byassuming that first stage actions chunk with second stage actions

      So this does not actually account for the model-based behaviour, which we hope we can build

    4. may best be viewed as actionsequence

      So a habit - something learned according to a model-free scheme - can trigger an action sequence


    1. participant could use this information to select the action that has a relatively high expected valueon common transitions. Thus, arare transition would lead to a state with lower expected value,yieldinga negative RPE

      This seems very likely to me! Is there some natural way to correct for this? Current estimation of expected value as a covariate, regress it out, or something?

    2. analysing frequency of stay on the first step choice should reveal an interaction effect between transition type and second choice feedback outcome in the preceding trial on the frequency of repeating the first step choice

      Participants have selected one spaceship because they like its major planet -> move to minor planet -> obtain reward -> select alternative spaceship (win-shift)

      Move to minor planet -> obtain no reward -> select same old spaceship again (you want to go to major planet) = lose-stay

    3. Thesefindings implicate the aMCC in the processing of theSPEs

      Fronto-medial theta has before already been 'localized' to the aMCC (likely)

    4. Predicted Response-Outcome(PRO)

      But this one model did say it should! (no data)

    5. Gläscher et al. (2010) conducted an fMRI experiment usinga paradigm that featuredcommon and uncommon transitions and found that the intraparietal sulcus and lateral PFC are sensitive toSPEs

      So empirical results show: aMCC does not do this

    6. transition model

      e.g. a successor representation (but not limited to this)


    1. ACC appears to encodehierarchical structure with distributed representations, which makes the parcellation problemeven harder

      It is not immediately obvious how hierarchy can be encoded in continuous time distributed neural representations. This is a different beast from a symbolic AI algorithm that updates a and then draws b etc.

    2. Individual ACC neurons seemcapable of responding to most task events, with particular mixtures of sensitivities within andacross neurons continually reallocated according to changing task conditions [72]

      Very cool target for a modeling study?

    3. phasic bursts of norepinephrine, which may serve as a neural-interrupt signal[67], can reset network activity in ACC [68] and thus allow for module re-binding

      So the LC-NE system can 'reprogram' the communication through the ACC?

    4. Specifically, when task demands are high (e.g., afteran error), ACC would send a synchronizing signal to lower-order modules, with consequentsynchronization and thus improved communication between those lower-order modules

      ACC is now also the great orchestrator of communication everywhere - What is left for the dlPFC?

    5. ACC motivates sticking to a plan

      Framing the ACC for extended control of sequences thus states that it keeps track of how much of this cost of planning would likely still be worth it. This is basically the same idea as the 'expected value of control' theory, although the function of ACC is expanded upon much by HMB-HRL theory.

    6. At face value, such a self-regulating controlmechanism is both computationally [48] and evolutionarily [49] maladaptive

      No! A self-regulating control systems should sometimes turn itself off! This is the whole reason we have cost added into the mix.

    7. feedback-based control mechanisms constitute thebread-and-butter of control theory in engineering (Box 1), but these always concern theregulation of subordinate systems, never self-regulation

      So a theory in which the ACC adapts its own control by detecting conflict is not 'natural' from an engineering standpoint - it should modify subordinate systems?

    8. For example, aprominent computational model of ACC contains units that exhaustively predict all possiblestates of the task environment, generating prediction errors to unexpected transitions; thoughnot explicitly used in the model for this purpose, in principle the prediction errors can providelearning signals for MB-RL [47]

      ACC-dlPFC theory as a super-learner for all unexpected events? Sound a bit predictive-processing ish

    9. ACC could use such models to plan overtemporally extended action sequences

      So it would take a planning function, which is in much literature associated with the HPC? How do the two interact, seems like a very relevant question!


    1. can be useful in the context of multitask learning to extract useful, reusable policies

      The DR is some sort of generalized representation of shared structure of a task (family)?

    2. how it is learned

      There is no clear proposal yet as to how the DR would be learned, it could be very similar to SR?

    3. default policy plays the role of prior over policy space and rewards play the role of the likelihood function

      Options also function as some sort of prior over action selection potentially?

    4. empiricallyunderconstrainedtheoreticalflexibilityinspecifyinghowatask’sstatespaceshouldbe.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted May 5, 2020. . https://doi.org/10.1101/856849doi: bioRxiv preprint

      Exactly a problem of PFC research - empirically underconstrained in what should be represented

    5. finite decision problem

      So linear RL cannot be a general theory of open-ended lifelong learning

    6. distinguishing between terminal states (representing goals), and nonterminal states (those that may be traversed on the way to goals)

      How exactly is this implemented, and what does this mean for our RNN architecture, which has a goal-representation space separately?

    7. “control cost,” KLL𝜋||𝜋MN, whichisincreasinginthe dissimilarity (KL divergence) between the chosen distribution 𝜋andsomedefaultdistribution,𝜋M.

      Control cost inherent in the model! Can be linked to expected-value of control model very naturally

    8. assumea one-to-one, deterministic correspondence between actions and successor states

      The next state is directly and exclusively dependent on the action we take

    9. The only way to find the latter using equation(2)is by iteratively re-solving the equation to repeatedly update 𝜋and𝐒untiltheyeventuallyconvergeto𝜋∗

      Policy iteration (solving through search)

    10. assuming that all choices are made followingpolicy 𝜋

      Since in practice, the state transition function is likely to be dependent on the chosen actions, it is wise to note it as S(pi), as the long-run state visits depend on the action selection according to this fixed policy

    11. The default policy and cost term introduced to make linear RL tractable offers a natural explanation for these tendencies,quantifies in units of common-currency reward how costly it is to overcome them in different circumstances,and relatedly offersa novel rationale and explanation for a classic problem in cognitive control: the source of the apparent costs of “control-demanding” actions

      Deviations from the default policy are what constitute 'control demanding' actions? => So the more we deviate from this, the more dACC activity we can expect, something like this??

    12. SR theory predicts grid fields must continually change to reflect updated successor state predictions as the animal’s choice policy evolves, which is inconsistent with evidence

      Entorhinal grid cells have this 'fourier-domain map of task space' but do not continuously change their representations to fit with different goals - as would be necessary under vanilla SR theory

    13. Fourier-domain map of task space

      Figure out what this means exactly...

    14. stable and useful even under changes in the current goalsand the decision policy they imply

      How exactly would they achieve this difference from the SR? Long-run state expectancies sieems to be the definition and also the problem of SR

    15. For instance, a change in goals implies a new optimal policy that visits a different set of states, and a different SR is then required to compute it.

      This is exactly what an option would look like!

    16. However, it simply assumes away the key interdependent optimization problem by evaluating actions under a fixed choice policy(implied by the stored state expectancies)for future steps.

      Assumes fixed, constant, probabilities of future state visits


    1. The present study confirms that the aMCC’s distributed code for temporal information is not sufficiently consistent across blocks and sequence types to be detectable using the ROI classification approach, revealing only a weak effect size in the generalization analysis.The two approaches therefore appear to provide complementary information

      Re-examine the methodology of the RSA in the aMCC study - is it very tailored?

    2. domain-general role to pars orbitalis in learning the relationship between environmental events and transition probabilities between various environmental states

      Is this sucessor representation learning??

    3. By contrast, inconsistentwith the past literature, in the ROI analysis, we did not find evidence for involvement of the aMCC and hippocampus

      So basically our theory is already under heavy scrutiny? Exactly the opposite of what we want to see!

    4. whether the stir action was performed in a tea or a coffee sequence, irrespective of the sequence position of the stir actio

      Only context, no temporal information

    5. discriminate between the first and second instance that the stir action was performed

      Temporal information -> progression through sequence information (regardless of coffee vs tea task)

    6. first defining functional or anatomical ROIs based on subject-specific data

      So it is very theoretically based, not like a random cluster search across all voxels

    7. serial rank order

      This is somewhat different from generic information maintanence in WM?

    8. contextual information

      So this is basically the same as Working Memory


    1. At the same time,we observed that dlPFC reinstatement of CTD positively scaledwith the hippocampal pattern similarity between the two over-lapping contexts

      So in dlPFC the CTD did have similarity over the two contexts, even though they were distinct in the HPC?

    2. hippocampal differentiationeffect

      More similar task demands should yield increasingly DIFFERENT HPC representations? Because we separate them?

    3. congruency: match/mismatch between the CTD and the actualtask demands on the trial)

      Some trials in context 1/2 will have task demand associated with 3/4. This is 'incongruent'

    4. ask-setsinclude additional instructions on“how”

      Task sets are conceptually different from the process that can identify the task set based on cueing - that is an associative / semantic process?

    5. proactively retrieves probabilistically likely task-sets

      The cueing of task sets - dont conceptualize this as being even higher in the hierarchy?


    1. Black dots indicate stable fixed points

      You can see in the DMC the RNN has created 3 stable states it can occupy - not only the fixation at the start. Also two for the sample->delay moment, dependent on the category of the sample stimulus!

    2. time (ms)

      You can see accurate decoding along a wider spectrum of time - stable maintenance of information!

    3. stable states associated with each category at the end of the sample period in the DMC task

      This really is maintanence of classificatory information after the first stimulus!

    4. test stimulus

      The second stimulus is the 'test stimulus'

    5. sample stimulus

      The first stimulus is called the 'sample stimulus'

    6. It is likely that this phenomenonis mediatedbyinteractions among different brain regions involvedinthe OIC and DMC tasks. Indeed, LIP is connected with the dorsolateral prefrontal cortex(DLPFC)

      Cognitive Control area through dlPFC might be responsible for 'reprogramming' what goes on in LIP? Flexible readouts, different effect of recurrent encoding, etc?

    7. greater compression of activity among direction within categories in the DMC task

      Exactly what you would expect right, as direction does not matter for encoding category itself? We are talking about matching.

    8. compressing variability among directions within a category

      So it were mainly the response directions that were still encoded in the OIC task in LIP. In DMC this disappeared in favour of more population-level category coding.

    9. we evaluated the temporal stability of category decoding using SVM decoders that were trained at one time point and then tested at all other times pointsin the shared sample period

      Check the maintenance of information at time-point 1 by training a decoder on it and applying it to future time-points!

    10. category computations are supported by a common subpopulation of neurons in the early sample period and different subpopulations of neurons or different readout mechanisms in the late sample period

      Some perceptual mechanism is task-independent, simply information providing and encoding. Later readout can be flexible according to task demands!

    11. attractor dynamics appears to compress category-related information to a simpler, binary format by collapsing all directions within a category towards a single population state

      Working-Memory component induced in RNN as a function of task demand

    12. graded neural activity in the OIC

      Direct classification - more room for stimulus idiosyncratic representation?

    13. categorical encoding was more abstract with binary-like neural activityin the DMC

      Working-memory component


    1. contingency-degradation

      getting reward also when you pick random actions or don't pick anything - non-contingent rewards

    2. devaluation

      Sudden loss of reward or becoming aversive - MB-RL should instantly stop approaching, while MF-RL gradually

    3. assume a predefined state and action space

      It is really the representational structure that makes a HUGE difference in the effectiveness of any learning strategy deployed over it

    4. apply MF RL updates on retrospectively inferred latent states

      Have a model at the ready but don't do anticipatory planning - only retrospective evaluation according to MF-RL learning rules

    5. necessarily pressed into the singular axis of MF–MB

      So keep an open mind about such things - seems specifically aimed at HEURISTICS - form of wrongful MB-RL?

    6. Simple strategies that rely only on working memory,

      Looks exactly like MF

    7. For MB control to materialize, the agent must identify its goal, search its model for a path leading to that goal and then act on its plan

      Hard to model and understand the exact scope of the MB controller, so default to MF evidence if not accurate?

    8. features of trajectories in the environment

      Successor representations

    9. contextual information is used to segregate circumstances in which similar stimuli require different actions

      Again, a working-memory procedure?

    10. compound representations

      Essentially leveraging working memory to create apparently more 'flexible' behaviour, while in reality MF-RL is the only real 'learning' mechanism

    11. DAergic signals support both instrumental (action–value) and non-instrumental (state–value) learning in the striatum.

      The correct error signals are provided to facilitate any form of RL based learning. In Striatum?

    12. Computational RL theory built on the principles that animal behaviourists had distilled through experimentation, to develop the method of temporal difference (TD) learning (a MF algorithm)

      Origins of RL are purely associative learning - delta rule style

    13. derive value estimates for the different states or actions available

      Practical difference between computing desired states and inferring best actions VS directly computing desired actions without explicit state values

    14. dimensionality of learning — the axes of variance that describe how individuals learn and make choices — is well beyond two

      It is not only the speed / accuracy that is being traded off - which is what MB/MF and all other two-systems seem to boil down to.

    15. System1/System2

      Many learning theories seem to boil down to this, and MF-RL / MB-RL is exactly this!


  2. Aug 2020
  3. Local file Local file
    1. rostral ACC activity will predict the probability of switching strategies, whereas caudal ACC activity will predict the probability of staying within a strategy

      Univariate more activity in rostral (higher up the hierarchy) means switching? not perhaps more top-down control to STAY in the current strategy?

    2. important for hierarchical systems that integrateinformation over extended time periods

      In fact it might be the one fundamental reason such hierarchy is useful

    3. ill-equipped to simulate control processes that are inherently dynamic, such as the response delays introduced by switching between tasks

      Yes, or maybe also the continuous tasks as proposed by Hayden group!

    4. by extending previous work that integrated goals into RNNs [38]

      The goal-circuit model is an abstraction of the HRL-RNN model?

    5. [12,144,181]

      Relevant publications explicitly drawing from HRL-RNN theory for ACC

    6. cells in isolation, or univariate indicators of ACC function that average across the activity of entire cell populations

      Distributed patterns will not be picked up - the guiding function of ACC cannot be detected. Or perhaps, only the 'energizing' part of it

    7. caudal ACC and rostral ACC apply control signals that attenuate costs associated with the production of low-level actions

      Or is this in some way similar to a 'gating' mechanism, allowing stable representations for control to be 'updated' or amended to current needs, detected by a 'higher' system

    8. tonic dopamine levels in ACC stabilize the task in working memory

      Something like a hidden state / task set active encoding

    9. ACC damage does not interfere strongly with many of the putative functions that have been attributed to it

      Crucial aspect of the HRL theory: Everything can still happen, the ACC does not EXCLUSIVELY execute all of these functions, it simply strengthens / guides

    10. do large and sudden changes in the state space explain the conflict-likesignals that are commonly observed in ACC?

      So basically it is not necessarily an explicit encoding of error, rather the updating of the current context?

    11. distributedmanner

      What exactly is the meaning and the point of this?

    12. an arsenal of mathematical tools from dynamical systems analysis

      Reference 58 contains examples of nonlinear dynamical systems analysis for neural networks?! :o

    13. bidirectional connectivity between DLPFC and ACC

      Explicit recurrence - Learning to reinforcement learn in activity dynamics?

    14. the model will predict how hierarchical action sequences are representedat different levels of abstraction along the frontal midline

      increasingly abstract goals will be represented along a gradient of the spatial organization of the network - this might have something to do with connectivity between layers? cf. convolutional neural nets

    15. individual regions of ACC process both typesof information

      Cognitive and emotional functions of the ACC can be found an individual regions all throughout the area


    1. multiple regression problem, in which the different codingmodels were treated as predictor variables to the observed similarity matrix, the analysis enabledjoint estimation of multiple coding models (

      Super relevant: We can estimate the DEGREE to which different features are encoded on a trial-by-trial basis??

    2. task-switching study using MVPA

      So the sharpening can also be seen using fMRI - single neuron recordings not necessary!

    3. increase in the gain (i.e., sharpening) of fron-toparietal task-set coding

      This is exactly what is seen in the intracranial recording study Ebitz et al!

    4. by using a between-subjects RSAapproach, the analysis was not optimized to capture finer-grained representational structure thatcould be subject-specific

      Of course between people the geometry can take on a distinct form, but even if the encoded information is the same? Shouldn't RSA abstract over that?

    5. o task-generalcontrol representations were detected

      So per task it was very uniquely constructed - not one level for WM, inhibition, etc.

    6. separately, across different sub-domains of negative affec

      Orthogonal but consistent representation!

    7. This exclusive focuson behavioral measures may be suboptimal for construct validation, as brain activity measures canprovide more proximal, higher-dimensional readouts of the neural mechanisms of interest

      Is it expected that even though we cannot find common patterns in behaviour, we can still find common patterns in the brain? I thought even within the same task over different sessions, the control representation might differ...

    8. A construct validation approach is often used to address this issue.

      This is basically how IQ and the sub-measures of it are defined / measured

    9. lso relevant is the insight that RSA can be conducted in a time-dependent mannerwithin fMRI, such thattrialsform the dimensions of the similarity matrices

      How about sub-trails (e.g. only the picking of sugar) ?

    10. strength of conjunctive coding was robustly relatedto trial-by-trial response time

      So again - a scalar value strength type signal?

    11. For example, inter-ference occurs when a goal-relevant task set and an irrelevant yet prepotent set are simultaneouslyactive.

      Is this really a more elaborate model? Would'nt the 'interference' simply move any RSA model closer to the midline between color / word naming task set? We won't have very clear access to neural firing strengths or anything, if that would be relevant.

    12. one-dimensionalstructure of the model

      The model only investigated the representational strength along the face-house attentional dimension (not an interesting representational geometry?)

    13. specifying and comparing representational models is more flexible within RSA

      So the benefit is that we get more insight into the geometry of the representation, not just its presence / decodability? Compare different computational models.

    14. classification-based decoding, which we simply refer to here as “classification”, and RSA

      Distributed patterns can be subdivided like this: decoding and encoding models

    15. type and form of informationencoded in LPFC and associated regions of the FPN and CO

      So there is relevant information encoded in task set control reps: but there will probably still be a relevant 'intensity' summarizable in a scalar value?

    16. independent of particular stimuli, responses, or other task information

      Abstract control related factors such as 'congruency' abstract control signals away from directly related stimulus signals

    17. highly abstracted, one-dimensional factors

      This is the main issue right - Setting up experiments to identify one 'factor' of cognitive control, e.g. 'Congruency' in stroop tasks. It is much more complex multidimensional than that.


    1. inferred in the current experimentbecause the future value of a patch is, by design, different from itspast value

      dACC continuously learning a variable - the slope specifically? Or is it 'simply' trying to predict the prediction errors of lower layers. It seems like the latter would generalize less!

    2. The opposing time-linked signals observed do notsuggest that dACC and the other regions integrate rewards to asimple mean estimate (as RL-simple would), but instead pointtowards a comparison of recent and past reward rates necessaryfor the computation of reward trends.

      But this was based on full-region regression weights over different time bins, computed from a particular choice moment. How can you determine separate representations with whole-area betas, simultaneously recent reward increases, whereas past reward decreases, the activity of the whole region?

    3. which was updated on everytime step using a higher-order PE, PE* (that is, the difference ofobserved PE and expected PE)

      So there is some explicit hierarchy in the prediction errors - but is that really what goes on, or is there an explicit estimation of a trend, not necessarily based on the prediction errors themselves? It is mathematically identical!

    4. it is also possible that PEs may be used as a decisionvariable to guide decisions

      The ability of PEs to directly influence decision making, and not just learning, goes above and beyond simple-RL


    1. this relationship is heterogeneous; of these 58 neurons, 31.03% (n= 18/58) showed a positive slope and 18.97 % (n= 11/58) showed a negative slop

      Distance to prey is an important variable, and it is the actual code for time of impending reward - but it is not encoded by an overall rise in activity (typical fMRI analysis assumption!) It is encoded distributed over the neurons (perhaps RSA, but could still mask it if very single-neuron heavy?)

    2. The encoding of the control-relevant variable becomes higher when the expected reward for controlling becomes higher!


  4. Jul 2020
    1. Finally, some research in deep RL proposes to tackle explora-tion by sampling randomly in the space of hierarchical behaviors(Machado et al., 2017;Jinnai et al., 2020;Hansen et al., 2020).This induces a form of directed, temporally extended, randomexploration reminiscent of some animal foraging models (Viswa-nathan et al., 1999).

      Sampling from hierarchical behaviours?

    2. An example is prediction learning, inwhich the agent is trained to predict, on the basis of its currentsituation, what it will observe at future time steps (Wayne et al.,2018;Gelada et al., 2019).

      This might be what can get confused for successor representation signal from dACC?

    3. Song et al. (2017)trained a recurrent deep RL modelon a series of reward-based decision making tasks that havebeen studied in the neuroscience literature, reporting close cor-respondences between the activation patterns observed in thenetwork’s internal units and neurons in dorsolateral prefrontal,orbitofrontal, and parietal cortices (Figure 2C).

      Relevant stuff!


    1. An important future goal is to create multiscale neural computational models that better predictmore complex real world behaviors

      Is this something that we induce automatically with recurrence?

    2. participants made decisionsbased on the instantaneous reward rate and the reward rate trend

      Trend is captured at a higher level representation?

    3. conflict betweenshort-term (safe options) and long-term (risky options) was mediated by the dorsal anterior cingulatecortex (dACC)

      dACC signals their conflict - does it do so by strengthening the representations of the one it favours?

    4. superimposition of computations at the shorter time scale (a trial) and the longer time scale (a blockof trials)

      now, participants have to continuously evaluate which task they should be engaged in (which computations to perform?)

    5. human memoryforaging

      Memory retrieval / search = foraging?

    6. one simple decision: when to leave a patch

      The key inference to make in Multi Value Theorem

    7. One research area that examined decisions of multiple time scales is foraging theory

      Foraging == The study of the tradeoff between exploiting the current local habit / inertia VS finding a different niche?

    8. miss the crucial resources that could be available if the animal maintains largerscale computations about the wider environment

      This is basically an explore-exploit tradeoff, but now not over one uniform action space but explicitly emphasizing local//global environment?

    9. summarized in the theories of hierarchical reinforcement learning

      !! Oh yeah baby

    10. shifts of anterior to posterior brain areas

      General organizational principle: The more something is habit-formed, the more posterior it shifts? The more control it requires, the more frontal it is?

    11. habit formation occurs when repetitive computations arestreamlined

      Habits automatize local goals so agent can allocate processing to more complex global goals?

    12. simplest form of multiscale processing,but it is ubiquitous.

      Simple bias in favour of repeating previously rewarded action = simple operant conditioning?

    13. stable or slowly changing environments

      Requires less to no flexibility

    14. effectively modulatebetween local tasks while also considering multiple global goals and contextual factors

      Comes together nicely with a view on RL as central controller for 'homeostasis', or something like that. Would require highly hierarchical system of goals and representations.

    15. t has been historically assumed that theinformation processing in each trial is independent from information processing from other trials,and that once one trial completes, all the information processing is reset

      No inter-trial dependencies - but of course there are processing benefits / interferences and maybe trial-by-trial updates of response caution known as speed-accuracy tradeoff

    16. many experimental paradigmsfocus on short spatial or temporal scales.

      Also a problem for 'real' HRL or meta-learning?

    17. area-restrictedsearch

      Foraging strategy: Limit your attention to locations that were previously rewarding (?)

    18. An overarching analogyis foraging

      Foraging requires dynamic allocation and weighting of attention and evidence between multiple sources

    19. the degree to which working memory is considered

      A Colling & MJ Frank Paper: How much RL is actually WM?

    20. The requirement to integrate information over spatial and temporal scales in a widevariety of environments would seem to be a common feature underlying intelligent systems, and onewhose performance has a profound impact on behavior [16–21]

      Exactly: Generalization over representations and over temporal grain of behaviour

    21. much of the progress made in the latest“AI spring” are, as we describe below, achievements of multiscale processing.

      Generalization and broader tasks are the hallmark of the current success of AI


  5. Jun 2020
    1. Cells that report choice independently of taskshould lie on the diagonal (i.e., an angle ofp/4).Instead, the distribution of angles was signif-icantly bimodal across all cells

      So the MFC has different cells specialized for different tasks (familiarity vs categorization)? Disappointing - hoped for generalized / remapping.

    2. Choice decoding in the MFC was strongestshortly after stimulus onset, well before theresponse was made

      So it is not the actual execution of an action which is the information that is picked up - It is the direction contextual state-space representation heads towards?

    3. Decoding accuracy for choices was highestin the MFC

      MFC is more task-relevant for response selection

    4. In contrast, in theMFC (Fig. 3E, right), the relative positions ofthe four conditions were not preserved.

      MFC seems to rely on different representations. Familiarity - category pairs are not perserved over tasks hence not really represented fundamentally. In the HPC it seems the determining factor (for Dim 1).

    5. In the HA, the ability to decode categorywas not significantly different between thetwo tasks

      HPC / Amyg encode stimulus category in memory & re-represent, regardless of task context. Part of stimulus representation in general.

    6. In the MFC, decoding accuracy for imagecategory was significantly higher in the memorytask

      MFC encodes task variable = stimulus category only during relevant context / task-set

    7. MS cell responses reflected amemory process: they strengthened over blocksas memories became stronger

      memory-selective identified cells fire stronger and stronger with repeated presentation of stimulus & identify false from true negative!

    8. We first trained a decoder to discriminate tasktype on trials where the subject was instructedto reply with a button press, and then we testedthe performance of this decoder on trials wherethe subject was instructed to use saccades

      Decoder should generalize task classification across response modalitied - does so in MFC (dACC and SMA)

    9. . Cells showed significantmodulation of their firing rate during thebaseline period as a function of task type

      So already from instruction there is a reconfiguration - very complicated mapping from linguistic input to representations for decision making...

    10. Subjects indicated choices using either sac-cades (leftward or rightward eye movement)or button press while maintaining fixationat the center of the screen

      Different response modalities allows for disambiguation of coding towards a very specific execution - if similar encoding it's really more cognitive/central!

    11. We found that neuronal pop-ulations within the MFC formed two separatedecision axes

      So movement through state space in unique but intra-task consistent direction?

    12. MFC [dorsal anterior cingulatecortex (dACC

      MFC = dACC = MCC

    13. insensitive to response modality

      So its really about the abstract task demands, not the concrete action output

    14. The strength andgeometry of representations of familiaritywere task-insensitive in the HA but not in theMFC

      This is what creates the 'shadowing' pattern?

    15. whether an image was novel orfamiliar, or whether an image belonged to agiven visual category

      recognition memory vs categorization: yes or no responses in both case. For 'pictures' so stimuli can be the same across tasks

    16. phase-locking of MFCactivity to oscillations in the HA

      HPC memory representations and dACC task set representations?


    1. pattern of conflict modulation during one correct response is 489 orthogonal to the pattern during another correct response

      i.e. it is not a 'general boosting' effect -> only on average the activity of neurons can still increase, but it is all about upregulating the relevant neurons for this correct response

    2. higher when Ericksen conflict was present (Figure 2A)

      Yeah, in single neurons you can show the detection of general conflict this way, and it was not partitionable into different responses...

    3. representational geometry

      nice wording similar to RSA

    4. with Ericksen conflict than it was for trials without Ericksen

      what about simon?

      This does mean: Conflict increases representation shifting response toward correct action!

    5. AUC

      This axis has more predictive power when there is conflict than when there is no conflict (task is already so easy that the information is not needed, or at least a lot less?)

    6. G)

      Very clear effect! suspicious? how exactly did they even select the pseudo-populations, its not clear exactly from the methods to me

    7. amplification hypothesis, conversely, does not predict a unified conflict 341 detection axis in the population. Instead, it makes a prediction that is exactly contrary to 342 the epiphenomenal view: that conflict should shift population activity along task-variable 343 coding dimensions, but in the opposite direction. That is, conflict is predicted to amplify 344 task-relevant neural responses

      conflict means more control will be exterted. Heavier representation of whatever info it is that dACC encodes that 'pushes' for the correct action. This function of dACC would be in line with the context layer!?

    8. At the population level, then, the epiphenomenon hypothesis330 predicts that conflict should decrease the amount of information about the correct response 331 and shift neuronal population activity down along the axis in firing rate space that encodes 332 this response

      Because less % of neurons 'fighting' for the correct response are active, at least in total.

    9. Neurons that were tuned for a specific correct response were 298 often tuned to prefer the same Simon/Ericksen distractor response

      DLPFC is tuned to action-outcomes? -> in single neurons!

    10. In fact, the majority of conflict-sensitive 288 dACC neurons were not selective for either correct response or distractor responses (66.7%

      So the conflict is represented separately, not having much to do with action-outcomes.

    11. did still signal either Ericksen or Simon 277 conflict

      Simply the C-term in the ANOVA which is a binary coder for the general presence? Would also have more trials where its parameter is influential, does that influence estimation?

    12. neurons did not encode the distractor response

      So on trials with a unique distractor response, that action-outcome was not represented at all? It's interesting but then where does the actual conflict take place?

    13. significant 270 proportion of neurons were selective for the correct response

      So desired action-outcome is represented. I think that was already known about dACC.

    14. separate pools of 266 neurons corresponding to the two conflicting actions, and that conflict increases activity 267 because it uniquely activates both pools

      more neurons activate for the different possible action outcomes = more activity overall --> conflict signal. Makes sense.

    15. Furthermore, the population of cells whose responses were significantly 244 affected by Eriksen conflict was almost entirely non-overlapping with the population 245 significantly affected by Simon conflict (specifically, only one cell was significantly 246 modulated by both)

      Really separate representations for different aspects of the current task-set?

    16. additive model was a better fit to the data than other, more 205 flexible models

      So separate statistical significance testing shows effect for Eriksen, not for Simon, but regression model shows through model comparison that it's best to ascribe to them the same effect...

    17. (n=15/145) neurons had significantly different firing rates between Simon and no-196 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted March 15, 2020. . https://doi.org/10.1101/2020.03.14.991745doi: bioRxiv preprint

      No significant main effect but more single cells had a significant effect...? -> also directionality is not all positive, some positive some negative

    18. A small number of individual 187 neurons also had different activity levels on Eriksen conflict and no conflict trials (8.2%, 188 n=12/145 neurons, within-cell t-test)

      Note the difference between 'averaged over all neurons' (first report) or 'within one specific neuron' (this report)

    19. activity was higher on Ericksen conflict 185 trials than on no conflict trials

      for Eriksen flankers there is a main effect of conflict (vs no-conflict). Simon was not statistically significant. Was it mainly a power issue?

    20. Ericksen

      So is it Ericksen or Eriksen??

    21. 12 task conditions

      Here they acknowledge 12 task conditions, not 9.

    22. Within each task condition 730 (combination of correct response and distractor response), firing rates from separately 731 recorded neurons were randomly drawn with replacement to create a pseudotrial firing rate 732 vector for that task condition, with each entry corresponding to the activity of one neuron 733 in that condition

      Definition of pseudotrial

    23. pseudotrial vector x

      one trial for all different neurons in the current pseudopopulation matrix?

    24. The separating hyperplane for each choice i is the vector (a) that satisfies: 770 771 772 773 Meaning that βi is a vector orthogonal to the separating hyperplane in neuron-774 dimensional space, along which position is proportional to the log odds of that correct 775 response: this is the the coding dimension for that correct response

      Makes sense: If Beta is proportional to the log-odds of a correct response, a is the hyperplane that provides the best cutoff, which must be orthogonal. Multiplying two orthogonal vectors yields 0.

    25. X is the trials by neurons pseudopopulation matrix of firing rates

      So these pseudopopulations were random agglomerates of single neurons that were recorded, so many fits for random groups, and the best were kept?

    26. re-representing high-750 dimensional neural activity in a small number of dimensions that correspond to variables 751 of interest in the data

      Essentially this is kind of like constructing dissimilarity matrices over large groups of voxels?

    27. 4917.0 (1) 5826.5 (1)*

      Additive model is the winner in single cell firing rates -> coding simply for the notion of conflict? cf. the population coding from dimensionality reduction!

    28. Subtracting this expectation from the observed pattern 723 of activity left the residual activity that could not be explained by the linear co-activation 724 of task and distractor conditions

      So this is what to analyze: If this still covaries with conflict in some way it means we go beyond epiphenomenal?

    29. Within each neuron, 719 we calculated the expected firing rate for each task condition, marginalizing over 720 distractors, and for each distractor, marginalizing over tasks.

      Distractor = specific stimulus / location (e.g. '1' or 'left')?

      Task = conflict condition (e.g. Simon or Ericksen)?

    30. condition-averaged within neurons (9 data points per 691 neuron, reflecting all combinations of the 3 correct response, 3 Ericksen distractors, and 3 692 Simon distractors)

      How do all combinations of 3 responses lead to only 9 data points per neuron? 3x2x2 = 12.