70 Matching Annotations
  1. Jul 2020
    1. Finally, some research in deep RL proposes to tackle explora-tion by sampling randomly in the space of hierarchical behaviors(Machado et al., 2017;Jinnai et al., 2020;Hansen et al., 2020).This induces a form of directed, temporally extended, randomexploration reminiscent of some animal foraging models (Viswa-nathan et al., 1999).

      Sampling from hierarchical behaviours?

    2. An example is prediction learning, inwhich the agent is trained to predict, on the basis of its currentsituation, what it will observe at future time steps (Wayne et al.,2018;Gelada et al., 2019).

      This might be what can get confused for successor representation signal from dACC?

    3. Song et al. (2017)trained a recurrent deep RL modelon a series of reward-based decision making tasks that havebeen studied in the neuroscience literature, reporting close cor-respondences between the activation patterns observed in thenetwork’s internal units and neurons in dorsolateral prefrontal,orbitofrontal, and parietal cortices (Figure 2C).

      Relevant stuff!


    1. An important future goal is to create multiscale neural computational models that better predictmore complex real world behaviors

      Is this something that we induce automatically with recurrence?

    2. participants made decisionsbased on the instantaneous reward rate and the reward rate trend

      Trend is captured at a higher level representation?

    3. conflict betweenshort-term (safe options) and long-term (risky options) was mediated by the dorsal anterior cingulatecortex (dACC)

      dACC signals their conflict - does it do so by strengthening the representations of the one it favours?

    4. superimposition of computations at the shorter time scale (a trial) and the longer time scale (a blockof trials)

      now, participants have to continuously evaluate which task they should be engaged in (which computations to perform?)

    5. human memoryforaging

      Memory retrieval / search = foraging?

    6. one simple decision: when to leave a patch

      The key inference to make in Multi Value Theorem

    7. One research area that examined decisions of multiple time scales is foraging theory

      Foraging == The study of the tradeoff between exploiting the current local habit / inertia VS finding a different niche?

    8. miss the crucial resources that could be available if the animal maintains largerscale computations about the wider environment

      This is basically an explore-exploit tradeoff, but now not over one uniform action space but explicitly emphasizing local//global environment?

    9. summarized in the theories of hierarchical reinforcement learning

      !! Oh yeah baby

    10. shifts of anterior to posterior brain areas

      General organizational principle: The more something is habit-formed, the more posterior it shifts? The more control it requires, the more frontal it is?

    11. habit formation occurs when repetitive computations arestreamlined

      Habits automatize local goals so agent can allocate processing to more complex global goals?

    12. simplest form of multiscale processing,but it is ubiquitous.

      Simple bias in favour of repeating previously rewarded action = simple operant conditioning?

    13. stable or slowly changing environments

      Requires less to no flexibility

    14. effectively modulatebetween local tasks while also considering multiple global goals and contextual factors

      Comes together nicely with a view on RL as central controller for 'homeostasis', or something like that. Would require highly hierarchical system of goals and representations.

    15. t has been historically assumed that theinformation processing in each trial is independent from information processing from other trials,and that once one trial completes, all the information processing is reset

      No inter-trial dependencies - but of course there are processing benefits / interferences and maybe trial-by-trial updates of response caution known as speed-accuracy tradeoff

    16. many experimental paradigmsfocus on short spatial or temporal scales.

      Also a problem for 'real' HRL or meta-learning?

    17. area-restrictedsearch

      Foraging strategy: Limit your attention to locations that were previously rewarding (?)

    18. An overarching analogyis foraging

      Foraging requires dynamic allocation and weighting of attention and evidence between multiple sources

    19. the degree to which working memory is considered

      A Colling & MJ Frank Paper: How much RL is actually WM?

    20. The requirement to integrate information over spatial and temporal scales in a widevariety of environments would seem to be a common feature underlying intelligent systems, and onewhose performance has a profound impact on behavior [16–21]

      Exactly: Generalization over representations and over temporal grain of behaviour

    21. much of the progress made in the latest“AI spring” are, as we describe below, achievements of multiscale processing.

      Generalization and broader tasks are the hallmark of the current success of AI


  2. Jun 2020
    1. Cells that report choice independently of taskshould lie on the diagonal (i.e., an angle ofp/4).Instead, the distribution of angles was signif-icantly bimodal across all cells

      So the MFC has different cells specialized for different tasks (familiarity vs categorization)? Disappointing - hoped for generalized / remapping.

    2. Choice decoding in the MFC was strongestshortly after stimulus onset, well before theresponse was made

      So it is not the actual execution of an action which is the information that is picked up - It is the direction contextual state-space representation heads towards?

    3. Decoding accuracy for choices was highestin the MFC

      MFC is more task-relevant for response selection

    4. In contrast, in theMFC (Fig. 3E, right), the relative positions ofthe four conditions were not preserved.

      MFC seems to rely on different representations. Familiarity - category pairs are not perserved over tasks hence not really represented fundamentally. In the HPC it seems the determining factor (for Dim 1).

    5. In the HA, the ability to decode categorywas not significantly different between thetwo tasks

      HPC / Amyg encode stimulus category in memory & re-represent, regardless of task context. Part of stimulus representation in general.

    6. In the MFC, decoding accuracy for imagecategory was significantly higher in the memorytask

      MFC encodes task variable = stimulus category only during relevant context / task-set

    7. MS cell responses reflected amemory process: they strengthened over blocksas memories became stronger

      memory-selective identified cells fire stronger and stronger with repeated presentation of stimulus & identify false from true negative!

    8. We first trained a decoder to discriminate tasktype on trials where the subject was instructedto reply with a button press, and then we testedthe performance of this decoder on trials wherethe subject was instructed to use saccades

      Decoder should generalize task classification across response modalitied - does so in MFC (dACC and SMA)

    9. . Cells showed significantmodulation of their firing rate during thebaseline period as a function of task type

      So already from instruction there is a reconfiguration - very complicated mapping from linguistic input to representations for decision making...

    10. Subjects indicated choices using either sac-cades (leftward or rightward eye movement)or button press while maintaining fixationat the center of the screen

      Different response modalities allows for disambiguation of coding towards a very specific execution - if similar encoding it's really more cognitive/central!

    11. We found that neuronal pop-ulations within the MFC formed two separatedecision axes

      So movement through state space in unique but intra-task consistent direction?

    12. MFC [dorsal anterior cingulatecortex (dACC

      MFC = dACC = MCC

    13. insensitive to response modality

      So its really about the abstract task demands, not the concrete action output

    14. The strength andgeometry of representations of familiaritywere task-insensitive in the HA but not in theMFC

      This is what creates the 'shadowing' pattern?

    15. whether an image was novel orfamiliar, or whether an image belonged to agiven visual category

      recognition memory vs categorization: yes or no responses in both case. For 'pictures' so stimuli can be the same across tasks

    16. phase-locking of MFCactivity to oscillations in the HA

      HPC memory representations and dACC task set representations?


    1. pattern of conflict modulation during one correct response is 489 orthogonal to the pattern during another correct response

      i.e. it is not a 'general boosting' effect -> only on average the activity of neurons can still increase, but it is all about upregulating the relevant neurons for this correct response

    2. higher when Ericksen conflict was present (Figure 2A)

      Yeah, in single neurons you can show the detection of general conflict this way, and it was not partitionable into different responses...

    3. representational geometry

      nice wording similar to RSA

    4. with Ericksen conflict than it was for trials without Ericksen

      what about simon?

      This does mean: Conflict increases representation shifting response toward correct action!

    5. AUC

      This axis has more predictive power when there is conflict than when there is no conflict (task is already so easy that the information is not needed, or at least a lot less?)

    6. G)

      Very clear effect! suspicious? how exactly did they even select the pseudo-populations, its not clear exactly from the methods to me

    7. amplification hypothesis, conversely, does not predict a unified conflict 341 detection axis in the population. Instead, it makes a prediction that is exactly contrary to 342 the epiphenomenal view: that conflict should shift population activity along task-variable 343 coding dimensions, but in the opposite direction. That is, conflict is predicted to amplify 344 task-relevant neural responses

      conflict means more control will be exterted. Heavier representation of whatever info it is that dACC encodes that 'pushes' for the correct action. This function of dACC would be in line with the context layer!?

    8. At the population level, then, the epiphenomenon hypothesis330 predicts that conflict should decrease the amount of information about the correct response 331 and shift neuronal population activity down along the axis in firing rate space that encodes 332 this response

      Because less % of neurons 'fighting' for the correct response are active, at least in total.

    9. Neurons that were tuned for a specific correct response were 298 often tuned to prefer the same Simon/Ericksen distractor response

      DLPFC is tuned to action-outcomes? -> in single neurons!

    10. In fact, the majority of conflict-sensitive 288 dACC neurons were not selective for either correct response or distractor responses (66.7%

      So the conflict is represented separately, not having much to do with action-outcomes.

    11. did still signal either Ericksen or Simon 277 conflict

      Simply the C-term in the ANOVA which is a binary coder for the general presence? Would also have more trials where its parameter is influential, does that influence estimation?

    12. neurons did not encode the distractor response

      So on trials with a unique distractor response, that action-outcome was not represented at all? It's interesting but then where does the actual conflict take place?

    13. significant 270 proportion of neurons were selective for the correct response

      So desired action-outcome is represented. I think that was already known about dACC.

    14. separate pools of 266 neurons corresponding to the two conflicting actions, and that conflict increases activity 267 because it uniquely activates both pools

      more neurons activate for the different possible action outcomes = more activity overall --> conflict signal. Makes sense.

    15. Furthermore, the population of cells whose responses were significantly 244 affected by Eriksen conflict was almost entirely non-overlapping with the population 245 significantly affected by Simon conflict (specifically, only one cell was significantly 246 modulated by both)

      Really separate representations for different aspects of the current task-set?

    16. additive model was a better fit to the data than other, more 205 flexible models

      So separate statistical significance testing shows effect for Eriksen, not for Simon, but regression model shows through model comparison that it's best to ascribe to them the same effect...

    17. (n=15/145) neurons had significantly different firing rates between Simon and no-196 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted March 15, 2020. . https://doi.org/10.1101/2020.03.14.991745doi: bioRxiv preprint

      No significant main effect but more single cells had a significant effect...? -> also directionality is not all positive, some positive some negative

    18. A small number of individual 187 neurons also had different activity levels on Eriksen conflict and no conflict trials (8.2%, 188 n=12/145 neurons, within-cell t-test)

      Note the difference between 'averaged over all neurons' (first report) or 'within one specific neuron' (this report)

    19. activity was higher on Ericksen conflict 185 trials than on no conflict trials

      for Eriksen flankers there is a main effect of conflict (vs no-conflict). Simon was not statistically significant. Was it mainly a power issue?

    20. Ericksen

      So is it Ericksen or Eriksen??

    21. 12 task conditions

      Here they acknowledge 12 task conditions, not 9.

    22. Within each task condition 730 (combination of correct response and distractor response), firing rates from separately 731 recorded neurons were randomly drawn with replacement to create a pseudotrial firing rate 732 vector for that task condition, with each entry corresponding to the activity of one neuron 733 in that condition

      Definition of pseudotrial

    23. pseudotrial vector x

      one trial for all different neurons in the current pseudopopulation matrix?

    24. The separating hyperplane for each choice i is the vector (a) that satisfies: 770 771 772 773 Meaning that βi is a vector orthogonal to the separating hyperplane in neuron-774 dimensional space, along which position is proportional to the log odds of that correct 775 response: this is the the coding dimension for that correct response

      Makes sense: If Beta is proportional to the log-odds of a correct response, a is the hyperplane that provides the best cutoff, which must be orthogonal. Multiplying two orthogonal vectors yields 0.

    25. X is the trials by neurons pseudopopulation matrix of firing rates

      So these pseudopopulations were random agglomerates of single neurons that were recorded, so many fits for random groups, and the best were kept?

    26. re-representing high-750 dimensional neural activity in a small number of dimensions that correspond to variables 751 of interest in the data

      Essentially this is kind of like constructing dissimilarity matrices over large groups of voxels?

    27. 4917.0 (1) 5826.5 (1)*

      Additive model is the winner in single cell firing rates -> coding simply for the notion of conflict? cf. the population coding from dimensionality reduction!

    28. Subtracting this expectation from the observed pattern 723 of activity left the residual activity that could not be explained by the linear co-activation 724 of task and distractor conditions

      So this is what to analyze: If this still covaries with conflict in some way it means we go beyond epiphenomenal?

    29. Within each neuron, 719 we calculated the expected firing rate for each task condition, marginalizing over 720 distractors, and for each distractor, marginalizing over tasks.

      Distractor = specific stimulus / location (e.g. '1' or 'left')?

      Task = conflict condition (e.g. Simon or Ericksen)?

    30. condition-averaged within neurons (9 data points per 691 neuron, reflecting all combinations of the 3 correct response, 3 Ericksen distractors, and 3 692 Simon distractors)

      How do all combinations of 3 responses lead to only 9 data points per neuron? 3x2x2 = 12.