683 Matching Annotations

Sep 2024
arxiv.org arxiv.org

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

2
1. guillefix 11 Sep 2024
  
  in Public
  
  Figure
  
  would be nice to see the error, to see if it follows a power law
2. guillefix 11 Sep 2024
  
  in Public
  
  Both MindEye2 and MindEye1 also map brain activityto the latent space of Stable Diffusion’s (Rombach et al.,2022) variational autoencoder (VAE) to obtain blurry recon-structions that lack high-level semantic content but performwell on low-level image metrics (e.g., color, texture, spatialposition), which get combined with the semantically richoutputs from the diffusion prior to return reconstructionsthat perform well across perceptual and semantic features.
  
  like spatial information?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2403.11207
Dec 2023
www.nature.com www.nature.com

Full-waveform inversion imaging of the human brain

2
1. guillefix 09 Dec 2023
  
  in Public
  
  Other ultrasound methodologies, employing higherfrequencies, are more severely impacted by signal attenuation,both in the skull and in soft tissue
  
  but maybe shallow imaging could work? hmm
2. guillefix 09 Dec 2023
  
  in Public
  
  Time-of-flight tomography uses a short-wavelength approx-imation, basing its analysis on the simplified physics of raytheory in which the effects of transmission through aheterogeneous medium are represented by a simplechange in travel time. For a finite wavelength, wavetransmitted through a medium that is heterogeneous onmany scales, such delay times are only sensitive to theproperties of the medium averaged over the dimensions ofthe first Fresnel zone 11 .
  
  hmm why?
Visit annotations in context

Annotators

guillefix

URL

nature.com/articles/s41746-020-0240-8.pdf
Nov 2023
arxiv.org arxiv.org

2303.03478.pdf

1
1. guillefix 21 Nov 2023
  
  in Public
  
  Upon receiving observations y, solving a Bayesian inverse problem involves sampling theconditional distribution of x given y (Tarantola, 2005)
  
  I guess adding the noise also helps in making the problem well-posed, i.e. having a solution
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2303.03478.pdf
www.nature.com www.nature.com

Full wave 3D inverse scattering transmission ultrasound tomography in the presence of high contrast

1
1. guillefix 16 Nov 2023
  
  in Public
  
  For the transmission inversion we use a model based large scale minimization at a particular angular fre-quency ωj based on an L 2 minimization.
  
  I think the Fourier transform they apply is in the spatial domain right?
Visit annotations in context

Annotators

guillefix

URL

nature.com/articles/s41598-020-76754-3.pdf
Oct 2023
arxiv.org arxiv.org

2308.12969.pdf

1
1. guillefix 02 Oct 2023
  
  in Public
  
  At test time, Gr is the sameas F(P ∗)
  
  I think this should be $G_p$ ?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2308.12969.pdf
Jul 2023
www.biorxiv.org www.biorxiv.org

A window to the brain: ultrasound imaging of human neural activity through a permanent acoustic window

1
1. guillefix 25 Jul 2023
  
  in Public
  
  Power Doppler intensity for each implant scenario over N=15 acquisitions. Stars indicate statistical differences (paired sampledt-test) between each scenario compared to the No implant scenario (* : p < 0.05 , **: p < 0.01 , ***: p < 0.001) (F) SNR attenuationfor each implant scenario as function of the depth (N=15 acquisitions).(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.The copyright holder for this preprintthis version posted June 15, 2023.;https://doi.org/10.1101/2023.06.14.544094doi:bioRxiv preprint
  
  would have been nice to compare with skull tissue, to get an idea of how much of an improvement this is!
Visit annotations in context

Annotators

guillefix

URL

biorxiv.org/content/10.1101/2023.06.14.544094v1.full.pdf
Oct 2022
distill.pub distill.pub

Feature Visualization

1
1. guillefix 16 Oct 2022
  
  in Public
  
  A good first guess is one that makes your data decorrelated and whitened.
  
  I guess the intuition here can be obtained by imagining doing gradient descent on a space where the manifold of images (the data) is very far from a ball (decorrelated and whitened): gradient descent is likely to move out of it
Visit annotations in context

Annotators

guillefix

URL

distill.pub/2017/feature-visualization
Sep 2022
forum.effectivealtruism.org forum.effectivealtruism.org

Mind Enhancement Cause Exploration - EA Forum

2
1. guillefix 15 Sep 2022
  
  in Public
  
  given legal tractability
  
  how are u going to deal with this?
2. guillefix 15 Sep 2022
  
  in Public
  
  A clinical mind enhancement research group could conduct healthy-volunteer clinical trials on effects and safety of MEIs to gather sufficient data to push almost deployable MEIs to deployable level.
  
  Yes we need more of this!
Visit annotations in context

Annotators

guillefix

URL

forum.effectivealtruism.org/posts/hGY3eErGzEef7Ck64/mind-enhancement-cause-exploration
Aug 2022
forum.effectivealtruism.org forum.effectivealtruism.org

Paul Christiano: Current work in AI alignment - EA Forum

4
1. guillefix 07 Aug 2022
  
  in Public
  
  you lose all the value of using AI if you won’t use unaligned systems
  
  isn't this just a matter of weighting values?
2. guillefix 07 Aug 2022
  
  in Public
  
  having an AI that is competent or intelligent
  
  which, is something I want
3. guillefix 07 Aug 2022
  
  in Public
  
  And if what I want is not measurable
  
  But it has to be, at some level? Or else how can we even objectively talk about it?
4. guillefix 07 Aug 2022
  
  in Public
  
  I might think it's really important for that new system to not mess up. I think this is a really important problem, but it's worth separating it from the problem of building AI that's trying to do what we want it to do.
  
  Why do you think it is important separating them?
Visit annotations in context

Annotators

guillefix

URL

forum.effectivealtruism.org/posts/63stBTw3WAW6k45dY/paul-christiano-current-work-in-ai-alignment
Local file Local file

Untitled document

2
1. guillefix 07 Aug 2022
  
  in Public
  
  ou could stay on a diet, spend the optimal amountof time learning which activities will achieve your goals, and then follow through on anoptimal plan, no matter how tedious it was to execute.
  
  WHY do we equate rationality with goals not changing
2. guillefix 07 Aug 2022
  
  in Public
  
  biological cognitiveenhancement (Sandberg 2011)
  
  cognitive enhancement may not be biological
Annotators

guillefix
www.alignmentforum.org www.alignmentforum.org

What failure looks like - AI Alignment Forum

6
1. guillefix 07 Aug 2022
  
  in Public
  
  then “slightly increase their degree of influence-seeking behavior” would be just as good a modification as “slightly improve their conception of the goal.”
  
  but he former would take more computation to connect to the goal? but it could also be simpler hmm? hmm
2. guillefix 07 Aug 2022
  
  in Public
  
  in the broader landscape of “possible cognitive policies.”
  
  with a metric related to goals of those policies?
3. guillefix 07 Aug 2022
  
  in Public
  
  Law enforcement will drive down complaints and increase reported sense of security. Eventually this will be driven by creating a false sense of security, hiding information about law enforcement failures, suppressing complaints, and coercing and manipulating citizens.
  
  WHYYYY
4. guillefix 07 Aug 2022
  
  in Public
  
  Investors will “own” shares of increasingly profitable corporations, and will sometimes try to use their profits to affect the world. Eventually instead of actually having an impact they will be surrounded by advisors who manipulate them into thinking they’ve had an impact.
  
  WHY??
5. guillefix 07 Aug 2022
  
  in Public
  
  We will try to harness this power by constructing proxies for what we care about, but over time those proxies will come apart:
  
  but why cant the proxies change?
6. guillefix 07 Aug 2022
  
  in Public
  
  -that can’t be done by trial and error
  
  hmm why not? if we have a relatively clear picture of what we want, or at least can tell stuff we want from stuff we dont, why cant we just try stuff until we find whether we want something?
Visit annotations in context

Annotators

guillefix

URL

alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like
Jul 2022
ai-alignment.com ai-alignment.com

Clarifying “AI alignment”

1
1. guillefix 31 Jul 2022
  
  in Public
  
  In particular, this is the problem of getting your AI to try to do the right thing, not the problem of figuring out which thing is right.
  
  what is the difference really?
Visit annotations in context

Annotators

guillefix

URL

ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6
www.alignmentforum.org www.alignmentforum.org

Risks from Learned Optimization: Introduction - AI Alignment Forum

4
1. guillefix 30 Jul 2022
  
  in Public
  
  As an example to illustrate the base/mesa distinction in a different domain, and the possibility of misalignment between the base and mesa- objectives, consider biological evolution.
  
  wouldnt say that evolution is an optimizer under their definition. It displays behavior that can be approximated by an optimization process, but it's not really one according to their definition (search + selecting candidates with best objective) or even if u could describe evolution as such a process for a fixed environment, once u take into account that the environment is changed by the evolutionary process, i dont think it can be cleanly described as an optimization process any more (again under their definition of one)
2. guillefix 30 Jul 2022
  
  in Public
  
  Arguably, any possible system has a behavioral objective—including bricks and bottle caps. However, for non-optimizers, the appropriate behavioral objective might just be “1 if the actions taken are those that are in fact taken by the system and 0 otherwise,”[6] and it is thus neither interesting nor useful to know that the system is acting to optimize this objective. For example, the behavioral objective “optimized” by a bottle cap is the objective of behaving like a bottle cap.[7] However, if the system is an optimizer, then it is more likely that it will have a meaningful behavioral objective. That is, to the degree that a mesa-optimizer’s output is systematically selected to optimize its mesa-objective, its behavior may look more like coherent attempts to move the world in a particular direction.[8]
  
  right but things which arent optimizers may be equally undesirable in terms of their behaviour?
3. guillefix 30 Jul 2022
  
  in Public
  
  When a base optimizer generates a mesa-optimizer, safety properties of the base optimizer's objective may not transfer to the mesa-optimizer.
  
  thus making the base optimizer itself not safe?
  
  But yeah some properties may not transfer
4. guillefix 30 Jul 2022
  
  in Public
  
  Whether a system is an optimizer is a property of its internal structure—what algorithm it is physically implementing—and not a property of its input-output behavior. Importantly, the fact that a system’s behavior results in some objective being maximized does not make the system an optimizer. For example, a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.(1) Rather, bottle caps have been optimized to keep water in place. The optimizer in this situation is the human that designed the bottle cap by searching through the space of possible tools for one to successfully hold water in a bottle. Similarly, image-classifying neural networks are optimized to achieve low error in their classifications, but are not, in general, themselves performing optimization.
  
  Hmm, I mean one could be pedantic, and say that a bottle cap is searching over atomic configurations using thermal fluctuations and finding a low energy configuration that gives it its shape..
Visit annotations in context

Annotators

guillefix

URL

alignmentforum.org/posts/FkgsxrGf3QxhfLWHG/risks-from-learned-optimization-introduction
forum.effectivealtruism.org forum.effectivealtruism.org

We should undertake community-building in Social VR platforms right now (not later) - EA Forum

8
1. guillefix 24 Jul 2022
  
  in Public
  
  The VR devices of 2025 (or potentially even next year) will be qualitatively different from what we have today:
  
  Other things that are needed: handtracking and matbe periferal tracking (or mixed reality/ optical tracking )
2. guillefix 24 Jul 2022
  
  in Public
  
  Some body language.
  
  can go beyond. some people already sharing biometrics
3. guillefix 24 Jul 2022
  
  in Public
  
  Why is it so much better?
  
  i think fa e tracking for subtle microexpressions could be necessary
4. guillefix 24 Jul 2022
  
  in Public
  
  ,
  
  .
5. guillefix 24 Jul 2022
  
  in Public
  
  The sense of being transported to another place, being bodily present with others, is important for committing with your whole self to the people you're with and the thing you're doing together.
  
  makes sure ur attention is focused
6. guillefix 24 Jul 2022
  
  in Public
  
  ,
  
  .
7. guillefix 24 Jul 2022
  
  in Public
  
  Most people don't realize this (and that's a large part of the problem), but when you have an audio call without audio isolation (without wearing headphones), the system silences you when other people are talking in order to prevent audio feedback. This disrupts natural turntaking negotiation and makes very lively conversation impossible.
  
  this seems not vr specific but yeah more common i vr awo
8. guillefix 24 Jul 2022
  
  in Public
  
  Costs about 100$
  
  more
Visit annotations in context

Annotators

guillefix

URL

forum.effectivealtruism.org/posts/9WymNxuDKNogkKpe5/we-should-undertake-community-building-in-social-vr
arbital.com arbital.com

General intelligence

1
1. guillefix 23 Jul 2022
  
  in Public
  
  "human-level AI
  
  well, i mean by human-level AI, an AI that is human-level in the sense of its generality. Why should we assume humans are the most general?
Visit annotations in context

Annotators

guillefix

URL

arbital.com/p/general_intelligence/
www.adept.ai www.adept.ai

Adept AI Labs

1
1. guillefix 22 Jul 2022
  
  in Public
  
  useful
  
  and fun!
Visit annotations in context

Annotators

guillefix

URL

adept.ai/post/introducing-adept
arxiv.org arxiv.org

Prioritized training on points that are learnable, worth learning, and not yet learnt

1
1. guillefix 22 Jul 2022
  
  in Public
  
  For a model using a point estimate of θ (such as anMLE or MAP), rather than a distribution over θ
  
  why is this necessary?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2206.07137v1.pdf
proceedings.mlr.press proceedings.mlr.press

andreas17a.pdf

1
1. guillefix 21 Jul 2022
  
  in Public
  
  Specifically, is itnecessary to explicitly ground high-level actions into therepresentation of the environment? Or is it sufficient tosimply inform the learner about the abstract structure ofpolicies, without ever specifying how high-level behaviorsshould make use of primitive percepts or actions?
  
  do we need grounding?
Visit annotations in context

Annotators

guillefix

URL

proceedings.mlr.press/v70/andreas17a/andreas17a.pdf
arxiv.org arxiv.org

1711.00482.pdf

1
1. guillefix 21 Jul 2022
  
  in Public
  
  For the example inFigure 2, f might be an image rating model (Socheret al., 2014) that outputs a scalar judgment y of howwell an image x matches a caption w
  
  like CLIP!
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/1711.00482.pdf
arxiv.org arxiv.org

2005.08100.pdf

1
1. guillefix 20 Jul 2022
  
  in Public
  
  While Transformers are good at modelinglong-range global context, they are less capable to extract fine-grained local feature patterns.
  
  why is this though? hmm
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2005.08100
www.semanticscholar.org www.semanticscholar.org

2203.15556.pdf

1
1. guillefix 20 Jul 2022
  
  in Public
  
  The 10th and 90th percentiles are estimated viabootstrapping data (80% of the dataset is sampled 100 times)
  
  what does "80% of the dataset is sample 100 times"? as in what dataset are they refering to? the dataset of (N,D,L) values?
Visit annotations in context

Annotators

guillefix

URL

semanticscholar.org/reader/8342b592fe238f3d230e4959b06fd10153c45db1
arxiv.org arxiv.org

2112.11446.pdf

1
1. guillefix 19 Jul 2022
  
  in Public
  
  The relation between data compression (via prediction)and intelligence
  
  i would say there's a connection bertween compression and generalization, and between performance and intelligence. And therefore because for a given amount of data, better generalization means better preformance, better compression tends to lead to higher intelligence
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2112.11446
arxiv.org arxiv.org

2108.07732.pdf

4
1. guillefix 19 Jul 2022
  
  in Public
  
  This evaluation is perhaps slightly unfair, as we have not performed the obvious step of training the model on a much largerdataset of executions. This is an interesting direction for future work.
  
  It would be interesting to see if doing this would improve program synthesis
2. guillefix 19 Jul 2022
  
  in Public
  
  100% 12%
  
  hmm if the 12% is due to variability in sampling how come all of the ones that are solved in one but not the other, are in the edited dataset, given that these samples are the same??
3. guillefix 19 Jul 2022
  
  in Public
  
  , indicating general variability in model performance
  
  do the two models have the same parameters and prompt? i guess it cant be, unless this variability if purely from the sampling?
4. guillefix 19 Jul 2022
  
  in Public
  
  This method of checking correctness forced us to filter the MathQA dataset to keeponly those problems for which the code evaluates to the declared numerical answer, resulting in us removing 45% ofproblems.
  
  what? so almost half of MathQA has wrong answers??
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2108.07732.pdf
www.cold-takes.com www.cold-takes.com

Forecasting transformative AI: the "biological anchors" method in a nutshell

2
1. guillefix 16 Jul 2022
  
  in Public
  
  How to characterize "task type," estimating how "difficult" and expensive a task is to “try” or “watch” once.
  
  this can be interpreted as asking about how much data is needed to be processed basically
2. guillefix 16 Jul 2022
  
  in Public
  
  Trained on a task where each "try" took days, weeks, or months of intensive "thinking."
  
  hmm and how many tries are needed?
Visit annotations in context

Annotators

guillefix

URL

cold-takes.com/forecasting-transformative-ai-the-biological-anchors-method-in-a-nutshell/
www.lesswrong.com www.lesswrong.com

Introduction to Pragmatic AI Safety [Pragmatic AI Safety #1] - LessWrong

1
1. guillefix 15 Jul 2022
  
  in Public
  
  and that safety research that advances capabilities is not safely scalable to a broader research community,
  
  what?
Visit annotations in context

Annotators

guillefix

URL

lesswrong.com/posts/bffA9WC9nEJhtagQi/introduction-to-pragmatic-ai-safety-pragmatic-ai-safety-1
Local file Local file

-1.4cmAGI Safety From First Principles

2
1. guillefix 15 Jul 2022
  
  in Public
  
  In the generalisation-based approach, the way to create superhuman CEOsis to use other data-rich tasks (which may be very different from the taskswe actually want an AI CEO to do) to train AIs to develop a range of usefulcognitive skills.
  
  This is just unsupervised pretraining
2. guillefix 15 Jul 2022
  
  in Public
  
  reinforcementlearning (which often requires billions of training steps even for much simplertasks).
  
  reinforcement learning isnt the best type of algorithm for talking about this. We should be taking about supervised learning. And then the figures look very different
Annotators

guillefix
forum.effectivealtruism.org forum.effectivealtruism.org

A New X-Risk Factor: Brain-Computer Interfaces - EA Forum

7
1. guillefix 12 Jul 2022
  
  in Public
  
  if this is possible
  
  if either is possible
2. guillefix 12 Jul 2022
  
  in Public
  
  BCI as Existential Security Factor
  
  Also could BCI be used to secure democratic states from devolving into totalitarism by allowing people to more strongly uphold and protect democratic principles?
3. guillefix 12 Jul 2022
  
  in Public
  
  Supposing our estimates are out by an order of magnitude
  
  in the sense of the final estimate not the individual factors right?
4. guillefix 12 Jul 2022
  
  in Public
  
  Having a tiny fraction of humanity free would not have powerful longterm positive effects for humanity overall, as it would be vanishingly unlikely that small, unthreatening populations could liberate humanity and recover a more positive future.
  
  hmm maybe by escaping into space? hmm
5. guillefix 11 Jul 2022
  
  in Public
  
  Further evidence can be seen in the steady expansion in government surveillance by even democratic governments.
  
  again more of what you were talking about that democratic states seem not stable typically? hmm
6. guillefix 11 Jul 2022
  
  in Public
  
  If there were only a single actor controlling the development of all BCIs, it would likely be an easier situation to control and regulate. Having multiple state actors, each of whom can decide the level of coercion they wish their BCI to be built with, is a much more complex scenario
  
  lol so if the world was authortarian, it would be easier to control, wouldnt it?
7. guillefix 11 Jul 2022
  
  in Public
  
  But BCI based surveillance would have no such flaw.
  
  Well BCI tech will have loopholes that defectors could probably utilize
Visit annotations in context

Annotators

guillefix

URL

forum.effectivealtruism.org/posts/qfDeCGxBTFhJANAWm/a-new-x-risk-factor-brain-computer-interfaces-1
May 2022
Local file Local file

Untitled document

5
1. guillefix 26 May 2022
  
  in Public
  
  This allows the system to interactively perform actions and probecausal relationships
  
  this distinction proposed here amounts to me more like the distinction between onpolicy and offpolicy learning.hmm. Because if u can do offpolicy learning, it doesnt matter much where the data came from
2. guillefix 26 May 2022
  
  in Public
  
  The semantic layer is not contained in the data
  
  Why?
3. guillefix 26 May 2022
  
  in Public
  
  And the animal-level part might not be themost crucial part in this equation.
  
  That is indeed an interesting hypothesis
4. guillefix 26 May 2022
  
  in Public
  
  Another example of what can be done in terms of hybridization is how we could use agenetic algorithm to evolve a formal grammar for a particular language, using a large lan-guage model like GPT-3 as an oracle to compute a fitness by comparing the kind of sentencesthat the current grammar candidates are producing and assessing how statistically realisticthey sound.
  
  like distilling DNNs to a simpler program hmm
5. guillefix 26 May 2022
  
  in Public
  
  its focus on handling only solutionspaces that happen to allow for a gradient computation.
  
  what about the techniques used in dVAEs/MoEs to deal with non-differentiable components?
Annotators

guillefix
arxiv.org arxiv.org

1810.08647.pdf

1
1. guillefix 12 May 2022
  
  in Public
  
  However, to avoid circular dependencies in the graph, it re-quires that agent k choose its action before j, and thereforek can influence j but j cannot influence k.
  
  One thing i dont get is why they didnt try a basic influence computation based on the effect of the action at the previous time step on the current action. That way it doesnt have problems with circular dependencies like they have in their basic set up
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/1810.08647.pdf
arxiv.org arxiv.org

2111.10364.pdf

2
1. guillefix 12 May 2022
  
  in Public
  
  As thesame as the reward case, DT takes the summation of the state-feature over a trajectory as an input
  
  hmm yeah this seems inherently less powerful by design no?
2. guillefix 06 May 2022
  
  in Public
  
  n this case, CDT is unnecessary because if Φ learns sufficient featuresof s, matching their means, i.e. moments, through DT is enough to match any distribution to anarbitrary precision (Wainwright & Jordan, 2008; Li et al., 2015)
  
  i.e. we can just matcht the means of Phi which could be any number of moments of s (thus matching any distributions of s)
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2111.10364.pdf
Apr 2022
arxiv.org arxiv.org

2204.05862.pdf

2
1. guillefix 27 Apr 2022
  
  in Public
  
  We split our static dataset 50:50, andtrained separate PMs on each half, which we refer to as train PMs and test PMs
  
  the test PM is not necessarily an unbiased estimate of the true reward tho? Tho I guess we can assume it's close to one~?
2. guillefix 27 Apr 2022
  
  in Public
  
  This means that our helpfulness dataset goes ‘up’ in desirability during the conversation, while our harmlessnessdataset goes ‘down’ in desirability. We chose the latter to thoroughly explore bad behavior, but it is likely not idealfor teaching good behavior. We believe this difference in our data distributions creates subtle problems for RLHF, andsuggest that others who want to use RLHF to train safer models consider the analysis in Section 4.4.
  
  " We chose the latter to thoroughly explore bad behavior, but it is likely not ideal for teaching good behavior." hmm what do they mean? of course the bad behaviour dataset is not good for teaching good behaviour? or do they mean that even negatively-rewarding that data is not an ideal way to teach good behavior? hmm?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2204.05862
arxiv.org arxiv.org

2204.06252.pdf

1
1. guillefix 21 Apr 2022
  
  in Public
  
  We hypothesize that this might be to a domain gapbetween the natural images that CLIP has been trained on andthe simulated images from CALVIN.
  
  or that images and related texts are not actually mapped to same spaces, people seem to find? hmm
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2204.06252.pdf
Mar 2022
calinon.ch calinon.ch

Untitled document

1
1. guillefix 24 Mar 2022
  
  in Public
  
  The task parameters referto the variables that can be collected by the system andthat describe a situation, such as positions of objects inthe environment. The task parameters can be fixed duringan execution trial or they can vary while the motion isexecuted
  
  So they are basically what would be called "observations" in RL
Visit annotations in context

Annotators

guillefix

URL

calinon.ch/papers/Calinon-JIST2015.pdf
Feb 2022
openreview.net openreview.net

should_i_run_offline_reinforce.pdf

3
1. guillefix 17 Feb 2022
  
  in Public
  
  C∗ ∈[1,1+ ̃O(1/N)]
  
  why would C* (or the bounds on it) depend on the number of data samples N?
2. guillefix 16 Feb 2022
  
  in Public
  
  For analysis purposes, we consider a BC algorithm that matches the empiricalbehavior policy on states in the offline dataset, and takes uniform random actions outside the supportof the dataset.
  
  Seriously? So no assumption of generalization? Which is precisely the strength of BC....
3. guillefix 16 Feb 2022
  
  in Public
  
  This condition holds in sparse-reward tasks
  
  doesnt it hold even if there's a reward at every time step?
Visit annotations in context

Annotators

guillefix

URL

openreview.net/pdf
Dec 2021
openreview.net openreview.net

learning_to_reach_goals_via_it.pdf

1
1. guillefix 22 Dec 2021
  
  in Public
  
  Although ingeneral, the optimal policy does vary with the horizon, in environments where it is possible to stay atthe goal, a Markovian policy that reaches, then stays at the goal can be near-optimal
  
  I guess the optimal action if you know you have a lot of time left, vs if you know the episode is finishing soon, is different
Visit annotations in context

Annotators

guillefix

URL

openreview.net/pdf
static-wordpress.akamaized.net static-wordpress.akamaized.net

SuperTrack: Motion Tracking for Physically Simulated Characters using Supervised Learning

5
1. guillefix 13 Dec 2021
  
  in Public
  
  it should be possible to extend it toachieve any goal which does not change due to the simulated char-acter’s behaviour
  
  A goal parametrizes a reward. What does a reward that doesn't change due to the simulated character's behaviour mean? Rewards are functions of the simulated character behaviour aren't they?
2. guillefix 13 Dec 2021
  
  in Public
  
  The reward structure is similarto that of Bergamin et al. [2019]
  
  why use exponential here? It is not intuitive that this would provide better reward gradients. For example, if two of the losses were quite high, the reward would stay close to 0 even if improvement was made on one of the two?
3. guillefix 13 Dec 2021
  
  in Public
  
  Again,PPO’s reward definition being a multiplication of terms likely lessensthe effect of any one loss component having more influence on themotion imitation than another.
  
  why?
4. guillefix 13 Dec 2021
  
  in Public
  
  before being converted toquaternions
  
  via the exponential map. Why are outputs ok to be axis angles, but not inputs? hmm
5. guillefix 13 Dec 2021
  
  in Public
  
  Second, that the transition function of these two individ-ual character states is partially separable; generally the kinematiccharacter’s behaviour does not change because of the simulated
  
  what do you mean here? Isnt the kinematic character behaviour determined by the simulation?? or does the kinematic character refer to the target pose??
Visit annotations in context

Annotators

guillefix

URL

static-wordpress.akamaized.net/montreal.ubisoft.com/wp-content/uploads/2021/11/24183638/SuperTrack.pdf
Nov 2021
openreview.net openreview.net

error_aware_imitation_learning.pdf

1
1. guillefix 17 Nov 2021
  
  in Public
  
  benefits from models leveraging temporal abstraction [58, 66]. Inspired by these works, weextend the RNN model from BC-RNN into a multi-layered variant (TieredRNN). This new variantintegrates multiple layers operating at varying timesteps to better streamline information flow fromtimesteps early on in a given sequence to timesteps much further downstream, which can be usefulfor our long-horizon MM tasks.
  
  why not use Transformers?
Visit annotations in context

Annotators

guillefix

URL

openreview.net/pdf
arxiv.org arxiv.org

2108.06038.pdf

2
1. guillefix 10 Nov 2021
  
  in Public
  
  (c)
  
  z hat encodes the probability of different robot strategies, given the human's action
2. guillefix 10 Nov 2021
  
  in Public
  
  t ∼ ψ(zt|st−K:t,aHt−1)
  
  wouldnt it make sense to get z from a the past history of actions. Tho I guess here they are modelling z as being independent at each time step..
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2108.06038.pdf
arxiv.org arxiv.org

2109.00137.pdf

1
1. guillefix 05 Nov 2021
  
  in Public
  
  compute energies
  
  if N_samples is large, then this could be very memory or compute intensive right?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2109.00137.pdf
Oct 2021
arxiv.org arxiv.org

2109.00137.pdf

1
1. guillefix 22 Oct 2021
  
  in Public
  
  we can define this simple environment to bein N dimensions
  
  dimension of the observations no?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2109.00137.pdf
Sep 2021
Local file Local file

Untitled document

3
1. guillefix 23 Sep 2021
  
  in Public
  
  he observation suggests that the semantic structure of wordand object categories share the common representation in human cognitive processing.
  
  is this about that, or about saliency? hmm
2. guillefix 23 Sep 2021
  
  in Public
  
  The Stroop effect known in human cognitive scienceliterature is one of such biases. When human participants observe a mismatched word, where theword “red” with the “blue” color, and are asked to answer the “color” of the word, their responseto the word can be slower than when they see the word “red” with the “red” color (Stroop, 1935).
  
  This is very Magritte
3. guillefix 23 Sep 2021
  
  in Public
  
  ir recognitio
  
  recognition of what?
Annotators

guillefix
www.jaronlanier.com www.jaronlanier.com

agents of alienation

2
1. guillefix 22 Sep 2021
  
  in Public
  
  They are somehow unable to grasp that someone could categorically attack ALL agents on the basis that they do not exist, and that it is potentially harmful to believe that they do.
  
  but there are agents, they are people no? That's the danger of this idea I believe
2. guillefix 22 Sep 2021
  
  in Public
  
  But remember, although agent programs tend to share a set of deficiencies, it is your psychology that really makes a program into an agent; a very similar program with identical capabilities would not be an agent if you take responsibility for understanding and editing what it does. An agent is a way of using a program, in which you have ceded your autonomy. Agents only exist in your imagination. I am talking about YOU, not the computer.
  
  Agents are a social construct xP xd
Visit annotations in context

Annotators

guillefix

URL

jaronlanier.com/agentalien.html
arxiv.org arxiv.org

Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

4
1. guillefix 20 Sep 2021
  
  in Public
  
  we leverage the learned reward function to scan goal candidates
  
  oh so rather than a captioner hmm
2. guillefix 20 Sep 2021
  
  in Public
  
  Bahdanau et al. [4] and Fu et al. [29] also learn a reward function butrequire extensive expert knowledge (expert dataset and known environment dynamics respectively),whereas our agent uses experience generated by its own exploration.
  
  and the social partner!
3. guillefix 20 Sep 2021
  
  in Public
  
  Language can thus be used to generate out-of-distributions goals by leveragingcompositionality to imagine new goals from known ones
  
  Rather than compositionality, I'd say that language has the property of being lower Kolmogorov complexity, which perhaps is related to its property that different possible generated sentences are more likely to be interesting than randomly generated images, because language is a better world model!
4. guillefix 20 Sep 2021
  
  in Public
  
  Children do so by leveraging the compositionality of language as atool to imagine descriptions of outcomes they never experienced before, targetingthem as goals during play.
  
  What is compositionality? Isn't it just modelling the distribution of langauge?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2002.09253.pdf
arxiv.org arxiv.org

1912.01734.pdf

2
1. guillefix 14 Sep 2021
  
  in Public
  
  Existing beam-search [17, 48, 53] and backtracking solu-tions [24, 29] are infeasible due to the larger action and statespaces, long horizon, and inability to undo certain actions.
  
  How does their expert planner work then? Hmm
2. guillefix 14 Sep 2021
  
  in Public
  
  Thisinference is more realistic than simple object class pre-diction, where localization is treated as a solved problem
  
  Although this has been solved for ALFRED now it seems, as everyone is using a FasterRCNN for the mask prediction, and just predicting labels.
  
  Modularizing the problem~~
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/1912.01734
openlab-flowers.inria.fr openlab-flowers.inria.fr

Transformer IMGEP on ALFRED

2
1. guillefix 09 Sep 2021
  
  in Public
  
  This time the model got the “turn right” thing all right. We still have the disambiguation problem, there are two tables involved here. I also disagree with “turning around” as the approapriate description for the navigation part after picking up the box (more like “make a right and then make a left”). The captioner may be picking up correlations in the language directly with bad conditoning on the frames, as picking up an object oftern is followed by turning around, as seen in the previous examples.
  
  Tbh I find it hard to pick up the directions in which it turns sometimes, due to the discrete nature of the actions.
2. guillefix 02 Sep 2021
  
  in Public
  
  That is, use the same framework and algorithm, but replace the sequence of words corresponding to goal sentences with goals as sequences of final state-action tuples. (This would be the time-extended version of simple goal-conditioned RL algos with goals as final states. In this framework, no relabelling is needed).
  
  One case where this type of goal-conditioning would fail, while the language one wouldn't, would be, for goals that characterize the whole trajectory.
Visit annotations in context

Annotators

guillefix

URL

openlab-flowers.inria.fr/t/transformer-imgep-on-alfred/1110
Jul 2021
arxiv.org arxiv.org

2105.06453.pdf

5
1. guillefix 05 Jul 2021
  
  in Public
  
  We conclude that the synthetic data is especiallyimportant for generalization to unseen environments.
  
  well you are basically providing new data
2. guillefix 05 Jul 2021
  
  in Public
  
  We conjecture thatdomain-specific language pretraining is important for theALFRED benchmark.
  
  I conjecture this shows that the ALFRED benchmark is too narrow
3. guillefix 05 Jul 2021
  
  in Public
  
  hen evaluated on human instruc-tions
  
  so is this a model that takes either visual embeddings or language embeddings as instructions?
4. guillefix 05 Jul 2021
  
  in Public
  
  We note that performance of the modelwith16input frames is close to the performance of the fullepisode memory agent, which can be explained the averagetask length of50timesteps
  
  huh? Isn't 16 significantly less than 50?
5. guillefix 05 Jul 2021
  
  in Public
  
  it is desirable for the agents tobe able to handle raw sensory inputs, such as images orvideos.
  
  you should explain why
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2105.06453.pdf
Jun 2021
arxiv.org arxiv.org

2106.01345.pdf

1
1. guillefix 23 Jun 2021
  
  in Public
  
  his setting is harder as it removes theability for agents to explore the environment and collect additional feedback.
  
  but it's easier in that you are not tasked to do the exploration well
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2106.01345
www.semanticscholar.org www.semanticscholar.org

2106.02039.pdf

7
1. guillefix 15 Jun 2021
  
  in Public
  
  evaluate the likelihood under our discretized model, we treat each bin as a uniform distribution overits specified range; by construction, the model assigns zero probability outside of this range.
  
  Oh I see. Why didn't they use the log likelihood of the discretized sequence? Here they are ignoring the softmax probability and just calculating the log likelihood corresponding to a particular discrete sequence sample?? That doesnt seem likelihy, as as long as one token is wrong, then this would give a 0 probability. So I guess they are still weighting by softmax hm. Yeah that seems to be what it is, looking at Appendix B. I just think they didn't write it very clearly
2. guillefix 15 Jun 2021
  
  in Public
  
  To better isolate the source of the Transformer’s improved accuracy over standard single-step models,we also evaluate a Markovian variant of our same architecture. This ablation has a truncated contextwindow that prevents it from attending to more than one timestep in the past. We find that this modelperforms similarly to the trajectory Transformer on fully-observed environments, suggesting thatarchitecture differences and increased expressivity from the autoregressive state discretization play alarge role in the trajectory Transformer’s long-horizon accuracy.
  
  Aha! So it's more about expressing probability distributions over outputs well!! I've also been noticing that with transflower!^^
3. guillefix 15 Jun 2021
  
  in Public
  
  log likelihood
  
  why is their log likelihood positive, if they are using a discrete model, so log likelihoods should be negative??
4. guillefix 15 Jun 2021
  
  in Public
  
  , the conditional probabilities of the past given the future are stillwell-defined, allowing us to condition samples not only on the preceding states, actions, and rewardsthat have already been observed, but also any future context that we wish to occur. If the futurecontext is a state at the end of a trajectory, we decode trajectories with probabilities of the form
  
  This is the same approach that Lynch et al follow
5. guillefix 15 Jun 2021
  
  in Public
  
  While this choice may seem inefficient, it allows us tomodel the distribution over trajectories with more expressivity, without simplifying assumptions suchas Gaussian transitions.
  
  But again there are alternatives like modelling using normalizing flows, VDVAEs, dVAEs, DDPMs, etc
6. guillefix 15 Jun 2021
  
  in Public
  
  modeling considerations areconcerned less with architecture design and more with how to represent trajectory data – consistingof continuous states and actions – for processing by a discrete-token architecture
  
  but there are ways around this, like MoGlow or Transflower
7. guillefix 15 Jun 2021
  
  in Public
  
  nd we no longer require ensembles orother epistemic uncertainty estimators, as is common in prior work on model-basedRL
  
  Not sure what they are referring to here?
  
  Are these the probabilistic aspects used for exploration? But here they don't solve the exploration problem as they rely on high-reward demonstrations from some other exploration algorithm no?
Visit annotations in context

Annotators

guillefix

URL

semanticscholar.org/reader/f864d4d2267abba15eb43db54f58286aef78292b
May 2021
Local file Local file

aggregated_paper (1).pdf

2
1. guillefix 25 May 2021
  
  in Public
  
  relaxing the conditioning onX
  
  again i don't think you are relaxing the conditioning on X; you are finding a high probability limit as X grows to be infinite in size, I think
2. guillefix 25 May 2021
  
  in Public
  
  Relaxing the conditioning onXgives us the limiting spectral distributionργMPρψMP, becauseμXis almost surelyρψMP, which terminates the proof.
  
  is it relaxing the conditioning on X, or is it just the latter observation?
  
  I wouldn't call this "relaxing the conditioning on X"
Annotators

guillefix
arxiv.org arxiv.org

2103.01045.pdf

3
1. guillefix 24 May 2021
  
  in Public
  
  intra-class correlations and strong inter-class
  
  I thought we were talking about binary classification?
2. guillefix 24 May 2021
  
  in Public
  
  Observe that if the average test error over theversion spaceε(VS)is low, thenmostmembers of the version space musthave low test error (either intuitively, or by Markov’s inequality).
  
  We prove a much stronger high-probability inequality in our "Generalization bounds for deep learning" work
3. guillefix 24 May 2021
  
  in Public
  
  In words, typical solutions may bedescribed in a number of bits close to the typical information content
  
  if the measure P is sufficiently biased, and simple, etc. See the AIT arguments in Dingle et al and Valle-Perez et al
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2103.01045.pdf
mathai-iclr.github.io mathai-iclr.github.io

MATHAI_29_paper.pdf

1
1. guillefix 24 May 2021
  
  in Public
  
  While the numberof steps until validation accuracy>99% grows quickly as dataset size decreases, the number of stepsuntil the train accuracy first reaches 99% generally trends down as dataset size decreases and stays inthe range of103-104optimization steps.
  
  But what if you keep running the training even longer? Are they just doing very inefficient model selection with the validation data, thus effectively using the validation data as extra training data?? I.e. just keep training, until the model happens to reach a good validation accuracy, and then stop? That sounds like training on the validation data?
Visit annotations in context

Annotators

guillefix

URL

mathai-iclr.github.io/papers/papers/MATHAI_29_paper.pdf
centaur.reading.ac.uk centaur.reading.ac.uk

Cognitive structures in dance

1
1. guillefix 19 May 2021
  
  in Public
  
  Articulatory suppression and interference have been used as methods to investigate the contribution of verbal, spatial, and/or motor codes in encoding, rehearsing, and recalling series of danceitems from working memory (e.g., Jean et al. 2001; Rossi-Arnaud, Cortese, & Cestari,2004). Suppression tasks involve performing a task at the same time as observing the to-be-remembered material, and are used to disrupt encoding of the to-be-remembered (TBR) material.
  
  like the experiments Feynman did for time perception:)
Visit annotations in context

Annotators

guillefix

URL

centaur.reading.ac.uk/42652/2/NeurocognitiveDance_ACTA Psychologica 2012.pdf
arxiv.org arxiv.org

1911.02001.pdf

2
1. guillefix 10 May 2021
  
  in Public
  
  Leveraging the extracted kinematic beats, we define the dance unit in this work. As illustrated inFigure 2(b), a dance unit is a temporally standardized short snippet, consisting of a fixed numberof poses, whose kinematic beats are normalized to several specified beat times with a constant beatinterval. A
  
  Wait so do they do this under the assumption that every music beat will have a matching kinematic beat? I don't understand how they "normalize" the dance in a way that every kinematic beat aligns with a music one:P
2. guillefix 10 May 2021
  
  in Public
  
  We thusadopt an auxiliary reconstruction task on the latent movement sequences to facilitate training:Lmrecon=E[∥∥{ˆzimov}−{zimov}∥∥1].
  
  So they are loosing some of the distributional modelling properties of the GAN...
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/1911.02001.pdf
arxiv.org arxiv.org

2103.10206.pdf

13
1. guillefix 10 May 2021
  
  in Public
  
  While the generation diversity is ensured by the adversariallearning schema
  
  doubt. You don't add noise to the model do you? Or if you do, you don't say it
2. guillefix 10 May 2021
  
  in Public
  
  So our proposed BC, whichfinds matched kinematic beat for each music beat, is a bet-ter metric
  
  but it has the flaw that if there are very frequent kinematic beats this could get a good value, even though the dance isnt really capturing the rhythm
3. guillefix 10 May 2021
  
  in Public
  
  Position Variance (PVar)evaluates the diversity of thegenerated dance
  
  this is quite a crude way to evaluate diversity
4. guillefix 10 May 2021
  
  in Public
  
  For each node, 3 individual fully-connected layersembed the joint node features into Q, K and V vectors andthen we perform 24 individual LLA modules on Q, K and Vof the nodes.
  
  so its basically independent attention for each node/joint?
  
  With feature sharing in the KCN layer?
5. guillefix 10 May 2021
  
  in Public
  
  N ∼(ti,σ2)
  
  what, the mean increases with time?, or what is $t_i$?
6. guillefix 10 May 2021
  
  in Public
  
  Attφ(qi,K,V) =ψ(qi ̃φ(KT))φ(V)
  
  isnt this somewhat similar to T5-style relative positional encoding
7. guillefix 10 May 2021
  
  in Public
  
  we adopt an adversarial trainingarchitecture
  
  However, their architecture is not probabilistic, in the sense that it won't produce different dances for the same song. However, given that no song is likely to be the same, even if they are very similar, that could be a way to effectively inject some noise.
8. guillefix 10 May 2021
  
  in Public
  
  During training thestart pose is simply selected as the first frame of the dancesequence and the right shifted outputs are just taken fromground-truth with mask for future time steps.
  
  So they use a causal decoder.
  
  I guess they could run this indefinitely, in an autoregressive way, by just restricitng the encoder/decoder context window to some max lenght of the past
9. guillefix 10 May 2021
  
  in Public
  
  In implementation we use a 4-knot CHS to fit a mo-tion curve between two key poses, as an optimal trade-off between fitting precision and representation simplicity
  
  "optimal"? In which sense
10. guillefix 10 May 2021
  
  in Public
  
  position androtation
  
  but position can be determined from rotation no?
11. guillefix 09 May 2021
  
  in Public
  
  The mo-tion capture methods are more accurate but still suffer fromnoise and misalignment to music.
  
  do they? They are usually aligned to music, and the noise for a good mocap system isnt much, and can be cleaned up relatively well as far as I know
12. guillefix 09 May 2021
  
  in Public
  
  Ow-ing to the LLA module, the training is more stable and thedegradation to averaging actions is avoided
  
  Hmm I would have thought that more context helps in avoding degeneration into mean pose for deterministic models
13. guillefix 09 May 2021
  
  in Public
  
  Different from sentences in NLP that have long-term cor-relations, human poses and motions only have local cor-relations in a short time interval.
  
  That is very debatable
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2103.10206.pdf
arxiv.org arxiv.org

2101.08779.pdf

2
1. guillefix 07 May 2021
  
  in Public
  
  similarity between them.
  
  we need a discriminator, or CLIP-like model for music/audio-motion!
2. guillefix 07 May 2021
  
  in Public
  
  Thisposes a unique challenge for cross-modal learning betweenmusic and motion
  
  Probably better handled by probabilistic models
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2101.08779.pdf
arxiv.org arxiv.org

1908.08351.pdf

10
1. guillefix 06 May 2021
  
  in Public
  
  f a model instead develops amore general representation, it should be able to apply learned functions also to longer input strings.Its performance on longer strings may drop for other, practical, reasons, but this drop should bemore smooth than for a model that has not learned a general-purpose representation at all.
  
  What kind of "practical reaons" are you referring to here?
2. guillefix 06 May 2021
  
  in Public
  
  Note that, in ourcase, this ability does not require productivity in generating output strings, since the correct outputsequences are not distributionally different from those in the training data (in some cases, theymay even be exactly the same)
  
  but it requires productivity w.r.t. the inputs no?
3. guillefix 06 May 2021
  
  in Public
  
  When the percentage of exceptions becomes too low, on the other hand, all models have difficultiesmemorising them at all:
  
  what about with larger models, which may be better at memorizing?
4. guillefix 06 May 2021
  
  in Public
  
  One potential explanation for this score discrepancy is that, due to the slightly different distri-bution of examples in the systematicity data set, the models learn a different solution than before.
  
  Ah, they train on a different training set for each composionality task?
5. guillefix 05 May 2021
  
  in Public
  
  As observed by many others before us, insight in the compositional skills of neural networks is noteasily acquired by studying models trained on natural language directly. While it is generally agreedupon that compositional skills are required to appropriately model natural language, successfullymodelling natural data requires far more than understanding compositional structures. As a conse-quence, a negative result may stem not from a model’s incapability to model compositionality, butrather from the lack of signal in the data that should induce compositional behaviour. A positiveresult, on the other hand, cannot necessarily be explained as successful compositional learning, sinceit is difficult to establish that a good performance cannot be reached through heuristics and memo-risation.
  
  if we cannot even measure/evaluate compositionality in natural language, how can we be so sure it is necessary for modeling it?
6. guillefix 05 May 2021
  
  in Public
  
  f an expression is altered byreplacing one of its constituents with another constituent with the same meaning (a synonym),this does not affect the meaning of the expression (Pagin, 2003
  
  How is "meaning" defined?
7. guillefix 05 May 2021
  
  in Public
  
  As a consequence, the interpretation of theprinciple of compositionality depends on the type of constraints that are put on the semantic andsyntactic theories involved.
  
  so that should really be the content of a theory of generalization, not the formally vacuous principle
8. guillefix 05 May 2021
  
  in Public
  
  Fodor and Pylyshyn (1988) contrast systematicity with storing all sentences in an atomic way,in a dictionary-like mapping from sentences to meanings. Someone who entertains such a dictionarywould not be able to understand new sentences, even if these sentences were similar to the onesoccurring in their dictionary. Since humans are able to infer meanings for sentences they have neverheard before, they must use some systematic process to construct these meanings from the onesthey internalised before
  
  So here 'systematicity' simply seems to mean "uses a simple rule to describe the data", which indeed is the key to generalization
9. guillefix 05 May 2021
  
  in Public
  
  This ability concerns the recombination of known parts and rules: anyone who understands a numberof complex expressions also understands other complex expressions that can be built up from theconstituents and syntactical rules employed in the familiar expressions. To use a classic examplefrom Szab ́o (2012): someone who understands ‘brown dog’ and ‘black cat’ also understands ‘browncat’.
  
  this is a bit tautological, depending on how you operationally define 'understand' I think
10. guillefix 05 May 2021
  
  in Public
  
  In the substitutivity test, we evaluate how robust models aretowards the introduction of synonyms, and, more specifically, in which cases words are consideredsynonymous.
  
  This feels like also a case of (a) no?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/1908.08351.pdf
arxiv.org arxiv.org

HEMVIP: Human Evaluation of Multiple Videos in Parallel

1
1. guillefix 05 May 2021
  
  in Public
  
  he binary nature of many preference tests means that responsesare relatively information-poor and makes it harder to verify that two conditions are statistically different.
  
  can you not do a Likert-type pairwise comparison?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2101.11898.pdf
Apr 2021
arxiv.org arxiv.org

2103.04922.pdf

13
1. guillefix 20 Apr 2021
  
  in Public
  
  uTWv
  
  uWu
2. guillefix 20 Apr 2021
  
  in Public
  
  pθ(z)
  
  this should read $-\ln{p_\theta ( \mathbf{z})}$
  
  typo
3. guillefix 20 Apr 2021
  
  in Public
  
  whilereverse KL-divergence is not a viable objective function,
  
  I guess in the above derivation, we dont really have access to p_d(x) even if we had the trained discriminator. We only have the softmaxes which give us the two ratios in expression (23) which are conditional probabilities (p(g|x) and p(d|x)) rather than the pd and pg separately.
4. guillefix 20 Apr 2021
  
  in Public
  
  dditionally, it has been suggested thatreverse KL-divergence,DKL(pg||pd), is a better measurefor training generative models than normal KL-divergence,DKL(pd||pg), since it minimisesEx∼pg[lnpd(x)][92]
  
  I think the intuition here is that minimizing D(pg|pd) has the property of ensuring that pg is contained inside the support of pd. So that, we are more sure that samples will be realistic, though may suffer from mode collapse, like GANs, but that may be a less serious problem.
5. guillefix 20 Apr 2021
  
  in Public
  
  2-Stage VAEsBy interpreting VAEs as regularised autoencoders, it is nat-ural to apply less strict regularisation to the latent spaceduring training then subsequently train a density estima-tor on this space, thus obtaining a more complex prior[53]. Vector Quantized-Variational Autoencoders (VQ-VAE)[170], [215] achieve this by training an autoencoder witha discrete latent space, then training a powerful autore-gressive model (see Section 5) on the latent space.
  
  So I think what VQ-VAEs are trying to achieve is a VAE with a very flexible learned prior. If we look at the VAE objective (eq 19), we see that if p(z) equals q(z), then the KL divergence (averaged over q(x), which I'm interpreting to be the data distribution), becomes -I(z;x), i.e. minus the mutual information between z and x. Interesting!
  
  However, we can think of VQ-VAE basically ignoring this explicit regularization term, during its first stage (or maybe its implicitly approximated via its codebook training objectives! hmm). In the second stage, we just fit the prior to approximate q(z) which is what we were assuming to be true in the analysis.
  
  It seems that some of the other works cited in the previous paragraph, try to use the expression in eq 19 (and thus the -I(z;x) regularization) directly. Though because it is intractable to compute, they approximate it in different ways.
6. guillefix 20 Apr 2021
  
  in Public
  
  2 stage VAEs that firstmap data to latents of dimensionrthen use a second VAE tocorrect the learned density can better capture the data [35]
  
  because then the second VAE does recover the data distribution on the latent, according to that paper. Interesting!
7. guillefix 18 Apr 2021
  
  in Public
  
  (14)
  
  i think here q(x) is supposed to be q(x|z) and the e^E(x) should be e^E(z)? (well e^E(z) is equivalent to e^E(x) if x is a function fo z..) But yeah, I think the q part, should be q(x|z). For example, if using normalizing flows, it would be jacobian of the inverse flow
8. guillefix 18 Apr 2021
  
  in Public
  
  implicit energy models
  
  what are implicit energy models?? I thought they said previously that EBM are not implicit generative models?
9. guillefix 18 Apr 2021
  
  in Public
  
  This is made worse by thefinite nature of the sampling process, meaning that samplescan be arbitrarily far away from the model’s distribution[57].
  
  does he mean because of the finite step size? Is that a big problem? hmm not sure I guess this sentence.
10. guillefix 18 Apr 2021
  
  in Public
  
  the gradient of the negative log-likelihoodlossL(θ) =Ex∼pd[−lnpθ(x)]has been shown to approxi-mately demonstrate the following property [21], [193]
  
  this contrastive divergence result is very cool!
11. guillefix 18 Apr 2021
  
  in Public
  
  TABLE 1
  
  what do the stars in training speed, sample speed and param efficiency correspond to, quantiatively?
  
  Also, it would be nice to know robsutness to hyperparameters, as that is often a big part of "training time"
12. guillefix 18 Apr 2021
  
  in Public
  
  training speed is assessed based on reportedtotal training times
  
  hmm ideally we would know if they had trained until convergence, or if they had gone over convergence.
13. guillefix 18 Apr 2021
  
  in Public
  
  Choosing what to optimise for has implica-tions for sample quality, with direct likelihood optimisationoften leading to worse sample quality than alternatives.
  
  is this in part because of noise in the data, which the likelihood based models also fit?
Visit annotations in context

Tags

typo

Annotators

guillefix

URL

arxiv.org/pdf/2103.04922.pdf
arxiv.org arxiv.org

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

1
1. guillefix 17 Apr 2021
  
  in Public
  
  constant mem-ory
  
  They say constant memory but below the memory is said to be $O(N)$ which one is true? As far as I can tell, the latter is a typo?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2006.16236
arxiv.org arxiv.org

Grounding Language in Play

2
1. guillefix 01 Apr 2021
  
  in Public
  
  especially in the most difficult long horizon setting.
  
  actually the graph shows a larger difference for 1 task than for more tasks?
2. guillefix 01 Apr 2021
  
  in Public
  
  This process is scalable because pairing happens after-the-fact, making it straightforward to parallelize via crowdsourc-ing.
  
  could you not crowdsource insruction following too?
  
  Maybe this adds extra diversity though
  
  Probably combining both would be best
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2005.07648.pdf
Mar 2021
Local file Local file

Vigil_2021___SocialAI_paper.pdf

3
1. guillefix 15 Mar 2021
  
  in Public
  
  Training an agent for145such social interactions most likely requires drasti-146cally different methods – e.g. different architectural147biases – than classical object-manipulation training
  
  Or a lot of pre-training data, which given current empirical findings, tends to work better.
2. guillefix 15 Mar 2021
  
  in Public
  
  To117enable the design and study of complex social sce-118narios in reasonable computational time
  
  alternatively you could consider more complex environments but with more offline algorithms like bootstraping from supervised learning
3. guillefix 15 Mar 2021
  
  in Public
  
  rather than lan-059guage based social interactions
  
  some important recent counter example is IIL.
Annotators

guillefix
openreview.net openreview.net

flow_improving_flow_based_generative_models_with_variational_dequantization_and_architecture_design.pdf

2
1. guillefix 15 Mar 2021
  
  in Public
  
  (8)
  
  This whole variational calculation is basically like combining Monte Carlo integration (with importance sampling), and Jensen inequality (to bring the expectation outside the log). The cool thing is that optimizing over q, makes the approximation exact if we our model for q is sufficiently expressive
2. guillefix 15 Mar 2021
  
  in Public
  
  Consequently, maximizing the log-likelihood of the continuous model on uniformly dequantizeddata cannot lead to the continuous model degenerately collapsing onto the discrete data, because itsobjective is bounded above by the log-likelihood of a discrete model.
  
  I don't see how this argument works
Visit annotations in context

Annotators

guillefix

URL

openreview.net/pdf
arxiv.org arxiv.org

1903.01973.pdf

3
1. guillefix 14 Mar 2021
  
  in Public
  
  In our datasets Fig. 2c, we find empirically thatfor the same amount of collection time, play indeed covers 4.2 times more regions of the availableinteraction space than 18 tasks worth of expert demonstration data, and 14.4 times more regions thanrandom exploration.
  
  That is indeed a cool finding
2. guillefix 14 Mar 2021
  
  in Public
  
  lay data ischeap: Unlike expert demonstrations (Fig. 5), playrequires no task segmenting, labeling, or resetting to an initial state, meaning it can be collectedquickly in large quantities.
  
  Well, you still need to have enough people playing for enough time, which may not be cheap. For example, in Imitating Interactive Intelligence they had to spent about 200K pounds to pay people to play with their environment
3. guillefix 14 Mar 2021
  
  in Public
  
  We additionally find that play-supervised models,unlike their expert-trained counterparts, are more robust to perturbations and ex-hibit retrying-till-success behaviors.
  
  I guess because in the play data there was examples of reaching the goal even from suboptimal trajectories
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/1903.01973.pdf
arxiv.org arxiv.org

2006.06874.pdf

1
1. guillefix 14 Mar 2021
  
  in Public
  
  Intuitively, this is equivalentto taking the average of demonstrated actions at each specificstate.
  
  unless you model the ditribution, e.g. using Normalizing flows, which should be better?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2006.06874.pdf
www.semanticscholar.org www.semanticscholar.org

2010.03768.pdf

1
1. guillefix 11 Mar 2021
  
  in Public
  
  TextWorldaddresses this discrepancy by providing programmatic and aligned linguistic signals during agentexploration.
  
  but isnt this just substituting the human language-instructor, with a rule-based one that is bound to be of lower quality?
Visit annotations in context

Annotators

guillefix

URL

semanticscholar.org/reader/398a0625e8707a0b41ac58eaec51e8feb87dd7cb
arxiv.org arxiv.org

1812.07035.pdf

1
1. guillefix 08 Mar 2021
  
  in Public
  
  Informally, all else being equal, discontinuousrepresentations should in many cases be “harder” to approx-imate by neural networks than continuous ones. Theoreti-cal results suggest that functions that are smoother [34] orhave stronger continuity properties such as in the modulusof continuity [33, 10] have lower approximation error for agiven number of neurons.
  
  and they probably generalize better, as there are several works showing that DNNs are implicitly biased towards "smooth" functions
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/1812.07035.pdf
flowersteam.github.io flowersteam.github.io

Language as a Cognitive Tool: Dall-E, Humans and Vygotskian RL Agents

3
1. guillefix 07 Mar 2021
  
  in Public
  
  n the skill learning phase, LGB relies on an innate semantic representation that characterizes spatial relations between objects in the scene using predicates known to be used by pre-verbal infants [Mandler, 2012].
  
  so this is feature-engineered no?
2. guillefix 07 Mar 2021
  
  in Public
  
  Although the policy over-generalizes, the reward function can still identify whether plants have grown or haven’t.
  
  how has the reward function learned the association between "feed" and "the object grows"? I guess that was taught from the language descriptions? It should be able to learn the reward function correctly then
3. guillefix 07 Mar 2021
  
  in Public
  
  more aligned data autonomously.
  
  i think this is similar to the idea of self-training
Visit annotations in context

Annotators

guillefix

URL

flowersteam.github.io/language_as_cognitive_tool_vygotskian_rl
www.frontiersin.org www.frontiersin.org

Narrative Constructions for the Organization of Self Experience: Proof of Concept via Embodied Robotics

1
1. guillefix 04 Mar 2021
  
  in Public
  
  This would require a means for representing meaning from experience—a situation model—and a mechanism that allows information to be extracted from sentences and mapped onto the situation model that has been derived from experience, thus enriching that representation
  
  This basically means adding extra information that is inferred, to what is just directly observed no?
Visit annotations in context

Annotators

guillefix

URL

frontiersin.org/articles/10.3389/fpsyg.2017.01331/full
www.cc.gatech.edu www.cc.gatech.edu

An Interaction Framework for Studying Co-Creative AI

2
1. guillefix 03 Mar 2021
  
  in Public
  
  What can and should the user be doing while the AI agent is taking its turn to increase engagement?
  
  Maybe the agent's actions themselves should be engaging enough? We should aim for that I think
2. guillefix 03 Mar 2021
  
  in Public
  
  We employ a turn-based frameworkbecause it is a common way of organizing co-creative interactions [3,12,13] and because it suitsevolutionary and reinforcement-learning approaches that require discrete steps [2, 7, 8, 14].
  
  I think thats a significant limitation. More fluid interactions can only take place in continuous-time settings
Visit annotations in context

Annotators

guillefix

URL

cc.gatech.edu/~riedl/pubs/guzdial-chi-hcml19.pdf
arxiv.org arxiv.org

2102.07227.pdf

1
1. guillefix 03 Mar 2021
  
  in Public
  
  As can be seen in Figure 3 (left), the trainingperformance was sensitive to the weight scaleσ, despitethe fact that a weight normalisation scheme was beingused.
  
  It would be interesting to explore whether this pitfall can actually have an effect in some scenario where one isnt using an abnormally high initialization
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2102.07227.pdf
Feb 2021
research.fb.com research.fb.com

Talking-With-Hands-16.2M-A-Large-Scale-Dataset-of-Synchronized-Body-Finger-Motion-and-Audio-for-Conversational-Motion-Analysis-and-Synthesis-v2.pdf

3
1. guillefix 14 Feb 2021
  
  in Public
  
  natural motions
  
  more natural than the baseline*
2. guillefix 13 Feb 2021
  
  in Public
  
  For quantitative evaluation, we computed the meansquared error between the generated motion and motioncapture on a left-out test set, for fingertip positions and jointangles
  
  this is problematic because there could be many motions which are good but quite different, and thus having big MSE
3. guillefix 12 Feb 2021
  
  in Public
  
  In total, we used approximately120 minutes of data
  
  what? why didnt you use more data? ... We need to do scaling experiments with this
Visit annotations in context

Annotators

guillefix

URL

research.fb.com/wp-content/uploads/2019/10/Talking-With-Hands-16.2M-A-Large-Scale-Dataset-of-Synchronized-Body-Finger-Motion-and-Audio-for-Conversational-Motion-Analysis-and-Synthesis-v2.pdf
arxiv.org arxiv.org

Gesticulator: A framework for semantically-aware speech-driven gesture generation

6
1. guillefix 12 Feb 2021
  
  in Public
  
  2)autoregression reduces the amount of fast moments, making thevelocity histogram more similar to the ground truth
  
  huh? I see autoregression increasing the amount of fast movements no?
2. guillefix 12 Feb 2021
  
  in Public
  
  “In which video...”: (Q1) “...are the character’s movements mosthuman-like?” (Q2) “...do the character’s movements most reflectwhat the character says?” (Q3) “...do the character’s movementsmost help to understand what the character says?” (Q4) “...are thecharacter’s voice and movement more in sync?”
  
  It would also be good to do observational studies where users are simply asked to interact with different characters. And we measure how engaged they are.
3. guillefix 12 Feb 2021
  
  in Public
  
  Hence, after five epochs of training with autoregression,our model has full teacher forcing: it always receives the ground-truth poses for autoregression. This procedure greatly helps withlearning a model that properly integrates non-autoregressive input.
  
  Interesting, I would have guessed that doing it the other way (starting with teacher forcing and decrease this to fully autoregressive training) would have been the natural curriculum.
  
  What was the idea for doing this? Is the idea basically to extend to gradually make the information in the autoregressive part of the input more and more predictive, so that the network can anneal from using features in the speech part, to using features in both speech and autoregressive motion?
4. guillefix 12 Feb 2021
  
  in Public
  
  This pretraining helps thenetwork learn to extract useful features from the speech input, anability which is not lost during further training.
  
  I wonder if self-attention like in transformers would be better at learning which features to pick on
5. guillefix 12 Feb 2021
  
  in Public
  
  we pass a sliding windowspanning 0.5 s (10 frames) of past speech and 1 s (20 frames) offuture speech features over the encoded feature vectors.
  
  so cant generate gestures from audio/text in real time with this
6. guillefix 12 Feb 2021
  
  in Public
  
  feature vector𝑉𝑠was made distinct from all other encodings, bysetting all elements equal to−15
  
  it may be a good idea to learn these embeddings no?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2001.09326.pdf
arxiv.org arxiv.org

2011.08115.pdf

2
1. guillefix 10 Feb 2021
  
  in Public
  
  three different domains: U.S. presidents,dog breeds, and U.S. national parks. We use mul-tiple domains to include diversity in our tasks,choosing domains that have a multitude of entitiesto which a single question could be applied
  
  three domains wow much diversity
2. guillefix 10 Feb 2021
  
  in Public
  
  We assume each task has an associ-ated metricμj(Dj,fθ)∈R, which is used tocompute the model performance for taskτjonDjfor the model represented byfθ.
  
  So this assumes that the reward can be defineable. In some tasks, it may not be so easy right? We may need to learn rewards
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2011.08115.pdf
arxiv.org arxiv.org

2005.02181.pdf

1
1. guillefix 09 Feb 2021
  
  in Public
  
  ecological pre-training
  
  whats ecological pretraining
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2005.02181.pdf
arxiv.org arxiv.org

2005.11401.pdf

1
1. guillefix 09 Feb 2021
  
  in Public
  
  BART waspre-trained using a denoising objective and a variety of different noising functions. It has obtainedstate-of-the-art results on a diverse set of generation tasks and outperforms comparably-sized T5models [32].
  
  wait so it was just trained on reconstruction? hmm interesting.
  
  i guess the fine-tuning then really changes the output in this case, even tho it still reuses knowledge in the model?
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2005.11401
arxiv.org arxiv.org

Grounding Language in Play

1
1. guillefix 08 Feb 2021
  
  in Public
  
  We believe these properties provide good motivationfor continuing to scale larger end-to-end imitation archi-tectures over larger play datasets as a practical strategy fortask-agnostic control.
  
  MORE DATA
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2005.07648.pdf
arxiv.org arxiv.org

2012.05672.pdf

5
1. guillefix 08 Feb 2021
  
  in Public
  
  Multipleavenues, including understanding more deeply the mechanisms of creative, knowledge-rich thought, or transferring knowledge from large, real world datasets, may offer a wayforward.
  
  ALSO INTERESTING FUTURE DIRECTIONS
2. guillefix 08 Feb 2021
  
  in Public
  
  To go beyond competence within somewhat stereo-typed scenarios toward interactive agents that can actively acquire and creatively recombineknowledge to cope with new challenges may require as yet unknown methods for knowl-edge representation and credit assignment, or, failing this, larger scales of data.
  
  Probably most reliable approach: Larger scales of data
3. guillefix 08 Feb 2021
  
  in Public
  
  To record sufficientlydiverse behaviour, we have “gamified” human-human interaction via the instrument of lan-guage games.
  
  GAMIFICATION DATA GATHERING THROUGH GAMES
4. guillefix 08 Feb 2021
  
  in Public
  
  Winograd envisioned computers that are not “tyrants,” but rather ma-chines that understand and assist us interactively, and it is this view that ultimately led himto advocate convergence between artificial intelligence and human-computer interaction(Winograd, 2006)
  
  And VR is a big part in the next step in human-computer interaction
5. guillefix 08 Feb 2021
  
  in Public
  
  Generally, these results give us con-fidence that we could continue to improve the performance of the agents straightforwardlyby increasing the dataset size.
  
  yeah if you have lots of money to pay people..
  
  but that is not that scalable
Visit annotations in context

Annotators

guillefix

URL

arxiv.org/pdf/2012.05672.pdf

guillefix

Annotations: 683

Joined: February 23, 2015

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

Annotators

URL

Annotators

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

Annotators

URL