- Last 7 days
arxiv.org arxiv.org
The decoder D reconstructs an image ˆx0 = D(z0), from which thepolicy π predicts the action a0.
Why is not the latent embedding z used as the input to the policy?
setting anew state of the art for methods without lookahead search
Isn't the world model used to do search?
arxiv.org arxiv.org
Somewhat surprisingly, the lowest scores in Blocksworldare associated with BlockAmbiguity and KStacksColor; thesetwo problems require the LLM to associate objects basedon their color and we had apriori expected the LLM to becapable of such associations and perform well on this task.
This kind of makes sense, because the colors are not explicitly modelled but they are part of the make of the object e.g. "red_block_1" rather than "red(block_1)". The latter would be a more natural way to express colors, as colors is a property of an object
- Nov 2024
fan-in of the layer.
What is "fan-in"?
proceedings.mlr.press proceedings.mlr.press
qst , qat are positive over S and A respectively
What does this mean?
- Jul 2024
arxiv.org arxiv.org
What is the dimensions of this?
3 × 3 blocks
What does "3 x 3 blocks" mean?
Our proposal distribution q(z = k|x) is deterministic, and bydefining a simple uniform prior over z we obtain a KL divergence constant and equal to log K.
rll.berkeley.edu rll.berkeley.edudsae.pdf1
orcesthe autoencoder to focus on object positions
This is still unclear to me
- Jun 2024
arxiv.org arxiv.org
This is notdirectly feasible with conventional policy gradient formula-tions
Why not?
arxiv.org arxiv.org
At is an estimator of the advantage function at timestep t
How is this calculated?
- May 2024
arxiv.org arxiv.org
How is the quality measured?
www.semanticscholar.org www.semanticscholar.org
Thedifference is that we only record objects that are either actionarguments or in contact with them.
How do you know, when the model is not learned yet?
www.semanticscholar.org www.semanticscholar.org
A “world” frameserves as a default frame of reference for every object in the environment
Is this what the dataset consists of? Sequences of world frames?
However, these approachesassume high-level actions to be provided as input.
No they don't. At least not Asai et al 2022
A is an uncountably infinite set of primitivedeterministic actions.
So actions are continous and not discrete?
- Feb 2024
www.semanticscholar.org www.semanticscholar.org
One can view the noise vector z in such a GAN as a featurevector, containing some representation of the transition to o′ from o.
How can it contain a representation of the transition if it is just noise?
- Jan 2024
www.semanticscholar.org www.semanticscholar.org
Wedefine action predicates PA = {left(1), left(2), right(1), right(2), jump(1), idle(1), ...} and state predicatesPS = {type, closeby, ...}
How did they come up with these?
openreview.net openreview.net
This dataset will contain a set of tuples, (s, a, s′), of states,actions, and next states
What is a state?
www.semanticscholar.org www.semanticscholar.org
More recently, Chenet al. (2022) explored a variant of DreamerV2 where a Transformer replaces the recurrent network inthe RSSM
Then what is the novelty in this paper?
openreview.net openreview.net
straight-through estimator
www.semanticscholar.org www.semanticscholar.org
bject state vectors
Where do the object state vectors come from?
openreview.net openreview.net
DeepMind Lab dataset
How is this dataset structured? There is no "fixed" dataset in the DeepMind Lab repo
showing good variability over the irrelevant factors
Not really. For the "white suitcase" scene it only differs in wall colors and floor colors, but the "black and white" representation of the scene is the same. Essentially there could be a way larger range of scenes where a white suitcase appears.
blue wall
Is "blue wall" a compositional concept or an atomic one?
small, round, red
Are these "features" hand-crafted?
few example images of an apple paired with the symbol “apple”
This is not unsupervised data
www.semanticscholar.org www.semanticscholar.org
unlabeled set of image pairs
It's kind of labelled because they know that an action has taken place between the images, just not what action it is.
www.semanticscholar.org www.semanticscholar.org
vψ (sτ )
What is the difference between this and \(V_\lambda\)?
ataset of past experience
Where does this data come from? Random exploration?
finite imagination horizon
What's the alternative, infinite imagination horizon? Seems impossible
www.semanticscholar.org www.semanticscholar.org
blocks1-5 (arm, 5 blocks)
Why only up to 5 blocks?
in many casesoptimally
What does it mean to solve them optimally?
In one case, the input data corresponds to one or morestate graphs Gi assumed to originate from hidden planninginstances Pi = 〈D, Ii〉 that need to be uncovered
Isn't the domain needed in order to generate the state graph?
www.semanticscholar.org www.semanticscholar.org
The latter approaches are less likely to generate crisp represen-tations due to the dependence on images
www.semanticscholar.org www.semanticscholar.org
a latent policy via behavior cloning
How is this done?
How is this value known?
before and after the action of inter-est is taken
Does every next observation depend on an action, or can the environment change "by itself"?
which predicts which action at was taken by the agent between consecutive obser-vations ot and ot+1
How is this trained when the action is not known?
people.csail.mit.edu people.csail.mit.edu
prior distribution over programs likely to solve tasks inthe domain
What does this prior distribution mean? The probability of the program to solve any task in the domain? Is there even any programs that would solve multiple tasks?
www.semanticscholar.org www.semanticscholar.org
positive probability
What does it mean that a program has a "positive probability" of solving a task?
earch for programs
Search for programs where? How are these programs created?
find best program
How is best defined?
- Dec 2023
www.semanticscholar.org www.semanticscholar.org
We selected the tasks on which Tassa et al. (2018) reportnon-zero performance from image inputs
www.semanticscholar.org www.semanticscholar.org
Finally, we call a PDDL Planner as the de-terministic solver to obtain A, a plan to accomplishthe goal CSL under the predefined scenario.
With what PDDL domain?
www.semanticscholar.org www.semanticscholar.org
Task descriptions are constructed using PDDL and symbolicplans are generated using the FAST-DOWNWARD planner
To generate a symbolic plan, an initial state (problem file) needs to be given. How does this looks like? Is there only three problem files (one for each problem) representing some "general" state? Shouldn't the initial plan depend on the initial state?
www.semanticscholar.org www.semanticscholar.org
Ourset-up automatically parses LLM-generated language intoa program using our synthetic grammar
Also, how do they handle cases where the parser generates incorrect PDDL? Wouldn't that give the LLM-as-planner a worse score that it actually should have?
The P+S modeloutputs executable PDDL actions
How do you make sure of this?
www.semanticscholar.org www.semanticscholar.org
Even with high Exec, some task GCR are low, becausesome tasks have multiple appropriate goal states, but weonly evaluate against a single “true” goal
This seems like an unfair way to evaluate the model
SR is the fraction of executionsthat achieved all task-relevant goal-conditions
How are the goal-conditions specified and where do they come from?
We provide the available objects in theenvironment as a list of strings
How are these objects retrieved? Automaticall or manually?
- Nov 2023
www.semanticscholar.org www.semanticscholar.org
“The bowl can also be a container to fillwater”, will be added to the task planner.
Where does this come from? The LLM? Template?
www.semanticscholar.org www.semanticscholar.org
perfectly match gold visual semantic plans us-ing only the text directives as input
Where do they say how they provide the state representation to the model?
Generated strings from all models arepost-processed for common errors in sequence-to-sequence models, including token doubling,completing missing bigrams (e.g. “pick <arg1>”→ “pick up <arg1>”), and heuristics for addingmissing argument tags
Probably won't generalize well to new domains
The ALFRED dataset contains 6,574gold command sequences
Didn't the "Understanding Language in Context" paper mention that it was around 8k data samples?
www.semanticscholar.org www.semanticscholar.org
astly, the goal predicates for each problem were generatedfrom the "PDDL parameters" field of every data sample.
What is this field?
hus, we have created a PDDL domain file usingour knowledge of the objects and actions in the ALFRED world and a PDDL problem file for eachsample
I assume that the domain file is created manually, but are the problem files also created by hand? If so that seems like a lot of work, since the dataset has 8,055 visual samples, the same amount would be needed to be handcoded.
Since in our task we ignorethe vision part of the data, we might encounter some duplicates between our datasets
How do they get the scene representation from the visual data? Is this included in the ALFRED dataset?
- Oct 2023
www.semanticscholar.org www.semanticscholar.org
ALFWorld uses PDDL - Planning DomainDefinition Language (McDermott et al., 1998) to describe each scene from ALFRED and to constructan equivalent text game using the TextWorld engine.
How is the PDDL created?