61 Matching Annotations
  1. Last 7 days
    1. The decoder D reconstructs an image ˆx0 = D(z0), from which thepolicy π predicts the action a0.

      Why is not the latent embedding z used as the input to the policy?

    2. setting anew state of the art for methods without lookahead search

      Isn't the world model used to do search?

    1. Somewhat surprisingly, the lowest scores in Blocksworldare associated with BlockAmbiguity and KStacksColor; thesetwo problems require the LLM to associate objects basedon their color and we had apriori expected the LLM to becapable of such associations and perform well on this task.

      This kind of makes sense, because the colors are not explicitly modelled but they are part of the make of the object e.g. "red_block_1" rather than "red(block_1)". The latter would be a more natural way to express colors, as colors is a property of an object

  2. Nov 2024
  3. Jul 2024
    1. ze(x)

      What is the dimensions of this?

    2. 3 × 3 blocks

      What does "3 x 3 blocks" mean?

    3. Our proposal distribution q(z = k|x) is deterministic, and bydefining a simple uniform prior over z we obtain a KL divergence constant and equal to log K.

      What?

    1. orcesthe autoencoder to focus on object positions

      This is still unclear to me

  4. Jun 2024
    1. This is notdirectly feasible with conventional policy gradient formula-tions

      Why not?

    1. At is an estimator of the advantage function at timestep t

      How is this calculated?

  5. May 2024
    1. Thedifference is that we only record objects that are either actionarguments or in contact with them.

      How do you know, when the model is not learned yet?

    1. A “world” frameserves as a default frame of reference for every object in the environment

      Is this what the dataset consists of? Sequences of world frames?

    2. However, these approachesassume high-level actions to be provided as input.

      No they don't. At least not Asai et al 2022

    3. A is an uncountably infinite set of primitivedeterministic actions.

      So actions are continous and not discrete?

  6. Feb 2024
    1. One can view the noise vector z in such a GAN as a featurevector, containing some representation of the transition to o′ from o.

      How can it contain a representation of the transition if it is just noise?

  7. Jan 2024
    1. Wedefine action predicates PA = {left(1), left(2), right(1), right(2), jump(1), idle(1), ...} and state predicatesPS = {type, closeby, ...}

      How did they come up with these?

    1. This dataset will contain a set of tuples, (s, a, s′), of states,actions, and next states

      What is a state?

    1. More recently, Chenet al. (2022) explored a variant of DreamerV2 where a Transformer replaces the recurrent network inthe RSSM

      Then what is the novelty in this paper?

    1. DeepMind Lab dataset

      How is this dataset structured? There is no "fixed" dataset in the DeepMind Lab repo

    2. showing good variability over the irrelevant factors

      Not really. For the "white suitcase" scene it only differs in wall colors and floor colors, but the "black and white" representation of the scene is the same. Essentially there could be a way larger range of scenes where a white suitcase appears.

    3. blue wall

      Is "blue wall" a compositional concept or an atomic one?

    4. small, round, red

      Are these "features" hand-crafted?

    5. few example images of an apple paired with the symbol “apple”

      This is not unsupervised data

    1. unlabeled set of image pairs

      It's kind of labelled because they know that an action has taken place between the images, just not what action it is.

    1. vψ (sτ )

      What is the difference between this and \(V_\lambda\)?

    2. ataset of past experience

      Where does this data come from? Random exploration?

    3. finite imagination horizon

      What's the alternative, infinite imagination horizon? Seems impossible

    1. blocks1-5 (arm, 5 blocks)

      Why only up to 5 blocks?

    2. in many casesoptimally

      What does it mean to solve them optimally?

    3. In one case, the input data corresponds to one or morestate graphs Gi assumed to originate from hidden planninginstances Pi = 〈D, Ii〉 that need to be uncovered

      Isn't the domain needed in order to generate the state graph?

    1. a latent policy via behavior cloning

      How is this done?

    2. π(ot)

      How is this value known?

    3. Moveover

      Moreover?

    4. is

      Remove

    5. before and after the action of inter-est is taken

      Does every next observation depend on an action, or can the environment change "by itself"?

    6. which predicts which action at was taken by the agent between consecutive obser-vations ot and ot+1

      How is this trained when the action is not known?

    1. prior distribution over programs likely to solve tasks inthe domain

      What does this prior distribution mean? The probability of the program to solve any task in the domain? Is there even any programs that would solve multiple tasks?

  8. Dec 2023
    1. We selected the tasks on which Tassa et al. (2018) reportnon-zero performance from image inputs

      Why?

    1. Finally, we call a PDDL Planner as the de-terministic solver to obtain A, a plan to accomplishthe goal CSL under the predefined scenario.

      With what PDDL domain?

    1. Task descriptions are constructed using PDDL and symbolicplans are generated using the FAST-DOWNWARD planner

      To generate a symbolic plan, an initial state (problem file) needs to be given. How does this looks like? Is there only three problem files (one for each problem) representing some "general" state? Shouldn't the initial plan depend on the initial state?

    1. Ourset-up automatically parses LLM-generated language intoa program using our synthetic grammar

      How?

      Also, how do they handle cases where the parser generates incorrect PDDL? Wouldn't that give the LLM-as-planner a worse score that it actually should have?

    2. The P+S modeloutputs executable PDDL actions

      How do you make sure of this?

    1. Even with high Exec, some task GCR are low, becausesome tasks have multiple appropriate goal states, but weonly evaluate against a single “true” goal

      This seems like an unfair way to evaluate the model

    2. SR is the fraction of executionsthat achieved all task-relevant goal-conditions

      How are the goal-conditions specified and where do they come from?

    3. We provide the available objects in theenvironment as a list of strings

      How are these objects retrieved? Automaticall or manually?

  9. Nov 2023
    1. “The bowl can also be a container to fillwater”, will be added to the task planner.

      Where does this come from? The LLM? Template?

    1. perfectly match gold visual semantic plans us-ing only the text directives as input

      Where do they say how they provide the state representation to the model?

    2. Generated strings from all models arepost-processed for common errors in sequence-to-sequence models, including token doubling,completing missing bigrams (e.g. “pick <arg1>”→ “pick up <arg1>”), and heuristics for addingmissing argument tags

      Probably won't generalize well to new domains

    3. The ALFRED dataset contains 6,574gold command sequences

      Didn't the "Understanding Language in Context" paper mention that it was around 8k data samples?

    1. astly, the goal predicates for each problem were generatedfrom the "PDDL parameters" field of every data sample.

      What is this field?

    2. hus, we have created a PDDL domain file usingour knowledge of the objects and actions in the ALFRED world and a PDDL problem file for eachsample

      I assume that the domain file is created manually, but are the problem files also created by hand? If so that seems like a lot of work, since the dataset has 8,055 visual samples, the same amount would be needed to be handcoded.

    3. Since in our task we ignorethe vision part of the data, we might encounter some duplicates between our datasets

      How do they get the scene representation from the visual data? Is this included in the ALFRED dataset?

  10. Oct 2023
    1. ALFWorld uses PDDL - Planning DomainDefinition Language (McDermott et al., 1998) to describe each scene from ALFRED and to constructan equivalent text game using the TextWorld engine.

      How is the PDDL created?