8 Matching Annotations
  1. Mar 2024
    1. and an input trajectory u(t) defined over a finite interval,

      I've always been confused about this. Could you expand a little on how we know the sequence of u(t)? Where are we getting this input from? Is this some random sequence of controls? Do we need to solve another control problem before we get to the point of obtaining a trajectory to then obtain an optimal control (u(t)*) for it? Isn't the idea of optimal control to get the desired inputs (optimized for some cost) from an initial state to a desired state? In that sense all we know is the start and the end and then find the u(t). A planning method might get us the state trajectory but we still need to somehow know the u(t) that makes that planned trajectory feasible. Thanks!!

    1. then people will eventually start worrying again about efficiency (a bit like we think about efficiency for automobiles).

      Or batteries and power electronics will allow longer flights despite the inefficient, constant motor action. I believe that vertical takeoff and landing, small size, cost effectiveness and the fact that a common need is to hover or very slow directional motion is the primary use case for quadrotors. I do understand that it goes against the ethos of utilizing the dynamics (instead of overriding them through brute force), but I guess sometimes you need to waste energy to achieve the task?

    1. For rotational coordinates, instead of passing in θ directly, we typically pass in both sin⁡θ and

      I am wondering if this statement is supposed to be here. It felt a little disconnected from the rest of the paragraph. Perhaps some more explanation of why this follows from the comment on the chosen architecture of the neural network being used would help understanding the idea here?

    2. we can apply this same algorithm to any number of dynamical systems virtually without modification

      I am a little confused as to how DP is able to get the optimal policy for the double integrator and the pendulum problem. Is it trivial to code this (value iteration) to work on the double integrator and pendulum problem from scratch? Grid world is easy because the mapping from actions to states is easy. But for the integrator and pendulum problem, since you discretize the input, how do we know which state is the next state if we use the inputs in range linspace(-1, 1,9) (integrator prob.). I looked into the actual FittedValueIteration function, it seems there is some barycentric interpolation taking place. Is that the only way to obtain this? I was thinking of implementing it from scratch but actually got a little confused as to how to achieve the action-state mapping to access the correct value for the next state. Am I missing something very basic? I apologize that the question is too long and please let me know if it is not possible to answer it here or where I could look for an answer. Thank you for the lectures!!

  2. Oct 2023