Download presentation
Presentation is loading. Please wait.
1
When do we need learning?
Lecture 19-1 When do we need learning?
2
When do we need learning?
Today : Policy-gradient methods and course wrap-up. 지금까지 배운 이론들 : DP, LQR, SOS, trajectory optimization + feedback, randomized planning, stochastic + robust control. 이 모든 이론들은 좋은 장점들을 가지지만 또한 크게 두가지의 결함 혹은 가정을 가진다. (At least two big flaws / assumptions) - model of the system(O.D.E) + uncertainty : very dependent on system i.d. - all use full-state feedback : very dependent on state estimation 지금 당신앞에 팬듈럼이 있다면 이것을 세울 수 있는가? 마찰 또는 정확한 state estimator 가능?
3
When do we need learning?
Problem of designing estimator + feedback controller simultaneously. (more generally, feedback from sensors to actuators) This is “output feedback”. u y 𝒙 u Plant State Estimation Full-state Feedback y u control
4
When do we need learning?
Even for linear systems, on open problem. - LQG(Linear Quadratic Gaussian) control = Kalman filter : 가우시안 분포를 가지는 시스템 노이즈와 센서 노이즈 추가. : output feedback의 부분집합이다. 그러나 선형 시스템이라도 이 제어기는 복잡하다. u = -Ky : “static output feedback” y 𝒙 u Kalman Filter LQR
5
When do we need learning?
design of output feedback controller에 대해 알아볼 것이다. state estimate + full state feedback 이 항상 필요한 건 아니다. 모델링이 제대로 되지 않았다면 ?? 올바른 제어가 되지 않을 것이다. Learning control : not by a model-based approach, directly searching favorite example = fluid dynamics(next chapter)
6
Example : the Heaving Foil
Lecture 19-2 Example : the Heaving Foil
7
Example : the Heaving Foil
The Heaving Foil (Jun Zhang, NYU) Simple model for flapping flight : Heaving foil Symmetric flat plate. driven vertically.(prescribed pos trajectory) free to translate horizontally.
8
Example : the Heaving Foil
The Heaving Foil (Jun Zhang, NYU) 금속판이 상하로 움직이면서 회전한다. 회전할 때 소용돌이가 일어난다. particle + laser를 통해 유체의 움직임 확인.
9
Example : the Heaving Foil
The Heaving Foil (Jun Zhang, NYU) 2 stable fixed point + 1 unstable fixed point (side?) (center) 유제 역학에서의 제어는 매우 어렵다. model : Navier-stokes not useful model too complex Jun은 좋은 CFD(Computational Fluid Dynamics) 코드를 가지고 있지만 매우 느리다. 30초 시뮬레이션을 위해 하루 이상이 걸린다. 또한 속도가 느리면서 모델에 매우 민감하다. geometry와 boundary condition에도 민감하다. 이러한 문제점들을 해결하기 위해 Learning control를 사용할 수 있다.(model-free)
10
Lecture 19-3 Model free control
11
Model free control If we define cost function, 𝐽 𝑥 0 = 0 𝑇 𝑔 𝑥,𝑢 𝑑𝑡
We can do “Black-box” optimization Example: gradient descent by finite-difference 𝜕𝐽 𝜕 𝛼 𝑖 = 𝐽 𝛼+ 𝜖 𝑖 𝑥 0 − 𝐽 𝛼 ( 𝑥 0 ) 𝜖 𝑖 However, what if evaluation of 𝐽 is expensive? Minimize # of function evaluation! perturbation in coordinate i
12
Stochastic gradient descent: concept
To get gradient of J, using finite-difference, have to search every direction How about pick a random direction, only if downhill! 𝜕𝐽 𝜕 𝛼 𝑖 = 𝐽 𝛼+ 𝜖 𝑖 𝑥 0 − 𝐽 𝛼 ( 𝑥 0 ) 𝜖 𝑖 𝑑𝐽 𝑑𝛼 # of function eval. = 𝑛+1 Policy gradient reinforcement learning, Extremum-seeking control, Iterative learning control
13
Stochastic gradient descent: weight perturbation
Assume J is smooth. 𝐽 𝛼+𝛽 𝑥 0 ≈ 𝐽 𝛼 𝑥 0 + 𝜕 𝐽 𝛼 𝜕𝛼 𝛽 Δ𝛼=−𝜂 𝐽 𝛼+𝛽 − 𝐽 𝛼 𝛽=−𝜂 𝛽 𝛽 𝑇 𝜕𝐽 𝜕𝛼 𝑇 𝐸 Δ𝛼 = −𝜂 𝛽 𝛽 𝑇 𝜕𝐽 𝜕𝛼 𝑇 Small random vector Always within 90degree from the true gradient 𝜎 2 𝐼, covariance
14
Baseline Δ𝛼=−𝜂 𝐽 𝛼+𝛽 −𝑏 𝛽
Δ𝛼=−𝜂 𝐽 𝛼+𝛽 −𝑏 𝛽 𝐸 Δ𝛼 = −𝜂 𝜕𝐽 𝜕𝛼 𝑇 , but loose Lyapunov guarantee Signal-to-noise ratio : noise ~ orthogonal to gradient, signal ~ gradient : depends on the number of parameter “baseline estimate”
15
More examples of learning
A Hydrodynamic Cart-Pole Passive walking robot
16
Big Challenges giving forward
Robust output feedback(even for simple robots) w/ rich sensor/noise models. perception/vision/etc as a sensor Robust online planning Contact/Collisions. Efficiently propagate uncertainty Robust/stochastic… Walking/Grasping/Manip… Principled/rigorous control for fluids/soft-robots Exploiting structure(e.g.Lagrangion) even in existing algorithm More beautiful simple experiments
Similar presentations