Class 01 - Q&A! #25

andrewrosemberg · 2025-08-22T22:37:53Z

andrewrosemberg
Aug 22, 2025
Maintainer

👋 Welcome!

We’re using Discussions as a place to:

Ask questions you’re wondering about the contents of the class.
Share ideas.
Engage with other students.
Welcome others and be open-minded. Remember that this is a community we
build together 💪.

andrewrosemberg · 2025-08-22T23:51:32Z

andrewrosemberg
Aug 22, 2025
Maintainer Author

2 great videos on a bit of motivation for the physics as an optimization topic:

1 reply

andrewrosemberg Sep 4, 2025
Maintainer Author

Great post on Implicit and Explicit ODE solvers: https://www.linkedin.com/posts/chrisrackauckas_implicit-ode-solvers-are-not-universally-activity-7369269826608971776-Nn9n

andrewrosemberg · 2025-08-25T19:03:11Z

andrewrosemberg
Aug 25, 2025
Maintainer Author

We had a great question after the first class from someone with an (NN-Based) RL background:

When or why should I frame my multi-period (robotics) problem as an optimization problem with non-trivial feasibility constraints? Can't I penalize constraints in the objective function and simulate my environment enough times so that I learn to be feasible and optimal?

My understanding of their question:

I presented the multi-stage problem as an expanding tree where you have to both consider the feasibility of your decisions and cleverly manage how to tackle the impact of future (uncertain) stages:

In (Deep) RL literature, $h(\mathbb{x}_t, \mathbb{u}_t) \geq 0$ are usually trivial box constraints. Most of the time, with no constraints on the state and only bounds on the action that can be easily clamped at the output of the parametric policy model:

$$\mathbb{u}_t = \text{clamp}(\pi(\mathbb{x}_{t-1}), (\underline{ \mathbb{u}}, \bar{ \mathbb{u}}))$$

Constrained RL methods deal with state constraints using modified versions of the penalty method we will mention in the 2nd lecture.
However, without access to an underlying model of the feasible region (or at least a close approximation) that can be exploited by the methods we will go over (Interior Point Methods, Augmented Lagrangian Methods, and other similar methods), I haven't found (constrained) model-free RL methods that can perform well in even slightly more complex constraints than a polytope with very few sides.

I am sure there must be clever (yet complex) methods that can converge for a few special cases, but my interpretation of their question is that there is this overall idea among some people that you can just use slightly modified versions of classical RL methods to solve any constrained multi-stage problem.

It would be great to show when penalty methods fail or even when they take incredibly longer to converge than the alternative.

2 replies

klamike Sep 3, 2025
Maintainer

Not really an answer to the question, but I do think it's important to note that there is some value in just writing down the "full"/"real" constrained multi-stage problem, even if you never plan on directly solving it. It lets you be explicit about what you relax/simplify in order to be able to solve it within time limits. Especially since there may be several options for each relaxation/simplification with different tradeoffs. For example, instead of moving all constraints to the objective all the time, as Andrew said, you may want to keep some simple constraints that are often binding as exact, since they are easy to project to. On the other hand, there may be some complicated constraints that are almost never binding, and so it you may want to just use a simple penalty for those.

andrewrosemberg Sep 4, 2025
Maintainer Author

Another crucial reason (as highlighted in CMU 16-745 – Lecture 7) to model or state the optimization problem explicitly—i.e., by identifying constraints, dynamics, and uncertainty—is the following:

Many reinforcement learning (RL) works tend to bundle modeling error and uncertainty together, sometimes to justify using model-free RL instead of optimal control or classical optimization methods. The reasoning of the RL advocates goes as follows: While the latter are very effective at solving such problems, they rely on having an accurate model of reality—which is often unavailable. Model-free RL is a better alternative as it avoids bias from potentially incorrect models and could therefore yield better results.
However, as pointed out in the lecture, RL algorithms typically assume certain regularity conditions about the stochastic process governing the uncertain data. These assumptions generally do not hold when the uncertainty arises from modeling errors in system dynamics. As a result, mis-modeling errors cannot simply be treated as “uncertainty,” and ignoring this distinction can undermine the validity of the RL approach.

andrewrosemberg · 2025-08-29T13:13:57Z

andrewrosemberg
Aug 29, 2025
Maintainer Author

Another question we had after the first class came from someone with an optimization background:

For mechanical systems in which the Lagrangian is given by

$$L = \underbrace{\frac{1}{2} v^{\top}M(q)v}_{\text{Kinematic Energy}} - \underbrace{U(q)}_{\text{Potential Energy}}$$

where $v = \frac{\partial q}{\partial t}$: And Lagrangian Mechanics is trying to minimize:

$$\mathcal{S}[q(\cdot)] \;=\; \int_{t_0}^{t_f} L\!\bigl(q(t),\; \dot q(t)\bigr)\,dt,$$

The student posed:

It is not immediately trivial that this function is convex, and 2) while Kinematic Energy (which seems that we are trying to minimize) is lower bounded by 0, the Potential Energy (the term we are maximizing) is unbounded. Why do things not keep going "up"? (increase in potential)

3 replies

andrewrosemberg Aug 29, 2025
Maintainer Author

Before proving convexity, one thing that I did to just compare 2 possible trajectories for a ball starting at 0 height with no velocity:

1. Stays still at the origin;
1. Moves upward with a small constant velocity.

"""
    discrete_action(qs, vs; Δt, m=1.0, g=9.81)

Compute the discrete action using the **midpoint rule**:
S ≈ Σ_k L( (q_k + q_{k+1})/2, (q_{k+1} - q_k)/Δt ) Δt
for 1D vertical motion with z-up convention (U = m*g*q).

Notes:
- `vs` is ignored for the midpoint action (kept only for compatibility).
- Returns per-step arrays evaluated at midpoints (length N-1) and total S.
"""
function discrete_action(qs::AbstractVector, vs::AbstractVector; Δt, m::Real=1.0, g::Real=9.81)
    N = length(qs)
    @assert length(vs) == N "qs and vs must have the same length."

    # Midpoint positions and finite-difference velocities
    qmid = @views 0.5 .* (qs[1:end-1] .+ qs[2:end])
    vmid = @views (qs[2:end] .- qs[1:end-1]) ./ Δt

    # Energies at midpoints
    T = 0.5 .* m .* (vmid .^ 2)
    U = m .* g .* qmid
    L = T .- U

    # Discrete action
    S = Δt * sum(L)

    # Return also the midpoint time grid for convenience
    return (S=S, L=L, T=T, U=U, qmid=qmid, vmid=vmid)
end

But this simulation is not showing yet the correct idea - the ball going up has less action. Perhaps I am missing something still!

andrewrosemberg Aug 29, 2025
Maintainer Author

Found this https://physics.stackexchange.com/questions/144077/least-action-principle-numerical-simulation-strangeness

PedroGatech Sep 8, 2025
Collaborator

Building on the answer from the Stack Exchange that Andrew just sent (that we have the constraint that position and velocity are correlated through $\delta v = \dfrac{d}{dt} \delta x$), that constraint leads to the constraint that the total energy is conserved. That is, the kinetic energy (K) could only increase if the potential energy (U) decreases, and vice versa: $\delta K = - \delta U$,
So the only way to gain U is to lose K, that's why it doesn't make sense for a block that starts at rest to gain infinite U. To do that, it'd need to lose K.

ivanightingale · 2025-09-17T19:53:07Z

ivanightingale
Sep 17, 2025
Collaborator

Can you describe how to derive the coordinate transformation in the unicycle example?

1 reply

andrewrosemberg Sep 17, 2025
Maintainer Author

Great question, I was hoping someone would try to do it as I had to do :)

Unicycle model

$$\dot p_x = v\cos\theta,\qquad \dot p_y = v\sin\theta,\qquad \dot\theta = \omega,$$

and the rotated coordinates

$$x= \begin{bmatrix} x_1\\x_2\\x_3 \end{bmatrix} = \begin{bmatrix} p_x\cos\theta + (p_y-1)\sin\theta\\[2pt] -\,p_x\sin\theta + (p_y-1)\cos\theta\\[2pt] \theta \end{bmatrix},\qquad y=\begin{bmatrix}x_1\\x_2\end{bmatrix}.$$

In these coordinates the dynamics are

$$\dot x = \begin{bmatrix} v+\omega x_2\\[2pt] -\omega x_1\\[2pt] \omega \end{bmatrix}.$$

See if you can start from the rotation matrix and get to these.

Class 01 - Q&A! #25

Uh oh!

andrewrosemberg Aug 22, 2025 Maintainer

👋 Welcome!

Replies: 4 comments · 7 replies

Uh oh!

Uh oh!

andrewrosemberg Aug 22, 2025 Maintainer Author

Uh oh!

andrewrosemberg Sep 4, 2025 Maintainer Author

Uh oh!

andrewrosemberg Aug 25, 2025 Maintainer Author

Uh oh!

klamike Sep 3, 2025 Maintainer

Uh oh!

andrewrosemberg Sep 4, 2025 Maintainer Author

Uh oh!

andrewrosemberg Aug 29, 2025 Maintainer Author

Uh oh!

Uh oh!

andrewrosemberg Aug 29, 2025 Maintainer Author

Uh oh!

andrewrosemberg Aug 29, 2025 Maintainer Author

Uh oh!

Uh oh!

PedroGatech Sep 8, 2025 Collaborator

Uh oh!

ivanightingale Sep 17, 2025 Collaborator

Uh oh!

andrewrosemberg Sep 17, 2025 Maintainer Author

andrewrosemberg
Aug 22, 2025
Maintainer

Replies: 4 comments 7 replies

andrewrosemberg
Aug 22, 2025
Maintainer Author

andrewrosemberg Sep 4, 2025
Maintainer Author

andrewrosemberg
Aug 25, 2025
Maintainer Author

klamike Sep 3, 2025
Maintainer

andrewrosemberg Sep 4, 2025
Maintainer Author

andrewrosemberg
Aug 29, 2025
Maintainer Author

andrewrosemberg Aug 29, 2025
Maintainer Author

andrewrosemberg Aug 29, 2025
Maintainer Author

PedroGatech Sep 8, 2025
Collaborator

ivanightingale
Sep 17, 2025
Collaborator

andrewrosemberg Sep 17, 2025
Maintainer Author