The Best Things in Life Are Model Free

submited by
Style Pass
2024-11-07 00:00:07

This is the tenth part of “An Outsider’s Tour of Reinforcement Learning.” Part 11 is here. Part 9 is here. Part 1 is here.

Though I’ve spent the last few posts casting shade at model-free methods for reinforcement learning, I am not blindly against the model-free paradigm. In fact, the most popular methods in core control systems are model free! The most ubiquitous control scheme out there is PID control, and PID has only three parameters. I’d like to use this post to briefly describe PID control, explain how it is closely connected to many of the most popular methods in machine learning, and then turn to explain what PID brings to the table over the model-free methods that drive contemporary RL research.

PID stands for “proportional integral derivative” control. The idea behind PID control is pretty simple: suppose you have some dynamical system with a single input that produces a single output. In controls, we call the system we’d like to control the plant, a term that comes from chemical process engineering. Let’s say you’d like the output of your plant to read some constant value $y_t = v$. For instance, you’d like to keep the water temperature in your espresso machine at precisely 203 degrees Fahrenheit, but you don’t have a precise differential equation modeling your entire kitchen. PID control works by creating a control signal based on the error $e_t=v-y_t$. As the name implies, the control signal is a combination of error, its derivative, and its integral:

I’ve heard differing accounts, but somewhere in the neighborhood of 95 percent of all control systems are PID. And some suggest that the number of people using the “D” term is negligible. Something like 95 percent of the myriad collection of control processes that keep our modern society running are configured by setting two parameters. This includes those third wave espresso machines that fuel so much great research.

Leave a Comment