Bayesian Optimization, Part 1: Key Ideas, Gaussian Processes

submited by
Style Pass
2024-04-15 20:30:07

My own usage has been tamer, but also varied: I’ve used it to (a) learn distributions (Ghose & Ravindran, 2020; Ghose & Ravindran, 2019), and (b) tune SHAP model explanations to identify informative training instances (Nguyen & Ghose, 2023).

Why is BayesOpt everywhere? Because the problems it solves are everywhere. It provides tools to optimize an noisy and expensive black-box function: given an input, you can only query your function for an output, and you don’t have any other information such as its gradients. Sometimes you mightn’t even have a mathematical form for it. Since its expensive, you can’t query it a lot. Plus, function evaluations might be noisy, i.e., the same input might produce slightly different outputs. The lack of function information is in contrast to the plethora of Gradient Descent variants that seem to be fueling the Deep Learning revolution. But if you think about it, this is a very practical problem. For example, you want to optimize various small and large processes in a factory to increase throughput - what does a gradient even mean here?

I encountered the area in 2018, when, as part of my Ph.D., I’d run out of ways to maximize a certain function. And the area has only grown since then. If you’re interested, I’ve an introductory write-up of BayesOpt in my dissertation (Section 2.1) too, and Section 2.3 there talks about the general problem I wanted to solve. Much of the material here though comes from a talk I gave a while ago (slides).

Leave a Comment