Camera calibration is a deep topic that can be pretty hard to get into. While the geometry of how a single camera system works and the physics describing lens behaviour is well-known, camera calibration continues to be mystifying to many engineers in the field.
To the credit of these many engineers, this is largely due to a wide breadth of jargon, vocabulary, and notation that varies across a bunch of different fields. Computer vision *seems* as if it is a distinct field unto itself, but in reality it was borne from the coalescence of computer science, photogrammetry, physics, and artificial intelligence. As a result of this, the language we use has been adopted from several different disciplines and there's often a lot of cross-talk.
Today, we want to explore one of the fundamental aspects of the calibration problem: choosing a model. More specifically, we're going to take a look at how we model cameras mathematically, and in particular take a look at existing models in use today and some of the different approaches they take. For this first essay, I want to specifically take a look at modeling the focal length, and why the common practice of modeling \\(f_x\\) and \\(f_y\\) is sub-optimal when calibrating cameras.
We've covered [camera calibration](https://www.tangramvision.com/blog/calibration-from-scratch-using-rust-part-1-of-3) in the past, as well as [3D coordinate transforms](https://www.tangramvision.com/blog/rotate-scale-translate-coordinate-frames-for-multi-sensor-systems-part-2). We'll need a basic understanding of these to define what we call the [collinearity function](https://en.wikipedia.org/wiki/Collinearity_equation). Beyond the above, having some least-squares or optimization knowledge will be helpful, but not necessary.