We will phrase the problem in terms of a measurement of error. For example, let’s try to achieve a given square error in a regression problem. That is, how hard it it to estimate a number given a sample from a larger population?
A simplified version of our modeling problems is the following. There is an unobserved population of real values vj where j is an integer ranging from 1 to p. We have access to a sample yi where i is an integer ranging from 1 to m. Each yi was generated as a copy of vj for an independent uniform random draw of j from the integers 1 through p, with re-use or replacement allowed. We want to estimate the average value of the vs from our ys. This is the usual formulation of working out an estimate on training data, and asking how well the estimate will work on future data drawn from the same population.
We are interested in how far off the visible sample estimate sample_estimate := (1/m) sumi=1...m yi is from the unobserved true population mean true_value := (1/p) sumj=1...p vj. We will quantify this as “square error” or (true_value - estimate)^2. Square error is an example of a loss or criticism: smaller is better, and in this case zero represents perfection.