Data, unlike some wines, do not improve with age. The contrary view, that data are immortal, a view that may underlie the often-observed tendency to recycle old examples in texts and presentations, is illustrated with three classical examples and rebutted by further examination. Some general lessons for data science are noted, as well as some history of statistical worries about the effect of data selection on induction and related themes in recent histories of science.
We commonly encounter repeated use of the same data sets in statistical exposition; that is, in textbooks, in lectures, and as examples in theoretical papers. These data sets may be termed classical, even though they may be of recent origin. They can supply a link to reality without the need of a lengthy explanation. But how strong is that link? I will argue that data have a limited shelf life. To see what data look like after too long a time on the shelf, see Figure 1.
Figure 1. The Hyrtl Skull Collection. Reproduced with permission from The Mütter Museum of the College of Physicians of Philadelphia.