I started reading empirical software engineering research because I was working with scientists and became embarrassed by how many of the things I did and taught were based on personal experience and anecdotes. Was agile development really better than designing things up front? Were some programming languages actually better than others? And what exactly does "better" mean in sentences like that? My friends and colleagues in physics, ecology, and public health could cite evidence to back up their professional opinions; all I could do was change the subject.
My interests have shifted over the years, but software engineering and scientific computing have always been near the center, which makes this set of papers a double pleasure to review. The first, Hatton1994, is now a quartery of a century old, but its conclusions are still fresh. The authors fed the same data into nine commercial geophysical software packages and compared the results; they found that, "numerical disagreement grows at around the rate of 1% in average absolute difference per 4000 fines of implemented code, and, even worse, the nature of the disagreement is nonrandom" (i.e., the authors of different packages make similar mistakes).
Hatton1997 revisited this result while also reporting on a concurrent experiment that analyzed large scientific applications written in C and Fortran. This study found that, "…C and Fortran are riddled with statically detectable inconsistencies independent of the application area. For example, interface inconsistencies occur at the rate of one in every 7 interfaces on average in Fortran, and one in every 37 interfaces in C. They also show that…roughly 30% of the Fortran population and 10% of the C…would be deemed untestable by any standards."