SREs are a rare bunch in the software community. But there’s little denying that the approach of Site Reliability Engineering is the future of software operations.
Ask any developer what they’re working on and you’ll see a tiny sliver of the whole codebase. That makes sense for the kind of work that is coding.
Because they have a scope spanning the entirety of a software system, SREs can end up working on various types of problems. Some problems may be well-defined like spooling up infrastructure based on known demand.
Other problems may be more abstract like working out how to cost-effectively autoscale a service that has inconsistent usage patterns and needs high performance.
Most developers work within some kind of agile framework like Scrum or XP. Some SREs also do that for planned software build work. That essentially timeboxes their efforts. That might work for estimable problems but does not always work for production-level work.
Can an SRE stop working on a problem because it does not fit into the mould of a sprint? That could spell disaster for production software. Daniel Wilhite answers the question of “Can scrum be used effectively by SRE teams?” very well.