A dichotomy I often see in many different teams and projects, is the dilemma between “failing fast” and “failing gracefully.” Here’s a simpl

Alert Fast – Software the Hard way

submited by

Style Pass

2021-06-07 16:00:10

A dichotomy I often see in many different teams and projects, is the dilemma between “failing fast” and “failing gracefully.” Here’s a simplified example of what I see far too often, and gets to the core of the dilemma.

On the face of it, this looks extremely reasonable. When working with complex legacy systems, it’s not always clear whether getName() may produce an exception or return a null value. Every good engineer should aim to please their users, and users certainly hate seeing errors or “nulls”. “So of course we should check for errors/nulls and add fallback logic to handle them.”

There are a couple problems with this approach. First, it makes your system brittle. Instead of debugging and fixing the root cause of your exceptions and nulls, you’re allowing them to proliferate. Instead of having one definitive way of accomplishing a specific functionality, you’ve got multiple different implementations, used only during specific fallback conditions, and it’s unclear when, or even if, they are being exercised. Fast forward 5 years, and you’ve got a legacy system that is bloated in size, with twice the cyclomatic complexity.

“Tough luck, that’s your job,” you might say. But there’s a second problem as well which customers and product owners do care about. Every piece of fallback logic is by definition, not as good as the real thing. It worsens the user experience in some way – maybe through latency, inaccurate data, or disabling of useful features. By using fallbacks everywhere, you might prevent short-term outages, but you are slowly choking the long-term user experience.