Things usually work until they don’t. Sidekiq background job process can explode, quietly turn off, or get stuck for a variety of reasons. Random network errors, misconfigured email clients, shortage of RAM, or disk space on Redis to name a few. Adding a correct monitoring infrastructure can save you a lot of headaches and angry calls from customers. In this blog post, I’ll describe a simple way to monitor the uptime and responsiveness of Sidekiq processes in Rails apps.
You can run those jobs periodically by using a Sidekiq Cron. I usually prefer it to Whenever or Clockwork gems. One huge advantage is that it does require an additional scheduler process. Config is simple and only requires adding a single file:
SidekiqPingJob periodically updates a Redis entry with a current time, and SidekiqPingCheckJob triggers an exception if the entry has not been updated for too long.
Can you spot the error of this setup? We’ve created a kind of a paradox situation. You’ll be notified that your Sidekiq is not responsive only if it is still responsive. If for some reason, the SidekiqPingCheckJob is not executed you’ll never get notified about the downtime. You could try to configure a more_urgent queue that, in theory, will be more responsive than the default queue. But, in the end, you’re always constrained by the fact that it’s not possible to monitor the infrastructure from the inside correctly.