Welcome dear reader! It's time for a deep dive into the world of DNS. As you would expect for a SaaS startup, everything starts with the customer. On

A Deep Dive into DNS Debugging

submited by
Style Pass
2023-03-24 11:30:08

Welcome dear reader! It's time for a deep dive into the world of DNS. As you would expect for a SaaS startup, everything starts with the customer. On a bright sunny day, we received a support request, asking for an explanation for unusual 5 second DNS resolution times that triggered check degradations.

It seemed that for some reason, sometimes, there were >5s resolution times for the DNS resolution of that check. That was definitely too slow, so something was obviously wrong. We got in touch with NS1, the DNS provider that was responsible for the authoritative servers of this domain, but they told us everything was fine according to their monitoring. It was clear that something in between NS1 and Checkly was not working as expected.

Before we rush into solution mode, let’s take a step back and see how big the problem is, to understand what priority it should have. I used our data lake to pull 5 months worth of API check run information and find out how many >5s DNS resolution times we had since September of last year.

My default tool to debug production issues would usually be Checkly, but our API checks were never designed to provide deep insights into DNS resolution. They only give you basic timing information that we are getting from the NodeJS  request library. This is good for pointing out when there is a DNS problem for synthetic monitoring, but it does not really help you debug it. The first step of this investigation was to refresh my memory about how NodeJS and request actually do DNS resolution and measurements.

Leave a Comment