On September 30th 2021, Slack had an outage that impacted less than 1% of our online user base, and lasted for 24 hours. This outage was the result of

The Case of the Recursive Resolvers

submited by
Style Pass
2021-11-29 12:00:11

On September 30th 2021, Slack had an outage that impacted less than 1% of our online user base, and lasted for 24 hours. This outage was the result of our attempt to enable DNSSEC — an extension intended to secure the DNS protocol, required for FedRAMP Moderate — but which ultimately led to a series of unfortunate events.

The internet relies very heavily on the Domain Name System (DNS) protocol. DNS is like a phone book for the entire internet. Web sites are accessed through domain names, but web browsers interact using IP addresses. DNS translates domain names to IP addresses, so that browsers can load the sites you need. Refer to ‘What is DNS?’ by Cloudflare to read more about how DNS works and all the necessary steps to do a domain name lookup.

DNS as a protocol is insecure by default, and anyone in transit between the client and the authoritative DNS name server for a given domain name can tamper with the response, directing the client elsewhere. DNS has a security extension commonly referred to as DNSSEC, which prevents tampering with responses between the authoritative DNS server of the domain name (i.e. slack.com.) and the client’s DNS recursive resolver of choice. DNSSEC will not protect the last mile of DNS — which is the communication between the client and their DNS recursor — from a MiTM attack. While we are aware of the debate around the utility of DNSSEC among the DNS community, we are still committed to securing Slack for our customers.

Leave a Comment