The Cloud Platform team at Snowflake runs more than 80 Kubernetes clusters on AWS, Azure, and GCP. On each cluster, we run most workloads in an Istio

Blue-Green Upgrades of Istio Control Plane

submited by
Style Pass
2021-06-23 23:30:04

The Cloud Platform team at Snowflake runs more than 80 Kubernetes clusters on AWS, Azure, and GCP. On each cluster, we run most workloads in an Istio service mesh to provide, despite multi-cloud, consistent management of traffic and security. Upgrading Istio is challenging due to its high frequency, wide blast radius, and large fleet of clusters. This article presents the blue-green upgrade approach we devised and the lessons we learned.

Istio is an open-source service mesh that offers a uniform control plane to manage microservices in hybrid-cloud and multi-cloud. With the sidecar-proxy injection, Istio requires zero code change in the application and yet provides mTLS, rate limiting, service discovery, telemetry, RBAC, traffic shifting, etc.

Istio upgrades must be done often, to address new vulnerabilities and the short end-of-life of each release. Because Istio is a critical infrastructure for traffic, policy, and observability, misconfigured or unhealthy Istio components could lead to cluster-wide outages, so Istio upgrades are risky. Implementing such upgrades on all our clusters on different clouds requires scalable tooling and automation. We want to validate the new version before shifting workloads over. In case of any upgrade failure, we must be able to roll back quickly and cannot be stuck between versions.

Leave a Comment