Kubernetes High Availability (Stacked ETCD) On-Premises Deployment
Last updated on April 5, 2025
Project Overview
This project involved deploying and managing a high-availability Kubernetes cluster (Stacked ETCD) on-premises for a cloud distributor company. The cluster was deployed on seven virtual machines running on LXD using kubeadm
and was configured for networking security, persistent storage, and monitoring. The setup integrated:
- Cilium for networking
- Istio for service mesh
- ArgoCD for GitOps
- NFS for storage
- Velero for backup and migration
- Prometheus for alerting and monitoring
The Challenge
The cloud distributor company was a relatively new business, operating for about a year. Their DevOps team was small, consisting of only two engineers: one responsible for infrastructure and the other for development. They required an infrastructure that was:
- Highly available
- Scalable
- Easy to manage
- Low on operational overhead
They needed an environment where workloads could be efficiently orchestrated, deployed, and managed without excessive manual intervention. Kubernetes became the natural choice for container orchestration, providing flexibility, scalability, and reliability.
Project Objectives
The primary goal of the project was to deploy a highly available Kubernetes cluster across multiple nodes. The key objectives included:
- Deploy a resilient and scalable Kubernetes cluster using
kubeadm
- Ensure high availability by distributing workloads across multiple virtual machines
- Implement secure networking using Cilium
- Implement service mesh using Istio
- Implement GitOps using ArgoCD
- Enable persistent storage using an NFS-backed storage solution
- Implement a robust backup and disaster recovery solution with Velero
- Provide comprehensive monitoring and alerting with Prometheus and Grafana
- Facilitate external access to services securely using Cilium Gateway API and Istio
Result
The implementation of the high-availability Kubernetes cluster provided significant improvements for the company’s infrastructure:
- Improved reliability: Workloads can continue running even if a node fails.
- Scalability: Easily scale applications across multiple nodes without downtime.
- Cost efficiency: Running on their own infrastructure reduces reliance on expensive cloud solutions.
- Portability: Kubernetes’ API enables seamless application migration across namespaces and environments.
- Faster deployments: Setting up development and staging environments in separate namespaces is quicker and more efficient.
- Enhanced security and observability: Istio and Cilium provide deep network insights and secure service communication.
- Automated deployments: ArgoCD ensures that applications remain in their desired state with minimal manual intervention.