Kubernetes High Availability (Stacked ETCD) On-Premises Deployment

Last updated on April 5, 2025

Morphius HA Cluster Topology

Project Overview

This project involved deploying and managing a high-availability Kubernetes cluster (Stacked ETCD) on-premises for a cloud distributor company. The cluster was deployed on seven virtual machines running on LXD using kubeadm and was configured for networking security, persistent storage, and monitoring. The setup integrated:

Cilium for networking
Istio for service mesh
ArgoCD for GitOps
NFS for storage
Velero for backup and migration
Prometheus for alerting and monitoring

The Challenge

The cloud distributor company was a relatively new business, operating for about a year. Their DevOps team was small, consisting of only two engineers: one responsible for infrastructure and the other for development. They required an infrastructure that was:

Highly available
Scalable
Easy to manage
Low on operational overhead

They needed an environment where workloads could be efficiently orchestrated, deployed, and managed without excessive manual intervention. Kubernetes became the natural choice for container orchestration, providing flexibility, scalability, and reliability.

Project Objectives

The primary goal of the project was to deploy a highly available Kubernetes cluster across multiple nodes. The key objectives included:

Deploy a resilient and scalable Kubernetes cluster using kubeadm
Ensure high availability by distributing workloads across multiple virtual machines
Implement secure networking using Cilium
Implement service mesh using Istio
Implement GitOps using ArgoCD
Enable persistent storage using an NFS-backed storage solution
Implement a robust backup and disaster recovery solution with Velero
Provide comprehensive monitoring and alerting with Prometheus and Grafana
Facilitate external access to services securely using Cilium Gateway API and Istio

Result

The implementation of the high-availability Kubernetes cluster provided significant improvements for the company’s infrastructure:

Improved reliability: Workloads can continue running even if a node fails.
Scalability: Easily scale applications across multiple nodes without downtime.
Cost efficiency: Running on their own infrastructure reduces reliance on expensive cloud solutions.
Portability: Kubernetes’ API enables seamless application migration across namespaces and environments.
Faster deployments: Setting up development and staging environments in separate namespaces is quicker and more efficient.
Enhanced security and observability: Istio and Cilium provide deep network insights and secure service communication.
Automated deployments: ArgoCD ensures that applications remain in their desired state with minimal manual intervention.

Download PDF