As discussed in Part-1 of this blog series (Designing Kubernetes on Google Cloud), with container images, we confine the application code, its runtime, and all the required libraries dependencies in a pre-defined format, to create and provision one or more containers. Till now we are focusing at running containers on a single host. Which cloud be acceptable in quality assurance environments, for applications’ testing and development. However, practically, in production environments, we deal with cluster(s) of hosts, containers and applications where we aim to ensure they are always fault-tolerant, resources used optimally, scalable, elastic, capable to update/rollback without any downtime, as well as easy to manage and control through a single controller/management unit, after connecting multiple nodes together in a cluster, in a similar fashion how VMs are managed and controlled from a single interface. This controller or management system is generally referred to as a containers’ orchestrator.
Like with the orchestra music team, an orchestrator with containers is a must, to be capable enough, to deal with production containerized solutions at scale, and to able to provision, monitor, manage and control the solution effectively and reliably.
This blog, discusses why we should use containers’ orchestrator, the possible implementations options of containers orchestrator, with focus on Kubernetes architecture.
Today, there are several containers’ orchestrators options available, such as:
- Docker Swarm: is a container orchestrator provided by Docker, Inc. It is part of Docker Engine.
- Mesos Marathon: is one of the frameworks to run containers at scale on Apache Mesos.
- Amazon ECS: Amazon Elastic Container Service (ECS) is a hosted service provided by AWS to run Docker containers at scale on its infrastructure.
- Kubernetes: was started by Google, but now, it is a part of the Cloud Native Computing Foundation project.
This blog series will focus on Kubernetes, in specific Kubernetes on Google Cloud Platform (GKE), because today Kubernetes is supported and deployable On-Prem as well as on different public cloud platforms such as AWS & GCP, its becoming the most deployed containers’ orchestrator.
According to Cloud Native foundation “Kubernetes is the world’s most popular container-orchestration platform and the first CNCF project. Kubernetes helps users build, scale and manage modern applications and their dynamic lifecycles. First developed at Google, Kubernetes now counts more than 2,300 contributors and is used by some of the world’s most-innovative companies, across a wide range of industries.”
In addition, today, Kubernetes offers the flexibility of choosing from different deployment environments, such as: On-Prem using bare metal or VMs as well as on the cloud that can be public, private, hybrid or multi-cloud. Moreover, like any successful open source project, Kubernetes has a very big and growing community worldwide today, as well as There are meet-up groups in different cities and Special Interest Groups (SIGs), that mainly focus on certain interests areas, such as scaling, bare metal, networking, etc. apart from the technical architecture simplicity, flexibility and modularity, these aspects are key to make Kubernetes most popular container-orchestration platform as stated by CNCF
Kubernetes architecture from 10000 feet view, is a client-server architecture model or Master-worker in Kubernetes terms. A worker node in Kubernetes, previously known as a minion. If you are fan of the “Despicable Me” animation movie, you can think of their master as the cluster orchestrator and the minions as the worker nodes, that take orders and perform what the master requested them to do.
In general, a worker node can be a VM or physical machine, it is based on the cluster environment. Because we are focusing on Kubernetes on GCP, it will be always a VM.
Let’s zoom-in more, from 10000 feet to 1000 feet view, to look at Kubernetes architecture with some more details.
From a 1000 feet view, Kubernetes architecture consists of the following main components and sub-components:
- One or more master nodes: the master node is the gateway for all the administrative tasks to build and manage containers’ clusters, as it provides the cluster’s control plane functions including scheduling function, deployments’ scaling and responding to the various cluster’s events. The following are the main components of a Master node in a Kubernetes cluster:
- API server: This is simply, the interface used by the Kubernetes master control plane to receive administrative REST commands and to interact with the cluster worker nodes. This means that, when system admins interact with Kubernetes cluster using the CLI interface, they are technically intracting with the master API Server component. Following any change, the new state of the cluster will be stored in the cluster etcd.
- Scheduler: This Component typically handle pod to node allocation for the newly created pods that still not allocated to any node, in which it selects a desirable node for them to operate. The “desirable node” here means that the scheduler will look at the different required attributes by the pod(s), before selecting the suitable node, such as hardware resource requirements, data locality, affinity, anti-affinity, etc.
- Controller manager: the controller manager consists of different multiple separate process (controllers), each one is aware about the state of specific objects that its manage, and watches their current state through the API server. Such as Node Controller, that monitor and respond when nodes go down, Replication Controller: that maintains the desired number of pods for every replication controller object in the system.
- Etcd: it is a CoreOS backend distributed key-value store, used by Kubernetes to store cluster’s state and configurations. Etcd, is considered the single source of the truth, as all the information about the cluster state resides there. Although, it is covered here as part of the Kubernetes Master, it possible to deploy it externally, in which the master node(s) need to connect to it.
- One or more worker nodes: A worker node, typically it’s a machine that can be VM, Bare Metal, etc. that host and run containerized the applications using Pods, while the actual provisioning and control of these Pods on the workers’ nodes is by the master node. When a Pod is scheduled to run on a worker node, the worker node, will provide the necessary environment for that Pod to run and communicate as defined and instructed by the master node.
The following are the main components of a worker node in a Kubernetes cluster:
- Kubelet: The kubelet is an agent that runs on each worker node and communicates with the master node. It receives the Pod definition via various means (through the API server that reside on the master node control plane, as a set of ‘PodSpecs’), and runs the Pod along with the associated container(s). It also makes sure that the containers which are part of the Pods are healthy at all times.
- Container Runtime: This is the engine that used by the worker node to run and manage a container’s lifecycle, like Docker Engine’s core container runtime, and containerd (containerd is based on the Docker Engine’s core container runtime to benefit from its maturity and existing contributor).
Engineers from Google, Docker, IBM, ZTE, and ZJU have worked to implement CRI for containerd. The project is called cri-containerd. With cri-containerd, users can run Kubernetes clusters using containerd as the underlying runtime without Docker installed. cri-containerd eliminates an extra hop in the stack, making the stack more stable and efficient. With the use of Container Runtime Interface (CRI), Kubernetes kubelet can connect to any container runtime with CRI, that can be used by Kubernetes to manage Pods, containers etc. as illustrated below
- kube-proxy: In a high level, kube-proxy is the gateway of worker node to interact with the external networks and nodes as well as it enables Kubernetes to achieve service abstraction by creating and maintaining network rules on the host and performing connection forwarding, which ultimately will enable the communication to directed to the service abstraction (logical construct called a Service) rather than connecting directly to the Pod to access an application. In very high level a service groups Pods based on pre-defined certain, and when a connections come to the service, it can load balances to it across these Pods. kube-proxy keep listens to the API server to maintain its state about any new service-endpoint addition or deletion of existing one. Services and connectivity details will be discussed in more details in a separate part of this blog series.
- Pod: A Pod is an Atomic deployment and execution in Kubernetes architecture, Pod is tightly coupled to the worker node, and technically a Pod in worker can run one or more containers (is it recommended to run more than a container in a Pod? this is going to be discussed in the next part of the blog series). Kubernetes in fact, interact and control Pods rather containers, in other words, with Kubernetes you cannot run container without a Pod (Pod is like a sandbox that reserve some resources for one or more containers). A Pod in Kubernetes described as “Atomic” because: first container deployment in a Pod is either “All or nothing, so either all the containers in Pod are running or none. The second aspect, is that like VM in virtualized environment, a Pod can only be run on a single node at a given time, even if a Pod is hosting multiple containers.
The main advantage of this Kubernetes architecture is, the declarative operational model, where the system admin defines the ‘intent’ (describe what ‘not how’ the cluster should be > desired targeted state), and load it to the control plane Master node (apiserver) using manifest file(s), without including the specific CLIs or how each Pod should be deployed such provisioning or update. And then, the Master node, will deploy it, as well as, will constantly keep checking the actual current state of the cluster is matching the defined desired state, if not, it will make sure to fix it. What to fix and how, these aspects will be discussed in the subsequent blogs.