Product Strategy Lead
Sep 7, 2023 | 3 mins read
The rapid adoption of containerization as a fundamental cloud-native tenet
cannot be ignored. It has evolved into a significant trend in cloud-native
computing, transforming how applications are developed, deployed, scaled, and
managed.
Part of the containerization ecosystem is the need for software that orchestrates, deploys, scales, and maintains the containers that make up a software application. Moreover, apart from containers and container orchestration software, there is a requirement for hardware infrastructure that is capable of easily executing and scaling containerized applications.
As far back as June 25, 2020, Gartner published a press release forecasting “strong revenue growth for global container management software and services through 2024.” Moreover, in the same press release, Gartner noted that global revenue from container management software will reach $944 million in 2024 from $465.8 million in 2020.
Clearly, container orchestration software is here to stay. However, several container orchestration platforms are available for use as part of the cloud-native architecture or stack.
Therefore, the question is, which container orchestration software is best of breed?
The brief answer to this question is Kubernetes.
Kubernetes was initially developed by Google, open-sourced, and taken over by the Cloud Native Computing Foundation (CNCF) in 2016. Fast forward to 2020 with the publication of the Cloud Native Survey 2020, which reported that 83% of all enterprise organizations surveyed are using Kubernetes. Consequently, it is considered the de facto container orchestration platform.
While Kubernetes is a complex and sophisticated system with many moving parts, it is specifically designed to automate the development, deployment, scaling, and management of containerized applications. In lay terms, Kubernetes is very good at its core function, that of managing cloud-native, containerized applications.
The research paper published in the Journal of Cloud Computing describes Kubernetes as follows:
“Overall, Kubernetes offers a powerful and flexible solution for managing containerized applications in production environments.”
As highlighted several times in this text, one of Kubernetes' core functions is to orchestrate, automate, and manage the scaling of the containers inside Kubernetes pods.
In summary, Kubernetes offers three types of scaling: horizontal, vertical, and cluster scaling. While vertical and cluster scaling are not part of the scope of this article, they play equally important roles in ensuring the high availability of a cloud-native application. However, the sole purpose and function of this discussion is to provide an overview of horizontal scaling, as well as its benefits and challenges.
Horizontal scaling ensures that the application is horizontally elastic. The research paper titled “A Fine-Grained Horizontal Scaling Method for Container-Based Clouds” notes that horizontal scaling (or elastic scaling) results in the application’s ability to dynamically adjust based on the required workload at any given point by automatically adjusting the “number of pods in a replication controller, deployment, replication set, or stateful set based on observed CPU utilization.”
How?
The official Kubernetes documentation provides the answer as follows:
Kubernetes contains an API resource and controller known as the Horizontal Pod Autoscaler (HPA). Based on the application’s workload, the HPA automatically increases or decreases the application’s ability to handle an increased/decreased demand for resources by deploying additional pods or reducing the number of pods, respectively.
The best way to describe how the HPA horizontally scales a containerized application in and out is to consider the following use case:
Imagine the scenario of a tech startup that provides a SaaS chatbot service that provides medical and psychological support, as described in the research paper titled “Using Chatbots to Support Medical and Psychological Treatment Procedures.”
This bot is designed to provide support before and after medical procedures such as colonoscopies and hip replacements, coach patients diagnosed with chronic conditions like Diabetes and Chron’s Disease and provide essential psychological support for people diagnosed with Autism and severe depression.
The chatbot is powered by a Large Language Model (LLM) known as the MMLU dataset and a vector database. The company has added in Retrieval Augmented Generation (RAG) to prevent the LLM from hallucinating, increase its performance, improve responsiveness, and answer questions in very close to real-time.
Additionally, the chatbot is developed using a containerized microservices architecture, with the containers being deployed, scaled, and maintained using Kubernetes. Each container is deployed as part of a Kubernetes pod with the set of pods deployed in a Kubernetes Deployment. The vector database contains the LLM stored as a series of vector embeddings that, when combined with the RAG, returns the answers to patient questions quickly and efficiently.
Note: The company expects variable workloads throughout a 24-hour period with peak traffic in the evenings and over weekends.
Figure 1: The Kubernetes Workflow
To manage these variable workloads, the Kubernetes administrator has configured the HPA to automatically scale out/in (up/down) the number of chatbot service pods based on the actual versus the configured CPU utilization. The goal is to ensure that there are always enough chatbot pods running to handle the current workload (incoming queries) while avoiding over-provisioning.
Efficient resource utilization is the most significant benefit of using Kubernetes’ HPA to automate the scaling in and out of the containerized microservices. The HPA dynamically allocates resources, ensuring high availability irrespective of the workload. This leads to other benefits, such as cost savings, scalability, and optimal user experience.
Circling back to our opening statement, cloud-native technologies, including containerization, lead the way in scalable, highly available applications with 99.99% uptime, ensuring an excellent user experience at every touchpoint.
The quality of user experience is ultimately the end goal. A good user experience leads to increased adoption, resulting in sustainable organizational growth over time and, in our scenario, the ability of the startup to evolve into a mature organization.