Scaling Microservices Architectures in the Cloud

With the velocity of data growing at the rate of 50% per year, the issue of scaling a Microservices architectures is critical in todays’ demanding enterprise environments. Just creating the Microervices is not sufficient. Scaling a microservices architecture requires careful choices with respect to the underling infrastructure and as well as the strategy on how to orchestrate the Microservices after deployment.

Choosing the right Infrastructure topology

While designing an application composed of multiple Microservices, the architect has multiple deployment topology options with increasing levels of sophistication as discussed below:

1. Deployment on a single machine within the enterprise or cloud

Most legacy systems, and many existing systems today, are deployed using this simplest of topologies. A single, typically fairly powerful server with a multi-core/processor is chosen as the hardware platform and the user relies on symmetric multiprocessing on the hardware to execute as many operations concurrently as possible, while the Microservice client applications themselves may be hosted on different machines possibly hosted across multiple clouds. While this approach has worked for the first generation of emerging cloud applications, it will clearly not scale to meet increasing enterprise processing demands since the single server becomes a processing and latency bottle neck.

2. Deployment across a cluster of machines in a single enterprise or cloud environment

A natural extension of the initial approach is to deploy the underlying infrastructure that hosts the Microservices across a cluster of machines within an enterprise or private cloud.  This organization provides greater scalability, since machines can be added to the cluster to pick up additional load as required.  However, it suffers from the drawback that if the Microservice client applications are themselves distributed across multiple cloud systems, then the single cluster becomes a latency bottleneck since all communication must flow through this cluster. Even though network bandwidth is abundant and cheap, the latency of communication can lead to both scaling and performance problems as the velocity of data increases.

3. Deployment across multiple machines across the enterprise, private and public clouds

The communications latency problem of the ‘single cluster in a cloud’ approach described above is overcome by deploying the software infrastructure on multiple machines/clusters distributed across the enterprise and public/private clouds as required. Such an organization is shown in the figure below. This architecture ensures linear scalability because local Microservices in a single cloud/enterprise environment can communicate efficiently via the local infrastructure (typically a messaging engine for efficient asynchronous communication or, if the requirement is simple orchestration, then a request/reply REST processing engine). When a Microservice needs to send data to another Microservice in a different cloud, the transfer is achieved via communication between the “peers” of the underlying infrastructure platform. This leads to the most general-purpose architecture for scaling Microservices in the cloud, since it minimizes latency and exploits all of the available parallelism within the overall computation.


Cloud Diagram


Orchestration and Choreography: Synchronous vs. Asynchronous

In addition to the infrastructure architecture, the method of Orchestration/Choreography has significant affects on the overall performance of the Microservices application. If the Microservices are orchestrated using a classic synchronous mechanism (blocking calls, each waiting for downstream calls to return), potential performance problems can occur as the call-chain increases in size. A more efficient mechanism is to use an asynchronous protocol, such as JMS or any other enterprise-messaging protocol/tool (IBM MQ, MSMQ, etc.) to choreograph the Microservices. This approach ensures that there are no bottlenecks in the final application-system since most of the communication is via non-blocking asynchronous calls, with blocking, synchronous calls limited to things like user-interactions. A simple rule of thumb is to avoid as many blocking calls as one can.