Back to overview

Service Mesh - Part 2

Reading time approx. 4 minutes
26.09.2024

In our previous post, our Xperts explored the challenges of microservices architectures, focusing on how a service mesh can help address them. We explained the fundamentals of this infrastructure layer and introduced the most common implementations. Now, we would like to take a closer look at the features and challenges of this technology.

The Features of a Service Mesh

Monitoring
Monitoring in a service mesh enables comprehensive monitoring of the communication between microservices. This happens without modifications to the application code or the configuration of individual services. To provide a clear overview of system performance, the monitoring system captures and analyses key metrics such as error rates, latency, and requests per second.

Example: With a service mesh like Istio, all network metrics can be captured through sidecar proxies attached to each microservice instance. These proxies collect data such as HTTP status codes, response times, and the rate of incoming and outgoing requests. This information is then sent to the control plane and queried by a Prometheus instance. Kiali's dashboard, based on these metrics, visualises the performance of the entire microservices system. For example, if a service exhibits an unusually high error rate, this can be immediately detected on the dashboard, allowing the development team to take targeted action to resolve the issue.

Resilience
Resilience in a service mesh refers to the system's ability to recover from failures and maintain service availability even in the face of problems. Mechanisms that contribute to improving resilience include circuit breaking, retry mechanisms, and timeouts.

Example: A microservice in a system becomes temporarily unavailable due to an external service it relies on. A service mesh can automatically deploy circuit breakers to prevent requests from being forwarded to the faulty service. The circuit breaker detects the failure and cuts off the connection, preventing the microservice from continuing to send failed requests. At the same time, retry mechanisms can be configured to automatically resend failed requests as soon as the service becomes available again. This increases the overall stability of the system and minimises the impact of temporary disruptions.

Routing
Routing in a service mesh enables fine-grained control of traffic between microservices. This can be done in various ways, including load balancing, canary releases, and A/B testing, to ensure that traffic is distributed efficiently and purposefully.

Example: A company introduces a new version of a microservice and wants to roll it out gradually to a small group of users to detect potential errors early. With the service mesh, routing can be configured so that only a certain percentage of the total traffic is directed to the new version of the microservice. This can be done using Istio's traffic management features, where specific routing rules are defined to direct traffic based on user attributes or request types. If the new version runs stably, traffic can be gradually increased until eventually 100% of the traffic uses the new version.


Verwendung_von_Service_Mesh_2 Routing Configuration for Rolling Out New Software Versions (Source: Istio)


Security
Security is a crucial feature of a service mesh. It includes various functions such as encryption, authentication, and authorisation to secure communication between microservices and minimise potential security risks.

Example: In a highly secure environment, a company uses the Istio service mesh to encrypt communication between its microservices. Istio enables traffic encryption through mTLS (mutual TLS), ensuring that authorized services only communicate with each other. Each service receives a digital certificate, and all data exchanged between the services is encrypted. Additionally, Istio can enforce authentication and authorisation policies to ensure that only authorised users or services can access certain resources. This protects sensitive data and helps prevent security incidents.

Disadvantages of a Service Mesh

In the enthusiasm for innovations like service mesh, the associated costs are often overlooked, and they should not be underestimated. The main challenges include the learning curve, increased latency, and additional resource consumption.

Steep Learning Curve

A service mesh represents a significant intervention in the microservices architecture. Although most implementations try to keep this intervention as invisible as possible, developers need to thoroughly understand how the service proxies work and interact. The learning effort varies depending on the service mesh implementation, influenced by the quality of the API, documentation, and available tools.

Latency & Resources

The introduction of additional applications for the control plane and data plane increases resource consumption. Although the resulting latency is typically only a few milliseconds, it can become noticeable in long chains of services. Therefore, it is advisable to conduct benchmarks in your own environment to measure the impact on latency and resource consumption and to compare different service mesh implementations.

Conclusion

The hype surrounding service mesh is understandable given the extensive features such as monitoring, routing, resilience, and security. A service mesh intelligently enhances network communication without unnecessarily complicating or making the microservices architecture difficult to maintain. For companies adopting microservices, a service mesh can significantly reduce implementation effort. Existing microservice systems can also benefit from a service mesh, provided the team can handle the additional complexity and has sufficient technical resources available.

The future of service mesh will be driven by the growing demand for microservices architectures and the need for better management of these distributed applications. Even though service mesh technologies are still in development, it is expected that they will steadily increase in their maturity and performance in the coming years.


Sources

www.istio.io
www.smi-spec.io
www.redhat.com