GitHub - ECE750-Group-5/Proactive-Circuit-Breaking-For-Istio

Proactive Circuit Breaking For Istio

Inspired by TCP Reno and the Adaptive Concurrency Limit feature of the Envoy Proxy
Explore our presentation »

Table of Contents

About The Project
- Built With
Getting Started
License
Acknowledgments
Limitations

About The Project

Motivations

We got the idea from two sources. First, Mendonça and etc. in their survey paper about building self-adaptive microservice systems, mentioned how the self-adaptive methods could make todays' cloud native applications more resilient. In this paper, they pointed out a research topic for self-adaptive circuit breakers.

Second, Netflix Engineering Blog has a famous article, Performance under Load, which states how circuit breakers can keep the downstream services get overwhelmed and mitigate the cascading failures.

Problem Statement

Because of runtime uncertainty and frequent code changes, it is hard to set the right circuit breaking thresholds. The traditional circuit breaking methods are not adaptive to the runtime changes.

The Envoy Proxy has a feature called Adaptive Concurrency Limit, which is a real-time adaptive circuit breaking mechanism. Inspired by TCP Vegas, a latency-based TCP congestion control algorithm, It uses the latency as a feedback to adjust the concurrency limit. However, this feature is not available in Istio (Istio Issue 25991), which is a popular service mesh platform.

The Envoy implementation have to recalibrate the latency when the concurrency limit is 1 for every measure window, introducing parameters for extra tuning and artificial unavailability.

Solution

Our solution intends to solve this problem by mimicking the TCP Reno congestion control algorithm and use CPU utilization, a more immediate signal for saturation, as the feedback.

High-Level Design

State Machine Algorithm

Multiplicative Decrease Multiplicative Increase
Random Probing

Experiment

Experiment Setup

We have three experiment group: Group A with proactive Circuit Breaking (timeline 21:50 to 22:00), Group B without any Circuit Breaking (timeline 22:00 to 22:20), and Group C with static Circuit Breaking with a predefined concurrency limit of 10 (timeline 22:25 to 22:35). For each group, we used Fortio to generate a constant load of 140 QPS and a HttpBin container as our target service with a constant resource of 20m CPU amd 78 Mi memory. We used Prometheus and Grafana to monitor the CPU utilization and the QPS of the target service.

Results

Both CPU and Latency improves. However, the latency didn't improve as much as we expected. We will need to further investigate the root cause of the high variance.

Built With

(back to top)

Getting Started

Prerequisites

You need to have minikube installed. If you don't have it, you can install it by following the instructions here.

Installation

Install Prometheus Operator, Prometheus, CAdvisor, Fortio and Grafana In the root directory, run the following command:

chmod +x set-up.sh
./set-up.sh

Configure the receiver for Prometheus AlertManager (Optional) This is example for Slack. You can use other receivers as well.

kubectl apply -f monitoring/alert

Deploy Httpbin

kubectl apply -f httpbin/httpbin.yaml

Start the proactive circuit breaker MAPE loop

python3 analyzing_planning_executing/main.py

Start Fortio load test

kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 140 -qps 140 -n 60000 -loglevel Warning http://httpbin:8000/get

(back to top)

Limitations

The current implementation is a proof of concept and is not production ready.
We haven't tested the system for system degradation and scaling-out events.
We could adopt the cubic increase function from TCP Cubic for more efficient adaptations.

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Acknowledgments

This project was developed as part of the ECE750 course at the University of Waterloo. We would like to thank our instructors, Prof. Landan and the TAs, for their guidance and support throughout the term.

We used DALLE to generate our project logo and Copilot for generating documentations.

References

“Circuit Breaking.” n.d. Istio. Accessed December 1, 2023. https://istio.io/latest/docs/tasks/traffic-management/circuit-breaking/.
Mendonça, Nabor C., Pooyan Jamshidi, David Garlan, and Claus Pahl. "Developing self-adaptive microservice systems: Challenges and directions." IEEE Software 38, no. 2 (2019): 70-79.
Yanacek , David. n.d. AWS. Accessed December 1, 2023. https://aws.amazon.com/builders-library/using-load-shedding-to-avoid-overload/.
Landau, Eran, William Thurston, and Tim Bozarth. 2018. “Performance under Load.” Medium. Netflix Technology Blog. March 23, 2018. https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581.
Netflix Opensource Software. 2023. “Concurrency Limit.” GitHub. November 29, 2023. https://github.com/Netflix/concurrency-limits/tree/master.
Allen, Tony. 2020. “Envoy, Take the Wheel: Real-Time Adaptive Circuit Breaking.” Www.youtube.com. September 4, 2020. https://www.youtube.com/watch?v=CQvmSXlnyeQ.
Allen, Tony. 2019. “Envoy GitHub Issue #7789: Adaptive Concurrency Control L7 Filter.” GitHub. July 31, 2019. envoyproxy/envoy#7789.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
analyzing_planning_executing		analyzing_planning_executing
experiment		experiment
httpbin		httpbin
monitoring		monitoring
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
istio_max_connections.csv		istio_max_connections.csv
requirements.txt		requirements.txt
set-up.sh		set-up.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proactive Circuit Breaking For Istio

About The Project

Motivations

Problem Statement

Solution

High-Level Design

State Machine Algorithm

Experiment

Experiment Setup

Results

Built With

Getting Started

Prerequisites

Installation

Limitations

License

Acknowledgments

References

About

Releases

Packages

Languages

License

ECE750-Group-5/Proactive-Circuit-Breaking-For-Istio

Folders and files

Latest commit

History

Repository files navigation

Proactive Circuit Breaking For Istio

About The Project

Motivations

Problem Statement

Solution

High-Level Design

State Machine Algorithm

Experiment

Experiment Setup

Results

Built With

Getting Started

Prerequisites

Installation

Limitations

License

Acknowledgments

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages