Elasti is an innovative open-source solution designed to optimize Kubernetes resource usage by enabling services to scale down to zero during idle periods and scale back up on demand. Built with a dual-component architecture—a Kubernetes controller and a request resolver—Elasti seamlessly manages service availability while minimizing costs. This post aims to be a technical walkthrough of its architecture, installation, and operational flows, ensuring you can integrate and extend Elasti effectively in your Kubernetes environments.
💡This feature is included within Truefoundry’s autoscaling suite. For additional details, please refer to the documentation.
The Scale-to-Zero Landscape
While Kubernetes offers robust scaling capabilities through HPA and solutions like KEDA, scaling to zero replicas remains challenging. Existing approaches typically fall into two categories:
- Native KEDA Scaling - While KEDA can scale deployments to zero using event metrics, this creates a window where incoming requests may be lost during the scale-up cold start period.
- Full-Service Meshes (e.g., Knative) - Provide comprehensive scale-to-zero but require significant architectural changes and carry high operational overhead.
- HTTP Proxies (e.g., KEDA HTTP Add-on) - Maintain persistent proxy layers that introduce latency even after services scale up.
Why Elasti?
Elasti was created to address these limitations with three key design goals:
- Lightweight Integration - Work with existing deployments/services without code changes.
- Zero Proxy Overhead - Get out of the request path once services scale up.
- Cost-Optimized Scaling - True zero-scale without maintaining always-on components.
How it works
Elasti comprises of two core components that work in tandem to manage service scaling:
Controller (Operator):
- Monitors ElastiService resources in your Kubernetes cluster.
- Dynamically scales services between 0 and 1 based on real-time traffic metrics.
Resolver:
- Acts as a proxy to intercept incoming requests when the service is scaled down.
- Queues these requests and notifies the controller to scale the service back up, ensuring that no request is lost.

Steady state flow of requests to services
In this mode, all the requests are handled directly by the service pods. The Elasti resolver doesn't come into the request path. Elasti controller keeps polling prometheus with the configured query and checks the result with threshold value to see if the service can be scaled down.

Scale down to 0 when there are no requests
If the query from prometheus returns a value less than the threshold, Elasti will scale down the service to 0. Before it scales to 0, it redirects the requests to be forwarded to the Elasti resolver and then modified the Rollout/deployment to have 0 replicas. It also then pauses Keda (if Keda is being used) to prevent it from scaling the service up since Keda is configured with minReplicas as 1.

Scale up from 0 when the first request arrives.
Since the service is scaled down to 0, all requests will hit the Elasti resolver. When the first request arrives, Elasti will scale up the service to the configured minTargetReplicas. It then resumes Keda to continue autoscaling in case there is a sudden burst of requests. It also changes the service to point to the actual service pods once the pod is up. The requests which came to ElastiResolver are retried till 6 mins and the response is sent back to the client. If the pod takes more than 6 mins to come up, the request is dropped.


Getting Started
Deploying a simple application with Elasti
- Creating a local cluster
minikube start
or
kind create cluster --name elasti-demo
or
Create a local cluster with Docker Desktop
- Setting up Prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set alertmanager.enabled=false \
--set grafana.enabled=false \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
Install and set up prometheus in the monitoring
namespace
Prometheus will be used to read metrics from nginx ingress which will then be used by elasti to query metrics based on which it will decide when to scale a service to and from zero.
- Setting up nginx ingress
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--set controller.metrics.enabled=true \
--set controller.metrics.serviceMonitor.enabled=true \
--create-namespace
Deploys an nginx controller in the ingress-nginx
namespace
The controller will be used to route traffic to our demo httpbin service.
4. Setting up Elasti:
helm repo add elasti https://charts.truefoundry.com/elasti
helm repo update
helm install elasti oci://tfy.jfrog.io/tfy-helm/elasti \
--namespace elasti --create-namespace
Installing Elasti with helm in namespace elasti
Once Elasti is installed, you should see it's two key components running:
- Controller/Operator: Manages the traffic switching and scaling by monitors metrics.
- Resolver: Intercepts, queues and proxies incoming requests
For more advanced configurations, check out values.yaml to see all configuration options in the helm value file.
- Deploy a demo application
kubectl create namespace elasti-demo
kubectl apply -n elasti-demo -f \
https://raw.githubusercontent.com/truefoundry/elasti/refs/heads/main/playground/config/demo-application.yaml
Deploying a httpbin service in the elasti-demo
namespace
This httpbin service will be used to demonstrate how to configure a service to handle traffic via elasti.
- Creating an ElastiService Resource:
Create a yaml file with the following config for an ElastiService.
apiVersion: elasti.truefoundry.com/v1alpha1
kind: ElastiService
metadata:
name: httpbin-elasti
namespace: elasti-demo
spec:
minTargetReplicas: 1
service: httpbin
cooldownPeriod: 5
scaleTargetRef:
apiVersion: apps/v1
kind: deployments
name: httpbin
triggers:
- type: prometheus
metadata:
query: sum(rate(nginx_ingress_controller_nginx_process_requests_total[1m])) or vector(0)
serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
threshold: "0.5"
demo-elasti-service.yaml
Once the file is created, apply the ElastiService
kubectl apply -f https://raw.githubusercontent.com/truefoundry/elasti/refs/heads/main/playground/config/demo-elastiService.yaml
A few key fields in the CRD spec are:
minTargetReplicas
: Min replicas to bring up when first request arrives.cooldownPeriod
: Minimum time (in seconds) to wait after scaling up before considering scale downtriggers
: List of conditions that determine when to scale down (currently supports only Prometheus metrics)scaleTargetRef
: Reference to the scale target similar to the one used in HorizontalPodAutoscaler.
For more details and configuring an ElastiService for your use case, please refer this doc.
Testing the setup
With these steps, you now have:
- Ingress nginx running as your Ingress controller.
- Prometheus set up to scrape metrics (including those from Ingress NGINX).
- httpbin deployed and accessible via an Ingress route.
This configuration helps you test real-world routing scenarios and monitor the performance and metrics of your ingress traffic.
To test this setup, you can send requests to the nginx load balancer and monitor the pods of our demo service.
kubectl port-forward svc/nginx-ingress-nginx-controller \
-n ingress-nginx 8080:80
Port forward to the nginx controller
kubectl get pods -n elasti-demo -w
Start a watch on the httpbin service
Now you can send a request to http://localhost:8080/httpbin
and you can see the service being scaled to 1 replica by elasti.
curl -v http://localhost:8080/httpbin
Send a request to the httpbin service
The service will then be scaled down again after no activity for cooldownPeriod
seconds specified in the ElastiService (5 seconds in this case).
Uninstalling Elasti
To uninstall Elasti, you will need to remove all the installed ElastiServices first. Then, simply delete the installation file.
kubectl delete elastiservices --all
helm uninstall elasti -n elasti
kubectl delete namespace elasti
Comparisons
Feature Comparison Table:
When to Choose Elasti
Elasti is the best choice when you:
- Need to add scale-to-zero capability to existing HTTP services
- Want to ensure zero request loss during scaling operations
- Prefer a lightweight solution with minimal configuration
- Need integration with existing autoscalers (HPA/KEDA)
Final Words
Elasti was developed out of the necessity to address a specific challenge in Kubernetes: implementing true scale-to-zero without sacrificing request integrity or imposing excessive overhead. This solution supports native autoscaling with HPA and KEDA, ensuring that existing service configurations remain unchanged while achieving efficient resource utilization.
By open-sourcing this tool, we aim to provide a robust solution for environments that require genuine scale-to-zero, zero request loss, and a minimal operational footprint.
We welcome contributions and feedback from the community—explore the development doc for more details.