TrueFoundry enables a cost-effective approach for deploying machine learning workloads by leveraging Kubernetes on EKS, offering 40–60% cost reductions compared to running similar workloads on SageMaker.
TLDR: 40–60% Cost Savings for AI/ML Workloads
1) With No Markup on Compute Costs, save 15–30% on EC2 instances compared to SageMaker
2) Configure spot instances easily for both training and inference, unlike SageMaker (training only), saving up to 70%
3) Simplifies resource scaling to zero for notebooks, CPUs, and GPUs, reducing dev costs by 30–70%
4) Replaces S3 read/write operations with EFS caching, cutting data costs by 20%
5) Uses advanced serving frameworks like VLLM and SGLang, lowering latency and boosting throughput by 40%
6) Dynamically adjusts resources with infra on autopilot , slashing production costs by 40–50%
7) Supports fractional CPUs and GPUs, saving 20–50% on compute costs
No markup on Compute Costs
No markup on compute costs is a key factor in the total savings, delivering 15–30% savings on compute instance expenses alone.

Seamless support for using Spot Instances
Truefoundry makes it very easy to configure spot instances and seamlessly switches instance underneath if spot instance is reclaimed, with savings upto 70%.

Truefoundry uses Karpenter and configures it optimally which has advanced algorithms to fetch the correct instance type based on availability and cost using AWS Apis. On the other hand, while SageMaker supports Spot Instances for training jobs, it does not extend this capability to inference services.
Comparing Inference costs on a few popular inference instances -

Auto-shutdown and Scale to 0
Truefoundry provides auto-shutdown feature for CPU, GPU, Notebook and SSH instances, with 30-70% savings in development
This results in automatically shutting down machines when developers are not using the compute - for e.g. Jupyter Notebooks, or demos hosted like Streamlit and Gradio.

In Sagemaker, its possible to configure autoshutdown on Jupyter notebooks, but it is quite cumbersome for data scientists to configure which results in them not configuring it at all. In Sagemaker, there is no auto-shutdown for GPU instances.
Native support for volumes
Sagemaker recommends reading and writing data to S3 during training iterations. This results in massive read and write costs on S3, specially if multiple data scientists are training models on the same data. Truefoundry supports caching the data in volumes, which have much lower read write costs compared to S3, with ~20% reduction in S3 read/write costs. This approach is widely used by companies like Salesforce and Netflix to reduce read and write costs.
Amazon S3 becomes costly due to per-request pricing for high-frequency reads.

Lower latency and higher throughput of models
Truefoundry natively supports advanced serving frameworks like Sglang, VLLM which can provide higher throughput with lower latency.
Truefoundry takes this a step further by automatically recommending the optimal model server based on the model architecture and use case, eliminating guesswork for data scientists leading to ~40% reduction for LLMs and Triton supported models
In contrast, SageMaker's default choice often involves large images that may not be optimized for specific workloads. This requires data scientists to manually select and test optimal configurations, leading to inefficiencies.
Autopilot Feature to automatically reduce cost
Truefoundry automatically analyzes the running workloads and suggests optimizations in cost possible based on the requested resources and actual usage, incoming traffic, etc. This has shown cost optimization upto 40-50% in some cases. Sagemaker doesn’t have any autopilot feature.

Fractional CPU and Memory
Truefoundry provides support for fractional CPU compute and memory which allows multiple workloads to run in one machine. The bin-packing provides 20% or more savings in CPU workloads. This is the same reason why Kubernetes can utilize resources better than running workloads on VMs. In Sagemaker, the minimum cpu / memory units are same as the VM specific configuration provided by AWS.
Fractional GPU
Truefoundry supports both timeslicing based and MIG based GPU partitioning, leading to ~40-50% savings on GPU compute. This allows developers to run multiple workloads on a single GPU machine and scale it out seamlessly. This is very crucial since GPU resources are very expensive and sharing them can lead to massive cost reduction. Sagemaker doesn’t provide fractional GPU support.

Case Study
A prominent gaming platform, faced a monthly bill of $40,000 for running their machine learning workloads on SageMaker. By transitioning to TrueFoundry's cost-optimized platform, they were able to dramatically reduce their expenses to just $6,000 per month. This 85% cost savings was achieved without compromising on scalability, performance, or ease of use.
External case studies have also highlighted significant cost reductions when transitioning from SageMaker to EKS. For instance, organizations like LeBonCoin have reported 30–40% savings after migrating their machine learning workloads from SageMaker to Kubernetes-based EKS.. Read more - https://medium.com/leboncoin-tech-blog/migrating-our-machine-learning-platform-from-aws-sagemaker-to-kubernetes-kubeflow-166c56f40e5c