Its taking us quite long to get our models into production and derive impact from them. Is there a way we can empower data scientists to take charge of this process?
The biggest reason we have found for delays in timelines is dependency between teams and lack of skillsets with different personas. TrueFoundry makes it easy for data scientists to train and deploy on Kubernetes using Python and also allows infra teams to setup security constraints and cost budgets. TrueFoundry configures the cluster for GPU auto-provisioning and shutdown, hence managing costs and avoiding human mistakes.
ML Engineers are heavily reliant on DevOps/platform teams for infra needs to train or deploy models
TrueFoundry integrates with your existing infrastructure including cloud infrastructure, Kubernetes cluster, Docker registry, Git repository and secret managers. It provides an abstraction layer over infra which is easy to understand for data scientists and ML engineers while leaving it fully configurable by the Infra team.
We want to use our standard Kubernetes infrastructure for ML training and deployments
TrueFoundry is Kubernetes native and it works over EKS, AKS and GKE (standard and autopilot clusters). ML requires a few custom things compared to standard software infrastructure - like dynamic node provisioning, GPU support, volumes for faster access, cost budgeting and developer autonomy. We take care of all the nitty-gritty details across the clusters so that you can focus on building the best applications over a state of the art infrastructure.
Data scientists don’t want to deal with infra or YAML
We provide Python APIs - so you never need to interact with YAML. We do provide YAML support also if you want to use it in your CI/CD pipelines.
We want our data to stay inside our own cloud or on-prem
TrueFoundry gets deployed completely on your own own Kubernetes cluster. The data stays in your own VPC, docker images get saved in your own docker registry and and all the models stay in your own blob storage system.
Models are deployed with autoscaling configured using HPA - but autoscaling is very slow because of download time of models.
We mount models in a shared volume across the deployment pods that decreases loading time of models and allows autoscaling to be much faster. We also configure autoscaling based on request count instead of cpu / memory which allows for much faster scaling out.
We want to host Jupyter notebooks and make it self serve with flexibility to provision resources, while putting some cost constraints on cost and security.
We have put in a lot of effort to run Jupyter notebooks on Kubernetes in a seamless way. Data scientists can configure the resources that they need and time of inactivity after which the notebooks are automatically stopped. The notebooks can be resumed in a single click with all the state being persisted. This allows DS to work independently with their own set of dependencies and also save cost.
How to keep track of all models inside the company in one place, figure out which ones are deployed in what environment?
TrueFoundry provides a model registry which can track which models are in what stage, and the schema and API of all the models in the registry.
How do I mirror or split traffic to my new version of the model so that we can test it on online traffic before rolling it out completely?
We have worked on efficiently mirroring or splitting traffic on models which allows data scientists to test models without rolling them out in production completely.
We want to use hardware and compute across clouds (AWS, GCP, Azure) and on-prem. How do I connect them so that developers don’t need to worry about the underlying compute and seamlessly move workloads from one environment to other?
We have put in a lot of effort to make sure we take care of the nitty gritty differences of the Kubernetes clusters across clouds. Developers can write the same code and deploy it in any environments without worrying about the underlying infrastructure. We take care of checking if underlying components of Kubernetes are installed, modifying ingress and resources automatically.
We want to use the power of LLMs for our business but we cannot let the data out of our environment. Is there any way to utilise the power of LLMs without sending my data to OpenAI?
TrueFoundry allows you to deploy and finetune the open-source LLMs on your own infrastructure. We have already figured out the best settings for the most common open source models so that you don’t need to do the hard work.
How do I allow all my developers to quickly try out different LLMs and see what results they can get out of it?
We host an internal LLM playground where you can decide which LLMs you want to whitelist for the company developers including internally hosted ones and different developers can experiment with the internal data.
We are incurring a lot of cost on our ML infra and its becoming difficult to track and reduce it.
We expose the cost visibility of services to developers and provide insights to reduce the cost. All our current customers have seen atleast 30% cost reduction after adopting TrueFoundry.