Blank white background with no objects or features visible.

TrueFoundry recognized in Gartner Hype Cycle for Platform Engineering 2026. Read the full report →

Join our VAR & VAD ecosystem — deliver enterprise AI governance across LLMs, MCPs & Agents. Become a Partner →

examroom.ai is an AI-first assessment platform for universities, certification bodies, and credentialing organizations. It runs the full examination lifecycle — from authoring questions (which the team calls “items”) to registering candidates, proctoring online and offline exams, and returning certified results the moment a candidate finishes. Some exams run for as long as 28 hours. Behind a deliberately seamless candidate experience sits a large estate of AI: more than 60 distinct AI solutions spanning computer vision for bubble-sheet (OMR) scoring, large language models for support and evaluation, proctoring, forecasting, and item selection.

As examroom.ai moved these solutions from experiments into production for hundreds of thousands of users, the hard problems stopped being about model accuracy and started being about operating AI at scale: observability, governance, cost control, and guardrails. This case study covers how examroom.ai standardized that operating layer on TrueFoundry — using the platform to ship services in hours instead of days, and adopting the TrueFoundry AI Gateway as the control plane for observability and governance across its AI estate.

“AI increases the complications when you’re not using it in the right way. But it solves the problem if you use it in the right way.” — Deepak M K, VP, examroom.ai

The challenge: production is a different beast

examroom.ai was built production-first. “examRoom.ai has a mindset of production-first,” Deepak explains. “We architect the solution not only for experimentation or POC purposes — it has to be for the production grid. Because at production, scaling and solving the problem is a different beast.”

The team learned that lesson early. In 2021, an object-detection model (built on YOLO and an RCNN-based pipeline) for exam-integrity monitoring ran flawlessly on a local server for six or seven hours of testing. In production it fell apart: the model could take up to five minutes to trigger, and it failed to detect the tiny objects that matter most for exam integrity. “We got anxious with the early success on the local machine,” Deepak recalls. “But in production, your nerves completely go down.”

Diagnosing it, the team found four compounding problems: an architecture that didn’t surface errors its developers could act on quickly; no scalable serving system; hard-coded compute with no autoscaling; and capacity assumptions that collapsed under real load. As usage climbed, memory and compute spiked unpredictably.

Operationally, deployment was its own beast. Even on managed Kubernetes with autoscaling bolted on, shipping a service took five to six days, and the team kept three engineers working full-time on ML Ops alone. “Cost in production can erode if you’re not using the right services in the right way,” Deepak notes — and at this scale, governance and observability were no longer optional.

Scaling the foundation: from 6 days to under 2 hours

The first shift was moving deployment onto TrueFoundry. Instead of hand-managing clusters, the team now pushes containers and the platform handles the rest.

“Now we don’t need to work on that, because we shifted to TrueFoundry. You just need to push the containers, and it automatically takes care of spinning up all the servers with minimum clicks. You can manage compute, you can manage cost, you can see all of it visually. That shifted from 6 days to less than 2 hours for us.”

That change freed the three dedicated ML Ops engineers to work on the product instead of the plumbing. Today examroom.ai runs 94 services on TrueFoundry across production, staging, UAT, QA, and development environments — including GPU-backed workloads for OCR, item generation, and detection — with autoscaling on spot capacity and automatic fallback to on-demand. Each service carries its own cost breakdown, request-volume and latency tracking (P50/P90/P99), pod health, and centralized logs, all in one place.

The control plane: observability and governance through the AI Gateway

With deployment solved, the harder, more strategic problem came into focus: how do you see and govern 60+ AI solutions serving 500,000 users without slowing anyone down? This is where examroom.ai is standardizing on the TrueFoundry AI Gateway as the single control plane in front of its models, agents, and tools.

“Lately, TrueFoundry has introduced the gateway as well, where we can solve the observability and the governance part.”

1. Observability that points to the fix

For examroom.ai, observability isn’t dashboards for their own sake — it’s the difference between a candidate waiting hours for a result and an engineer resolving the root cause in minutes. When a result fails to generate, the system has to tell the team why: was it a pattern-recognition issue, or did the model simply fail to read the bubbles? That distinction routes the right notification to the right admin, who reaches the right engineer.

“Earlier, understanding the problem itself was the problem. Now you just go to the logs, apply the filter, and you get to know exactly what the problem is — thanks to TrueFoundry, which gives you the action you have to take on top of it. You don’t need to debug or print out all the logs. Everything is right in front of you. You just read it and take action.”

Key practice: Treat observability as actionable, not just visible — the goal is a recommended action, not a wall of logs.

2. Governance through guardrails, not committees

Governance is often treated as a tax on AI teams. Deepak’s view is the opposite: build the guardrails correctly up front, and governance becomes automatic.

“A lot of people say AI is incapable of handling governance. But use it the right way and governance automatically comes into the picture. Create the guardrails the way you need, and you don’t have to manage the governance at all.”

A concrete example: examroom.ai runs a support assistant for candidates who hit issues during registration or payment. A candidate who paid but didn’t get a confirmation should get help with exactly that — and nothing else. “What if he starts asking about politics, or his favorite foods? That’s where the guardrail comes in,” Deepak says. “You provide a response only to the related questions — the problem the candidate faced while registering or making the payment.” The AI Gateway is where those guardrails are defined and enforced consistently across services, alongside the encryption examroom.ai uses to protect exam items, whose leak could compromise the integrity of an entire university’s assessment.

Key practice: Encode governance as enforced guardrails at the gateway, so every model and agent inherits the same policy automatically.

3. Resilience and intelligent fallbacks

At 500,000 users, a single user can trigger dozens of the 60+ AI solutions at once. The gateway and a microservice architecture let examroom.ai degrade gracefully — switching from three solutions to two when one misbehaves — without taking the whole system down. “Even though one service is down, you can take action on that, but your overall services are not going to go down,” Deepak explains. The same control plane is where the team manages model routing, virtual models, and the MCP gateway that connects its agents to tools.

Key practice: Design for fallback at the gateway, so traffic can be re-routed across models and services without altering the production system.

4. Evaluation built for scale

examroom.ai standardizes on a hybrid of RAG and fine-tuning, with automated evaluation rather than human review as the default. “If you look at the tokens, it’s not a few hundred — it’s a few thousand. It’s impossible to track manually,” Deepak says. “So you build a system that can evaluate by itself, and if it fails, of course you are there.” RAG keeps memory and guardrails easy to define on top of the knowledge base; fine-tuning handles domain-specific terminology and standardized outputs.

Results

  • Deployment time fell from 6 days to under 2 hours — a step change in release velocity.
  • Three dedicated ML Ops engineers were freed from cluster management to focus on product.
  • Scaled from 5 test users to 500,000 production users across the assessment lifecycle.
  • 94 services run on TrueFoundry across prod, staging, UAT, QA, and dev — with GPU autoscaling on spot + on-demand fallback, per-service cost tracking, and centralized logs and latency metrics.
  • Incident resolution shifted from log-spelunking to read-and-act, with filtered logs and recommended actions.
  • Observability and governance are consolidating on the AI Gateway as the single control plane for examroom.ai’s 60+ AI solutions.

Practical lessons for teams putting AI into production

Asked what separates teams that succeed with AI from those that struggle, Deepak’s advice is concrete:

  • Tie every solution to a business KPI — is it saving time or saving cost? “That’s the business impact necessary to run the organization.”
  • Standardize the delivery pattern. For a given industry, 60–70% of the problem can be solved with reusable architecture — accelerators, orchestration templates, automated evaluation pipelines, and a governance system. “On a six-month project, that’s already reducing four months.”
  • Don’t productionize anything without observability, CI/CD, and a governance system.
  • Don’t get fascinated by early success on a local machine. Production is where it counts — “keep monitoring.”

Conclusion

examroom.ai’s journey shows that the hardest part of production AI isn’t building the model — it’s operating a large estate of models safely, observably, and affordably at scale. By standardizing deployment on TrueFoundry and consolidating observability and governance on the AI Gateway, examroom.ai turned a six-day, three-engineer deployment process into a two-hour push, and built the control plane it needs to govern 60+ AI solutions for half a million users. If you’re scaling AI in production and need observability and governance that keep up, talk to TrueFoundry.

The fastest way to build, govern and scale your AI

Operate your ML Pipeline from Day 0

pipeline