F5 and NVIDIA advance AI factory economics with new capabilities for accelerated AI inference

Share This Post

F5 BIG-IP Next for Kubernetes accelerated with BlueField DPUs improves token throughput, reduces cost per token, and enables secure multi-tenant AI infrastructure, transforming AI factories for the agentic era

F5 (NASDAQ: FFIV), the global leader in delivering and securing every app and API, today announced expanded capabilities in its ongoing collaboration with NVIDIA to accelerate and optimize AI inference infrastructures.

The expanded integration combines F5 BIG-IP Next for Kubernetes with NVIDIA BlueField-3 DPUs, creating an intelligent, telemetry-aware infrastructure layer that increases token throughput with better GPU utilization, reduces latency, and enables secure multi-tenant AI platforms at scale.

In AI systems, tokens represent the measurable unit of AI output—the words, symbols, or data fragments generated and processed during inference. The volume and velocity of token production ultimately determine user experience, infrastructure efficiency, and revenue per accelerator.

As enterprises and GPUaaS providers race to monetize AI and move from AI experimentation to revenue-generating services, infrastructure efficiency has become a defining metric. Success is increasingly measured not simply by deployed GPU capacity, but by token economics, sustained token throughput, time to first token (TTFT), cost per token, and revenue per GPU accelerator. The F5 and NVIDIA joint solution is designed to directly address these metrics.

Optimizing tokenomics through intelligent AI infrastructure

The shift from application-centric inference to agent-driven AI workflows demands new architectural approaches to optimize token throughput and reduce costs. BIG-IP Next for Kubernetes now leverages NVIDIA NIM statistics, Dynamo runtime signals, and GPU telemetry to make inference-aware routing decisions before execution. By matching workloads to the most appropriate accelerators in real time, the solution increases sustained utilization while reducing latency and re-compute.

“AI infrastructure is no longer just about access to GPU or scaling their deployments. It has evolved into maximizing economic output per accelerator,” said Kunal Anand, Chief Product Officer, F5. “Together with NVIDIA, we are enabling AI factories to treat token production as a measurable business metric. BIG-IP Next for Kubernetes provides the intelligence and governance required to increase GPU yield, reduce cost per token, and scale shared AI platforms confidently.”

Validated infrastructure efficiency: A structural uplift

The performance numbers speak for themselves. In testing validated by The Tolly Group, BIG-IP Next for Kubernetes, accelerated by NVIDIA BlueField-3 DPUs, delivered up to a 40% increase in token throughput, a 61% faster time to first token (TTFT), and a 34% reduction in overall request latency.

These are not incremental gains. By offloading networking, TLS/encryption, AI-aware load balancing, and traffic management to NVIDIA BlueField-3 DPUs, BIG-IP Next for Kubernetes preserves host CPU capacity and frees GPUs to do what they were built for: sustained, high-throughput inference at scale. The result is improved GPU utilization, reduced queuing delays, and increased token yield—enabling lower cost per token within a fixed infrastructure footprint. Critically, no model modifications were required, making these gains immediately deployable across existing AI factory infrastructure. For enterprises and NeoCloud providers competing on token economics, this is the difference between infrastructure that constrains AI output and infrastructure that accelerates it.

“NVIDIA’s accelerated computing infrastructure coupled with F5’s AI-aware Application Delivery and Security Platform unlocks superior AI factory tokenomics—delivering scalable and cost-effective inference without making any changes to the models,” said Kevin Deierling, SVP, Networking, NVIDIA. “Together, F5 and NVIDIA are empowering enterprises to scale AI factory inference efficiently and economically.”

Built for agent-driven AI and multi-tenant AI platforms

Modern AI workloads are increasingly agent-driven, persistent, and context-aware. They demand intelligent traffic control that traditional load balancing cannot provide. The enhanced BIG-IP Next for Kubernetes solution can now support:

  • Inference-aware routing for agentic AI workflows
  • Integration with NVIDIA DOCA Platform Framework (DPF) to simplify NVIDIA BlueField DPU deployment and lifecycle management
  • EVPN-VXLAN with dynamic VRFs for secure network-level multi-tenancy
  • Integrated security, token governance, and observability within Kubernetes AI environments

These capabilities enable enterprises and NeoCloud providers to securely share GPU infrastructure across business units or external customers while preserving performance isolation and predictable service levels.

A control plane for AI factory economics

F5 and NVIDIA provide enterprises with validated tools and best practices to optimize inference architecture. With these advancements, BIG-IP Next for Kubernetes is positioned to become a strategic control plane for AI factory economics, governing token consumption, optimizing traffic flows, and maximizing infrastructure return on investment.

Rather than overprovisioning to compensate for inefficiencies, organizations can now extract greater economic value from every GPU already in production. The result is improved revenue per GPU, lower operational overhead, and scalable AI services built for sustained growth. By combining NVIDIA’s infrastructure telemetry and DPU acceleration with F5’s traffic intelligence and security capabilities, the companies are helping enterprises transform AI factories into efficient, monetizable platforms ready for the agentic era.

Related Posts

Vietnam pushes local crypto exchanges as Hanoi moves to block offshore trading: Reuters

Vietnamese firms are racing to secure licences for the...

Building Resilience in the Age of AI

With AI deployments continuing to rise across industries, operational...

CFTC Issues No-Action Letter for Crypto Wallet Provider Phantom

The no-action position taken by the US regulator under...

World launches agentkit with Coinbase-backed x402 to verify human identity behind AI agents

As AI agents increasingly transact, shop, and act autonomously...

Phantom wins CFTC no-action relief, clearing path for crypto wallet access to regulated derivatives markets

Phantom, a developer of self-custodial crypto wallets particularly popular...

Tokenized RWA Market Hits $27B as US Treasury Products Lead Growth

The tokenized real-world asset ( RWA) market pushed past...