Red Hat Developer00 bình luận5 phút đọc3 giờ trước

Implement GPU-as-a-Service with Kueue and NVIDIA MIG

GPU-as-a-Service (GPUaaS) addresses the common problem of expensive, underutilized GPUs in organizations by enabling self-service reservation of GPU slices. The approach uses Red Hat OpenShift with Kueue (a Kubernetes queueing and quota system) and NVIDIA Multi-Instance GPU (MIG) technology. MIG allows a single physical GPU to be partitioned into isolated slices of varying sizes, while Kueue manages resource pools, fair sharing, and quota enforcement via ClusterQueues. A custom OpenShift web console plug-in lets developers book GPU time slots through a calendar UI without writing YAML, generating native Kueue resources under the hood. Once a reservation is made, developers can deploy models from the OpenShift AI model catalog using preconfigured hardware profiles that tie deployments to their reserved MIG slice. This enables long-running inference workloads and batch jobs like fine-tuning to share GPU resources elastically under defined access policies.

Đọc bài gốc

#kubernetes #gpu #openshift

Nguồn: https://developers.redhat.com/articles/2026/06/29/implement-gpu-as-a-service-kueue. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

ITNEXT154 phút2 giờ trước

GitOps for 15,000+ Clusters: What Large-Scale Testing with vCluster Taught Us

A detailed experience report from 31 iterations of large-scale GitOps fleet management testing using Argo CD, vCluster, Sveltos, and the open-source kubara framework on STACKIT Kubernetes Engine. Key findings: Argo CD's application controller hits OOM kills around 15k–20k cached objects per hub regardless of tuning (DRY vs WET manifests, sharding algorithms, processor counts). The root cause is that object count — not cluster or application count — drives memory usage non-linearly due to per-cluster caches, diffs, and live state. Sveltos addon controller handled the same workload at roughly 2 GB RAM vs 21 GB for Argo CD, and deployed 1,000 applications across 250 vClusters in 35 minutes with sharding (17 minutes in WET/pull mode). Centralized agent mode (Mode 2) was fastest at 13–16 minutes for 1,000 apps. The main architectural lesson: at very large scale (1,000+ clusters, 5,000+ real-world applications), a single Argo CD hub is not the right tool — architecture choices matter more than tuning.

#kubernetes

Implement GPU-as-a-Service with Kueue and NVIDIA MIG

Đề xuất cho bạn

GitOps for 15,000+ Clusters: What Large-Scale Testing with vCluster Taught Us

AI inference is obviously profitable

OpenAI and Broadcom build a chip to rival Nvidia’s Blackwell

The AI memory crisis just hit DDR2, a standard from 2003, with 60% price hikes

Prioritizing Recent Messages with Go Channels

The inside scoop on alerting changes in Kubernetes Monitoring

Qt Canvas Painter: Accelerated performance using paths

How to Build a Durable, Autoscaling AI Agent with Temporal, Composio, KEDA, and Kubernetes