NVIDIA Developer00 bình luận8 phút đọc2 ngày trước

Maximize AI Factory Energy Efficiency Through Full-Stack Inference and Training Optimizations

Power accounts for up to 40% of AI factory operating expenses, making performance per watt a critical metric. NVIDIA outlines a full-stack approach to maximizing energy efficiency across inference and training workloads. Key strategies include using MoE model architectures (which activate only a subset of parameters per token), narrow-precision formats like NVFP4, and tools like TensorRT-LLM and NVIDIA Dynamo for inference throughput. On the training side, collaboration with the ML.ENERGY Initiative at University of Michigan has produced energy-aware GPU scheduling in Megatron-LM that reduces idle GPU time without slowing overall training. The NVIDIA DSX platform ties it all together with real-time telemetry, dynamic power allocation, 45°C liquid cooling support, and grid-aware orchestration (DSX Flex), enabling operators to maximize tokens per watt within fixed site power budgets.

Đọc bài gốc

#nvidia #ai-inference #mixture-of-experts

Nguồn: https://developer.nvidia.com/blog/maximize-ai-factory-energy-efficiency-through-full-stack-inference-and-training-optimizations. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

XDA Developers14 phút22 giờ trướcAI

My 7-year-old GPU runs local AI perfectly, and I don't need my cloud subscriptions anymore

Các mô hình MoE và kỹ thuật lượng tử hóa (quantization) cho phép chạy AI cục bộ trên GPU cũ 8GB VRAM như RTX 2070 Super, thay thế được các gói cloud nhờ các model như Qwen3-Coder 8B hay Gemma 4 E4B. Các công cụ như Ollama (dòng lệnh) hay LM Studio (GUI) giúp triển khai dễ dàng, nhưng cần lưu ý tốc độ sinh token, kích thước cửa sổ ngữ cảnh và hỗ trợ tool calling.

Nếu bạn đang tìm cách tiết kiệm chi phí và tăng hiệu suất cho các ứng dụng AI hàng ngày mà vẫn giữ được chất lượng cao, thì bài viết này sẽ cho bạn cách tối ưu hóa mô hình AI với GPU cũ và công nghệ MoE/quantization để làm việc hiệu quả mà không cần phụ thuộc vào cloud.

Maximize AI Factory Energy Efficiency Through Full-Stack Inference and Training Optimizations

Đề xuất cho bạn

My 7-year-old GPU runs local AI perfectly, and I don't need my cloud subscriptions anymore

OpenAI and Broadcom build a chip to rival Nvidia’s Blackwell

How Businesses Are Building Specialized AI They Can Trust

TensorX and Solstice line up $1bn to finance Europe’s sovereign AI buildout

3 Agents. 3 LLMs. 1 Aging GPU: Engineering Parallel Inference on Bare Metal

Databricks’ former AI chief thinks he can cut AI’s power bill by 1,000x

The Ultimate Summer Sale Pairing: Steam Sale Meets GeForce NOW Discounts

Nvidia accidentally made the RTX 50 series feel like a beta test for the RTX 60 series