NVIDIA0 Hot0 bình luận4 phút đọc3 giờ trước

How NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

NVIDIA's full-stack inference software, codesigned with its GPU and networking hardware, is enabling companies like Baseten, Cognition, Deep Infra, Together AI, and Cursor to dramatically reduce cost per token in production AI workloads. On the Blackwell platform, the stack has cut token costs by up to 5x for DeepSeek V4 within a month. The stack operates across three layers — production orchestration, application acceleration, and infrastructure access — and when combined, techniques like disaggregated serving, NVFP4 precision, and multi-token prediction compound to deliver up to 20x throughput gains. The open source ecosystem, particularly PyTorch and CUDA-native frameworks like vLLM and SGLang, accelerates this further by enabling day-zero deployment of new frontier models on Blackwell hardware.

Đọc bài gốc

#nvidia #ai-infrastructure #cuda #ai-inference

Nguồn: https://blogs.nvidia.com/blog/inference-software-lowest-token-cost. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

NVIDIA1 Hot4 phút3 giờ trướcAI

NVIDIA BioNeMo Agent Toolkit Brings Accelerated AI to Life Sciences Researchers in Claude Science

NVIDIA BioNeMo Agent Toolkit tích hợp các khả năng khoa học GPU-accelerated (như NVIDIA Parabricks, RAPIDS-singlecell, nvMolKit) vào Claude Science, cho phép các nhà nghiên cứu mô tả nhiệm vụ bằng ngôn ngữ tự nhiên (như dự đoán cấu trúc protein) để AI orchestrate thực hiện. Toolkit này là mã nguồn mở, framework-agnostic, có sẵn trên GitHub, trong khi Claude Science đang trong giai đoạn public beta.

Lập trình viên chuyên về AI sinh học nên đọc để khám phá cách tích hợp công nghệ GPU cao cấp của NVIDIA vào các pipeline nghiên cứu sinh học sinh thái, giúp tối ưu hóa hiệu suất và mở rộng khả năng tự động hóa cho các dự án liên quan đến gen, phân tử và dữ liệu sinh học thông minh.

How NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

Đề xuất cho bạn

NVIDIA BioNeMo Agent Toolkit Brings Accelerated AI to Life Sciences Researchers in Claude Science

Claude Meets Blackwell Ultra: Anthropic’s Models Now Run on NVIDIA GB300 in Azure

AI inference is obviously profitable

OpenAI and Broadcom build a chip to rival Nvidia’s Blackwell

“Bring it to our shop”: Workday’s pitch for keeping AI agents close to your most valuable data

AI won't be powered by better models alone, says Oxylabs CEO Vytautas Savickas

How Businesses Are Building Specialized AI They Can Trust

Micron and Anthropic sign a multi-year AI memory supply deal