NVIDIA Developer00 bình luận13 phút đọc1 ngày trước

Accelerating BEV Pooling on NVIDIA GPUs for Physical AI Applications

BEVPoolV3 is a new CUDA kernel optimization for bird's-eye-view (BEV) pooling used in autonomous vehicles and robotics. The post walks through a practical GPU optimization workflow: classify whether the working set fits in L2 cache, remove redundant scatter traffic via a five-array INT32 scatter map, implement interval-owned scatter-reduce to avoid atomics, and validate with NVIDIA Nsight Compute. On RTX PRO 6000 Blackwell Max-Q (large L2), BEVPoolV3 FP8 achieves up to 42x speedup over the V2 baseline. On RTX A6000 (small L2, DRAM-bound), the adapted FP16 path reaches 19x speedup. The post also explains why FP8 outperforms NVFP4 for L2-resident scatter-reduce workloads, and how the same methodology applies to sparse embeddings, voxelization, and other irregular memory-bound kernels.

Đọc bài gốc

#gpu #computer-vision #cuda

Nguồn: https://developer.nvidia.com/blog/accelerating-bev-pooling-on-nvidia-gpus-for-physical-ai-applications. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

TechCentral13 phút1 ngày trướcAI

OpenAI and Broadcom build a chip to rival Nvidia’s Blackwell

OpenAI và Broadcom hợp tác phát triển chip AI tùy chỉnh Jalapeño nhằm cạnh tranh với Nvidia Blackwell và Google TPU, nhắm vào workloads inference. Chip này đã được thử nghiệm với mô hình GPT-5.3-Codex-Spark và dự kiến triển khai vào cuối năm 2025, trong khi tình trạng thiếu hụt HBM đang ảnh hưởng đến biên lợi nhuận của Broadcom.

Lập trình viên nên đọc bài này để hiểu cách các công ty lớn như OpenAI và Broadcom hợp tác phát triển chip AI chuyên dụng, giúp tối ưu hóa hiệu suất cho các mô hình lớn như GPT-5.3, ảnh hưởng trực tiếp đến hiệu năng và chi phí của các ứng dụng AI trong tương lai.

#openai

Accelerating BEV Pooling on NVIDIA GPUs for Physical AI Applications

Đề xuất cho bạn

OpenAI and Broadcom build a chip to rival Nvidia’s Blackwell

Qt Canvas Painter: Accelerated performance using paths

The AI memory crisis just hit DDR2, a standard from 2003, with 60% price hikes

Toward More Controllable AI Video Editing: An Early Research Exploration at Netflix

3 Agents. 3 LLMs. 1 Aging GPU: Engineering Parallel Inference on Bare Metal

What's New in WebGPU (Chrome 149-150)

AMD Contributes ONNX Runtime Backend To FFmpeg DNN Filter

Sail raises $80M to make AI agents cheaper to run