TechCrunch00 bình luận3 phút đọc2 giờ trước

Arena, the AI leaderboard everyone uses, is now a $100M business

Arena, the AI model leaderboard that originated as a UC Berkeley research project in 2023, has reached $100 million in annualized run-rate revenue just eight months after launching its commercial service in September 2025. The platform is best known for its crowdsourced leaderboard built from over 10 million user evaluations, where users compare outputs from two anonymous models. Its commercial offering, AI Evaluations, provides model labs and enterprises with deep-dive performance analytics. Arena grew from $30M ARR in January 2026 to $100M ARR by June, reflecting surging demand for post-training refinement services. The company competes for budget with human labeling firms like Scale AI, Mercor, and Surge. Arena has raised $250M total, including a $150M Series A at a $1.7B valuation.

Đọc bài gốc

#llm

Nguồn: https://techcrunch.com/2026/06/29/arena-the-ai-leaderboard-everyone-uses-is-now-a-100m-business. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

Gusto Engineering16 phút3 giờ trướcAI

From Prompt to Classifier: A Production Case Study

Đội kỹ thuật của Gusto xây dựng bộ phân loại chuyển tiếp AI-sang-người cho hệ thống hỗ trợ khách hàng bằng cách bắt đầu với prompt LLM, sử dụng dữ liệu sản xuất để tạo dataset 3.500 lượt hội thoại, sau đó tinh chỉnh mô hình BERT nhẹ đạt 94% precision và 93% recall. Phương pháp LLM-đầu-tiên-sau-chuyên-biệt phù hợp cho quyết định ổn định, khối lượng lớn như phân loại intent, nhưng không hiệu quả với sinh văn bản mở hoặc quy tắc thay đổi.

Lập trình viên nên đọc bài này để hiểu cách chuyển từ việc sử dụng mô hình LLM trực tiếp sang xây dựng hệ thống chuyên biệt hiệu quả, đặc biệt là trong trường hợp phân loại quyết định cụ thể như phân luồng hỗ trợ khách hàng, giúp tối ưu hóa chi phí và tốc độ triển khai.

#machine-learning

Arena, the AI leaderboard everyone uses, is now a $100M business

Đề xuất cho bạn

From Prompt to Classifier: A Production Case Study

Inside Target’s LLM-Based System for Semantic Matching in Marketing Forecast Pipelines

The many journeys of learning Rust

Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows

AI inference is obviously profitable

Anthropic’s Mythos found flaws in classified US systems during a government test

The Exhaustion of Talking to a Tool

EP219: 12 Open-source LLMs