GitHub Blog00 bình luận8 phút đọc2 giờ trước

Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks

GitHub shares benchmark results comparing the GitHub Copilot agentic harness against model-vendor harnesses (Claude Code and Codex CLI) across five benchmarks: SWE-bench Verified, SWE-bench Pro, SkillsBench, TerminalBench, and an internal Win-Hill benchmark. Using four models (Claude Sonnet 4.6, Claude Opus 4.7, GPT-5.4, GPT-5.5), the Copilot harness achieves task resolution rates on par with vendor harnesses while consuming fewer tokens across most configurations. A key differentiator is multi-model flexibility — supporting 20+ frontier models — enabling users to trade off cost vs. peak quality per task. The post also details methodology, including five independent runs per configuration and controlled normalization of context windows and reasoning effort.

Đọc bài gốc

#github #ai-agents

Nguồn: https://github.blog/ai-and-ml/github-copilot/evaluating-performance-and-efficiency-of-the-github-copilot-agentic-harness-across-models-and-tasks. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

DigitalOcean13 phút4 giờ trướcAI

Run Codex in the cloud – DigitalOcean for Codex is now available

DigitalOcean giới thiệu plugin Codex Public Preview, cho phép nhà phát triển tạo Droplet (máy ảo đám mây) trực tiếp từ OpenAI Codex bằng ngôn ngữ tự nhiên. Plugin tự động cấu hình môi trường với Codex CLI, công cụ ngôn ngữ phổ biến, SSH keys và trả về liên kết truy cập, giúp quản lý dự án, cài đặt phụ thuộc, điều khiển máy ảo hay theo dõi tác vụ agent từ ứng dụng ChatGPT di động.

Là lập trình viên muốn tiết kiệm thời gian và công sức thiết lập môi trường phát triển trên cloud mà vẫn có thể sử dụng AI hỗ trợ như Codex mà không phải lo về cấu hình thủ công.

#cloud

Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks

Đề xuất cho bạn

Run Codex in the cloud – DigitalOcean for Codex is now available

Agentic Pipelines now supports OpenAI Codex

Maintaining working memory in AI agents

Cursor, GitLab and Zed agree GitHub is breaking. They disagree on how to rebuild it.

I automated my job (and it made me a better leader)

Your AI Agent Keeps Missing The Real Bottleneck. JetBrains Rider Can Fix It Now.

LinkedIn will now show recruiters which apps you actually use, not just which skills you claim

EP219: 12 Open-source LLMs