Armin Ronacher0 Hot0 bình luận11 phút đọc2 giờ trước

Better Models: Worse Tools

Armin Ronacher investigates a regression in newer Claude models (Opus 4.8, Sonnet 5) where they emit extra, invented keys in tool call arguments that don't match the schema, causing Pi's edit tool to reject calls. The failure is context-dependent, appearing mainly in long agentic sessions. The hypothesis is that post-training on Claude Code's forgiving harness — which silently filters unknown keys and applies parameter aliases — has reduced the gradient against schema violations, making newer models worse at adhering to non-Claude-Code tool schemas. Enabling Anthropic's strict mode eliminates the issue, suggesting server-side grammar-constrained sampling is the fix. The broader concern is that as post-training becomes increasingly tied to one closed-source harness, alternative tool schemas may become implicitly off-distribution, forcing third-party harnesses to either mimic Claude Code's quirks or rely on strict mode.

Đọc bài gốc

#ai-agents #claude #reinforcement-learning

Nguồn: https://lucumr.pocoo.org/2026/7/4/better-models-worse-tools. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

Medium1 Hot11 phút7 giờ trướcAI

Claude Sonnet 5: A Security Deep Dive for AI Agent Deployments

Claude Sonnet 5 cải thiện đáng kể khả năng chống tấn công prompt injection cho các hệ thống AI agent, giảm tỷ lệ thành công tấn công từ ~50% (Sonnet 4.6) xuống dưới 1% (và gần 0% với safeguards tích hợp). Mặc dù từ chối yêu cầu độc hại tăng từ 76,6% lên 92,4%, nhưng cũng dẫn đến từ chối cao hơn cho các tác vụ bảo mật hợp pháp. Sonnet 5 vượt trội hơn Sonnet 4.6 nhưng thấp hơn Opus 4.8 trong đánh giá khả năng tấn công mạng, với safeguards mặc định giảm điểm tấn công xuống 0 trên hầu hết tiêu chuẩn.

Lập trình viên xây dựng hệ thống AI agent phải đọc bài này để hiểu cách cải thiện an toàn chống lại tấn công prompt injection và các rủi ro bảo mật mới trong triển khai, từ đó tối ưu hóa thiết kế hệ thống mà không phụ thuộc vào các giải pháp bảo vệ bên ngoài.

Better Models: Worse Tools

Đề xuất cho bạn

Claude Sonnet 5: A Security Deep Dive for AI Agent Deployments

Write code not specs

Please stop the AI Confidence Theater

Grounding LLMs: How Function Calling Makes AI Actionable

Is your site ready for AI agents? Lighthouse now has an answer

Prompt, Context, Harness & Loop Engineering

ACP vs MCP: What's the difference for agentic coding?

The Solo Operator Manifesto: Engineering Systems in the Agentic Era