Snyk00 bình luận16 phút đọc3 giờ trước

Snyk VulnBench JS 1.0: LLM Bug Repeatability

Snyk ran 300 repeated vulnerability-finding scans across 10 JavaScript fixtures to measure how repeatable LLM-based security reviews are compared to deterministic SAST. Key findings: LLM reference-matched findings were highly stable (85% consistent across all 5 runs), but extra LLM-only reports were highly inconsistent — nearly 50% appeared in only 1 of 5 identical runs. The best LLM configuration (Claude Opus 4.6 Medium) reached 75.4% F1 against Snyk Code's reference set, leaving a 24.6-point gap. More expensive models (Claude Opus 4.7 Max) cost 5.7x more but scored lower. LLMs excelled at high-signal exploit shapes (command injection, SQLi, SSRF) but missed systematic patterns like repeated path traversal sinks and resource-limit findings. The data supports combining LLM review with SAST rather than replacing one with the other.

Đọc bài gốc

#security #llm #appsec

Nguồn: https://snyk.io/blog/snyk-vulnbench-js-1-0-llm-security-review-repeatability. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

daniel.haxx.se17 phút7 giờ trướcAI

Do excellent vulnerability reports

Bài viết hướng dẫn cách viết báo cáo lỗ hổng bảo mật (vulnerability reports) chất lượng cao cho các dự án nguồn mở, do người duy trì dự án curl chia sẻ. Các khuyến nghị chính gồm: viết đoạn giới thiệu rõ ràng, cung cấp mã tái hiện lỗi (reproducer) độc lập, gửi kèm bản vá (patch) nếu có thể, chỉ rõ phiên bản bị ảnh hưởng, tuân thủ kênh gửi báo cáo ưu tiên của dự án và sẵn sàng hợp tác trong suốt quá trình. Ngoài ra, bài viết cũng đề cập cách thức viết advisory bảo mật và nhấn mạnh tôn trọng thời gian hạn chế của các maintainer tình nguyện.

Lập trình viên nên đọc bài này để cải thiện chất lượng báo cáo lỗ hổng cho các dự án mở nguồn, tránh gây khó khăn cho các maintainer và tăng cơ hội được giải quyết nhanh chóng.

#security

Snyk VulnBench JS 1.0: LLM Bug Repeatability

Đề xuất cho bạn

Do excellent vulnerability reports

The many journeys of learning Rust

Why Every Agent Vulnerability is a Trust Boundary Failure

Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows

AI inference is obviously profitable

Okta is the first to bring AI agent governance inside FedRAMP boundaries

FFmpeg fixes PixelSmash flaw in widely used video decoder

Anthropic’s Mythos found flaws in classified US systems during a government test