Medium0 Hot0 bình luận7 phút đọc3 giờ trước

5. High Reliability: How Great Software Recovers Gracefully (Part 2)

Part 2 of a series on software reliability covers the SLA/SLO/SLI framework for measuring and committing to service quality, then walks through core reliability patterns: redundancy, failover, health checks, load balancing, and monitoring. Each concept is explained with relatable analogies (restaurants, hospitals, cashier queues) and grounded in real tools like NGINX, HAProxy, Prometheus, Grafana, and Datadog. The post emphasizes that reliability is about preparing for inevitable failures, not preventing them entirely.

Đọc bài gốc

#monitoring #distributed-systems

Nguồn: https://medium.com/kanak-club/5-part-2-reliability-why-great-software-isnt-the-one-that-never-fails-it-s-the-one-that-recovers-0296fbb4a4f7. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

InfoQ1 Hot4 phút6 ngày trướcAI

Dapr 1.18 Introduces Verifiable Execution, Bringing Cryptographic Trust to AI Agents and Workflows

Dapr 1.18 bổ sung tính năng Verifiable Execution, cung cấp khả năng xác minh bằng mật mã cho các ứng dụng phân tán và tác nhân AI thông qua lịch sử quy trình có chữ ký, truy xuất nguồn gốc và chính sách dựa trên bằng chứng. Bản phát hành cũng nâng cấp Jobs API lên ổn định, hỗ trợ hot reloading cho Component/Configuration, cải tiến runtime Actor cùng khả năng mạng IPv6/dual-stack.

Lập trình viên phát triển ứng dụng AI hoặc hệ thống phân tán cần đọc để hiểu cách Dapr 1.18 giúp xây dựng các giải pháp có thể chứng minh tính minh bạch, an toàn và tuân thủ quy định trong môi trường công nghệ mới, đặc biệt là khi cần chứng minh nguồn gốc và tính xác thực của các quyết định AI trong các ngành có yêu cầu nghiêm ngặt.

5. High Reliability: How Great Software Recovers Gracefully (Part 2)

Đề xuất cho bạn

Dapr 1.18 Introduces Verifiable Execution, Bringing Cryptographic Trust to AI Agents and Workflows

Kafka's log compaction corrupts data. Here's how we fixed it

Grafana 13.1 release: observability as code updates, extending Grafana Assistant across more data sources, and more

A revamped way to create and manage alerts across all your telemetry

How to use traces to avoid breaking changes

OpenClaw’s new app doesn’t run AI on your phone. That’s the whole point.

How OpenAI Delivers Low-Latency Voice AI for 900M Users

A Quadrillion Rows across three Clouds: scaling LogHouse