Redpanda00 bình luận7 phút đọc2 ngày trước

Bridge Queries in Redpanda SQL

Redpanda SQL introduces bridge queries, a feature that lets you query live Redpanda topic data and historical Iceberg table data together through a single virtual SQL table. This eliminates the classic streaming-to-lakehouse tradeoff between data freshness and Parquet file quality. Previously, frequent flushes to Iceberg were needed for low-latency analytics, resulting in thousands of tiny Parquet files, poor compression, high S3 costs, and constant compaction overhead. With bridge queries, the topic itself serves the freshness gap at query time, so Iceberg flushes can happen on a longer cadence (hours instead of seconds/minutes), producing larger, analytics-optimized Parquet files. Configuration involves raising the iceberg lag target and optionally the flush size threshold. The result is lower query latency, reduced S3 costs, and no need for a compaction service.

Đọc bài gốc

#big-data #apache-iceberg #olap #redpanda

Nguồn: https://www.redpanda.com/blog/bridge-queries-in-redpanda-sql. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

Redpanda112 phút5 giờ trướcAI

Kafka's log compaction corrupts data. Here's how we fixed it

Apache Kafka có lỗ hổng trong cơ chế log compaction khiến dữ liệu bị hỏng do xung đột giữa compaction và replication, gây ra bốn vấn đề: dữ liệu đã xóa tái xuất hiện, giao dịch bị hủy hiện dưới dạng đã commit, dữ liệu đã commit bị ẩn, và consumers read_committed bị đóng băng partition. Redpanda Streaming khắc phục bằng giao thức compaction phối hợp, sử dụng các cặp offset (MCCO/MTRO, MXFO/MXRO) để đảm bảo tombstones và transaction markers không bị xóa trước khi tất cả replicas xử lý xong. Lỗi này có thể tái hiện trên Kafka phiên bản 3.9 đến 4.2 bằng Docker Compose.

Lập trình viên cần đọc bài này để hiểu cách giải quyết vấn đề lỗi race condition trong log compaction của Kafka, giúp tránh mất dữ liệu và bảo đảm tính nhất quán khi xử lý các trường hợp đồng bộ hóa dữ liệu trên nhiều broker.

Bridge Queries in Redpanda SQL

Đề xuất cho bạn

Kafka's log compaction corrupts data. Here's how we fixed it

How Vibe.co handles billions of ad impressions with ClickHouse Cloud

Announcing DuckDB 1.5.4 Variegata

EDB converges analytics on Postgres to support AI agents

How Daikin Applied Americas builds consistent data pipelines at scale with Genie Code

How to Eliminate Training-Serving Skew in MLOps (2026)

The emergence of the web data infrastructure layer for AI

Build a Governed Databricks Workspace with Pulumi