
Microsoft Fabric's Copy job now includes native SCD Type 2 support as a built-in write method, eliminating the need for custom PySpark notebooks or complex Dataflow Gen2 merge logic. When enabled, it automatically adds Valid_From, Valid_To, and Is_Current columns to the destination table, handles row expiration on updates, and soft-deletes removed records. The post explains when to choose Type 1 vs Type 2 (a business decision, not a tooling one), walks through the configuration steps, and details current limitations: connector support is limited to Azure SQL DB, Azure SQL MI, on-premises SQL Server, and Fabric Lakehouse tables; the destination table must be created by Copy job itself; column mappings cannot be changed after enabling SCD Type 2; and CDC must be enabled on the source. For cases requiring custom change-detection logic, unsupported connectors, or fine-grained surrogate key control, notebooks remain the better option.
Nguồn: https://bartwullems.blogspot.com/2026/07/slowly-changing-dimensions-in-microsoft.html. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.
Di chuyển từ kiến trúc monolith sang microservices cần áp dụng các pattern cụ thể thay vì viết lại toàn bộ. Bốn chiến lược chính gồm: Strangler Fig (dần dần chuyển lưu lượng qua API gateway), Parallel Run (chạy song song để kiểm chứng), Collaborator (thêm microservices mới mà không sửa core), và Change Data Capture (đồng bộ dữ liệu real-time bằng Debezium/Kafka Connect). Các pattern này hiệu quả nhất khi kết hợp theo trình tự trong quá trình chuyển đổi.
Lập trình viên nên đọc bài này để hiểu cách chuyển đổi từ kiến trúc monolith sang microservices một cách chỉnh xác, ít rủi ro và tối ưu hóa hiệu suất, không phải là một thay đổi đột ngột mà là một quá trình thuần túy, có kế hoạch với các mẫu thiết kế hiệu quả.
A practical introduction to Slowly Changing Dimensions (SCD) for those new to data warehousing. Covers the core fact/dimension table split and explains all four SCD types: Type 0 (fixed attributes), Type 1 (overwrite), Type 2 (full history with surrogate keys and effective dates), Type 3 (previous value only), and Type 4 (separate current and history tables). Includes a decision framework for choosing the right type based on business needs rather than technical defaults, and notes that different columns in the same table can use different SCD types.
Agents rely on search as their primary interface to the world, but stale search results are a correctness problem rather than just a UX annoyance. The core challenge is maintaining computed search documents (denormalized entities assembled from joins, aggregates, pricing logic, and embeddings) as underlying source systems change. Batch reprocessing introduces lag; hand-rolled CDC pipelines are brittle and complex. Materialize addresses this by letting teams define computed entities as SQL views that are continuously maintained as inputs change, then propagating entity-level before-and-after diffs downstream. A companion open-source tool, Perfect Embedding, sits as a Kafka Connect SMT between Materialize and the search index, regenerating vector embeddings only when the fields that drive semantic meaning actually change.
Long-running transactions are a common but overlooked cause of rising CDC source latency in AWS DMS migrations. When a transaction stays open, DMS must buffer all subsequent changes until it commits, causing latency to climb from seconds to hours. This post explains how DMS processes transactions, how to diagnose latency spikes, and provides ready-to-use bash monitoring scripts for Oracle, PostgreSQL, MySQL, and SQL Server. The scripts detect transactions exceeding a configurable threshold (default 15 minutes), send alerts via Amazon SNS, enforce TLS/SSL connections, retrieve credentials from AWS Secrets Manager, and filter false positives. A step-by-step Oracle demo shows the full workflow from DMS task setup to alert receipt. Scheduling via crontab enables continuous proactive monitoring.
A practical walkthrough for migrating from MySQL to PostgreSQL, covering the full process: schema assessment, data type mapping (AUTO_INCREMENT, ENUM, unsigned integers, zero dates), SQL dialect rewrites (backticks, LIMIT syntax, IFNULL, GROUP_CONCAT, ON DUPLICATE KEY), stored procedure porting to PL/pgSQL, validation strategies, and low-downtime cutover using CDC. Includes copy-paste before/after SQL examples for common MySQL-isms and a comparison of migration tools (DBConvert, pgloader, AWS DMS).
Debezium 3.6.0.CR1, the first candidate release for the 3.6 milestone, introduces several enhancements: off-heap memory management for table history and schemas using RocksDB, a 75% reduction in memory allocation and 31–42% CPU overhead reduction in the MySQL connector's poll path, Spanner Omni compatibility, a monitoring REST API and multi-panel dashboard in Debezium Platform, and JDBC sink support in Platform. Oracle connector improvements include better diagnostics for missing archive logs, a unified JDBC driver version, exponential back-off for XStream attach retries, deferred transaction creation until first DML, and filtering of PRIVATE redo thread logs in RAC environments. Bug fixes address a KafkaSchemaHistory recovery race condition, PostgreSQL varbit column handling, Vitess timestamp processing, NanoTimestamp overflow for extreme dates, and CockroachDB temporal type alignment. A total of 44 issues were resolved.