Snowflake Data Engineering Deep Dive: Adaptive Refresh, Cortex Code & Iceberg CLDs

Snowflake Data Engineering 2026: Smart Pipelines & Feature Deep Dive

Snowflake · Data Engineering · Smart Pipelines · June 2026

Snowflake Data
Engineering:
Smart Pipelines & the
2026 Feature Breakdown

From adaptive refresh to AI-powered pipeline builders and open lakehouse via Iceberg — Snowflake Summit 2026 shipped features that fundamentally change how data engineers build production pipelines. Here's the full technical breakdown.

By Arabinda Mohapatra
Published June 20, 2026
Read time ~16 min
SCROLL TO READ
2.8×
Faster Dynamic Table refresh on Gen2 warehouses (June 2026 GA)
5
Refresh modes in Dynamic Tables: INCREMENTAL, FULL, AUTO, ADAPTIVE, CUSTOM_INCREMENTAL
GA
Cortex Code generally available in Snowsight Workspaces since March 2026
0
Bytes of data moved with Catalog-Linked Databases — zero-copy cross-cloud interop

Snowflake's "Smart Pipeline"
Architectural Shift

Snowflake Summit 2026 didn't drop incremental improvements. It shipped what Snowflake is calling a transition to "Smart Pipelines" — a paradigm where the platform handles refresh strategy, query optimization, code generation, and cross-cloud data federation without requiring engineers to write orchestration boilerplate, manage metadata sync jobs, or tune warehouse sizes for complex transformations.

This post goes deep on four core feature clusters: Custom Incrementalization & Adaptive Refresh for Dynamic Tables, the engine-level fast refresh upgrades, Cortex Code as an embedded AI pipeline agent, and Apache Iceberg's Cross-Cloud Auto-Fulfillment and Catalog-Linked Database (CLD) capabilities. For each, we look at the technical pain it replaces, what Snowflake is actually doing under the hood, and the reference documentation.

"Smart Pipelines automatically determine the most efficient refresh method for each cycle. No tuning required. Snowflake systematically chooses between incremental and full reinitializations to optimize for cost and prevent failures on complex queries."

— Snowflake Engineering Blog, June 2026 · snowflake.com/blog

Custom Incrementalization
& Adaptive Refresh

Before this release, Dynamic Tables supported four refresh modes: INCREMENTAL, FULL, AUTO, and starting May 2026 in Public Preview — ADAPTIVE and CUSTOM_INCREMENTAL.

The Old Way (Technical Pain Point)

Legacy Approach
  • Manually orchestrate DAGs in Airflow or Prefect for refresh scheduling
  • Track state with metadata tables (MAX(updated_at) watermarks)
  • Code-branch between incremental and full rebuild logic for late-arriving data
  • Full rebuilds on massive tables maxed out warehouse capacity and caused remote disk spilling
  • Stream-static joins, soft-deletes, and conditional branching couldn't be expressed as a single SELECT
The Snowflake Way
  • ADAPTIVE: Snowflake evaluates change volume heuristics on each refresh and auto-selects incremental vs. reinitialize
  • CUSTOM_INCREMENTAL: Write your own MERGE or INSERT refresh logic while Snowflake handles scheduling, retries, and transactional guarantees
  • Change tracking on underlying micro-partitions is enabled automatically
  • Expensive functions (Cortex AI, UDFs) are never unexpectedly reinitialized
  • State reuse and memoization across refreshes without external orchestration

Technical Depth: What Snowflake Actually Does

ADAPTIVE mode uses incremental refresh by default. Snowflake's internal cost-based heuristics evaluate whether the volume of upstream metadata mutations makes a full reinitialization cheaper than an incremental merge. If so, it reinitializes and then resumes incremental. This is particularly powerful for workloads that are usually incremental-friendly but occasionally spike — like tables receiving bulk INSERT OVERWRITE operations from an upstream ETL job.

CUSTOM_INCREMENTAL mode is for when even INCREMENTAL can't express your transformation. You provide a REFRESH USING clause containing the exact MERGE or INSERT statement Snowflake should run. Valid write targets are MERGE INTO SELF or INSERT INTO SELF. You read the current table state via FROM SELF AS cur in the USING subquery.

-- ADAPTIVE refresh: Snowflake picks incremental vs. full at runtime CREATE DYNAMIC TABLE dt_orders TARGET_LAG = '10 minutes' WAREHOUSE = transform_wh REFRESH_MODE = ADAPTIVE AS SELECT order_id, customer_id, status, updated_at FROM raw_orders; -- CUSTOM_INCREMENTAL: engineer controls the MERGE logic per refresh CREATE DYNAMIC TABLE dt_customer_state ( customer_id VARCHAR, last_status VARCHAR, updated_at TIMESTAMP_NTZ ) TARGET_LAG = '5 minutes' WAREHOUSE = transform_wh REFRESH_MODE = CUSTOM_INCREMENTAL REFRESH USING ( MERGE INTO SELF AS target USING ( SELECT customer_id, status, updated_at FROM raw_orders CHANGES(INFORMATION => DEFAULT) AT (STREAM => TRUE) WHERE METADATA$ACTION = 'INSERT' ) AS src ON target.customer_id = src.customer_id WHEN MATCHED THEN UPDATE SET target.last_status = src.status, target.updated_at = src.updated_at WHEN NOT MATCHED THEN INSERT (customer_id, last_status, updated_at) VALUES (src.customer_id, src.status, src.updated_at) ) AS SELECT customer_id, last_status, updated_at FROM raw_orders;

๐Ÿ“– Official Reference Docs

Custom Incrementalization: Full syntax, use cases (stream-static joins, soft-deletes, audit trails, stateful aggregation), and constraints. docs.snowflake.com → Custom incrementalization

Dynamic Table Refresh Modes: Side-by-side comparison of all five refresh modes, transition rules, and reinitialization triggers. docs.snowflake.com → Refresh modes

CREATE DYNAMIC TABLE syntax: Full DDL reference including the REFRESH USING clause for CUSTOM_INCREMENTAL. docs.snowflake.com → CREATE DYNAMIC TABLE

Native Fast Refresh
Performance Upgrade

Alongside the new refresh modes, Snowflake shipped under-the-hood engine upgrades for Dynamic Tables running on Gen2 warehouses. The headline number: up to 2.8× speedup for complex transformations including aggregate functions, QUALIFY / RANK filters, and Slowly Changing Dimensions (SCD Type-1) — with no SQL changes required.

The Old Way (Technical Pain Point)

Processing heavy analytical operations like window functions (ROW_NUMBER() OVER (PARTITION BY...)), structural deduplication, or complex multi-table JOINs forced the query optimizer into massive data shuffles across compute nodes. This caused high network overhead and forced temporary data to spill to local SSDs — or worse, remote cloud storage. Engineers had to manually split workloads into staggered micro-batches or build staging tables to isolate the compute footprint.

◈ Dynamic Table Incremental Refresh — Engine-Level Optimization Path
Transaction Log (Stream of changes) Micro-partition delta read only changed rows Vectorized Execution Enhanced in Gen2 QUALIFY / RANK isolation Impacted micro-partitions only — no full scan no shuffle no spill Micro-partition MERGE File compaction Partition pruning Automatic — no manual partition management Dynamic Table Updated in place Predictable scaling PERFORMANCE GAINS — GEN2 WAREHOUSES Aggregate functions (GROUP BY) up to 2.4× QUALIFY / RANK / ROW_NUMBER filters up to 2.8× SCD Type-1 (deduplication / upsert) up to 2.5× Multi-table JOIN pipelines up to 2.1×

The Snowflake Way: The engine upgrade optimizes dynamic metadata processing at the micro-partition level. It reads only the changed rows from the transaction log (Stream) and evaluates them using enhanced vectorized execution. Instead of scanning entire tables to execute a QUALIFY or RANK clause, the execution engine isolates the exact micro-partitions impacted by the incremental delta. This eliminates network data shuffling and remote disk spilling, cutting execution times and ensuring predictable warehouse scaling.

๐Ÿ“– Official Reference Docs

Optimize queries for incremental refresh: Performance patterns including the QUALIFY RANK() = 1 optimization for SCD Type 1 workloads. docs.snowflake.com → Optimize incremental refresh

What's New: Dynamic Tables Faster Refresh (Snowflake Blog): Full breakdown of engine improvements and when to choose ADAPTIVE vs CUSTOM_INCREMENTAL. snowflake.com/blog → Faster Dynamic Tables

Cortex Code:
AI-Powered Pipeline Builder

Cortex Code (also known as Snowflake CoCo — the names are used interchangeably as of June 2026) is an AI-driven autonomous agent embedded directly inside the Snowflake platform. It went generally available in Snowsight Workspaces in March 2026, with CLI availability shipping alongside it.

Unlike generic AI coding assistants that understand only repository-level context, Cortex Code is grounded in your actual Snowflake environment from the first prompt: it knows your Role-Based Access Control (RBAC) model, database schemas, table lineage, data types, pipeline dependencies, and query execution history.

The Old Way (Technical Pain Point)

Migrating complex legacy SQL dialects — Oracle PL/SQL, Teradata BTEQ scripts, or Hive HQL — into modern Snowflake SQL or Snowpark Python was a tedious manual rewrite process. When jobs failed in production, engineers had to dig through fragmented execution logs, trace nested stored procedures, and manually run SYSTEM$EXPLAIN_PLAN_WITH_DDL to identify execution bottlenecks. Building dbt pipelines, semantic models, or Cortex Analyst integrations required context-switching across multiple tools.

๐Ÿง 
Agentic Code Generation
Generate SQL queries, data pipeline logic, and Python scripts via natural language. Preview AI-suggested changes in a diff view before applying — insertions and deletions highlighted. Fix failed queries with one click. Works directly in Snowsight Workspaces. Docs →
๐Ÿ”„
Pipeline Migration & Legacy Code
Translate legacy SQL dialects (Oracle, Teradata, Hive) into optimized Snowflake SQL. Evaluates your real-time schema, constraints, and data types to generate correct, production-ready migration output — not just syntactic translation.
๐Ÿ›
Runtime Error Tracing
When a pipeline fails, Cortex Code maps the runtime error directly to the query execution graph. It pinpoints the exact node or join condition that failed and recommends a syntactically correct, optimized fix grounded in your actual schema.
๐Ÿ”
Catalog-Aware Data Discovery
Search database objects, tables, and columns using plain language without knowing exact names. Pulls documentation answers from the Horizon Catalog — including tags, masking policies, lineage context, and the official Snowflake docs. Docs →
⚙️
dbt, Airflow & Cross-Stack Pipelines
Identify data across dbt, Apache Airflow, Postgres, Spark, and AWS Glue in natural conversations. Generates fully executable ML pipelines ready to run in Snowflake Notebooks. Creates Cortex Analyst semantic models without leaving Snowsight.
๐Ÿ’ป
CLI for Local Dev & CI/CD
Cortex Code CLI runs in your terminal and integrates with VS Code and Cursor. Enables end-to-end projects including dbt pipelines, agent deployment, and pipeline orchestration while remaining Snowflake-aware — bridging local dev with cloud governance.

๐Ÿ“– Official Reference Docs

Cortex Code overview: Full capabilities, RBAC integration, and supported workflow types. docs.snowflake.com → Cortex Code

Cortex Code in Snowsight (Workspaces): In-platform agentic coding — code generation, diff views, inline suggestions, schema search, and Marketplace discovery. docs.snowflake.com → Cortex Code in Snowsight

GA announcement (March 9, 2026): docs.snowflake.com → Release Note

Snowflake CoCo product page: snowflake.com → Snowflake CoCo

Iceberg Cross-Cloud
Auto-Fulfillment & Catalog-Linked
Databases

Snowflake's Iceberg integration crossed several major milestones in the past 12 months: write support for externally managed Iceberg tables and Catalog-Linked Databases (CLD) reached General Availability in October 2025, Cross-Cloud Auto-Fulfillment for individual Iceberg tables reached GA in early 2026, and Google Cloud BigLake Metastore catalog integration went GA on June 2, 2026. Full ADLS Gen2 compatibility and expanded CLD capabilities shipped at Summit 2026.

The Old Way (Technical Pain Point)

Managing a hybrid lakehouse required building and maintaining custom synchronization pipelines to copy physical Parquet files from external storage to Snowflake internal stages — leading to high data duplication and expensive cross-region cloud egress fees. Syncing from external metastores (AWS Glue or Hive) meant running frequent, fragile catalog refresh jobs that opened the door to data drift and metadata corruption when external systems modified the underlying storage.

Catalog-Linked Databases (CLD) — Technical Depth

A CLD is a Snowflake database connected to an external Iceberg REST catalog (AWS Glue, Snowflake Open Catalog, Google BigLake Metastore, Databricks Unity Catalog). Snowflake automatically syncs with the external catalog to detect namespaces and Iceberg tables, and registers them — no manual table creation required. You get full DML: INSERT, UPDATE, DELETE, MERGE on externally managed tables, with new tables appearing in your external catalog auto-discovered and automatically included.

-- Step 1: Configure the external volume (ADLS example) CREATE EXTERNAL VOLUME iceberg_adls_vol STORAGE_LOCATIONS = (( NAME = 'azure-datalake' STORAGE_PROVIDER = 'AZURE' STORAGE_BASE_URL = 'azure://mydatalake.blob.core.windows.net/iceberg/' AZURE_TENANT_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' )); -- Step 2: Catalog integration with AWS Glue CREATE CATALOG INTEGRATION glue_irc_int CATALOG_SOURCE = GLUE CATALOG_NAMESPACE = 'iceberg_db' TABLE_FORMAT = ICEBERG GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789:role/sf-glue-role' GLUE_CATALOG_ID = '123456789' GLUE_REGION = 'us-east-1' ENABLED = TRUE; -- Step 3: Create Catalog-Linked Database — auto-discovers all tables CREATE DATABASE iceberg_linked_db LINKED_CATALOG = ( CATALOG = 'glue_irc_int' NAMESPACE_MODE = FLATTEN_NESTED_NAMESPACE NAMESPACE_FLATTEN_DELIMITER = '-' ); -- Step 4: Query immediately — zero data movement SELECT customer_id, SUM(amount) AS total FROM iceberg_linked_db."iceberg_db"."transactions" WHERE event_date >= '2026-01-01' GROUP BY 1; -- Time travel on externally managed Iceberg tables SELECT * FROM iceberg_linked_db."iceberg_db"."events" AT (TIMESTAMP => '2026-06-01 00:00:00'::TIMESTAMP_NTZ);

What This Unlocks for Data Engineers

๐Ÿ”—
Zero-Copy Cross-Cloud Interop
Your data stays in your S3 or Azure bucket. Snowflake links to the metadata pointer (.metadata.json) and auto-syncs. When a Spark job adds a new column or a Flink stream appends rows, it shows up in Snowflake automatically — zero bytes moved.
๐ŸŒ
Cross-Cloud Auto-Fulfillment
Share your entire CLD as a single object across Snowflake accounts and regions. New tables appearing in the external catalog are auto-included. Governance policies (column masking, row-access, RBAC tags) apply across the board via Snowflake Horizon.
๐Ÿ›ก️
Horizon Governance on Open Storage
Column-Level Security, Row-Access Policies, and Role-Based Access Control from Snowflake Horizon apply to external Iceberg tables as if they were native Snowflake objects — without converting a single Parquet file to a proprietary format.
ADLS Gen2 Full Compatibility
Full Azure Data Lake Storage Gen2 support means organizations running their primary lakehouse on Azure can now use Iceberg CLD without workarounds. Default metadata formats and catalog-linked databases span AWS, GCS, and Azure in a single object.

๐Ÿ“– Official Reference Docs

Catalog-Linked Databases (CLD) user guide: Full setup steps, auto-discovery, DML support, and billing model. docs.snowflake.com → Catalog-linked databases

GA release note — Write support + CLD (Oct 17, 2025): Official GA announcement covering DML on externally managed tables and CLD for REST catalogs. docs.snowflake.com → Oct 2025 release note

BigLake Metastore GA (June 2, 2026): Google Cloud catalog integration with workload identity federation. docs.snowflake.com → June 2026 release note

Build Data Lakes with Snowflake + AWS Glue (Developer Guide): End-to-end tutorial using CLD + Cortex Code CLI. snowflake.com → AWS Glue + Iceberg tutorial

CLD Engineering Blog (June 2026 update): Deep-dive on freshness, speed, and cross-region sharing improvements. snowflake.com → CLD Engineering Blog

Feature Scorecard:
Old Way vs. Snowflake Way

Feature Release (2026) Legacy Pain Point Snowflake Solution Key Business Value
Adaptive Refresh May 2026 · Preview Over-allocating compute or failures on massive full-table overwrites Platform heuristics auto-select incremental vs. full per cycle Lower compute spend, automated reliability
Custom Incrementalization May 2026 · Preview SELECT-based DTs couldn't express soft-deletes, stream-static joins, or stateful aggregation Write own MERGE/INSERT; Snowflake handles scheduling + retries Eliminate Airflow/Prefect boilerplate for complex transform patterns
DT Fast Refresh Performance June 2026 · GA QUALIFY/RANK/SCD ops caused full shuffles, remote disk spilling Vectorized execution on impacted micro-partitions only — up to 2.8× Reduced warehouse runtime costs, predictable scaling
Cortex Code / CoCo Mar 2026 · GA Hours writing boilerplate, manual legacy migration, fragmented log debugging Conversational AI generates, migrates, traces and fixes production pipelines Faster time-to-market for new data models, reduced L1 engineering overhead
Iceberg CLD + Cross-Cloud Auto-Fulfillment Oct 2025 / Jun 2026 GA High egress costs, manual sync, fragile catalog refresh for external lakehouses Link external catalog as single object; live sync, zero data movement Eliminated vendor lock-in, zero egress fees, instant multi-region sharing
ADLS Gen2 / BigLake GA June 2026 · GA Azure and GCP lakehouses required manual ingestion into Snowflake stages Native Iceberg CLD on Azure ADLS Gen2 and Google BigLake Metastore Full multi-cloud open lakehouse without data movement

What to Build Next
With These Features

01
PoC: Replace an Airflow DAG with ADAPTIVE Dynamic Table
Identify an existing high-compute Airflow or dbt model that alternates between incremental and full rebuilds. Recreate it as a Dynamic Table with REFRESH_MODE = ADAPTIVE and measure cost and latency variance over a two-week period.
02
Migrate Stream + Task to CUSTOM_INCREMENTAL
Any pipeline currently using Snowflake Streams and Tasks for soft-delete or conditional merge logic is a candidate. The CUSTOM_INCREMENTAL mode lets you express the same MERGE INTO logic declaratively, with automatic scheduling and dependency tracking built in.
03
Audit ADLS Storage for Iceberg CLD Migration
Review the organization's ADLS or S3 storage footprint for Parquet datasets currently ingested into Snowflake via COPY INTO. Each one is a candidate to be exposed as a Catalog-Linked Iceberg table — removing the ingestion job entirely and eliminating egress costs.
04
Run Cortex Code on a Legacy SQL Migration
Pick one complex stored procedure or a legacy HQL/Oracle script and run it through Cortex Code in Snowsight Workspaces. Benchmark time-to-working-query vs. manual migration. Use the diff view to validate the AI's output before applying — it previews insertions and deletions explicitly.

Key Dates in the
2025–2026 Rollout

OCT 2025
CLD + Iceberg write support reach GA. Full DML on externally managed Iceberg tables and catalog-linked databases for REST catalogs (AWS Glue, Open Catalog). Zero-copy interop across catalogs now production-ready.
EARLY 2026
Cross-Cloud Auto-Fulfillment (CCAF) for Iceberg reaches GA. Individual externally managed Iceberg tables can be shared across Snowflake accounts and cloud regions without data movement.
MAR 9, 2026
Cortex Code generally available in Snowsight Workspaces. Agentic coding assistant embedded in the platform — GA for all accounts. CLI also ships GA for local terminal and IDE use.
APR 21, 2026
Snowflake Intelligence + Cortex Code expanded. New capabilities announced including Snowflake CoCo branding, expanded dbt and Airflow pipeline integration, and Brave Search API integration for real-time web context in Cortex agents.
MAY 26, 2026
ADAPTIVE and CUSTOM_INCREMENTAL refresh modes enter Public Preview. Two of the most-requested Dynamic Tables features ship simultaneously via Summit 2026. dbt adapter 1.11.5 adds Dynamic Table materialization support.
JUN 2, 2026
Google Cloud BigLake Metastore catalog integration reaches GA. Workload identity federation allows Snowflake to authenticate to GCP without long-lived service account keys for Iceberg CLD queries.
JUN 2026
Dynamic Table fast refresh performance upgrade reaches GA. Engine-level optimization for Gen2 warehouses ships — up to 2.8× faster for QUALIFY/RANK/SCD operations. ADLS Gen2 full compatibility for Iceberg tables confirmed GA at Summit.

What "Smart Pipelines"
Actually Means for Data Engineers

The common thread across all four feature clusters is the same: Snowflake is absorbing engineering complexity into the platform layer. Refresh strategy, metadata sync, code generation, query optimization, and cross-cloud governance were all problems that required significant engineering effort to solve manually. These features don't just reduce that effort — in many cases they eliminate the category of problem entirely.

ADAPTIVE refresh means an engineer no longer needs to maintain a metadata table tracking the last successful watermark and branch their DAG between incremental and full rebuild modes. CUSTOM_INCREMENTAL means stream-static join pipelines no longer require Streams + Tasks + stored procedure boilerplate. CLD means the Glue-to-Snowflake sync job can be deleted. Cortex Code means the first draft of that SCD Type-2 query doesn't require three Stack Overflow tabs.

The right response to this release cycle isn't to immediately replatform everything. It's to audit the highest-friction, highest-cost pipelines in the current stack and ask which of these features would eliminate the friction outright — not just reduce it.

❄️ The Smart Pipeline Thesis

Snowflake's 2026 bet is that the best data infrastructure is the kind engineers stop thinking about. ADAPTIVE, CUSTOM_INCREMENTAL, Cortex Code, and Iceberg CLD are four answers to the same question: what if the platform handled that?

Sources & Official Documentation

  1. Snowflake Docs. Custom incrementalization. docs.snowflake.com
  2. Snowflake Docs. Dynamic table refresh modes. docs.snowflake.com
  3. Snowflake Docs. CREATE DYNAMIC TABLE. docs.snowflake.com
  4. Snowflake Docs. Optimize queries for incremental refresh. docs.snowflake.com
  5. Snowflake Blog. (May 2026). What's New with Dynamic Tables — Faster Refresh. snowflake.com
  6. Snowflake Blog. (June 2026). Data Engineering in the AI Era: New Snowflake Tools Built for Smart Pipelines. snowflake.com
  7. Snowflake Docs. Cortex Code. docs.snowflake.com
  8. Snowflake Docs. Cortex Code in Snowsight. docs.snowflake.com
  9. Snowflake Docs. Release Note: Cortex Code in Snowsight — GA (Mar 9, 2026). docs.snowflake.com
  10. Snowflake Product. Snowflake CoCo. snowflake.com
  11. Snowflake Docs. Use a catalog-linked database for Apache Iceberg™ tables. docs.snowflake.com
  12. Snowflake Docs. Release Note: Write support + CLD GA (Oct 17, 2025). docs.snowflake.com
  13. Snowflake Docs. Release Note: BigLake Metastore catalog integration GA (Jun 2, 2026). docs.snowflake.com
  14. Snowflake Engineering Blog. (June 2026). Catalog-Linked Database (CLD) in Snowflake: New Features. snowflake.com
  15. Snowflake Engineering Blog. (March 2026). Stop Moving Data: Automate Your Open Lakehouse with Cortex Code CLI. snowflake.com

Comments