Dynamic Tables In Snowflake

Soumak Das
9 min readJul 20, 2023

Introduction

Dynamic tables are a revolutionary feature introduced during the Snowflake Summit 2023 that serve as the fundamental building blocks for declarative data transformation pipelines. They offer a streamlined and automated approach to data engineering in Snowflake, enabling efficient and cost-effective data transformation for consumption. By utilizing dynamic tables, you can define the desired end state of your data transformation, allowing Snowflake to handle the complex pipeline management on your behalf.

One of the remarkable benefits of dynamic tables is their ability to significantly reduce the amount of code required compared to using a combination of streams and tasks in Snowflake. This reduction in code complexity is a game-changer for data engineers.

How Dynamic Tables Work

To understand how dynamic tables work, let’s delve into their mechanics. When you create a dynamic table, you specify a query that will transform the data sourced from one or multiple base or dynamic tables. Snowflake’s automated refresh process regularly executes this query and updates the dynamic table with any changes made to the base tables.

During the refresh process, Snowflake’s compute resources merge the changes made to the base tables into the dynamic table. This process ensures that the dynamic table remains up to date with the latest changes in the base tables. The compute resources utilized for this task are associated with the warehouse assigned to the dynamic table.

When creating a dynamic table, you have the option to define a target “freshness” for the data, also known as target lag. This target lag represents the maximum allowed time difference between the dynamic table’s content and the updates in the base tables. For instance, if you specify a target lag of 5 minutes, Snowflake will ensure that the data in the dynamic table is no more than 5 minutes behind the updates in the base tables.

Alternatively, you can set a longer target lag if real-time freshness is not required, which can help reduce costs. For instance, if the data in the dynamic table only needs to be at most 1 hour behind the updates in the base tables, you can specify a target lag of 1 hour instead of 5 minutes.

Syntax

CREATE [ OR REPLACE ] DYNAMIC TABLE <name>

TARGET_LAG = { ‘<num> { seconds | minutes | hours | days }’ | DOWNSTREAM }

WAREHOUSE = <warehouse_name>

AS <query>

TARGET_LAG = { num { seconds | minutes | hours | days } | DOWNSTREAM }

Specifies the lag for the dynamic table: ‘num seconds | minutes | hours | days’

Specifies the maximum amount of time that the dynamic table’s content should lag behind updates to the base tables.

Example: If the data in the dynamic table should lag by no more than 5 minutes, specify 5 minutes.If the data in the dynamic table should lag by no more than 5 hours, specify 5 hours.

The minimum value is 1 minute. If the dynamic table A depends on another dynamic table B, the minimum lag for A must be greater than or equal to the lag for B.

DOWNSTREAM

Specifies that the dynamic table should be refreshed only when dynamic tables which depend on it are refreshed.

WAREHOUSE = warehouse_name

Specifies the name of the warehouse that provides the compute resources for refreshing the dynamic table.

You must have the USAGE privilege on this warehouse in order to create the dynamic table.

AS query : Specifies the query whose results the dynamic table should contain.

Let’s take an example from the below to understand how so many lines of code can be saved in snowflake by using dynamic table and its refresh mechanism.

You can set up a dynamic table to query other dynamic tables.

Chaining Together Pipelines of Dynamic Tables

Let’s explore an example to illustrate how dynamic tables can save numerous lines of code in Snowflake by utilizing their refresh mechanism. You can set up a dynamic table to query other dynamic tables, forming a chain of interdependent pipelines. This approach is akin to defining a directed acyclic graph (DAG) of tasks. The dynamic table responsible for updating the aggregate sales table, for example, queries the dimension tables, which, in turn, query the staging table. This efficient chaining of dynamic tables eliminates the need for extensive manual code orchestration and error handling.

Dynamic Table Refresh Types

A dynamic table refresh process happens in one of two ways:

1. Incremental refresh — When possible, the automated refresh process performs an incremental refresh. With incremental refresh, the automated refresh process analyzes the query for the dynamic table and computes the changes to the query results (the changes since the dynamic table was last refreshed). The refresh process then merges those changes into the dynamic table.

2. Full refresh — If the automated process is unable to determine how to perform an incremental refresh, the process performs a full refresh. With full refresh, the automated refresh process performs the query for the dynamic table and materializes the results, completely replacing the current materialized results of the dynamic table.

The type of refresh performed depends on the constructs used in the query. While dynamic tables support a wide range of expressions, keywords, and clauses, certain constructs, such as lateral joins, subqueries outside of FROM clauses, and volatile user-defined functions, are not currently supported for incremental refreshes. If the query includes these unsupported constructs, Snowflake resorts to a full refresh.

When one dynamic table depends on another, the automated refresh process ensures data consistency across the tables. By specifying the lag or target lag for each dynamic table, the process determines an optimized refresh schedule that meets the target lag times. Less frequent refreshes align with more frequent refreshes to maintain data consistency.

Understanding Dynamic Table Lag

Dynamic table refresh is triggered based on how out of date data might be, or what is commonly referred to as lag or target lag.

Lag is specified in one of two ways:

· Measure of freshness — Specified using the TARGET_LAG = { num { seconds | ... | days } parameter when altering or originally defining a dynamic table. Specifying TARGET_LAG in this way defines the maximum amount of time that the dynamic table’s content should lag behind updates to the base tables.

· DOWNSTREAM — Specifies that the dynamic table should be refreshed on demand when other dynamic tables depending on it need to refresh. Updates are inferred from up-stream database objects. Downstream dynamic tables are only updated when required by upstream consumers.

Consider the following example where dynamic table 2 (DT2) is defined based on dynamic table 1(DT1). Dynamic table 2 must read from dynamic table 1 to materialize its contents. In addition, a report consumes DT2 data via a query.

The following results are possible, depending on how each dynamic table specifies its lag:

Types of Queries That Support Incremental Refreshes

The following table describes the expressions, keywords, and clauses that currently support incremental refresh.

Note

If the query uses expressions that are not supported for incremental refresh, the automated refresh process uses a full refresh instead. To determine which refresh mode is used, see Determining if an Incremental or Full Refresh is Used.

Note the following:

· This list may change over time.

· This list does not include the expressions, keywords, and clauses that are currently not supported in dynamic tables at all. For that list, see Query Constructs Not Currently Supported in Dynamic Tables.

Keyword / Clause

Support for Incremental Refreshes

WITH

Dynamic tables support common table expressions (CTE).

Expressions in SELECT

Dynamic tables support expressions containing built-in immutable and deterministic (IMMUTABLE) user-defined functions.

FROM

Dynamic tables can select data from tables, views, and other dynamic tables.

OVER

Dynamic tables support Window functions.

WHERE

Dynamic tables support filters with the same expressions that are valid in SELECT.

JOIN (and other expressions for joining tables)

Dynamic tables currently only support inner joins, outer joins, and cross joins for incremental refreshes.

Note

Existing dynamic tables that use outer joins might be set up to use full refreshes. To use incremental refreshes, you must recreate these tables.

To determine if you need to recreate the dynamic table, use one of the following procedures to check if the dynamic table uses a full refresh:

· Run the SHOW DYNAMIC TABLES command, and check if the REFRESH_MODE column for the table is FULL.

· Open Snowsight, go to the details page for the dynamic table, and check if Full Refresh is displayed at the top of the page.

You can specify any number of tables in the join.

Updates to all tables in the join are reflected in the results on the query.

UNION ALL

Dynamic tables support UNION ALL.

GROUP BY

Dynamic tables support GROUP BY.

Important

Replacing an immutable UDF while it is in-use by an incremental refresh dynamic table will result in undefined behavior in the dynamic table using the UDF.

Currently, the following constructs and types of queries are not supported with incremental refreshes. If you specify these in the query, the automated refresh process uses a full refresh (which can result in a higher cost) to update the table.

· LATERAL joins

· Subqueries outside of FROM clauses (e.g. WHERE EXISTS)

· VOLATILE user-defined functions

How Data is Refreshed When Dynamic Tables Depend on Other Dynamic Tables

For example, if dynamic table A has a target lag of 2 minutes and queries dynamic table B with a target lag of 1 minute, Snowflake may schedule refreshes every 96 seconds for A and every 48 seconds for B. As a result, the refresh schedule ensures that both tables are refreshed appropriately, ensuring consistent data snapshots.

Specific Point in Time

Dynamic Tables Refreshed

2022–12–01 00:00:00

A, B

2022–12–01 00:00:48

B

2022–12–01 00:01:36

A, B

2022–12–01 00:02:24

B

This means that at any given time, when you query a set of dynamic tables that depend on each other, you are querying the same “snapshot” of the data across these tables.

It’s important to note that the target lag of a dynamic table cannot be shorter than the lag of the dynamic tables it depends on. The process sets up a refresh schedule that adheres to this constraint, refreshing the dependent tables in a manner that guarantees snapshot isolation and consistency.

· A dynamic table A queries the dynamic tables B and C.

· Dynamic table B has a target lag of 5 minutes.

· Dynamic table C has a target lag of 1 minute.

This means that the target lag time for A must not be shorter than 5 minutes (i.e. not shorter than the longer of the lag times for B and C).

If you set the lag for A to 5 minutes, the process sets up a refresh schedule that aims to achieve the following:

· Refresh C often enough to keep its lag below 1 minute.

· Refresh A and B together, often enough to keep their lags below 5 minutes.

· Ensure that the refresh for A and B coincides with a refresh of C to ensure snapshot isolation.

Note: If refreshes take too long, the scheduler may skip refreshes to try to stay up to date. However, snapshot isolation is preserved.

Conclusion

The dynamic tables are a game-changing feature in Snowflake that simplify and automate data transformation pipelines. They eliminate the need for complex manual orchestration, reduce the lines of code required, and ensure data freshness with their efficient refresh mechanisms. By leveraging dynamic tables, data engineers can focus on the desired end state of their data transformations, leaving the intricate pipeline management to Snowflake’s capable hands.

--

--

Soumak Das

Sr. Data Engineer @EY & Snowflake/Airflow/Databricks/AWS writer