Unlocking the Power of Data with Snowflake Schema Detection and Schema Evolution

Soumak Das
4 min readSep 18, 2023

At Snowflake, we understand the importance of simplifying the process of working with data. One of Snowflake’s highly compelling features is its native support for semi-structured data, offering flexibility and efficiency to data professionals. Whether you prefer schema-on-write or schema-on-read, Snowflake gives you the freedom to choose the best approach for your data needs.

In this blog post, we’ll dive into how Snowflake has enhanced its schema-on-write capabilities to make loading files into structured data tables a breeze. We introduced these features during our journey from Summit 2021 to Summit 2023, with the goal of simplifying schematization while maintaining the high performance Snowflake is known for.

Schema Detection: A Glimpse into Your Data

Data files come in various shapes and sizes, generated by different teams. Data professionals often find themselves manually inspecting files to understand their structure, a process that can be time-consuming and error-prone.

Enter Snowflake’s Schema Detection, the first step in understanding your data and building data pipelines. Initially available for binary file formats like Avro, ORC, and Parquet, Schema Detection has now expanded its support to CSV and JSON in public preview.

With Schema Detection, Snowflake automates the process of determining and returning the schema of your staged files. You can review the detected schema, modify it if necessary, and use it to create tables, external tables, or views. Snowflake also provides the option to automatically create tables or external tables based on the detected schema. All of this is made possible by our INFER_SCHEMA function, which retrieves and returns the schema from a set of staged files.

Here’s how it works:

  1. INFER_SCHEMA retrieves the schema from your staged files.
  2. GENERATE_COLUMN_DESCRIPTION returns the column names and data types necessary to create tables, external tables, or views manually.
  3. CREATE TABLE … USING TEMPLATE leverages the detected schema to automatically create a structured table.

Schema Detection in Action

To illustrate the power of Schema Detection, let’s consider a scenario where you have a CSV file, and you want to determine its…

--

--

Soumak Das

Sr. Data Engineer @EY & Snowflake/Airflow/Databricks/AWS writer