Snowflake advanced interview question — part 2

Soumak Das
8 min readAug 26, 2024

--

  1. How can you extract the fourth character from each value in a specific column that contains city names such as Chennai, Kolkata, and Mumbai?
    — Answer: You can use the `SUBSTR` function in Snowflake to extract the fourth character. For example:
    SELECT SUBSTR(col_a, 4, 1) AS fourth_char FROM your_table;
  2. What is the process to recover accidentally deleted records from a Snowflake table?
    — Answer: You can recover deleted records using Time Travel in Snowflake. For instance, if you want to restore the table to its state 5 minutes ago:
    ```sql
    CREATE TABLE your_table_restored AS
    SELECT *
    FROM your_table AT (OFFSET => -5 * 60);
    ```
  3. Explain how to create a table in Snowflake and load data from an external file.
    — Answer:
    — Creating the table:
    ```sql
    CREATE TABLE your_table (
    column1 DATA_TYPE,
    column2 DATA_TYPE,

    );
    ```
    — Loading data:
    ```sql
    COPY INTO your_table
    FROM @your_stage
    FILE_FORMAT = (FORMAT_NAME = ‘your_file_format’);
    ```
  4. What are the benefits of using SnowSQL in Snowflake, and when is it most useful?
    — Answer: SnowSQL is a command-line tool ideal for automating tasks, bulk loading/unloading data, and scripting. It’s beneficial for batch processing and managing Snowflake environments programmatically.
    5. What should be considered when loading files into Snowflake using Snowpipe? Is there a file size limit?
    — Answer: While Snowpipe can handle large files, splitting files into chunks of around 100–250 MB is recommended for better performance. Snowpipe can handle larger files, but processing time may increase with file size.
    6. How can you manage double quotes in file formats when loading data into Snowflake?
    — Answer: You can configure the file format to escape or remove double quotes using the `FIELD_OPTIONALLY_ENCLOSED_BY` parameter in the file format definition:
    ```sql
    CREATE FILE FORMAT my_format
    TYPE = ‘CSV’
    FIELD_OPTIONALLY_ENCLOSED_BY = ‘“‘;
    ```
    7. Is it possible to perform DML operations on a clone table in Snowflake, and how does it affect storage?
    — Answer: Yes, you can perform DML operations on a clone table. Only the changes (i.e., new or modified data) consume additional storage, as clones are metadata pointers to the original data.
    8. Describe the process of database cloning in Snowflake and whether it includes staging data.
    — Answer: Database cloning in Snowflake creates a copy of a database almost instantly without duplicating the data. However, cloning does not include data in external stages or other external objects.
    ```sql
    CREATE DATABASE new_db CLONE existing_db;
    ```
    9. What datatype should be used to handle JSON and XML data in Snowflake?
    — Answer: Use the `VARIANT` datatype to handle semi-structured data like JSON and XML in Snowflake.
    10. What is the maximum size of the VARIANT datatype in Snowflake?
    — Answer: The maximum size of the VARIANT datatype in Snowflake is 16 MB.
    11. How can you optimize a query that uses a clustering key but is still slow due to large data volumes?
    — Answer: You may need to adjust the clustering by redefining the key to better match the query pattern, increasing the compute power, or using materialized views. Additionally, you could review the query execution plan and adjust the workload management settings.
    12. What happens if a file with the same name is reloaded into an S3 bucket linked to Snowflake?
    — Answer: If the file is reloaded without being deleted first, Snowflake’s `COPY` command will skip the file if it detects a duplicate based on the `MD5` hash, preventing duplicate data from being loaded.
    13. If a file is deleted and then reloaded to S3, will the same data get reloaded into Snowflake?
    — Answer: Yes, if the file is deleted and then reloaded, Snowflake will treat it as a new file and load the data again, potentially leading to duplicates unless deduplication measures are taken.
    14. How do you remove duplicates while retaining one unique record in a Snowflake table?
    — Answer: You can use a CTE with the `ROW_NUMBER()` function to identify and delete duplicates:
    ```sql
    WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY unique_column ORDER BY load_time DESC) AS rn
    FROM your_table
    )
    DELETE FROM your_table
    WHERE rn > 1;
    ```
    15. What is the cost implication of cloning in Snowflake?
    — Answer: Cloning is cost-effective because it doesn’t duplicate storage for unchanged data. However, changes made to the clone will incur storage costs.
    16. What are some common errors encountered when loading data into Snowflake, and how can they be resolved?
    — Answer: Common errors include file format mismatches, permission issues, and network interruptions. These can be resolved by checking file formats, ensuring appropriate permissions, and retrying the load operation.
    17. How do you monitor long-running queries or steps in Snowflake?
    — Answer: Use the `QUERY_HISTORY` or `WAREHOUSE_LOAD_HISTORY` views to monitor long-running queries and identify bottlenecks.
    18. What are the limitations of using SQL within Snowflake’s JavaScript stored procedures?
    — Answer: Snowflake JavaScript stored procedures may have limitations like no support for certain SQL features (e.g., window functions), lack of transaction control, and limited error handling.
    19. How can you efficiently load large volumes of on-premise data into Snowflake?
    — Answer: Efficiently loading large data volumes can be achieved using Snowpipe for continuous loading, breaking data into smaller chunks, and utilizing the bulk `COPY INTO` command.
    20. How does Python integration work in Snowflake, and what are its use cases?
    — Answer: Python integration in Snowflake, available via Snowpark, allows you to write Python code for data processing directly in Snowflake. It’s ideal for complex data transformations, machine learning, and advanced analytics.
    21. What is the process for changing a Snowflake warehouse size from small to medium?
    — Answer: You can change the warehouse size using the following command:
    ```SQL
    ALTER WAREHOUSE your_warehouse_name SET WAREHOUSE_SIZE = ‘MEDIUM’;
    ```
    22. What happens to compute resources in Snowflake when they are no longer in use?
    — Answer: Snowflake automatically suspends compute resources when they are idle, helping to optimize cost by only charging for active usage.
    23. What are the restrictions in a Snowflake Reader account?
    — Answer: A Snowflake Reader account has limited capabilities, such as restricted DML operations, inability to create warehouses, and limited administrative functions.
    Sure! Here are the remaining questions with answers:
    24. How do you change the data type of a column in a Snowflake table?
    — Answer: You can change the data type of a column using the `ALTER TABLE` command with the `MODIFY COLUMN` clause. However, Snowflake has limitations, such as not allowing changes to certain data types (e.g., converting `VARCHAR` to `NUMBER` directly). Here’s an example of changing a `VARCHAR` column to a larger size:
    ```SQL
    ALTER TABLE your_table
    MODIFY COLUMN column_name VARCHAR(255);
    ```
    25. What are Snowflake credits, and how are they calculated for each warehouse size?
    — Answer: Snowflake credits are the currency used to pay for computing usage. Credits are calculated based on the size of the virtual warehouse and the time it is running. For example, a `SMALL` warehouse consumes 1 credit per hour, while a `LARGE` warehouse consumes 4 credits per hour.
    26. How do Snowflake credits differ from Data Warehouse credits?
    — Answer: Snowflake credits are a broader term encompassing compute and storage costs. Data Warehouse credits specifically refer to the compute resources consumed by virtual warehouses. These credits are used when you run queries or perform operations in Snowflake.
    27. What are the different types of credits in Snowflake, and what are their purposes?
    — Answer:
    — Compute Credits: Used for running queries, loading data, and other compute-intensive operations.
    — Storage Credits: Used for storing data in Snowflake.
    — Cloud Services Credits: Used for metadata operations, optimizing query performance, and managing Snowflake’s infrastructure.
    28. How can you handle errors encountered when loading files into Snowflake tables?
    — Answer: Errors can be handled using error handling options in the `COPY INTO` command, such as `ON_ERROR`, which can be set to `CONTINUE`, `SKIP_FILE`, `ABORT_STATEMENT`, etc. Example:
    ```SQL
    COPY INTO your_table
    FROM @your_stage
    FILE_FORMAT = (FORMAT_NAME = ‘your_file_format’)
    ON_ERROR = ‘SKIP_FILE’;
    ```
    29. How do you eliminate entire row duplicates in a flat file before loading it into Snowflake?
    — Answer: You can use an intermediate staging table to load the data, followed by a `DELETE` operation using a CTE or the `ROW_NUMBER()` function to remove duplicates. Example:
    ```SQL
    WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY * ORDER BY load_time DESC) AS rn
    FROM staging_table
    )
    DELETE FROM staging_table
    WHERE rn > 1;
    ```
    30. How do you retain one unique record while deleting duplicates from a Snowflake table?
    — Answer: You can use the `ROW_NUMBER()` function to assign a unique rank to each row and then delete the duplicates:
    ```SQL
    WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY unique_column ORDER BY load_time DESC) AS rn
    FROM your_table
    )
    DELETE FROM your_table
    WHERE rn > 1;
    ```
    31. How do you check for long-running steps in SQL within Snowflake?
    — Answer: You can monitor long-running queries using the `QUERY_HISTORY` or `WAREHOUSE_LOAD_HISTORY` views. These views provide insights into query performance and can help identify bottlenecks.
    32. What are the limitations of using SQL within Snowflake’s JavaScript stored procedures?
    — Answer: Limitations include the inability to execute some SQL commands, limited transaction control, and the lack of support for certain SQL features, such as window functions. Error handling in JavaScript is also different from SQL, requiring careful coding practices.
    33. How can you integrate and share data between Unix and Snowflake using SnowSQL on Unix?
    — Answer: You can use SnowSQL on Unix to execute SQL commands, load data, and export data from Snowflake. Integration typically involves setting up secure connections, configuring stages for data sharing, and using the `PUT` and `GET` commands to move data between Unix and Snowflake.
    34. What is the best approach to load a large volume of data from on-premises to Snowflake?
    — Answer: The best approach involves using Snowpipe for continuous data loading or the `COPY INTO` command for batch loading. Data can be transferred to Snowflake via external stages (e.g., Amazon S3) before loading it into Snowflake. Breaking the data into smaller chunks and parallelizing the load can also improve performance.
    35. How do you load data into an external stage in S3 for Snowflake?
    — Answer: To load data into an external stage in S3, you first create the external stage, then upload the files to S3, and finally load the data into Snowflake using the `COPY INTO` command. Example:
    ```SQL
    CREATE OR REPLACE STAGE my_s3_stage
    URL = ‘s3://mybucket’
    CREDENTIALS = (AWS_KEY_ID=’your_aws_key’ AWS_SECRET_KEY=’your_secret_key’);
    COPY INTO your_table
    FROM @my_s3_stage
    FILE_FORMAT = (FORMAT_NAME = ‘your_file_format’);
    ```
    36. How does Python work in Snowflake, and what are its use cases?
    — Answer: Python in Snowflake, facilitated by Snowpark, allows developers to write Python code directly within Snowflake for data processing and analytics. Use cases include complex transformations, data engineering tasks, and integrating machine learning models into your Snowflake workflows.
    37. Does Snowflake support Lambda functions, and if so, how are they used?
    — Answer: Snowflake doesn’t natively support AWS Lambda functions, but you can invoke Lambda functions externally to trigger Snowflake tasks or process Snowflake data by integrating with AWS services such as SNS or S3.
    38. What are the benefits of using an ETL tool with Snowflake?
    — Answer: Using an ETL tool with Snowflake helps streamline data ingestion, transformation, and loading processes. ETL tools can optimize data pipelines, support complex transformations, and provide robust scheduling and error handling mechanisms, making them ideal for managing large-scale data workflows in Snowflake.

--

--

Soumak Das

Sr. Data Engineer @EY & Snowflake/Airflow/Databricks/AWS writer