Snowflake Scenario based Interview Questions

Soumak Das
4 min readJul 13, 2023

--

1. You have observed that a store procedure that is getting executed daily at 7 AM as part of your batch process is consuming resources and the CPU I/O is showing as 90%, and the other jobs which are getting executed are impacted due to the store procedure. How can you quickly resolve the issue with the store procedure?

Ans: Assign a separate warehouse for the SP

2. Some queries are getting executed on a warehouse and you have executed Alter Warehouse statement to resize the warehouse, how this will affect the queries which are already in the execution state.

Ans. No

3. A new business analyst has joined your project, as part of the onboarding process you have sent him some queries to generate some reports, the query took around 5 minutes to get executed, the same query, when executed by other business analysts, has returned the results immediately? What could be the Issue?

Ans. The warehouse size could be different for specific roles assigned to different analysts.

4. A firm has an application which generates streaming data, the data is being moved into snowflake using snowpipe from ADLSg2 containers. Will the order of files loaded into snowflake will always be same? How is data deduplication handled through snowpipe?

Ans. Order doesn’t necessarily remain same; data deduplication is handled by snowflake generating a unique ID for each pipe object and hence a file already processed will not be process again.

5. A developer in a large MNC has set the context of the worksheet while working on a stored procedure. The developer created a temporary table with the same name of a permanent table (used in the SP) in that session.

Now, is it possible to create temporary table with the same name of the permanent table? If so, which table will the SP refer to if the SP is called in this session and why?

Ans. Yes it is possible; SP will refer to the Temp table for the session.

6. A company is spread across various continents; it has opted for snowflake as their ultimate datawarehouse solution. What all things should the company take into consideration for a highly available and scalable solution of their datawarehouse? Which snowflake features should be used to achieve this?

Ans: Snowflake data sharing capabilities, autoscale feature and Data Replication capabilities for HA.

7. A data engineering team has successfully migrated a lot of data into a stage database. The average size of the tables have reached more than 2 tb. The data analytic team is facing performance issues while connecting this database while running complex BI queries. What could be possible issues for query performance and what are the solutions.

Ans: implement clustering keys in medium cardinality columns.

8. A large manufacturing company runs a dozen individual Snowflake accounts across its business divisions. The company wants to increase the level of data sharing to support supply chain optimizations and increase its purchasing leverage with multiple vendors.
The company’s Snowflake Architects need to design a solution that would allow the business divisions to decide what to share, while minimizing the level of effort spent on configuration and management. Most of the company divisions use Snowflake accounts in the same cloud deployments with a few exceptions for European-based divisions.
According to Snowflake recommended best practice, how should these requirements be met?

Ans: Deploy a Private Data Exchange and use replication to allow European data shares in the Exchange.

9. A Data Engineer is designing a near real-time ingestion pipeline for a retail company to ingest event logs into Snowflake to derive insights. A Snowflake Architect is asked to define security best practices to configure access control privileges for the data load for auto-ingest to Snowpipe.
What are the MINIMUM object privileges required for the Snowpipe user to execute Snowpipe?

Ans: OWNERSHIP on the named pipe, USAGE and READ on the named stage, USAGE on the target database and schema, and INSERT end SELECT on the target table

10. A media company needs a data pipeline that will ingest customer review data into a Snowflake table, and apply some transformations. The company also needs to use Amazon Comprehend to do sentiment analysis and make the de-identified final data set available publicly for advertising companies who use different cloud providers in different regions.
The data pipeline needs to run continuously ang efficiently as new records arrive in the object storage leveraging event notifications. Also, the operational complexity, maintenance of the infrastructure, including platform upgrades and security, and the development effort should be minimal.
Which design will meet these requirements?

Ans. Ingest the data using Snowpipe and use streams and tasks to orchestrate transformations. Create an external function to do model inference with Amazon Comprehend and write the final records to a Snowflake table. Then create a listing in the Snowflake Marketplace to make the data available to other companies.

--

--

Soumak Das
Soumak Das

Written by Soumak Das

Sr. Data Engineer @EY & Snowflake/Airflow/Databricks/AWS writer

No responses yet