Snowflake Architecture and Best Practices
Snowflake Unique architecture consist of Three layers:
1. Database Storage
2. Query Processing
3. Cloud Services
Database Storage:
When data is loaded into SF, It reorganizes that data into its internal optimized, compressed, columnar format. SF stores this optimized data in cloud storage.
Query Processing:
Query execution is performed in the processing layer. Snowflake processes queries using VW aka Virtual Warehouse
Cloud Services:
Cloud services layer is a collection of services that coordinate activities across SF.
Services Managed in this layer:
Authentication, Infrastructure management, Metadata management, Query parsing and optimization, Access control.
Cloud Services:
Cloud services layer is a collection of services that coordinate activities across SF.
Services Managed in this layer:
Authentication, Infrastructure management, Metadata management, Query parsing and optimization, Access control.
Data loading recommendations (Files need to be Split on Snowflake):
•Loading a large single file will make only one node at action and other nodes are ignored even if we have larger warehouse.
•Have a separate large warehouse to support ingesting large files, this provides full horsepower for data loading. We can turnoff warehouse after data loading.
•Follow same practice for data unloading as well.Data querying recommendations (Dedicated warehouse for Querying):
• Snowflake automatically caches data in the Virtual Warehouse (local disk cache), so place users querying the same data on the same virtual warehouse. This maximizes the chances that data retrieved to the cache by one user will also be used by others.
• Suspending warehouse will erase this cache.
• Snowflake Query Profile feature helps us to analyze queries being run from BI tools as well.
• Consider scaling out the warehouse which will be using for BI Analytics to cater concurrent users.
Design Recommendations:
•Storing Semi-structured Data in a VARIANT Column:
Load the data set into a VARIANT column in a table. Use the FLATTEN function to extract the objects and keys you plan to query into a separate table.
•Date/Time Data Types for Columns:
When defining columns to contain dates or timestamps, choose a date or timestamp data type rather than a character data type.
SF stores DATE and TIMESTAMP data more efficiently than VARCHAR.
•Set a Clustering Key for larger datasets:
Clustering key is not necessary for most tables, SF performs automatic tuning via the optimization engine and micro-partitioning.
if Query Profile indicates that a significant percentage of the total duration time is spent scanning set a cluster key for larger datasets.