Snowflake Key Concepts
Snowflake is
a cloud-based data warehouse that offers scalable and secure storage and
processing of structured and semi-structured data. Snowflake uses a
shared-nothing architecture for parallel processing and offers native support
for SQL as well as other programming languages such as Python, R, and Java
Snowflake’s
unique architecture consists of three key layers:
- -
Database
Storage
- -
Query
Processing
- -
Cloud
Services
In
Snowflake, they have done the decoupling of storage and compute.
Virtual
Warehouse:
- -
A
compute resource in Snowflake that processes queries and performs data loading
and unloading.
- -
It
can be independently scaled up or down based on demand.
- -
You
can resume and suspend very easily.
Micro-Partition:
- -
A
storage unit in Snowflake that contains a subset of the data in a table.
- -
Micro-partitions
are automatically optimized for efficient querying.
- -
Within
each micro-partition, data is stored in a columnar data structure, allowing
better compression and efficient access only to those columns required by a query.
Time
Travel:
- -
A
feature in Snowflake that allows users to query historical data at specific
points in time or within a specific time range.
Data
Sharing:
- -
Snowflake’s
Secure Data Sharing feature allows you to share objects (such as tables) from a
database in your account with another Snowflake account without having to
duplicate the data and without the need to copy or transfer the data.
Restoring:
- -
It
provides you the facility to restore with simple SQL Commands like UNDROP
TABLE.
Multi-cluster:
- -
Concurrency
is no longer a problem for Snowflake, unlike traditional data warehouses with
concurrency issues where users and processes must compete for resources.
Because of Snowflake’s multi-cluster architecture, concurrency is not an issue
anymore.
Caching
Results:
- -
To
help speed up your queries and reduce costs, the Snowflake architecture
includes caching at various levels. When a query is run, for example, Snowflake
keeps the results of the query for 24 hours. So, if the same query is run again
by the same user or another account user, the results are already available to
be returned, assuming the underlying data hasn’t changed. This is especially
useful for analysis work, as it eliminates the need to rerun complex queries to
access previous data or compare the results of complex queries before and after
a change.
Multi-table
INSERT:
- -
Snowflake
allows Multi-table INSERT and threads are executed in parallel.
QUALIFY:
- -
In
a SELECT statement, the QUALIFY clause filters the results of window functions.
QUALIFY does with window functions what HAVING does with aggregate functions
and GROUP BY clauses.
Pricing:
- -
You
can pay for actual consumption only.
Very good insights from this blog
ReplyDeleteThank you very much.
Delete