Snowpro Advanced Architect: Data Sharing

Image by OpenClipart-Vectors from Pixabay

Domain Objectives

  • Data sharing
    • Capabilities of data sharing
  • Shares
    • Create and configure shares
    • Privileges required to work with shares
  • Secure views

Sometimes an organisation or a department within an organisation might want to give outsiders access to their data. While the idea might sound nice, it’s often fraught with risk. You don’t really want to be giving outsiders their own logins to your database, and it might not be practically possible to do so (your internal security might be so tight that allowing people from external organisations just isn’t doable). Alternatively, you can migrate your data towards these outsiders, but that requires extensive effort and processing time.

What Snowflake does is it offers a solution that requires neither external access to your database, nor does it require any sort of data migration. This solution is called a data share.

Now, I’ve already written a practical guide to creating data shares here which you can consult for SQL syntax (even though that’s not part of the exam spec) and for a general sense of how you would go about creating and establishing shares. The blog also touches a lot on the requisite privileges required to work with shares.

A share allows read-only access to database objects in your account. An external account or a reader account will gain access to the share and can perform queries against it. Instead of migrating the data to the user’s database, they query your data itself. Part of the magic of Snowflake is that by decoupling storage and compute, this sort of thing becomes possible without impacting the performance of your queries.

How Data Sharing Works

Snowflake Data Sharing enables the read-only sharing of the following database objects in your account:

  • Tables
  • External tables
  • Secure views
  • Secure materialised views
  • Secure UDFs

As mentioned above, the share is simply ‘pointing’ to your data. No data actually gets moved.

Image Credit: Snowflake

A share consists of 3 things:

  • The privileges that grant access to the database(s) and the schema containing the objects to share.
  • The privileges that grant access to the specific objects in the database.
  • The consumer accounts with which the database and its objects are shared.

Image Credit: Snowflake

Every share is a transaction, of sorts, and in every transaction there are two parties. Snowflake has some specific terminology for these two parties:

Providers A data provider is any Snowflake account that creates shares and makes them available to other Snowflake accounts to consume.Providers can specify grants to each database object to provider granular access-level control. Snowflake does not impose any limits on the amount of objects you can share.

Consumers A data consumer is any account that chooses to create a database from a share made available by a data provider. Once a shared object is created it can then be queried and used by the consumer.Snowflake doesn’t put a hard limit on the amount of objects that can be shared with you, however, you can only create one database per share.

Now, a lot of consumers will be Snowflake customers who have their own accounts with databases, schemas and tables of their own that they manage themselves. However, you may want to share data with people who don’t have Snowflake themselves. Instead of jealously insisting that they cough up and pay, Snowflake allows you to create reader accounts. Reader accounts are read-only accounts (hence the name) that a Snowflake user can give access to shares. Note that these accounts can only read data from the provider account that created it.

Image Credit: Snowflake

Share Types

There are three types of shares in Snowflake:

Direct Shares The simplest form of data sharing, enables account-to-account sharing of data.

Data Marketplace Available to all Snowflake accounts hosted on non-VPS regions. Connects providers of data with consumers using secure data sharing.You can discover and access a variety of third-party data and have those datasets available in your Snowflake account and join it with your own data. You can also monetise your data by selling it in the marketplace.

Data Exchange Creates a data hub for securely collaborating between a select group of members that you invite. A use case could be letting your suppliers see stock so they can check everything they have sent you has arrived.

Share Privileges

Data sharing is two-sided and so there are two main permissions that serve in the administration of shares:

  • CREATE SHARE - which, in the provider account enables the creation and management of shares.

  • IMPORT SHARE - which, in the consumer account enables those who possess this privilege to view inbound shares shared with the account and create databases from them.

By default, both of these belong only to the ACCOUNTADMIN role, but they can be distributed to other roles and users.

Remember that for the CREATE SHARE privilege to work, the role used to perform the granting and revoking of privileges to a share must have OWNERSHIP or USAGE/SELECT WITH GRANT OPTION on:

  • Database
  • Tables
  • External tables
  • Secure views
  • Secure materialized views
  • Secure UDFs That are intended to be used by the share. For example, if you want to include demo_db.example_schema.fake_table in a share then the role of the person adding those items to the share must have OWNERSHIP or USAGE/SELECT WITH GRANT OPTION of demo_db, example_schema and fake_table.

By default, only an ACCOUNTADMIN can manage a Data Exchange, IMPORTED PRIVILEGES can be granted to other roles.

Secure Views

It might seem odd at first that the Snowflake certification team have put secure views in the ‘data sharing’ section – surely that exclusively relates to data shares? Well, first of all, I’d argue that dichotomising too much might be a mistake. Secure views are not only one of the database objects you can include in a share, but, I would say, are probably the best use case for shares. Secondly, they are – in a sense – a way of making restricted data more accessible, they take what cannot be shared and create a middle ground from where it can be shared.

This all probably sounds quite abstract if you’re not familiar with the concept of a secure view. A secure view is essentially a way of creating a view without exposing the underlying data and view definition to unauthorised users.

Sharing a simple, bog-standard SQL view can be dangerous for data security reasons:

  • User-defined functions, or other programmatic methods, can expose the underlying data by taking advantage of internal optimisations used to create these views. Snowflake does not use these optimisations to create secure views.
  • The query expression one uses to create the views is visible to users who have access to the share. Snowflake restricts this view definition is only visible to authorised users.

For the sake of adding extra levels of dynamic security, secure views can also use CURRENT_ROLE and CURRENT_USER to add row and column level security. Wowzer these views are looking very secure now!

Tightening the Security of Secure Views

  • Commonly, when one is creating surrogate keys people use a sequence or auto-increment column.E.g. “2017-01-07 15:22:14.810 -0700”. This could help a user guess details of the underlying data distribution. Therefore, you should either:
    • Hide sequence-generated columns as part of the view
    • Use randomised identifiers
    • Programmatically obfuscate the identifiers
  • Snowflake does not expose the amount of data scanned when querying a secure view.

  • Time-based assumptions can be made, however, and so if this data must be protected, it is advised to create separate views for separate users.

  • When using secure views with Sharing Data Securely in Snowflake, use the CURRENT_ACCOUNT function to authorize users from a specific account to access rows in a base table.

Further Reading

Sharing Data Securely

Working with Secure Views