Snowpro Advanced Architect: Security

Photo by FLY:D 🔶Art Photographer on Unsplash

Domain Objectives

Understand account parameter and the impact of settings
- Explain the access control framework
- Required privileges for tasks, shares, statements to research Role Based Access Control structures
- Role Hierarchy and Privilege Inheritance

Security in Snowflake, I would say, comes in two flavours: macro-security and micro-security. In my mind, it looks like a castle. On the one hand, the castle body is itself secured; it might be built on high ground, be surrounded by a moat, poses watchtowers… these defensive features are used to regulate who can and cannot enter the castle. But beyond that, once you’re inside the castle that doesn’t mean you have free reign to go wherever you want. Each section of the building will have its own access requirements, guards, doors, locks and keys will determine who can go where once they’re inside.

On the macro side (and I stress I’m using my own terminology here, not Snowflake’s) you have the features that defend your entire Snowflake account. One would assume you don’t want the entire world letting themselves in, you may want to restrict access to just members of staff, or a select few clients. Once your staff are inside, you may want to restrict access to databases, schemas and tables according to departments or levels of seniority.

For securing your account on the macro scale, in place of moats, hills and watchtowers, Snowflake gives you:
• Authentication methodology
• Network controls
• Encryption

And on the micro scale, in place of guards, locks and keys, Snowflake gives you:
• Roles
• Access Controls

Authentication

Authentication determines how people/programmes prove that they are allowed access to your Snowflake account. There are several ways of doing this, and the best way of doing it is by combining different methods based on whether users are people or not.

There are 5 ways to authenticate to Snowflake:

  1. Built-in username/password authentication password is stored in the Snowflake USER object and the user authenticates with Snowflake. The USER object is delivered as a string, or the password is typed in by the user. This option is not as secure as alternative options.
  2. Built-in username/password authentication with multi-factor authentication (MFA). multi-factor authentication for security. Note that this option only supports Duo MFA.
  3. SSO powered by SAMLv2 cases.
  4. Key Pair users, such as programmatic access or service accounts.
  5. External OAuth 2.0.

Option 1 isn’t recommended, I mean, who uses just a simple username and password these days? If you want to use username/password authentication, it’s recommended you opt for 2, which offers an extra layer of security. Option 3 is best for humans, while Option 4 is best for non-humans (service accounts, programmatic access etc). Option 5 is the only option that allows for SSO access by programmatic users.

The case Snowflake makes for option 3 is as follows:

“The benefit of configuring Snowflake for federated authentication is users need to log in just once with one set of credentials to get access to all corporate apps, websites, and data for which they have permission. This benefits users in the form of simplicity as they do not have to manage multiple passwords for access to SaaS applications, data, or websites required to perform their job duties.
The unification of user access management means there is a central directory to provision and deprovision users. Configuring SSO for Snowflake with any SAML 2.0 compliant IdP helps customers think of Snowflake as many other SaaS applications that use this common protocol.
SSO for human interactive use cases must consider the capabilities supported by Snowflake in concert with the authentication capabilities of the systems users need to authenticate to Snowflake. This matrix needs to be considered along with the three scenarios described to enable SSO for as many systems as possible.”

Source


And for option 4:

“The benefits of configuring Snowflake for Key Pair Authentication include:
<br>  1) The secret does not travel over the network
<br>  2) The user does not need the private key.
<br>  3) Snowflake allows for the aggressive rotation of key pairs. The outcome of key pair authentication is that it is more secure than username and password.
<br>The result of External OAuth authentication is centralized management of tokens issued to Snowflake, and service accounts or users used exclusively for programmatic access will only ever be able to use Snowflake data when going through the External OAuth configured service. Customers benefit from sessions initiated with Snowflake do not require a password and only initiate their sessions through external OAuth.”

Source

Snowflake supports SCIM 2.0 (An open specification to help facilitate automated management of user identities and groups (roles) in cloud applications using RESTful APIs) to integrate with Okta and Microsoft Azure AD, along with custom identity providers.

“The authentication process uses an OAuth Bearer token and this token is valid for six months. Customers must keep track of their authentication token and can generate a new token on demand. During each API request, the token is passed into the header as an authorization Bearer parameter. After the initial Snowflake SCIM configuration, you can use the Snowflake SCIM API to address the following use cases:
1.	User Lifecycle Management
2.	Map Active Directory Groups to Snowflake Roles”

Source

Networking

OK – you’ve decided who/what can access your Snowflake account, now we need to look at how they can access it. To go back to the castle analogy, are we letting them in through the front door? Or should we insist they paddle their way in by boat? Maybe we want underground tunnels? Equally, in Snowflake we want to make similar decisions, do we want to use private networking, CIDR range allow lists, or just to let Snowflake do it all for us?

Snowflake provides us three ‘connectivity patterns’ (which is what we’ll be calling them from now on):

  1. Leveraging Snowflake’s out-of-the-box network security. All Snowflake communications have multiple layers of built-in security (described below). This is the default, “do-nothing-extra” security option, which is the most common choice among Snowflake customers.
  2. Layering on built-in Network Policies. In addition to out-of-the-box security, security-sensitive organizations typically implement Network Policies to specify which IP addresses can connect to the Snowflake data platform.
  3. Integrating CSP capabilities that may add more security to network connectivity. A smaller number of organizations choose to use cloud service provider features such as private networking if they determine it’s appropriate.

Source

Within all of these, five basic connections are included:

  1. The connection between the Snowflake driver/connector and the Snowflake account URL, e.g. acme.us-east-1.snowflakecomputing.com
  2. The connection between the Snowflake driver/connector and one or more OCSP providers, e.g. ocsp.digicert.com
  3. The connection between the Snowflake driver/connector and the Snowflake Internal Stage, e.g. randomname1stg.blob.core.windows.net
  4. The connection between the Snowflake service and the customer-owned cloud storage e.g. a customer’s GCS bucket
  5. The connection between the users’ browsers and the Snowflake Apps layer e.g. apps.snowflake.com

And on these network paths, there are two types of data flowing:

• The first is the organization’s data (aka customer data), which is the information the organization is interested in protecting.
• The second is OCSP (Online Certificate Status Protocol) information, which is used to validate certificates used to establish TLS 1.2 tunnels for network communications. Only the OCSP traffic uses an unencrypted channel over port 80. There are patterns where your organization may use TLS inspection of some kind, which may make this OCSP communication moot, but those discussions are out of scope for this document.

A few details that could affect architectural considerations:

  1. Network Policies use IP CIDR ranges as inputs, and contain both Allow
  2. One can apply a Network Policy to the entire Snowflake account, to specific integrations that have endpoints for network communications exposed on channel 1 e.g. SCIM, or to specific Snowflake Users. The most specific Policy always wins.
  3. There can be only one active Network Policy in any given context at one time (e.g.only one per account, integration, or user).

Source

Misapplications to avoid:

  1. Any design where a Network Policy is being used for every user is likely on the wrong path based on all evidence as of March 2021.
  2. Many organizations will attempt to apply CSP private networking technologies to many communication channels where it provides little additional security, but does add a lot of operational overhead. Consider CSP private networking only where large volumes of data or extremely sensitive data is flowing.

As of March 2021, SAML-based SSO can only be used on either public URL or private URL (for CSP private networking integration) Snowflake endpoints at one time. This will be addressed in future releases.

Source

Encryption

Just imagine the situation, after a long-planned and carefully executed mission, you’ve managed to sneak into the castle and get right down to the vaults. Here, you’re assured by your nation’s spies, lie your enemy’s secrets. You look at one of the parchments, unroll it and all you see are indecipherable streams of letters and numbers that make absolutely no sense whatsoever. Darn! This is the principle behind encrypting your data in Snowflake. Even if malicious actors somehow manage to get in through the backdoor, they won’t find anything useful.

Snowflake supports two types of encryption:

End-to-End Encryption

Image credit: Snowflake

Here, no but the end users of the data can read the data (making it unreadable to Snowflake) when the data is loaded into an internal stage it is automatically encrypted by the local machine uploading the data – from there, it is converted to Snowflake’s proprietary file format and stored in a cloud storage contained (it is ‘at rest’). All data at rest is encrypted by Snowflake. When unloading data a similar process happens and the data is decrypted on the client-side.

Client-Side Encryption

Image credit: Snowflake

Client-side encryption means that the data is encrypted before it is stored on Snowflake. The data is encrypted with a random encryption key which is then encrypted with a customer master key. (Note that when using an external stage, Snowflake recommends using client-side encryption).

Keys

If any key is compromised, that presents a problem and the whole layer of defence can come crumbling down. For that reason, Snowflake makes some effort to protect those keys. First of all, there’s the Hierarchical Key Model which consists of four levels

Image credit: Snowflake

In this model, every parent key encrypts a child key in a process known as ‘wrapping’, the benefit of this is twofold:

  1. There’s not just one key that can be compromised and unlock everything – even if you got hold of the root key, that wouldn’t help you unlock a file unless you had every child key all the way down the hierarchy
  2. Each key is itself encrypted and so if it’s compromised, it’s useless without the parent key, which is, again, useless without its parent key

Keys are also rotated. After 30 days, account and table master keys are rotated, and once the old keys are no longer needed, they are destroyed. When a key is retired (meaning a new key has been created to replace it, but it’s still needed) it only ever decrypts data, never encrypts it.

Image credit: Snowflake

The final ‘destruction’ of the key only takes place if periodic_data_rekeying is set to true. Otherwise, it remains. Rekeying does not impact Time Travel or Fail-safe, but customers are charged for additional storage of Fail-safe data files that were rekeyed – a 7 day overlap exists.

Root keys are generated and stored using HSM (Hardware Security Module), this offers three benefits:

  1. The root key never leaves the HSM
  2. Lower-level keys cannot be unwrapped without access to the HSM
  3. HSM generates secure, random keys during key rotation and rekeying (unless you use GCP)

Image credit: Snowflake

Snowflake also allows you the option of tri-secret secure. Tri-Secret Secure lets you control access to your data using your own master encryption key that you maintain in your cloud provider’s key management service. Your key is combined with a Snowflake key to create a composite master key which encrypts all data in your account. Along with Snowflake’s user authentication, this provides three levels of data protection.

Roles

Snowflake uses a RBAC model (Role-Based Access Control), which they summarise in the following way:

“The Role-based Access Control (RBAC) model adopted in Snowflake prefers role inheritance to role composition when roles are granted to users. In combination with single role activation, this model makes separating duties easier and more accurate to implement, and violations easier to detect. Additionally, a single role activation simplifies object ownership assignment and allows the model to work with more traditional Discretionary Access Control.

Consider a scenario where Role A is granted to Role B and User 1 is granted Role B, which is also the default role used by User 1. In every session, User 1 can use the sum of privileges granted to Role A, Role B and PUBLIC without switching roles. This set of roles is called effective roles as all of them are in effect during any statement execution within the session. On the other hand, Role B is the current (active role) which controls the ownership of any objects created.”

Source

When granting access, it’s important to remember that Snowflake operates according to an object-hierarchy: databases contain schemas which contain tables, views etc… In order to access the lowest items in a hierarchy, one must have access to the higher items. Access cannot be granted out of order – e.g. if one is granted use on Table X, in Schema Y or Database Z, access to X, Y and Z are not automatically granted. It’s important to remember too that even if a role has privileges on an object, if it does not have the USAGE privilege on the database and schema containing the object it will not be visible to that role.

Access Controls

Snowflake combines approaches from two models:

  1. Discretionary Access Control (DAC) – each object has an owner who can grant access to that object
  2. Role-based Access Control (RBAC) – access privileges are assigned to roles, which are in turn assigned to users

(Personally, I would advise against DAC, too many times have I seen this – not just in Snowflake – cause all sorts of mess).

Here it’s important to introduce some key concepts:

Securable Object – an entity to which access can be granted. • Role – an entity to which privileges can be granted. Roles are assigned to users; more than one role can be assigned to a user. • Privilege – a defined level of access to an object. • User – an identity, whether a person or programme, recognised by Snowflake

Access to securable objects is allowed via privileges assigned to roles.

Image credit: Snowflake

Each object lies within a logical container in a hierarchy of containers, and is owned by a single role. If two users share a role, they share ownership. Ownership can also be transferred.

Image credit: Snowflake

Roles can be granted to other roles, creating a hierarchy of roles. Subordinate roles are inherited by all above in the hierarchy.

There are some system-defined roles in Snowflake:

ACCOUNTADMIN
Role that encapsulates the SYSADMIN and SECURITYADMIN system-defined roles. It is the top-level role in the system and should be granted only to a limited/controlled number of users in your account.

SECURITYADMIN
Role that can manage any object grant globally, as well as create, monitor, and manage users and roles. More specifically, this role:
• Is granted the MANAGE GRANTS security privilege to be able to modify any grant, including revoking it.
• Inherits the privileges of the USERADMIN role via the system role hierarchy (e.g. USERADMIN role is granted to SECURITYADMIN).

USERADMIN
Role that is dedicated to user and role management only. More specifically, this role:
• Is granted the CREATE USER and CREATE ROLE security privileges.
• Can create users and roles in the account.
• This role can also manage users and roles that it owns. Only the role with the OWNERSHIP privilege on an object (i.e. user or role), or a higher role, can modify the object properties. In addition, the role must have the global CREATE USER or CREATE ROLE privilege, respectively, to modify users or roles it owns.

SYSADMIN
Role that has privileges to create warehouses and databases (and other objects) in an account.
• If, as recommended, you create a role hierarchy that ultimately assigns all custom roles to the SYSADMIN role, this role also has the ability to grant privileges on warehouses, databases, and other objects to other roles.

PUBLIC
Pseudo-role that is automatically granted to every user and every role in your account. The PUBLIC role can own securable objects, just like any other role; however, the objects owned by the role are, by definition, available to every other user and role in your account.
• This role is typically used in cases where explicit access control is not needed and all users are viewed as equal with regard to their access rights.

CUSTOM ROLES
Custom roles (i.e. any roles other than the system-defined roles) can be created by the SECURITYADMIN roles as well as by any role to which the CREATE ROLE privilege has been granted. By default, the newly-created role is not assigned to any user, nor granted to any other role.
• When creating roles that will serve as the owners of objects in the system, we recommend creating a hierarchy of custom roles, with the top-most custom role assigned to the system role SYSADMIN. This role structure allows system administrators to manage all objects in the account, such as warehouses and database objects, while restricting management of users and roles to the SECURITYADMIN or ACCOUNTADMIN roles.
• Conversely, if a custom role is not assigned to SYSADMIN through a role hierarchy, the system administrators will not be able to manage the objects owned by the role. Only those roles granted the MANAGE GRANTS privilege (typically only the SECURITYADMIN role) will see the objects and be able to modify their access grants

Example of a role hierarchy:

Image credit: Snowflake

Every user has an ‘active role’ based upon the following (in this order):

  1. If a role is specified as part of the connection
  2. If a default role has been set for the user
  3. If neither of the above are true, the user gets PUBLIC

Any user can use USE ROLE to change their role. Snowflake will always check the role, its privileges and an objects’ required privileges before allowing an action.

Snowflake recommend the following in respect to the ACCOUNTADMIN role:
    • Assign it to a limited number of people.
    • All users who possess it should use MFA.
    • At least two users should possess it.
    • Avoid using it to create objects – it’s only for initial set up.
    • Avoid using it for automated scripts.

Some tips and tricks:
• To lock down object security, use managed access schemas; object owners lose the ability to make grant decisions, instead the schema owner grants privileges on objects in the schema.
• Use future grants, these allow you to define an initial set of privileges on objects in a schema and as new objects are created those privileges are automatically granted to roles, simplifying the whole process.
• Remember when cloning objects that the clone doesn’t inherit the source privileges.
• To grant access to objects at the bottom of an object hierarchy, the role should have access to the database and schema of those objects.
• Any users who will manage objects created by custom roles should also be assigned those custom roles.
• Align access to objects according to business functions.
• Grant highest-function roles (in role hierarchies) to the SYSADMIN role.

Conclusion

Hopefully, you understand a little more about how Snowflake defends your account from unauthorised access, the active part account holder must play in that defence, and how account holders can ensure that users are channeled towards the data that is suitable for their position and needs. The key phrase here is ‘active role’. Snowflake does not promise to act as the sole and ultimate guardian of your data. Whilst Snowflake does a lot of the heavy lifting in terms of securing your data, the account holder must be a part of that by tailoring it to business needs, requirements and procedures.

Doing so means understanding the models that Snowflake provide and I hope those models are a little clearer. Before getting stuck in in Snowflake, and before doing the exam this blog was written for, I definitely suggest digging into the further reading below.

Sources & Further Reading

Managing Your Snowflake Account

Managing Security in Snowflake