Microsoft Fabric Updates Blog

Building Common Data Architectures with OneLake in Microsoft Fabric

Introduction

OneLake can be used as a single data lake for your entire organization, it provides ease of use and helps eliminate data silos. It can also simplify security while ensuring that sensitive data is kept secure. OneLake and Fabric provide several out of the box capabilities to keep data access restricted to only those that need it. This article will look at some common data architecture patterns and how they can be secured with Microsoft Fabric.

Security structure

It is important to understand the basic building blocks of security in Microsoft Fabric before getting started. Fabric provides many different places where security can be set. This allows for both flexibility and scale of security configurations. There are three main types of security in Fabric. 

  • Workspace roles
  • Item permissions
  • Compute permissions

We will take a close look at each of these and how they interact.

Workspace roles

The first and least granular level of security in Microsoft Fabric is workspace roles. Workspace roles are pre-configured sets of capabilities that can be granted to users or groups at the workspace level in Fabric. When a user is assigned to a role, they receive all the capabilities of that role within the confines of the workspace. These workspace permissions then apply to all items within the workspace.

Each workspace role contains permissions that allow users to perform certain actions. For this blog, we are focusing on data security and the data access granted by each role is outlined below.

Role

Can add admins?

Can add members?

Can write data and create items?

Can read data?

Admin

Yes

Yes

Yes

Yes

Member

No

Yes

Yes

Yes

Contributor

No

No

Yes

Yes

Viewer

No

No

No

Yes

A screenshot of a computer

Description automatically generated

Item permissions

Next in the hierarchy are permissions that can be set on a specific item. Item permissions allow for adjusting the permissions set by a workspace role or giving a user access to a single item within a workspace without adding them to a workspace role. The easiest way to configure item permissions is to share an item with a user or group. During the sharing step, the user can choose which permissions to grant to the end user.

A screenshot of a computer

Description automatically generated

Sharing the item always grants the user the Read permission for that item. Read allows users to see the metadata for that item and view any reports associated with it but not access the underlying data in SQL or OneLake. To grant just the Read permission, leave all the boxes unchecked.

If the “Read all SQL endpoint data” is checked, users will be given the ReadData permission. ReadData gives access to all Tables in the item when accessing through the SQL Endpoint. Users will not be able to access OneLake directly.

If the “Read all Apache Spark” box is checked, users will be given ReadAll. This permission allows users to access data in OneLake. This could be through direct OneLake access, Apache Spark queries, or the lakehouse UX.

The last checkbox is not relevant for this blog, but you can learn about the Build permission here.

A screen shot of a computer

Description automatically generated

Compute permissions (SQL and Semantic models)

The last place permissions can be set is within a specific compute engine in Fabric, specifically through the SQL Endpoint or semantic models. The SQL Endpoint provides direct SQL access to Tables in OneLake, but it can have security configured natively through SQL commands. SQL security allows for more granular permissions such as table and row level security. However, the security set in this way only applies to queries made through SQL. Accessing OneLake data through a Spark query (users with the ReadAll permission) is not impacted by the security restrictions in SQL. Likewise, semantic models also allow for security to be defined using DAX and those restrictions apply to users querying through the semantic model or reports built on top.

In the below example, a Lakehouse is shared with a user and Read access is granted. They are then given SELECT through the SQL endpoint. When that user tries to read data through Spark notebooks the access gets denied since they don’t have ReadAll, but reads made through SQL SELECT statements would succeed.

A picture containing text, screenshot, font, diagram

Description automatically generated

Shortcuts

Shortcuts are a feature of Microsoft OneLake that allow for data to be easily reused without making copies of the data. Shortcuts function like symbolic links where the data appears as if it is natively part of a Lakehouse, but the original data was not moved or copied. There are two primary types of shortcuts, and the security functions differently for each.

OneLake shortcuts: Are shortcuts to another location in OneLake. These shortcuts require that the user accessing the shortcut has access to the location that the shortcut points to. For example, if Lakehouse1 has TableA that is a shortcut to Lakehouse2/TableB. Any user accessing TableA will need access to Lakehouse2/TableB to see any data.

External (ADLS, AWS S3) shortcuts: Shortcuts to external locations outside of Fabric/OneLake use a service principal or account key to authenticate to the target destination. All users accessing the shortcut receive the same permissions as provided by the service principal/account key. To control access to data in external shortcuts, use the guidance from the Securing data section to configure permissions at the appropriate level within Fabric. This may require configuring SQL security to restrict access to the item.

Another important detail about shortcuts is how they interact with SQL Endpoint and SQL Warehouse queries. SQL engines in Fabric use a delegated model when accessing shortcuts. This means that the SQL creator’s identity (the identity of the user that created the Lakehouse or warehouse) is checked against the shortcut target destination not the querying user. The diagram below shows which identity is used when accessing a shortcut to another Lakehouse based on which engine is being queried.

A screenshot of a computer

Description automatically generated

Securing data

Now that we understand the tools Fabric provides for configuring access, how should the pieces be setup to work together?

The determining factor for setting access should be to always grant users access to data they need at the lowest possible level. If a user needs to read data from a report, they should only be given access to that report itself. With that guidance in mind, let us take a look at some common patterns.

Data Mesh

Data mesh is an architectural paradigm that treats data as a product, rather than a service or a resource. Data mesh aims to decentralize the ownership and governance of data across different domains and teams, while enabling interoperability and discoverability through a common platform. In a data mesh architecture, each decentralized team manages the ownership of the data that is part of their data product.

Microsoft Fabric supports organizing data into domains and enabling data consumers to be able to filter and discover content by domain. It also enables federated governance, which means that some governance currently controlled at the tenant level can be delegated to domain-level control, enabling each business unit/department to define its own rules and restrictions according to its specific business needs. As a result, enterprise customers are empowered with the key tools they need to structure their tenant’s data estate along the principles of a data mesh.

Let’s take a look at building a data mesh in Fabric. First, using the domains feature, tenant admins can manage the creation and assignment of any domains and the associated workspaces. Next, each data team has its own Fabric workspace. The workspace will store the data and orchestration needed to build out the final data products for consumption. The users that build and create data products in the workspace are given a read/write workspace role such as Contributor. This will let them interact with all the items in the workspace and create new ones as needed.

Second, within the workspace teams will have Lakehouses that are consumed by different downstream teams. For example: data scientists, business analysts, and company leaders. To keep users aligned with their target experiences, each type of downstream user can be given access to a single Fabric data experience.

Downstream User

Fabric experience

Data scientists

Spark notebooks

Business analysts

SQL Endpoint

Report creators

Semantic models

Company leaders

Power BI reports

Using Fabric artifact permissions, we can assign each user group to a single experience by sharing the correct items. The admin can share the Lakehouse with “Read all SQL endpoint data” for the business analysts. They can share the Lakehouse with “Read all Apache Spark” selected for the data scientists. Lastly, they can share the Power BI reports with the company leaders to ensure they have access to the polished end products.

Because the “Read all Apache Spark” setting gives full access to the data of a Lakehouse, there might be cases where multiple Lakehouses are needed. For example, if some of the data is specific to only some data scientists due to PII or country specific data contents, lakehouses can be created per downstream consumption group. Using shortcuts, new lakehouses can be easily created and share data between them without creating additional data copies.

For business analysts, granular security such as row and column level security can be configured directly in the SQL Endpoint for the Lakehouse. This ensures that data is kept secure at a granular level. This same security can be reused for Power BI reports built over the SQL Endpoint as well.

Data mesh is unique in that each workspace will implement the same approach to managing security, as each team may need to consume data products from a variety of other teams. This interconnected approach is where the power of this architecture shines through. However, it’s important that each team understands how to correctly secure data since there is no central team managing all data security.

A screenshot of a computer

Description automatically generated

Hub and Spoke

Hub and spoke is a data architecture pattern that centralizes the data from different sources into a single hub, such as a data warehouse or a data lake. The hub serves as the source of truth for the data and provides standardized schemas and formats. The spokes are the various applications or services that consume the data from the hub for different purposes, such as analytics, reporting, or machine learning. The spokes can also perform transformations or aggregations on the data before presenting it to the end users. Hub and spoke aims to simplify the data integration and management process by reducing the complexity and redundancy of data pipelines.

Like with the data mesh architecture, there is no single way to build a hub and spoke model. We will look at a common method of how to achieve this in Microsoft Fabric.

Like the data mesh approach, workspaces are the core level for defining groups of related items and securing them. It is common for a hub and spoke team to have a single workspace where all data products are created and managed, but you can just as easily use multiple workspaces. For the data engineers and teams creating the central data products, granting the Contributor workspace role works best.

Within the workspace, polished data products will be created for end users to consume. Some of these data products will need to be consumed by other teams for use in in-depth analysis and will require access to the underlying OneLake data. Others will consume data through a SQL warehouse or SQL Endpoint. Lastly, polished reports for company executives and decision makers are created and shared with those users.

Using item sharing, the hub and spoke admins can share select consumption experiences or artifacts with the downstream teams that need access to them.

  • For data science or ML teams, use “Read all Apache Spark” to give teams access to the OneLake data (raw files.) Teams can then leverage shortcuts to access the data in their own workspace without creating copies of it.
  • For most business analysts, use “Read all SQL Endpoint data” to give access to the SQL queries. This can be augmented by defining SQL permissions. For businesses that don’t use lake data, you can simplify this by creating a Fabric warehouse instead a Lakehouse and share the warehouse directly.
  • For users that need to consume sensitive data, creating pipelines against the SQL Endpoint will allow for RLS or other fine-grained security to be applied when that data is read. The data can be transformed or processed downstream without risking access to rows or columns that are not allowed for those users.
  • For all other users, Power BI reports can be created and shared from the workspace. You can create reports specific to downstream teams that will need them.

A diagram of a workflow

Description automatically generated

Recap

Microsoft Fabric provides a robust set of tools for managing access to data, while leveraging the power of OneLake to simplify your data estate. In this guide we looked at some common architectures and how you can use the capabilities in Fabric to build those models and keep your data secure.

Next steps

  1. Learn more about Fabric security features here. Microsoft Fabric security – Microsoft Fabric | Microsoft Learn
  2. Get started with a Microsoft Fabric free trial: Sign up for a Fabric free trial.
  3. Create a Lakehouse to get started with exploring the features in this article: Create a Lakehouse – Microsoft Fabric | Microsoft Learn

Entradas de blog relacionadas

Building Common Data Architectures with OneLake in Microsoft Fabric

octubre 29, 2024 por Dandan Zhang

Managed private endpoints allow Fabric experiences to securely access data sources without exposing them to the public network or requiring complex network configurations. We announced General Availability for Managed Private Endpoint in Fabric in May of this year. Learn more here: Announcing General Availability of Fabric Private Links, Trusted Workspace Access, and Managed Private Endpoints. … Continue reading “APIs for Managed Private Endpoint are now available”

octubre 28, 2024 por Estera Kot

We’re thrilled to announce that the Native Execution Engine is now available at no additional cost, unlocking next-level performance and efficiency for your workloads. What’s New?  The Native Execution Engine now supports Fabric Runtime 1.3, which includes Apache Spark 3.5 and Delta Lake 3.2. This upgrade enhances Microsoft Fabric’s Data Engineering and Data Science workflows, … Continue reading “Native Execution Engine available at no additional cost!”