Microsoft Fabric Updates Blog

Databricks Unity Catalog tables available in Microsoft Fabric

You can now access Azure Databricks Unity Catalog tables directly from Fabric via the new Mirrored Azure Databricks Catalog feature, now in Public Preview. This capability utilizes shortcuts in OneLake, ensuring that Fabric avoids any data movement or duplication. Leverage the unique advantage on Azure where you now can further your investment in Azure Databricks and analyze your data with all Microsoft Fabric workloads including Power BI Direct Lake mode delivering superior performance and flexibility.

Mirrored Azure Databricks Catalog

We have created a new item type inside Fabric called a “Mirrored Azure Databricks Catalog”. This can be found in the “Get data” section of the new items panel.

Each Mirrored Azure Databricks catalog item in Fabric is designed to map to an individual catalog within Azure Databricks Unity Catalog. When you create the item, you will first provide connection details for your Azure Databricks workspace. Then you will select the elements that you want to appear in the Fabric item. You can select an entire catalog or choose a subset of schemas or tables. Once you click create, all the file-based tables eligible for external access will appear in the Fabric item. These tables are now immediately ready to use by Fabric workloads.

Fabric automatically maintains a synchronous state between your selected catalog in Azure Databricks and your Mirrored Azure Databricks Catalog item. As tables are added or removed from your catalog in Unity Catalog, they are automatically added or removed from your Mirrored Azure Databricks Catalog item in Fabric.

The Mirrored Azure Databricks catalog item also has a built-in SQL endpoint and default semantic model. From within the item, you can switch to the SQL endpoint view and start analyzing your Azure Databricks catalog data with Fabric SQL. With the default semantic model, you can easily define relationships between your tables and visualize your data through Power BI reports and take full advantage of Direct Lake mode.

The Mirrored Azure Databricks Catalog item is also fully integrated with OneLake. This means it’s compatible with other workloads in Fabric. From a Fabric Lakehouse, you can create shortcuts to your Mirrored Azure Databricks catalog item. This allows you to easily integrate and unify data across systems. It also lets you analyze your data with spark notebooks, build ML models and create AI skills.

Supported Unity Catalog Object Types

The Mirrored Azure Databricks Catalog item will synchronize both Managed and External tables from Unity Catalog. However, some table types and table characteristics are not supported. The objects below will not be included in the Mirrored Azure Databricks Catalog item.

Unsupported Unity Catalog Types

  • Tables with RLS/CLM policies
  • Lakehouse federated tables
  • Delta sharing tables
  • Streaming tables

One Copy

The Mirrored Azure Databricks Catalog item replicates the structure of your Unity Catalog, so all supported tables are reflected under the corresponding schemas and catalogs. However, none of the data is physically copied, instead, a shortcut is created for every table in the item. These shortcuts access your source data stored in ADLS thus there is only ever “one copy” of the data. This means that you don’t have to wait for your data to be copied before you can start using it. It also eliminates latency so you’re never working with stale data.

Permissions and Governance

Authorization

Like other mirrored items in Fabric, the Mirrored Azure Databricks catalog item utilizes a delegated auth model. When you first set up the connection for your item, you provide an identity to connect as this can be a user identity or service principal. This identity is utilized for all interactions with the Azure Databricks workspace. In other words, data access will only be authorized to tables this identity has access to in Unity Catalog.

Inside Fabric you can manage who has access to the Mirrored Azure Databricks catalog item using Fabric workspace roles. More granular security can be defined for the SQL endpoint and semantic models. Coming soon, you will also be able to further restrict access using OneLake data access roles and item sharing.

Configure External Access in Databricks

Azure Databricks provides a multi-tiered governance model for enabling external access. First the metastore admin must enable the setting “External Data Access” on the metastore, this is off by default.

Next the “EXTERNAL_USE_SCHEMA” permission must be granted to the delegated identity that was specified when first creating the Mirrored Azure Databricks catalog item. This permission can be set at a catalog level or on individual schemas. Note that this is not included with the “ALL PRIVILEGES” permission.

Last, the delegated identity must have “SELECT” permissions on the individual tables and “USE” permissions on the schemas and catalogs. For instructions, see Control external access to data in Unity Catalog.

Network Security

In order for Fabric to communicate with your Azure Databricks workspace, the workspace URL must be accessible to Fabric. If you utilize workspace IP filters, you will need to configure the filter to allow the Fabric service tags.

See Fabric service tags for more details.

Additionally, the ADLS storage account that is utilized by your Azure Databricks workspace must also be accessible to Fabric. Coming soon, you will be able to configure firewall rules in your ADLS storage account to allow Fabric to connect as a trusted service or resource instance.

Enable Preview in Fabric

To access the public preview of Mirrored Azure Databricks catalogs, you must enable the feature within the capacity setting.  Open the Admin portal, select Capacity settings, then Fabric capacity.  Click on your capacity, choose Delegated tenant settings and enable the switch for Mirrored Azure Databricks Catalog (preview).

Try it today

The Mirrored Azure Databricks Catalog (Public Preview) in Fabric offers the best of both worlds. Maximize your existing investments in Azure Databricks and Unity Catalog while embracing Microsoft Fabric features like Power BI direct lake mode, OneLake Shortcuts and AI Skill Builder. This powerful combination supercharges your Lakehouse accelerating innovation and efficiency, propelling your data strategy to new heights. For more information, see our documentation here.

Bài đăng blog có liên quan

Databricks Unity Catalog tables available in Microsoft Fabric

tháng 10 31, 2024 của Jovan Popovic

Fabric Data Warehouse is a modern data warehouse optimized for analytical data models, primarily focused on the smaller numeric, datetime, and string types that are suitable for analytics. For the textual data, Fabric DW supports the VARCHAR type that can store up to 8KB of text, which is suitable for most of the textual values … Continue reading “Announcing public preview of VARCHAR(MAX) and VARBINARY(MAX) types in Fabric Data Warehouse”

tháng 10 29, 2024 của Dandan Zhang

Managed private endpoints allow Fabric experiences to securely access data sources without exposing them to the public network or requiring complex network configurations. We announced General Availability for Managed Private Endpoint in Fabric in May of this year. Learn more here: Announcing General Availability of Fabric Private Links, Trusted Workspace Access, and Managed Private Endpoints. … Continue reading “APIs for Managed Private Endpoint are now available”