Microsoft Fabric Updates Blog

Organizing your tables with lakehouse schemas and more (Public Preview)

We’re thrilled to introduce a new feature in Fabric: Lakehouse schemas. This feature lets users arrange their lakehouse tables into a folder-like structure, enhancing data discovery and more. Many users will be familiar with schemas in Fabric using Data Warehouse, and we are bringing aligned capabilities into Lakehouse.

Schemas created in your lakehouses will also appear in SQL Analytics Endpoint, Semantic models, shortcuts, and elsewhere lakehouse data is referenced. Your data remains consistently organized across different engines.

With lakehouse schemas you can:

  • Organize your tables in a folder-like structure.
  • Reference your tables in Spark code using namespace ‘workspace.lakehouse.schema.table’.
  • Reference multiple tables with schema shortcut.

How to get started?

Lakehouse schemas are now available in Public Preview. Your current lakehouses without enabled schemas will operate as usual. You can activate schema support while setting up a new lakehouse by selecting “Lakehouse schemas (Public Preview)” adjacent to the lakehouse name field. Upon creation, a default schema, “dbo”, will appear within the “Tables” section and cannot be renamed or removed.

To create a new schema, click “Tables” and select “New schema”. After entering a schema name, you’ll see it immediately created and listed under “Tables” in alphabetical order.

Organizing your tables with schemas

After creating your schemas, you can populate them with tables. When using Notebook code, precede the table name with the schema, such as “marketing.promotions”. In the Pipelines Copy tool, you can select a schema while importing data. However, note that Dataflows only import data into the default “dbo” schema and don’t allow schema selection.

Another quick way to organize your tables is using Lakehouse Explorer. Simply drag a table name from your source schema to the target schema name, and the table will be instantly moved to it. Make sure to update all your references to the moved table, as its path has changed.

Referencing tables in Notebooks

As previously noted, referencing a table now requires including the schema name. With our latest update, you gain a more expansive feature: the ability to reference tables from outside your current workspace, such as “myworkspace.mylakehouse.schema.table.” This enables tasks like joining Spark SQL tables located in separate workspaces.

Take this scenario: Your customer data resides in the “Sales” lakehouse within the “Corporate” workspace, while employee information is housed in the “HRM” lakehouse within the “Internal” workspace. If you need to identify which employees are also customers, you can execute a query that joins the “Employees” and “Customers” tables to obtain your answer.

Remember, if the schema name isn’t specified, the system will default to the “dbo” schema. The same default setting applies to the lakehouse and workspace names.

Referencing all your data lake tables with five clicks

Yes, this is correct; it’s just five clicks and not just a marketing trick. You’ll get that using the Schema shortcut. It enables you to reference a folder that contains all your tables in a data lake, which could be stored in ADLS Gen2, AWS S3, or other sources supported by shortcuts, and instantly, you’ll see all tables available in your lakehouse without copying the data. Click on “Tables”, select “Schema shortcut”, select your target location, pick the folder that contains all tables, and click “Finish”.

What is coming up next?

As previously stated, this feature launched in its Public Preview phase. Our team is actively working to address limitations before General Availability.

In our upcoming updates, we plan to introduce a tool to enable the transition of existing lakehouses without schema support to ones that do. We are also committed to implementing further features to enhance data security in schemas.

Moreover, we are concentrating on improving metadata utilization and specification features. This will enable users to input their metadata within the lakehouse architecture, thus facilitating more sophisticated data discovery and automation and leveraging AI technologies.

More information

You can read more about lakehouse schemas on our documentation page, Lakehouse schemas (Preview)—Microsoft Fabric | Microsoft Learn.

We also encourage you to submit ideas about schemas or lakehouses in general in the Microsoft Fabric Ideas Portal.

If you want to express joy and happiness with the feature or provide a critical view, please tag us in your social posts using #fabriclakehouse.

Bài đăng blog có liên quan

Organizing your tables with lakehouse schemas and more (Public Preview)

tháng 10 31, 2024 của Jovan Popovic

Fabric Data Warehouse is a modern data warehouse optimized for analytical data models, primarily focused on the smaller numeric, datetime, and string types that are suitable for analytics. For the textual data, Fabric DW supports the VARCHAR type that can store up to 8KB of text, which is suitable for most of the textual values … Continue reading “Announcing public preview of VARCHAR(MAX) and VARBINARY(MAX) types in Fabric Data Warehouse”

tháng 10 29, 2024 của Dandan Zhang

Managed private endpoints allow Fabric experiences to securely access data sources without exposing them to the public network or requiring complex network configurations. We announced General Availability for Managed Private Endpoint in Fabric in May of this year. Learn more here: Announcing General Availability of Fabric Private Links, Trusted Workspace Access, and Managed Private Endpoints. … Continue reading “APIs for Managed Private Endpoint are now available”