Microsoft Fabric Updates Blog

Read data from Delta Lake tables with the DeltaLake.Table M function

We’re happy to announce a new function in Power Query’s M language for reading data from Delta Lake tables: the DeltaLake.Table function. This function is now available in Power Query in Power BI Desktop and in Dataflows Gen1 and Gen2 and replaces the need to use community-developed solutions like this one by Gerhard Brueckl. Let’s see some examples of how to use it.

Connecting to Delta Lake tables

DeltaLake.Table can connect to any folder that represents a Delta Lake table (which means, for example, that the folder you connect to should contain a subfolder called _delta_log) in either ADLSgen2 storage, a Fabric Lakehouse or a local file system. If you have a folder on your local PC with the path “C:\DeltaCoffee” that contains a DeltaLake table like so:

You can read the data in this Delta Lake table using the following M code:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

The output in Power Query is a table with the contents of the Delta Lake table:

You can connect to Delta Lake tables in ADLSgen2 or a Fabric Lakehouse in a very similar way, using the AzureStorage.DataLake function to connect to the DFS endpoint of the folder containing the Delta Lake table. Here’s an example of how to connect to a folder in a Fabric Lakehouse using AzureStorage.DataLake (which works because OneLake supports access via existing ADLSgen2 APIs):

let
    Source = AzureStorage.DataLake("https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/DeltaCoffee/", [HierarchicalNavigation=true]),
    ToDelta = DeltaLake.Table(Source)
in
    ToDelta

Note that you need to set the HierarchicalNavigation option to true in the AzureStorage.DataLake function for this to work.

Nested structures

The DeltaLake.Table function can handle nested structures in Delta Lake tables in a similar way to how Power Query handles nested structures in other data sources. In the example shown above the coffee_profile column contains a nested structure. Here’s what it looks like when viewed inside a Fabric Lakehouse:

As shown in the first screenshot above the coffee_profile column in the table returned by DeltaLake.Table in Power Query contains Record values, which can be expanded by clicking on the icon in the top right-hand corner of the column:

Here’s what the table looks like in Power Query after the coffee_profile column has been expanded:

By default DeltaLake.Table will convert up to 32 levels of nested structures in this way. The number of levels of nesting that is converted can be controlled using the MaxDepth option, for example:

DeltaLake.Table(Source, [MaxDepth=10])

If the number of levels of nesting exceeds the maximum depth then an error will be raised.

Use of statistics

Power Query will use statistics to prune partitions and compute aggregates by default. If you need to stop it using statistics, perhaps because you know they are incorrect, you can do so by setting the UseStatistics option to false like so:

DeltaLake.Table(Source, [UseStatistics=false])

Time travel

Different versions of the Delta Lake table can be accessed using the Value.Versions function on the table returned by DeltaLake.Table, for example:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta)
in
    GetVersions

Returns the following table, with one row for each version of the table and an extra row with a value of null in the Version column for the latest version:

The Data column contains values of type Value, which return the data in the table for each version. In the screenshot above the latest version is version 3; the row with the version number null, the latest version, returns the same data as the row with the version number 3.

If you access a specific version of the data you can get the version number by using the Value.VersionIdentity function. For example, the following M query gets the data for version number 2 of the Delta Lake table in the GetVersionNumberTwo step, which means that when Value.VersionIdentity is used in the final step it returns the value 2:

let
    Source = Folder.Contents("C:\DeltaCoffee"),
    ToDelta = DeltaLake.Table(Source),
    GetVersions = Value.Versions(ToDelta),
    GetVersionNumberTwo = GetVersions{2}[Data],
    GetVersionIdentity = Value.VersionIdentity(GetVersionNumberTwo)
in
    GetVersionIdentity

Related blog posts

Read data from Delta Lake tables with the DeltaLake.Table M function

April 9, 2024 by Meenal Srivastva

Create data pipelines in Fabric to access your firewall-enabled ADLS Gen2 storage accounts with ease and security. We are excited to announce a new feature in Fabric that enables you to create data pipelines to access your firewall-enabled Azure Data Lake Storage Gen2 (ADLS Gen2) accounts. This feature leverages the workspace identity to establish a … Continue reading “Introducing Trusted Workspace Access in Fabric Data Pipelines”

April 4, 2024 by Mark Kromer

We are always working on making the experience of building low-code data pipelines as easy as possible for our customers. Next week, we are going to roll-out a new experience in the Script activity in Fabric Data Factory pipelines to make it even easier to build expressions using the pipeline expression language.