Read data from Delta Lake tables with the DeltaLake.Table M function
We’re happy to announce a new function in Power Query’s M language for reading data from Delta Lake tables: the DeltaLake.Table function. This function is now available in Power Query in Power BI Desktop and in Dataflows Gen1 and Gen2 and replaces the need to use community-developed solutions like this one by Gerhard Brueckl. Let’s see some examples of how to use it.
Connecting to Delta Lake tables
DeltaLake.Table can connect to any folder that represents a Delta Lake table (which means, for example, that the folder you connect to should contain a subfolder called _delta_log) in either ADLSgen2 storage, a Fabric Lakehouse or a local file system. If you have a folder on your local PC with the path “C:\DeltaCoffee” that contains a DeltaLake table like so:
You can read the data in this Delta Lake table using the following M code:
let
Source = Folder.Contents("C:\DeltaCoffee"),
ToDelta = DeltaLake.Table(Source)
in
ToDelta
The output in Power Query is a table with the contents of the Delta Lake table:
You can connect to Delta Lake tables in ADLSgen2 or a Fabric Lakehouse in a very similar way, using the AzureStorage.DataLake function to connect to the DFS endpoint of the folder containing the Delta Lake table. Here’s an example of how to connect to a folder in a Fabric Lakehouse using AzureStorage.DataLake (which works because OneLake supports access via existing ADLSgen2 APIs):
let
Source = AzureStorage.DataLake("https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/DeltaCoffee/", [HierarchicalNavigation=true]),
ToDelta = DeltaLake.Table(Source)
in
ToDelta
Note that you need to set the HierarchicalNavigation option to true in the AzureStorage.DataLake function for this to work.
Nested structures
The DeltaLake.Table function can handle nested structures in Delta Lake tables in a similar way to how Power Query handles nested structures in other data sources. In the example shown above the coffee_profile column contains a nested structure. Here’s what it looks like when viewed inside a Fabric Lakehouse:
As shown in the first screenshot above the coffee_profile column in the table returned by DeltaLake.Table in Power Query contains Record values, which can be expanded by clicking on the icon in the top right-hand corner of the column:
Here’s what the table looks like in Power Query after the coffee_profile column has been expanded:
By default DeltaLake.Table will convert up to 32 levels of nested structures in this way. The number of levels of nesting that is converted can be controlled using the MaxDepth option, for example:
DeltaLake.Table(Source, [MaxDepth=10])
If the number of levels of nesting exceeds the maximum depth then an error will be raised.
Use of statistics
Power Query will use statistics to prune partitions and compute aggregates by default. If you need to stop it using statistics, perhaps because you know they are incorrect, you can do so by setting the UseStatistics option to false like so:
DeltaLake.Table(Source, [UseStatistics=false])
Time travel
Different versions of the Delta Lake table can be accessed using the Value.Versions function on the table returned by DeltaLake.Table, for example:
let
Source = Folder.Contents("C:\DeltaCoffee"),
ToDelta = DeltaLake.Table(Source),
GetVersions = Value.Versions(ToDelta)
in
GetVersions
Returns the following table, with one row for each version of the table and an extra row with a value of null in the Version column for the latest version:
The Data column contains values of type Value, which return the data in the table for each version. In the screenshot above the latest version is version 3; the row with the version number null, the latest version, returns the same data as the row with the version number 3.
If you access a specific version of the data you can get the version number by using the Value.VersionIdentity function. For example, the following M query gets the data for version number 2 of the Delta Lake table in the GetVersionNumberTwo step, which means that when Value.VersionIdentity is used in the final step it returns the value 2:
let
Source = Folder.Contents("C:\DeltaCoffee"),
ToDelta = DeltaLake.Table(Source),
GetVersions = Value.Versions(ToDelta),
GetVersionNumberTwo = GetVersions{2}[Data],
GetVersionIdentity = Value.VersionIdentity(GetVersionNumberTwo)
in
GetVersionIdentity