Connecting to OneLake
How do I connect to OneLake? Do I need to use new tools or download a new SDK? These are natural questions from any app developer, ISV, or data engineer looking to get started with Microsoft Fabric and OneLake. Luckily, the answers are simpler than you may think. In this blog, we’ll dive into how to connect and interact with OneLake, including how OneLake achieves its compatibility with any tool used over ADLS Gen2!
Can I really use the same tools as ADLS Gen2?
OneLake supports the same DFS APIs as ADLS Gen2. This means any API calls, SDKs, or tools that work over ADLS Gen2 can also connect to OneLake. Simply replace your ADLS Gen2 URL with a OneLake URL, and everything will just work! To demonstrate, let’s modify the steps of this walkthrough, which uses Python to list files in an ADLS Gen2 account. By changing just a few lines of code, we’ll be able to read files in OneLake instead.
Reading OneLake using Python
We can summarize the walkthrough to read from ADLS Gen2 into three simple steps:
- Install and import the necessary packages.
- Create an authorized DataLakeServiceClient to represent your storage account.
- List the files (or perform any other required steps).
To update this code to work with OneLake, we only need to change how we create the service client – we can use all the same packages and extensions as with ADLS Gen2. Here’s the line where we create the service client:
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net ".format(
"https", storage_account_name), credential=default_credential)
As mentioned above, the only change required to use this same code over OneLake is just to use a OneLake URL instead. In this case, we need to swap “dfs.core.windows.net” out for the OneLake domain “dfs.fabric.microsoft.com”. Now, the service client will build a OneLake URL instead of an ADLS Gen2 URL.
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.fabric.microsoft.com".format(
"https", storage_account_name), credential=default_credential)
Next, we must supply a file system for the file system client and the paths we want to read from. While OneLake doesn’t have filesystems like ADLS Gen2, workspaces fill the same spot in OneLake’s hierarchy. When adding the path, don’t forget to add the item type to your item!
file_system_client = service_client.get_file_system_client(file_system="myworkspace")
paths = file_system_client.get_paths(path="mylakehouse.Lakehouse/Files/")
Finally, when running the script, we have to provide the name of our storage account. The account name for OneLake is always ‘onelake’, so our two calls will look like:
initialize_storage_account_ad("onelake")
list_directory_contents()
Running this script will list the contents of our Files folder, and can be adapted to perform a variety of actions, from creating new directories to deleting files – the walkthrough we used as a guide contains multiple different examples! Just remember to use a OneLake endpoint, a workspace in place of a filesystem, and include your item type in the path, and this SDK (and others) will just work over OneLake!
How do I build a OneLake endpoint?
As mentioned above, OneLake uses a unique endpoint to distinguish itself from ADLS Gen2: onelake.dfs.fabric.microsoft.com. OneLake URLs are structured similar to ADLS Gen2 URLs to help with compatibility. Here’s a few things to note about using OneLake endpoints, especially when adapting them to ADLS Gen2-compatible tools.
- The account name of your OneLake is always ‘onelake’.
- Some tools use the account name as part of building the endpoint.
- OneLake uses the ‘fabric.microsoft.com’ domain, instead of Azure Storage’s ‘core.windows.net’
- If a tool supports a custom domain, just use ‘fabric.microsoft.com”
- Some tools validate URLs for ‘core.windows.net’ and block ‘fabric.microsoft.com’ – we are addressing these as quickly as we can find them!
- Workspaces in OneLake map to filesystems (DFS) and containers (Blob).
- Once again, this is mostly used for tools that create endpoints from your account and filesystem name (like what we saw in the Python example!).
Note: Like ADLS Gen2, OneLake also supports a blob endpoint that can be used to instead call Blob APIs: ‘onelake.blob.fabric.microsoft.com’. OneLake has the same compatibility with Blob APIs as ADLS Gen2, and for more details about Blob features supported ADLS Gen2, see Blob Storage feature support in Azure storage accounts | Microsoft Learn.
What file formats does OneLake support?
As a data lake, OneLake supports all data formats and types. The best place to land unstructured data is in the Files folder of a lakehouse. The lakehouse and OneLake don’t check for any formats in this folder, making it a great area to store raw or unstructured data.
For tabular data, Fabric and OneLake are standardized on Delta Lake. Loading data in the Delta format to any of the managed locations in OneLake (like the Tables folder in a Lakehouse) will automatically register that table and its metadata in Fabric’s metastore. From there, you can reference that file as a table, and use SparkSQL to interact with it. Together, a single lakehouse can hold both your unstructured and structured data, making it easy to flow from raw data to structured reports, in a single lakehouse or across multiple workspaces.
How do I authorize requests to OneLake?
For compatibility with existing tools, OneLake automatically extracts your user identity from a submitted AAD token and maps that identity to any permissions granted in the Fabric portal. This means OneLake is fully compatible with any tools which leverage AAD authorization.
When accessing OneLake from an existing tool, just log in to your Azure account and tenant. Your subscription doesn’t matter, as OneLake only cares about your user identity. After logging in or supplying your credentials, authorizing to OneLake will just work! So long as you have permissions to your workspace, you’ll be able to read and write within your items with no issues.
Here’s an example using PowerShell to quickly get a valid AAD token:
az login --allow-no-subscriptions
Get-AzAccessToken -ResourceTypeName Storage
OneLake does not care about the subscription you use (hence ‘—allow-no-subscriptions’) but does require AAD tokens in the correct audience (guaranteed by ‘-ResourceTypeName Storage’).
Please note that OneLake does not support SAS URI or account key authorization, and you cannot enable public access to your OneLake. It’s important to note that just because you may have full permission over an item or workspace, there are still operations that you will never have permission to do via DFS and Blob APIs. For more details on unauthorized operations, see the below section.
Tips and Troubleshooting
Although OneLake aims for complete compatibility with ADLS Gen2, there are some issues that can occasionally prevent a tool from working over OneLake. Let’s look at a few of them:
URL Validation
To help ensure calls are only made to authorized domains, some tools validate storage URLs to ensure they match known endpoints. As OneLake has a distinct endpoint from ADLS Gen2, these tools will block calls to OneLake or not know to use ADLS Gen2 protocols with OneLake. One way around this is to use custom endpoints (as in the Powershell example above). Otherwise, it’s often a simple fix to add OneLake’s endpoint (fabric.microsoft.com). If you find a URL validation issue or any other problems connecting to OneLake, please let us know at aka.ms/fabricideas!
Unauthorized operations
As a SaaS service, OneLake enforces the structure of Fabric workspaces and items and manages some functionality on your behalf. This means some operations possible on an ADLS Gen2 account you own are unauthorized on OneLake. For example, you cannot manage workspaces, items, and Fabric-managed folders using DFS APIs.
For valid calls with unsupported headers, OneLake will ignore the headers but allow the call to complete otherwise. Examples of this include managing permissions and setting storage tiers.
Overall, these restrictions help to protect the security and structure of OneLake while still ensuring valid requests to OneLake don’t fail unnecessarily.
Learning more
To learn more about OneLake and for additional details about the scenarios discussed above, please check out these links:
- What is Microsoft Fabric?
- What is OneLake?
- Accessing OneLake via DFS and Blob APIs.
- Accessing OneLake from the Azure Storage Powershell module.
Tags: Analytics, Microsoft Fabric, Data Engineering, Data Lake, OneLake