Microsoft Fabric Updates Blog

Getting the size of OneLake data items or folders

Understanding the size of your OneLake data can be important to manage and plan storage costs, especially if you have large amounts of data.  Today, capacity admins can use the Microsoft Fabric Capacity Metrics app to find the total size of OneLake data stored in a given capacity or workspace but you may also want to understand the size of data in a specific item or folder.  In this blog, we’ll walk through a couple Azure PowerShell commands that enable you to get size information of any item or folders in OneLake.  As described in this blog post, with OneLake’s compatibility with ADLS tools, it’s as simple as replacing the ADLS Gen2 URL with a OneLake URL. 

To get started, there are three simple steps to set up:

Step 1: Open Azure PowerShell and install the Azure Storage PowerShell module. 

Install-Module Az.Storage -Repository PSGallery -Force

Step 2: Sign in to your Azure account.

Connect-AzAccount

Step 3: Create the storage account context.

  • Storage account name is onelake.
  • Set -UseConnectedAccount to pass through your Azure credentials.
    • Set -endpoint as fabric.microsoft.com.
$ctx = New-AzStorageContext -StorageAccountName 'onelake' -UseConnectedAccount -endpoint 'fabric.microsoft.com'

Get the size of an item

This example gets the size of an item “mylakehouse.lakehouse” in the workspace “myworkspace”.   

$workspaceName = 'myworkspace'
$itemPath = 'mylakehouse.lakehouse'
$colitems = Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $workspaceName -Path $itemPath -Recurse -FetchProperty | Measure-Object -property Length -sum
"Total file size: " + ($colitems.sum / 1GB) + " GB"

Keep in mind, if the workspace name does not meet Azure Storage naming criteria (ex. must be lowercase letters) then replace the workspace and item names with their GUIDs.  You can find the associated GUID for your workspace or item in the URL on the Fabric portal. You must use GUIDs for both the workspace and the item, and don’t need the item type.

Get the size of a folder

$workspaceName = 'myworkspace'
$itemPath = 'mylakehouse.lakehouse/Files/folder1'
$colitems = Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $workspaceName -Path $itemPath -Recurse -FetchProperty | Measure-Object -property Length -sum
"Total file size: " + ($colitems.sum / 1GB) + " GB"

Additional Details

  • These PowerShell commands won’t work on shortcuts that directly point to ADLS containers.  It’s recommended you create ADLS shortcuts to a directory that is at least one level below a container.
  • If you would like to fully automate these steps, you can obtain the workspace and item information with these APIs: Workspaces – List Workspaces – REST API and Items – List Items – REST API.

Next steps

We hope you continue to explore the ways you can leverage ADLS tools and APIs with your OneLake data.  To learn more about how to connect to OneLake, check out these links:

Bài đăng blog có liên quan

Getting the size of OneLake data items or folders

tháng 10 30, 2024 của Patrick LeBlanc

Welcome to the October 2024 Update! Here are a few, select highlights of the many we have for Fabric this month. API for GraphQL support for Service Principal Names (SPNs). Introducing a powerful new feature in Lakehouses: Sorting, Filtering, and Searching capabilities. An addition to KQL Queryset that will revolutionize the way you interact with … Continue reading “Fabric October 2024 Monthly Update”

tháng 10 22, 2024 của Elizabeth Oldag

Shortcuts in Microsoft OneLake allow you to unify your data across domains and clouds by creating a single virtual data lake for your entire enterprise. With shortcuts, data can be reused multiple times, making it simple to consolidate data, without data movement, data duplication or changing ownership of the data. The consumption of data via … Continue reading “Use OneLake shortcuts to access data across capacities: Even when the producing capacity is paused!”