Semantic link in Microsoft Fabric: Bridging BI and Data Science
We are pleased to introduce the Public Preview of semantic link, an innovative feature that seamlessly connects Power BI datasets with Synapse Data Science within Microsoft Fabric. As the gold layer in a medallion architecture, Power BI datasets contain the most refined and valuable data in your organization. With semantic link, we unlock this data’s potential beyond traditional business intelligence by making it accessible to notebooks and Python in Microsoft Fabric.
Python has emerged as the go-to language for state-of-the-art machine learning and boasts a vast ecosystem of libraries for a wide range of tasks, including rich visualizations, statistical analysis, and data validation. By bridging this gap, we aim to empower business analysts to utilize modern data tools with their data, enable Power BI developers to streamline automation tasks, and facilitate seamless collaboration with data scientists.
Semantic link supports the popular pandas and Spark APIs, making it easy to join existing data and apply common libraries. You can compute Power BI measures, read tables, and execute DAX queries. Semantic link goes beyond plain data connectivity by propagating semantic information from Power BI to power new capabilities of Microsoft Fabric for data augmentation, validation and exploration, as well as an extendable set of semantic functions.
In this blog post, we’ll showcase semantic links capabilities to access Power BI datasets.
Use semantic link to bring your Power BI data to pandas
Semantic link offers easy to use Python methods for pandas users to discover and read data:
- discover Power BI datasets, tables and measures using list_datasets, list_tables and list_measures
- read data from tables using read_table
- evaluate measures using evaluate_measure
- and for advanced use-cases evaluate DAX expressions: evaluate_dax and %%dax cell magic
The following code snippets show how to install the python library in Microsoft Fabric and evaluate Power BI measures. The resulting FabricDataFrame is a semantically aware pandas dataframe – with all its functionality – while providing additional features like semantic propagation and semantic functions. Note that this sample assumes that the Power BI dataset “Customer Profitability Sample” is accessible in the Fabric workspace.
To make your adventures into notebooks even easier, you can use the %%dax cell magic to execute DAX. The sample below queries a Dynamic Management View (DMV) and its output is available in the _ variable for further analysis using Python (see output caching). All underlying requests are run on low-priority, making sure that your production workload is not impacted.
Use semantic link to bring your Power BI data to Spark
Spark users can access Power BI data from all languages supported in Fabric: Python, R, and SparkSQL using the semantic link Spark native connector. Configure the Power BI catalog to gain access to all your datasets. In this example we evaluate a measure using the special _Metrics table. All other tables are accessible using e.g. “pbi.`Customer Profitability Sample`.Customer” and ready to be combined with other Spark data sources.
Use semantic propagation for data augmentation
Semantic links Python API returns FabricDataFrame when accessing Power BI data to enable data augmentation and semantic functions. Here’s a brief example on how you can augment an existing dataframe with Power BI data. Instead of computing the measure for a set of dimensions, joining the data frame and filtering it, the add_measure function simplifies the operation by matching the columns to the Power BI dataset – here Customer[Country/Region] and Industry[Industry] – to compute the measures Total Revenue and Total COGS at these levels and automatically adding them.
Discover semantic functions with intelligent code auto-completion
Semantic functions enable intelligent auto-complete by matching function parameters with column metadata. For example, the to_geopandas function provides suggestions to bind the lat_col and long_col parameters to the latitude and longitude columns based on Power BI data categories.
A semantic function is a regular Python function, exposed on FabricDataFrames and accompanied with metadata to enable intelligent auto-completion. While semantic link provides a few semantic functions available on GitHub, you can define your own semantic functions using Python decorators. The @semantic_function decorator applied on the _is_capatial function makes it available for intelligent code auto-completion.
Explore and validate data in Power BI from Python
Ensuring data quality is a crucial task and semantic link provides tools to support this. In this example we visualize existing relationships defined in your Power BI dataset.
To understand the data in even more detail, the find_dependencies and plot_dependencies_metadata methods help you understand and visualize functional dependencies present in your data:
To learn even more about data validation and exploration visit our docs.
Get coding!
In summary, semantic link is a powerful tool that enables business analysts and data scientists to use data effectively in a comprehensive data science environment. By using semantic link, you can:
- Eliminate duplicated business logic by empowering data scientists to directly access your semantic model in Power BI datasets
- Do even more with semantic information present in Power BI datasets using semantic functions, data augmentation, validation and exploration.
Hope you find semantic link useful, and we welcome your feedback and suggestions. To try semantic link follow our how-to guides. We’d love to hear your feedback in the comments, and Fabric ideas!