Microsoft Fabric Updates Blog

Introducing Synapse Data Science in Microsoft Fabric

See Arun Ulagaratchagan’s blog post to read the full Microsoft Fabric preview announcement.
 
Data science is a powerful tool for unlocking the value of data in any organization’s analytics workflow. Through the use of data science, organizations can make more informed decisions and gain predictive insights that would otherwise be unattainable.
 
Today, we are thrilled to announce the preview of the new Synapse Data Science experience in Microsoft Fabric! With data science in Microsoft Fabric, you can utilize the power of machine learning features to seamlessly enrich data as part of your data and analytics workflows.
  

What is Microsoft Fabric?

Microsoft Fabric is our next generation data platform for analytics and integrates Power BI, Data Factory, and the next generation of Synapse experiences, exposing easy to use analytics experiences for a variety of roles. This unified platform allows users to securely share data, code, models, and experiments across the team and simplifies many aspects of data science from data ingestion to serving predictive insights.
  

Value of data science integrated in the Analytics workflow

Synapse Data Science in Microsoft Fabric allows data science practitioners to work seamlessly on top of the same secured and governed data that has been prepared by data engineering teams. This eliminates the need to copy data and figure out ways to give your data science teams secure access to data. In Microsoft Fabric, the open Delta Lake support allows data science users to version datasets to create reproducible machine learning code. Additionally, data science users have access to a wide range of easy-to-use getting started experiences, low-code tools and code authoring experiences with Notebooks and Visual Studio Code. Synapse Data Science in Microsoft Fabric also provides a rich set of built-in ML tools. For example, MLFlow model and experiment tracking, powered by Azure machine learning, is built in. The SynapseML Spark library provides scalable ML tools and users can serve predictions swiftly to Power BI with the new PBI Direct Lake capability. Finally, streamlined collaboration across different analytics roles makes hand-offs seamless and teams more productive.

Next, we will cover how Microsoft Fabric provides users with a variety of features to help complete end-to-end data science workflows.

A picture containing text, screenshot, diagram, line

Description automatically generated

Let’s first walk through a core data science scenario in Fabric, in the context of a typical data science process, to illustrate how data science in Fabric accelerates predictive business insights:

Problem formulation and ideation

The process starts with formulating a question, Collaboration across multiple roles is required for answering these questions. This step is aided by easy access to the same source of truth, such as business metrics, logic, and data analysis tools. Semantic link is a new feature we are launching that will drastically simplify handoffs and ease collaboration between data scientists and stakeholders.

Data discovery and pre-processing

Data engineering teams will build Lakehouses that data scientists can consume. Data scientists will need to further pre-process data to solve problems with ML tools. We are adding a new tool called Data Wrangler to help boost productivity during this tedious step.

Experiment and build ML models

For building ML models, we allow users to create and track ML experiments and models using MLFlow. Users can leverage library management and build environments using third party libraries for developing ML solutions, and the rich SynapseML Spark library that we own and maintain enables model training and ML feature construction to be done at large scale.

Enrich and operationalize

Finally, to enrich and operationalize data with predictive models, data science users can schedule their batch prediction scripts and leverage our scalable PREDICT function to speed up the process. Multiple options exist for operationalization of batch scoring. For example users can leverage a lightweight scheduling of Notebooks to run on a regular basis or schedule Spark jobs that run as part of data pipeline steps.

Gain insights

With the PBI Direct Lake mode, access to predicted values in Lakehouse tables is seamless without the need to load data. Your BI reports will have automatic access to the latest enriched data to help accelerate your predictive business insights!

 
Through a combination of various well integrated experiences available to a wide range of analytics roles, Microsoft Fabric enables users to successfully complete their data science projects end-to-end.


  

What’s included in Synapse Data Science?

Now that you hopefully have a better understanding of how Microsoft Fabric helps to better integrate data science with analytics and BI, let’s take a closer look at some of the new features and experiences we are introducing.
 

Data prep and code generation with Data Wrangler

Data Wrangler, a powerful, intuitive tool for data wrangling and preparation. Data Wrangler makes data cleansing and preparation easier than ever before, while still allowing users to take advantage of the power of coding and reproducibility of Python. The dynamic data display, built-in statistics and chart-rendering capabilities along with the ability to get started with Pandas data in just a few clicks, make this tool easily accessible to a range of experience levels, from novice developers to seasoned professionals. Future updates will include support for Spark and a natural language processing “to code” functionality via Azure OpenAI.


 

ML models and experiments as first-class citizens with MLFlow

We are also making machine learning models and experiments first-class citizens in Fabric. Built-in support for ML models and experiments allows users to manage models and track experiment runs using standard MLFLow APIs. Comparison experiences make it easy to compare different experiment runs and auto logging helps capture key metrics automatically as users author code to train models. The Microsoft Fabric MLFlow tracking store is powered by Azure Machine Learning, which opens the possibility of valuable integrated experiences in the future.


 

SynapseML, a comprehensive machine learning library for Spark

Additionally, we bring you the Synapse ML Library, the richest machine learning library for Spark, owned and maintained by Microsoft. With the goal to simplify distributed and scalable machine learning, this library provides access to many different ML tools and easy to use APIs for applying ML and enriching data at scale. Core capabilities include distributed ML with performant and popular algorithms like LightGBM as well as full MlFlow support for SynapseML models. Spark operators help users to work with pre-trained AI models from Azure Cognitive Services, including the new Azure Open AI features, for applying foundation model powered transformations directly on data with Spark.

A screenshot of a computer Description automatically generated with medium confidence
 

Enrich data in your Lakehouse with scalable PREDICT

We facilitate the operationalization of ML models with the scalable PREDICT function for distributed batch scoring on Spark, allowing users to process predictions without moving any data. Users can write the enriched data to the Lakehouse and serve it seamlessly to BI reports with the powerful Power BI Direct Lake capability. Additionally, we introduce an easy-to-use guided experience that helps users quickly and easily generate code to apply their ML models.


 

R Language support

We understand that many users depend on code authoring with R. That is why we also bring you native support for the R language on Apache Spark. Both through notebook and Spark Job definitions, users can author and run code with SparkR and SparklyR. Library management capabilities for R allow installation of R libraries incl. Tidyverse, so that data scientists can use familiar Spark and R interfaces to process data and develop machine learning models. We hope you enjoy the added flexibility of using R with Apache Spark in Microsoft Fabric.

A screenshot of a computer Description automatically generated with medium confidence

Going forward, we plan to release many more valuable experiences to help you build data science solutions as part of your analytics workflows. There is a long list of upcoming features and experiences to be aware of. Here are some highlights on our roadmap.
  

Upcoming features and experiences

Semantic Link (Preview)

Semantic Link offers a powerful set of tools to bridge data science and BI. With Semantic Link, data science users can tap into the semantic data model using familiar tools like Python and Spark. This helps to gain a good understanding of the data and the problem to solve. Analysts and business users that define the semantic model, key measures and business logic can now be confident that data science users will be able to tap into the same source of truth. This drastically improves the collaboration across roles and avoids duplication of effort. Additionally, Semantic Link also helps to validate data and detect data quality issues. Sign up for the private preview for early access and use Semantic Link to explore Power BI datasets from Python and Spark, read measures and measure definitions, and detect data quality issues.

A screenshot of a computer
 

Hyperparameter tuning and AutoML (Preview)

Hyperparameter tuning and AutoML will allow users to automate the process of optimizing machine learning models with the flexibility of FLAML. This process can also be easily tuned to SparkML and SynapseML models and is further supported by code-first integration to parallelize AutoML trials with Spark. Additionally, costs can be reduced by parallelizing hyperparameter trials with Spark, and MLFLow can be used to automatically capture hyperparameter metrics and parameters. All of this is designed to make it easier to build machine learning models.
 

Pre-trained AI models (Coming soon in preview)

Azure Cognitive Services pretrained AI models will be integrated into Microsoft Fabric, allowing users to access Text Analytics, Anomaly Detection, Text Translator, and other AI models incl. foundation models from Azure open AI, out of the box without pre-provisioning any resources in Azure. This makes it seamless to apply AI powered transformations on data in Lakehouses.
 

Copilot experiences in Notebooks (Coming soon in preview)

Developers in Microsoft Fabric will also get a wide array of built-in Copilot experiences that boost developer productivity across the entire analytics workflow. For example, these experiences help notebook users to generate, explain and document code but also trouble shooting and migration assistance. Through integration with best of breed foundation models from Azure Open AI, the Microsoft Fabric Copilot experiences will be contextualized and relevant to the data the user has access to. Stay tuned for more details about these upcoming experiences!
 
We hope you are excited to try out the new Synapse Data Science experience in Microsoft Fabric. Check out the links below to learn more and get started with our new data science experiences. You can also sign-up for ongoing and upcoming private previews of data science and AI features in Fabric here.
  

Get started with Microsoft Fabric

Microsoft Fabric is currently in preview. Try out everything Fabric has to offer by signing up for the free trial—no credit card information required. Everyone who signs up gets a fixed Fabric trial capacity, which may be used for any feature or capability from integrating data to creating machine learning models. Existing Power BI Premium customers can simply turn on Fabric through the Power BI admin portal. After July 1, 2023, Fabric will be enabled for all Power BI tenants.
 
Sign up for the free trial. For more information read the Fabric trial docs.
  

Other resources

If you want to learn more about Microsoft Fabric, consider:

Related blog posts

Introducing Synapse Data Science in Microsoft Fabric

April 16, 2024 by Ruixin Xu

We are pleased to share a set of key updates regarding the Copilot in Microsoft Fabric experiences. The information in this blog post has also been shared with Fabric tenant administrators. Below are the highlights of the changes. This change is an important milestone to eventually allow Copilot to reach GA within this year. See … Continue reading “Copilot in MS Fabric: Soon available to more users in your organization“

April 15, 2024 by Santhosh Kumar Ravindran

Users orchestrate their data engineering or data science processes using notebooks and in most of the enterprise scenarios pipelines and job schedulers are used as a primary option to schedule and trigger these Spark jobs. We are thrilled to announce a new feature Job Queueing for Notebook Jobs in Microsoft Fabric. This feature aims to … Continue reading “Introducing Job Queueing for Notebook in Microsoft Fabric”