Introducing Synapse Data Science in Microsoft Fabric
See Arun Ulagaratchagan’s blog post to read the full Microsoft Fabric preview announcement.
Data science is a powerful tool for unlocking the value of data in any organization’s analytics workflow. Through the use of data science, organizations can make more informed decisions and gain predictive insights that would otherwise be unattainable.
Today, we are thrilled to announce the preview of the new Synapse Data Science experience in Microsoft Fabric! With data science in Microsoft Fabric, you can utilize the power of machine learning features to seamlessly enrich data as part of your data and analytics workflows.
What is Microsoft Fabric?
Microsoft Fabric is our next generation data platform for analytics and integrates Power BI, Data Factory, and the next generation of Synapse experiences, exposing easy to use analytics experiences for a variety of roles. This unified platform allows users to securely share data, code, models, and experiments across the team and simplifies many aspects of data science from data ingestion to serving predictive insights.
Value of data science integrated in the Analytics workflow
Synapse Data Science in Microsoft Fabric allows data science practitioners to work seamlessly on top of the same secured and governed data that has been prepared by data engineering teams. This eliminates the need to copy data and figure out ways to give your data science teams secure access to data. In Microsoft Fabric, the open Delta Lake support allows data science users to version datasets to create reproducible machine learning code. Additionally, data science users have access to a wide range of easy-to-use getting started experiences, low-code tools and code authoring experiences with Notebooks and Visual Studio Code. Synapse Data Science in Microsoft Fabric also provides a rich set of built-in ML tools. For example, MLFlow model and experiment tracking, powered by Azure machine learning, is built in. The SynapseML Spark library provides scalable ML tools and users can serve predictions swiftly to Power BI with the new PBI Direct Lake capability. Finally, streamlined collaboration across different analytics roles makes hand-offs seamless and teams more productive.
Next, we will cover how Microsoft Fabric provides users with a variety of features to help complete end-to-end data science workflows.
Let’s first walk through a core data science scenario in Fabric, in the context of a typical data science process, to illustrate how data science in Fabric accelerates predictive business insights:
Problem formulation and ideation
The process starts with formulating a question, Collaboration across multiple roles is required for answering these questions. This step is aided by easy access to the same source of truth, such as business metrics, logic, and data analysis tools. Semantic link is a new feature we are launching that will drastically simplify handoffs and ease collaboration between data scientists and stakeholders.
Data discovery and pre-processing
Data engineering teams will build Lakehouses that data scientists can consume. Data scientists will need to further pre-process data to solve problems with ML tools. We are adding a new tool called Data Wrangler to help boost productivity during this tedious step.
Experiment and build ML models
For building ML models, we allow users to create and track ML experiments and models using MLFlow. Users can leverage library management and build environments using third party libraries for developing ML solutions, and the rich SynapseML Spark library that we own and maintain enables model training and ML feature construction to be done at large scale.
Enrich and operationalize
Finally, to enrich and operationalize data with predictive models, data science users can schedule their batch prediction scripts and leverage our scalable PREDICT function to speed up the process. Multiple options exist for operationalization of batch scoring. For example users can leverage a lightweight scheduling of Notebooks to run on a regular basis or schedule Spark jobs that run as part of data pipeline steps.
With the PBI Direct Lake mode, access to predicted values in Lakehouse tables is seamless without the need to load data. Your BI reports will have automatic access to the latest enriched data to help accelerate your predictive business insights!
Through a combination of various well integrated experiences available to a wide range of analytics roles, Microsoft Fabric enables users to successfully complete their data science projects end-to-end.
What’s included in Synapse Data Science?
Now that you hopefully have a better understanding of how Microsoft Fabric helps to better integrate data science with analytics and BI, let’s take a closer look at some of the new features and experiences we are introducing.
Data prep and code generation with Data Wrangler
Data Wrangler, a powerful, intuitive tool for data wrangling and preparation. Data Wrangler makes data cleansing and preparation easier than ever before, while still allowing users to take advantage of the power of coding and reproducibility of Python. The dynamic data display, built-in statistics and chart-rendering capabilities along with the ability to get started with Pandas data in just a few clicks, make this tool easily accessible to a range of experience levels, from novice developers to seasoned professionals. Future updates will include support for Spark and a natural language processing “to code” functionality via Azure OpenAI.
ML models and experiments as first-class citizens with MLFlow
We are also making machine learning models and experiments first-class citizens in Fabric. Built-in support for ML models and experiments allows users to manage models and track experiment runs using standard MLFLow APIs. Comparison experiences make it easy to compare different experiment runs and auto logging helps capture key metrics automatically as users author code to train models. The Microsoft Fabric MLFlow tracking store is powered by Azure Machine Learning, which opens the possibility of valuable integrated experiences in the future.
SynapseML, a comprehensive machine learning library for Spark
Additionally, we bring you the Synapse ML Library, the richest machine learning library for Spark, owned and maintained by Microsoft. With the goal to simplify distributed and scalable machine learning, this library provides access to many different ML tools and easy to use APIs for applying ML and enriching data at scale. Core capabilities include distributed ML with performant and popular algorithms like LightGBM as well as full MlFlow support for SynapseML models. Spark operators help users to work with pre-trained AI models from Azure Cognitive Services, including the new Azure Open AI features, for applying foundation model powered transformations directly on data with Spark.
Enrich data in your Lakehouse with scalable PREDICT
We facilitate the operationalization of ML models with the scalable PREDICT function for distributed batch scoring on Spark, allowing users to process predictions without moving any data. Users can write the enriched data to the Lakehouse and serve it seamlessly to BI reports with the powerful Power BI Direct Lake capability. Additionally, we introduce an easy-to-use guided experience that helps users quickly and easily generate code to apply their ML models.
R Language support
We understand that many users depend on code authoring with R. That is why we also bring you native support for the R language on Apache Spark. Both through notebook and Spark Job definitions, users can author and run code with SparkR and SparklyR. Library management capabilities for R allow installation of R libraries incl. Tidyverse, so that data scientists can use familiar Spark and R interfaces to process data and develop machine learning models. We hope you enjoy the added flexibility of using R with Apache Spark in Microsoft Fabric.
Going forward, we plan to release many more valuable experiences to help you build data science solutions as part of your analytics workflows. There is a long list of upcoming features and experiences to be aware of. Here are some highlights on our roadmap.
Upcoming features and experiences
Semantic Link (Preview)
Semantic Link offers a powerful set of tools to bridge data science and BI. With Semantic Link, data science users can tap into the semantic data model using familiar tools like Python and Spark. This helps to gain a good understanding of the data and the problem to solve. Analysts and business users that define the semantic model, key measures and business logic can now be confident that data science users will be able to tap into the same source of truth. This drastically improves the collaboration across roles and avoids duplication of effort. Additionally, Semantic Link also helps to validate data and detect data quality issues. Sign up for the private preview for early access and use Semantic Link to explore Power BI datasets from Python and Spark, read measures and measure definitions, and detect data quality issues.
Hyperparameter tuning and AutoML (Preview)
Hyperparameter tuning and AutoML will allow users to automate the process of optimizing machine learning models with the flexibility of FLAML. This process can also be easily tuned to SparkML and SynapseML models and is further supported by code-first integration to parallelize AutoML trials with Spark. Additionally, costs can be reduced by parallelizing hyperparameter trials with Spark, and MLFLow can be used to automatically capture hyperparameter metrics and parameters. All of this is designed to make it easier to build machine learning models.
Pre-trained AI models (Coming soon in preview)
Azure Cognitive Services pretrained AI models will be integrated into Microsoft Fabric, allowing users to access Text Analytics, Anomaly Detection, Text Translator, and other AI models incl. foundation models from Azure open AI, out of the box without pre-provisioning any resources in Azure. This makes it seamless to apply AI powered transformations on data in Lakehouses.
Copilot experiences in Notebooks (Coming soon in preview)
Developers in Microsoft Fabric will also get a wide array of built-in Copilot experiences that boost developer productivity across the entire analytics workflow. For example, these experiences help notebook users to generate, explain and document code but also trouble shooting and migration assistance. Through integration with best of breed foundation models from Azure Open AI, the Microsoft Fabric Copilot experiences will be contextualized and relevant to the data the user has access to. Stay tuned for more details about these upcoming experiences!
We hope you are excited to try out the new Synapse Data Science experience in Microsoft Fabric. Check out the links below to learn more and get started with our new data science experiences. You can also sign-up for ongoing and upcoming private previews of data science and AI features in Fabric here.
Get started with Microsoft Fabric
Microsoft Fabric is currently in preview. Try out everything Fabric has to offer by signing up for the free trial—no credit card information required. Everyone who signs up gets a fixed Fabric trial capacity, which may be used for any feature or capability from integrating data to creating machine learning models. Existing Power BI Premium customers can simply turn on Fabric through the Power BI admin portal. After July 1, 2023, Fabric will be enabled for all Power BI tenants.
Sign up for the free trial. For more information read the Fabric trial docs.
If you want to learn more about Microsoft Fabric, consider:
- Signing up for the Microsoft Fabric free trial
- Visiting the Microsoft Fabric website
- Reading the more in-depth Fabric experience announcement blogs:
- Data Factory experience in Fabric blog
- Synapse Data Engineering experience in Fabric blog
- Synapse Data Warehousing experience in Fabric blog
- Synapse Real-Time Analytics experience in Fabric blog
- Power BI announcement blog
- Data Activator experience in Fabric blog
- Administration and governance in Fabric blog
- OneLake in Fabric blog
- Microsoft 365 data integration in Fabric blog
- Dataverse and Microsoft Fabric integration blog
- Exploring the Fabric technical documentation
- Reading the free e-book on getting started with Fabric
- Exploring Fabric learning modules
- Exploring Fabric through the Guided Tour
- Watching the free Fabric webinar series
- Joining the Fabric community to post your questions, share your feedback, and learn from others
- Visiting Microsoft Fabric Ideas to submit suggestions for improvements and vote on your peers’ ideas
To help you get started with Microsoft Fabric, there are several resources we recommend:
- Microsoft Fabric Learning Paths: experience a high-level tour of Microsoft Fabric and how to get started
- Microsoft Fabric Tutorials: get detailed tutorials with a step-by-step guide on how to create an end-to-end solution in Microsoft Fabric. These tutorials focus on a few different common patterns including a lakehouse architecture, data warehouse architecture, real-time analytics, and data science projects.
- Microsoft Fabric Documentation: read Fabric docs to see detailed documentation for all aspects of Microsoft Fabric.