Microsoft Fabric Updates Blog

Data Factory Announcements at Fabric Community Conference Recap

Last week was such an exciting week for Fabric during the Fabric Community Conference, filled with several product announcements and sneak previews of upcoming new features.

Thanks to all of you who participated in the conference, either in person or by being part of the many virtual conversations through blogs, Community forums, social media and other channels. Thank you also for all your product feedback and Ideas forum suggestions that help us defining the next wave of product enhancements.

We wanted to make sure you didn’t miss any of the Data Factory in Fabric announcements, by providing you with this recap of all new features.

New Data Pipelines capabilities

  • Data Pipelines accessing on-premises data using the On-premises data gateway [Announcement]
  • CI/CD support for Data Pipelines [Announcement]
  • Data Pipelines activity limit increased from 40 to 80 activities [Announcement]
  • Public APIs for Data Pipelines [Announcement]
  • Semantic Model Refresh activity for Data Pipelines [Announcement]
  • Unity Catalog support in Azure Databricks activity [Announcement]
  • Improved Performance tuning tips experience [Announcement]

New Dataflows Gen2 capabilities

New Get Data & Authentication capabilities

  • Modern Get Data UX to browse Azure resources [Announcement]
  • Azure Service Principal (SPN) support for on-premises and VNET data gateways [Announcement]
  • Block sharing of Shareable Cloud Connections at tenant level [Announcement]
  • Mirroring for Azure SQL DB, Cosmos DB and Snowflake in Fabric [Announcement]

You can continue reading below for more information about each of these capabilities.

Data Pipelines accessing on-premises data

We are thrilled to announce the public preview of on-premises connectivity for Data pipelines in Microsoft Fabric.

Using the On-premises Data Gateway, customers can now connect to on-premises data sources using data pipelines with Data Factory in Microsoft Fabric. This enhancement significantly broadens the scope of data integration capabilities. In essence, by using an on-premises Data Gateway, organizations can keep databases and other data sources on their on-premises networks while securely integrating them and orchestrating them using data pipelines in Microsoft Fabric.

Check out the following resources to help you get started:

CI/CD support for Data Pipelines

When building successful data analytics projects, it is very important to have source control, continuous integration, continuous deployment, and collaborative development environments. Many Fabric engineers with previous Azure Synapse Analytics and Azure Data Factory experience have utilized the Git integration included in those PaaS offerings for those important capabilities. Now, we’re excited to share that we have added Git Integration and integration with built-in Deployment Pipelines to Data Factory data pipelines in Fabric as a public preview!

CI/CD features to utilize your own Git repo in Azure DevOps, or to use the built-in Deployment Pipelines in Fabric, will eventually light-up and become available to all Fabric items. Now that data pipelines can be used with these features, read more about the current preview capabilities and limitations at the online documentation here.

Source control enabled at the workspace level showing support for Data Pipelines

Learn more about this enhancement here: REST APIs for Fabric Data Factory pipelines now available | Microsoft Fabric Blog | Microsoft Fabric

Public APIs for Data Pipelines

REST APIs for CRUD operations on Fabric Data Factory are now available as public preview.

The ability to execute and create pipelines using a REST endpoint is a very important feature that we have enabled for Fabric Data Factory that many our Azure Data Factory (ADF) customers have utilized for very powerful patterns over the years. Now that the public REST APIs have been published, you can automate pipeline creation, management and execution as well as execute pipelines via REST endpoints in other pipelines via Web activity.

To get started using the create, read, update, delete, list operations from the new REST API please see our online documentation here: Fabric data pipeline public REST API (Preview) – Microsoft Fabric | Microsoft Learn

Semantic Model Refresh activity for Data Pipelines

We are excited to announce the availability of the Semantic Model Refresh activity for data pipelines. With this new activity, you will be able to create connections to your Power BI semantic model datasets and refresh them.

New Semantic model refresh activity for Data pipelines

To learn more about this activity, read https://aka.ms/SemanticModelRefreshActivity

Unity Catalog support in Azure Databricks activity

We are excited to announce that Unity Catalog support for Databricks Activity is now supported. With this update, you will now be able to configure your Unity Catalog Access Mode for added data security.

Find this update under Additional cluster settings. 

Azure Databricks activity with the new additional cluster settings section
Unity catalog access mode new option inside of the Additional cluster settings

For more information about this activity, read https://aka.ms/AzureDatabricksActivity

Improved “Performance tuning tips” experience

The more intuitive user experience and more insightful performance tuning tips are available in Data Factory data pipelines. These tips will provide useful and accurate advice regarding staging, degree of copy parallelism settings, etc. to optimize your pipeline performance.

Copy data details dialog showing how to copy data from an Azure SQL Database to an Azure Blob Storage

Fast Copy

Dataflows help with ingesting and transforming data. With the introduction of Dataflow Gen2 High-Scale Data Transformations, we are able to transform your data at scale. However, to do this at high scale, your data needs to be ingested first.

With the introduction of Fast Copy, you can ingest terabytes of data with the easy experience of dataflows, but with the scalable backend and high throughput of Pipeline’s Copy activity.

As part of the initial release of Fast Copy, we support Azure Data Lake Storage Gen2, Azure Blob Storage, Azure SQL Database, Azure PostgreSQL and Fabric Lakehouse as sources. We will continue expanding the breadth of Fast Copy sources in future updates.

You can learn more about Dataflows Fast Copy here: Fast copy in Dataflows Gen2

Output destinations support for schema changes for Lakehouse & Azure SQL database

One of the most requested enhancements for Output Destinations in Dataflows Gen2 has been having the ability to modify the schema of the destination table based on the schema of the latest results from your dataflow evaluations.

When loading into a new table, by default the automatic settings are on. Using automatic settings, dataflows Gen 2 manages the mapping for you. This will allow you the following behavior:

  • Update method replace: Data will be replaced at every dataflow refresh. Any data in the destination will be removed. The data in the destination will be replaced with the output data of the dataflow.
  • Managed mapping: Mapping is managed for you. When you need to make changes to your data/query to add an additional column or change a data type, mapping is automatically adjusted for this when you republish your dataflow. You do not have to go into the data destination experience every time you make changes to your dataflow, allowing you for easy schema changes when you republish the dataflow.
  • Drop and recreate table: To allow for these schema changes, on every dataflow refresh, the table will be dropped and recreated. Note that your dataflow refresh will fail if you have any relationships or measures depending on your table.
automatic settings when using the choose destination dialog

Manual settings

By un-toggling the use automatic setting, you get full control over how to load your data into the data destination. You can make any changes to the column mapping by changing the source type or excluding any column that you do not need in your data destination.

New behavior for choose destination settings when using the manual settings

Cancel dataflow refresh

Canceling a dataflow refresh is useful when you want to stop a refresh during peak time, if a capacity is nearing its limits, or if refresh is taking longer than expected. Use the refresh cancellation feature to stop refreshing dataflows.

To cancel a dataflow refresh, select Cancel refresh option found in workspace list or lineage views for a dataflow with in-progress refresh.

New cancel refresh operation available inside of the workspace list

Once a dataflow refresh is canceled, the dataflow’s refresh history status is updated to reflect cancelation status.

a refresh history showing an entry where the refresh activity was canceled

Privacy Levels support

You can now set privacy levels for your connections in Dataflows. Privacy levels are critical to configure correctly so that sensitive data is only viewed by authorized users.

Furthermore, data sources must also be isolated from other data sources so that combining data has no undesirable data transfer impact. Incorrectly setting privacy levels may lead to sensitive data being leaked outside of a trusted environment. You can set this privacy level when creating a new connection:

New privacy level configuration available inside of the Get Data experience

Learn more about Privacy Levels in this article: Behind the scenes of the Data Privacy Firewall – Power Query | Microsoft Learn

Manage Connections experience enhancements

Manage connections is a feature that allows you to see at-a-glance the connections that you have in use for your Dataflows and the general information about those connections.

We are happy to release a new enhancement to this experience where now you can see a list of all the data sources available in your Dataflow: even the ones without a connection set for them!

For the data sources without a connection, you can set a new connection from within the manage connections experience by clicking the plus sign in the specific row of your source.

Manage connections dialog showing connections that are linked and data sources that don't have any connections linked yet

Furthermore, whenever you unlink a connection now the data source will not disappear from this list if it still exists in your Dataflow definition. It will simply appear as a data source without a connection set until you can link a connection either in this dialog or throughout the Power Query editor experience.

Test Framework for Custom Connectors SDK in VS Code

We’re excited to announce the availability of a new Test Framework in the latest release of Power Query SDK! The Test Framework allows Power Query SDK Developers to have access to standard tests and a test harness to verify the direct query (DQ) capabilities of an extension connector. With this new capability, developers will have a standard way of verifying connectors and a platform for adding additional custom tests.  We envision this as the first step in enhancing the developer workflow with increased flexibility & productivity in terms of the testing capabilities provided by the Power Query SDK.

The Power Query SDK Test Framework is available on Github. It would need the latest release of Power Query SDK which wraps the Microsoft.PowerQuery.SdkTools NuGet package containing the PQTest compare command.

What is the Power Query SDK Test Framework?

Power Query SDK Test Framework is a ready-to-go test harness with pre-built tests to standardize the testing of new and existing extension connectors by providing ability to test functionalcompliance and regression testing that can be extended to perform testing-at-scale. It will help address the need for a comprehensive test framework to satisfy the testing needs of extension connectors.

Diagram of the new Power Query SDK Test Framework and all of its components

Follow the links below to get started:

Modern Get Data UX to browse Azure resources

Using the regular path in Get Data to create a new connection, you always need to fill in your endpoint, URL or server and database name when connecting to Azure resources like Azure Blob, Azure Data Lake gen 2 and Synapse. This is a bit of a tedious process and does not allow for easy data discovery.

With the new ‘browse Azure’ functionality in Get Data, you can easily browse all your Azure resources and automatically connect to them, without going through manually setting up a connection, saving you a lot of time.

Azure resource picker inside of the Get Data experience

Azure Service Principal (SPN) support for on-premises and VNET data gateways

You can now authenticate your on-premises and VNET data gateway connections using SPNs. Learn more about SPN in Data Factory.

Azure service principal (SPN) is a security identity that’s application based and can be assigned permissions to access your data sources. Service principals are used to safely connect to data, without a user identity.

Within Microsoft Fabric, service principal authentication is supported in Semantic Models, dataflows (both Dataflow Gen1 and Dataflow Gen2), and Datamarts.

Block sharing of Shareable Cloud Connections at tenant level

By default, any user in Fabric can share their connections if they have the following user role on the connection:

  • Connection owner or admin
  • Connection user with sharing

Sharing a connection in Fabric is sometimes needed for collaboration within the same workload or when sharing the workload with others. Connection sharing in Fabric makes this easy by providing a secure way to share connections with others for collaboration, but without exposing the secrets at any time. These connections can only be used within the Fabric environment.

If your organization does not allow connection sharing or wants to limit the sharing of connections, a tenant admin can restrict sharing as a tenant policy. The policy allows you to block sharing within the entire tenant.

Manage cloud connection sharing dialog when using a principal

General Availability of VNET Data Gateway

The VNET Data Gateway is a network security offer that lets you connect your Azure and other data services to Microsoft Fabric and the Power Platform. You can run Dataflow Gen2, Power BI Semantic Models, Power Platform Dataflows, and Power BI Paginated Reports on top of a VNET Data Gateway to ensure that no traffic is exposed to public endpoints. In addition, you can force all traffic to your data source to go through a gateway, allowing for comprehensive auditing of secure data sources.

Diagram of the VNET Service and the VNET Gateway

To learn more and get started, read this article: VNET Data Gateways.

Mirroring for Azure SQL DB, Cosmos DB and Snowflake in Fabric

We are excited to announce that Mirroring, previously announced at Ignite in November 2023, is now available to customers in Public Preview. You can now seamlessly bring your databases into OneLake in Microsoft Fabric, enabling seamless zero-ETL, near real-time insights on your data – and unlocking warehousing, BI, AI, and more. 

Data driven insights are important for every business. With the critical need to make smart decisions, create new things, improve your products or services – time to value is everything. Yet, this can be difficult when you have data in different places, like apps, databases, and data warehouses. These places typically store data differently, so you can’t easily analyze and cross reference them – you have to laboriously move their data to a place where you can analyze and harmonize at scale. Doing this takes time, money, and typically, costly expertise to build complex, connected solutions. By the time you do this your data is old, and your insights are out of date. Decision makers need to be able to ask questions about their data, without time consuming complexity that adds risk and can impact mission critical workloads.  

Mirroring simplifies this process into clicks and seconds, not complex processes and hours, days, or weeks. You get a modern, fast, and safe way of accessing and ingesting data continuously and seamlessly from databases or data warehouses into Fabric’s OneLake, without the need for cumbersome pipelines – in near real time. Combined with the rest of your organization’s data in OneLake, you can quickly unify and govern your data estate, removing data silos. 

As part of the initial Public Preview launch, Azure Cosmos DB, Azure SQL DB, and Snowflake customers on any cloud are able to mirror their data in OneLake and unlock all the capabilities of Fabric’s Data Warehouse, Direct Lake Mode in Power BI, Notebooks and much more. Besides the support for Azure Cosmos DB, Azure SQL Database, Snowflake in Mirroring, many more data sources will be added in the future based on your feedback.

Learn more about Mirroring in Fabric by reading this article: Mirroring – Microsoft Fabric | Microsoft Learn

Thank You for your feedback, keep it coming!

We wanted to thank you for your support, usage, excitement, and feedback around Data Factory in Fabric. We’re very excited to continue learning from you regarding your Data Integration needs and how Data Factory in Fabric can be enhanced to empower you to achieve more with data.

Please continue to share your feedback and feature ideas with us via our official Community channels, and stay tuned to our public roadmap page for updates on what will come next:

Related blog posts

Data Factory Announcements at Fabric Community Conference Recap

April 23, 2024 by Misha Desai

At the recent Fabric Conference, we announced that both code-first automated machine learning (AutoML) and hyperparameter tuning are now in Public Preview, a key step in making machine learning more complete and widely accessible in the Fabric Data Science. Our system seamlessly integrates the open-source Fast Library for Automated Machine Learning & Tuning (FLAML), offering … Continue reading “Introducing Code-First AutoML and Hyperparameter Tuning: Now in Public Preview for Fabric Data Science”

April 18, 2024 by Santhosh Kumar Ravindran

We are excited to announce a new feature which has been a long ask from Synapse Spark customers, Optimistic Job Admission for Spark in Microsoft Fabric.This feature brings in more flexibility to optimize for concurrency usage (in some cases ~12X increase) and prevents job starvation. This job admission approach aims to reduce the frequency of … Continue reading “Introducing Optimistic Job Admission for Fabric Spark”