Construct a data analytics workflow with a Fabric Data Factory data pipeline
Microsoft Fabric Data Factory provides an easy way to build low-code data integration and ETL projects for building cloud-scale data analytics. Today, I want to focus on data pipelines in Data Factory and the advantages you’ll find by using pipelines to orchestrate your Fabric data analytics projects and activities.
What is a data pipeline?
For Azure Data Factory and Azure Synapse users, data pipelines will be very familiar as we’ve had data pipelines in those products for many years. Now that Data Factory and data pipelines are available in the SaaS orientation of Fabric, you will find the experience to be nearly identical. However, if you are primarily a Power BI or Power Platform user, you may not have experience with data pipelines. So, today, I’d like to take a few minutes to explain what a data pipeline is.
In the context of Fabric data analytics, you will use a data pipeline to build automated workflows that combine the different artifacts in your workspace that you’ve created as a way to build your analytics. As an example, in the screenshot below, you can see that I’ve built a pipeline that performs the following tasks:
- Find files in a storage folder
- Iterate over the files found
- Copy each file contents to the bronze layer in my Lakehouse
- After the data has been loaded to bronze, run a Spark Notebook to transform the data and load it into the silver layer
- If the Notebook was successful, send an email to the team and continue
- If the Notebook failed, notify the team via a Teams channel and then fail the pipeline
- Execute a Dataflow to combine and clean data, preparing for gold layer
- Finally, issue a Copy command to load the cleaned data into the gold layer for reporting
Why would you use a data pipeline?
I created that pipeline design entirely in the web UI in Fabric without writing any code. Now I can set a schedule to automate the execution of my logic on a regular cadence from the designer UI when I click on the Schedule button. The frequency with which you update your Lakehouse will depend upon the business requirements and the frequency with which new data arrives at your sources.
Separately, inside of Fabric, I can create and manage those artifacts that I just orchestrated above. My Notebook is created and tested in the Data Engineering app, while I used the Data Factory app to create a Dataflow. So now I use Data Factory data pipelines in Fabric to bring them all together into a single cohesive logical “pipeline”. In other words, I just created an end-to-end workflow that I can run on a schedule, fully automated and additionally … now I can use the central Monitoring Hub feature in Fabric to watch the execution of my pipelines, Notebooks, Dataflows, etc. all from a single pane of glass:
So as you build your analytics project in Fabric, you’ll use data pipelines to piece those artifacts together into an automated workflow to keep your Lakehouse (and subsequently, your business reporting users) updated, refreshed, and cleaned.
How to get started
I hope that this gives you a sense of the value that data pipelines from the Data Factory app inside of Microsoft Fabric can bring to your data analytics projects. To get started, switch over to Data Factory in Fabric and choose New > Data Pipeline. You’ll land on the page in the below screenshot when you can being adding activities to the low-code design surface and begin building your own workflows!
Other resources
- Join the Fabric community to post your questions, share your feedback, and learn from others.
- Visit Microsoft Fabric Ideas to submit feedback and suggestions for improvements and vote on your peers’ ideas!
- Check our Known Issues page for up to date on product fixes!
Have any questions or feedback? Leave a comment below!