Data Factory Spotlight: Semantic model refresh activity
Overview
Data Factory empowers you to ingest, prepare and transform data across your data estate with a modern data integration experience. Whether you are a citizen or professional developer, Data Factory is your one-stop-shop to move or transform data. It offers you intelligent transformations and a rich set of activities from hundreds of cloud and on-premises data sources, and a growing list of output destinations that span across both Microsoft and 3rd party databases, services, and applications.
There are two primary high-level features Data Factory implements: dataflows and pipelines.
- Data pipelines enable you to leverage rich out-of-the-box data orchestration capabilities to compose flexible data workflows that meet your enterprise needs. You can learn more about their capabilities in the Data Factory Spotlight: data pipelines blog post.
- Dataflows Gen2 enable you to leverage a low-code interface and 300+ data and AI-based transformations, letting you transform data easier and with more flexibility than any other tool.
In this article, we will focus on the much-requested Semantic model refresh activity in Data pipelines and how you can now create a complete end-to-end solution that spans the entire pipeline lifecycle in just a few clicks. Additionally, you have the flexibility to configure advanced settings tailored for enterprise-grade scenarios.
Semantic model refresh activity
With the data pipeline user interface (UI), users can now seamlessly connect to and configure their Power BI semantic model refreshes using the Semantic model refresh activity.
From a new pipeline you can select the Pipeline activity card on the canvas and then the Semantic model refresh option within the list, or for an existing pipeline navigate to the Activities tab and the Semantic model refresh activities icon.
How does the Semantic model refresh work?
Utilizing the enhanced refresh with the Power BI REST API, the Semantic model refresh activity is optimized for carrying out refresh operations asynchronously and provides customization options and the following features that are helpful:
- Timeout and retry attempts
- Refresh operation parameters
- Response details
Timeout and retry attempts
When developing robust solutions for production, it is important to manage your end-to-end pipelines ensuring their reliability, performance, and resilience. One essential aspect of this process is identifying and handling long-running operations effectively. By doing so, you can minimize downtime and promptly address any underlying issues.
Within the activity properties pane, you will find the General tab, where you can configure the activities execution behavior. Here are key settings you should be aware of:
- The Timeout duration specifies the maximum allowable time for an operation to be completed successfully. If the operation exceeds this duration, a timeout error returns.
- A common practice when operationalizing data pipelines in Data Factory is to use the activity Retry and set it to a value greater than 0. Because Data Factory is a cloud-based service, setting retries can enable the pipeline manager to automatically retry a failed activity based on the number you set, if there are background service or service call failures.
- Additionally, setting the duration between each retry attempt using the Retry interval (sec) ensures that retries are spaced out appropriately, allowing time for potential recovery.
In the example below, we have configured the Timeout duration to run for no longer than 1 hour and 30 minutes, including one Retry attempt, with a Retry interval starting 30 seconds after the original operation has failed.
In the pipeline output, the initial refresh encountered a failure, prompting an automatic retry. This is particularly beneficial when dealing with a transient issue, before definitively classifying it as a failure requiring investigation.
Learn more about refresh operation time limits.
Refresh operation parameters
With the semantic model refresh operation parameters, you can configure advanced settings for the type of processing, commit mode, max parallelism, retry count and more. Here are the current settings you should be aware of:
- Wait on completion: Determines whether your activity should wait for the refresh operation to be completed before proceeding. This is the default setting and is only recommended if you want the pipeline manager to continuously poll your semantic model refresh for completion. For long-running refreshes, an asynchronous refresh (i.e., not selecting wait on completion) can be preferable so that the pipeline does not continuously execute in the background.
- Commit mode: Determines whether to commit objects in batches or only when complete.
- At present, the semantic model refresh activity supports the transactional commit mode and full refresh type. This approach ensures that existing data remains intact during the operation, committing only fully processed objects upon a successful refresh completion.
- Max parallelism: Control the maximum number of threads that run processing commands in parallel. By default, this value is set to 10 threads.
- Retry count: Specify how many times the refresh operation should retry before considering it a failure. The default value is 0, meaning no retries and this value should be adjusted based on your tolerance for transient errors.
- Note: refresh operations, include statuses of Completed, Failed, Unknown, Disabled, or Cancelled.
Learn more about parameters.
Response details
After executing the refresh operation, details such as the unique request identifier, the status of the operation, specific details about the objects involved (such as tables and partitions), execution duration and more are available within the output.
You can access this information after each data pipeline is run by navigating to the monitoring hub, where you will find a comprehensive breakdown of the refresh. Alternatively, if you prefer a more customized approach, consider extracting the relevant values and storing them in a designated storage location, such as a Fabric KQL Database, which is specifically tailored for telemetry data. This flexibility allows you to fine-tune your monitoring and analysis workflow to meet your unique requirements.
Learn more about response properties.
Conclusion
We hope that you have enjoyed this overview and look forward to more Data Factory content in our spotlight series. To read more about the Semantic model refresh pipeline activity please visit the official learn documentation.