Automating Fabric items with Real-Time Intelligence
Until today, you could schedule Fabric jobs to run data pipelines, notebooks and spark jobs to process data for example, loading data to OneLake or training ML models. But with the recently announced Real-Time Intelligence in Fabric, you can now make these items react in a more event-driven way. These operate in a similar way to triggers in Azure Data Factory pipelines, supporting the running of data pipelines in response to events from Azure Storage accounts. This can be used for scenarios such as:
- When new files are loaded to Azure storage accounts, start a data pipeline to load data to OneLake.
- Start a Fabric notebook when issues with data quality are found in Power BI reports.
A number of events are now surfaced from the Fabric and Azure platforms in the Real-Time hub, meaning you can listen to changes from Azure Storage such as when a new file is created in a particular folder. You can find these by browsing to the ‘Fabric events’ pivot in the Real-Time hub. For more information about using the Real-Time hub and the Fabric events that you can find there please see our blog post.
The ‘Azure Blob Storage Events’ category let you subscribe to changes in your Azure storage accounts, and ‘Fabric workspace item events’ are emitted when items are created, updated, read, or deleted in Fabric. In the details page for either set of events, you can choose the ‘Set alert’ option which opens a pane where you configure the alert.
First you need to select the specific events and source you want to monitor. In the wizard that opens you can also scope to a particular workspace (for Fabric events) or specify filters such as a container or files of a particular type (for Azure events).
At present the ‘Condition’ section only supports triggering on every event received. It will support checking for conditions in fields in the event when more sources are supported in future. We’d love your feedback on the conditions you want to look for in these events – please submit Ideas at https://aka.ms/rtiidea
Finally, you can choose the ‘Run a Fabric item’ action and browse to the workspace and item that you want to run when the events are detected. You’ll also need to specify a workspace to save the reflex item that will manage the alert (this needs to be in a workspace on a Fabric capacity but doesn’t need to be in the same workspace as the item you are running).
Triggering data pipelines with Azure storage events
A very common scenario for data integration and ETL jobs is to invoke a data pipeline when a file arrives or is deleted in your Azure storage accounts. Now you can configure this directly from the data pipeline canvas. In the toolbar you’ll see a ‘Trigger (preview)’ button which will allow you to configure a storage trigger based on Reflex:
That will open the pane with the current pipeline already selected as the item to run. When you select “Select events” you can connect to the appropriate storage account and select the events you want to listen for, and even filter to specific files or locations in your storage account.
You’ll notice that the list of events from your storage account has many more options than ADF. In ADF, you can only configure blob created and blob deleted. Reflex also enables a large set of event options beyond Azure storage event sources that we’ll bring into data pipelines soon as well to make for even more exciting ETL and data integration scenarios.
You will also have the option to filter based on the event subject which can allow you to listen for files of just certain names and types, similar to what ADF enables. When you’ve finished specifying the events, the trigger will be ready to run.
Use the file and folder names in your pipeline
The events created by Azure storage include information about the file that triggered the event, including the file type and the storage container it’s in. These are passed through to the pipeline as parameters that you can use, for example processing the specific file that started the pipeline. The ‘subject’ parameter contains the path to the file that you can use in your copy or processing tasks.
Once your pipeline has been invoked from a storage event, you can now utilize the file name and folder name inside of your pipeline expression logic. Fabric Data Factory will automatically parse the event for you and store the file name and folder name in a new “Trigger parameters” areas of the expression builder.
To make troubleshooting and testing easy, we honor the “?” syntax to allow your expressions to accept NULL values from these built-in parameters so you’re your pipelines will still execute even if no event has occurred or if the event did not supply that data:
@pipeline()?.TriggerEvent?.FileName.
Anomaly detection triggers in Private Preview
In the announcement of Real-Time Intelligence at Build, we also announced that we are bringing automated anomaly detection to Data Activator triggers. These will help find ‘unknown unknowns’ in your streaming and real-time data, going beyond basic rules to look for individual data points that lie outside the normal ranges.
If you’re interested in learning more, you can fill out this form. We’ll be accepting a small number of customers in the Private Preview phase based on region, use case etc. We’ll reach out via email if we are able to enroll you in the preview and will let you know when it is more broadly available.
Learn more and help us with your feedback
To find out more about Real-Time Intelligence, read Yitzhak Kesselman’s announcement. As we launch our preview, we’d love to hear what you think and how you’re using the product. The best way to get in touch with us is through our community forum or submit an idea. For detailed how-tos, tutorials and other resources, check out the documentation.
This is part of a series of blog posts that dive into all the capabilities of Real-Time Intelligence. Stay tuned for more!