Introducing High Concurrency Mode for Notebooks in Pipelines for Fabric Spark
We’re excited to introduce high concurrency mode for notebooks in pipelines, bringing session sharing to one of the most popular orchestration mechanisms for enterprise data ingestion and transformation. Notebooks will now automatically be packed into an active high concurrency session without compromising performance or security, while paying for a single session.
Key Benefits:
- Faster Session Start: High concurrency mode offers a significantly faster session start experience, reducing time to ~5 seconds for shared notebooks. This is approximately 30 times faster than traditional methods, resulting in substantial performance gains in pipeline execution.
- Session Tags: We’ve also introduced support for session tags, allowing users to target notebooks to specific high concurrency sessions for better session management.
Why Use High Concurrency Mode?
- Rapid Spark Session Start: Notebook steps no longer need to wait for on-demand Spark pool spin-up when using custom pool configurations. By leveraging pre-warmed high concurrency sessions, notebook steps can quickly attach to an existing Spark session, significantly boosting overall pipeline performance.
- Cost Savings: Achieve better compute cost savings by sharing a single session across multiple notebooks for your data engineering or data science workloads. You’ll only be billed for the single session, preventing potential queuing issues during peak usage hours.
Example: Consider a pipeline with five notebook steps, each taking 5 minutes to execute. With traditional methods, starting a Spark session (3 minutes) for each step would result in a total runtime of approximately 40 minutes. Using high concurrency mode, this time can be reduced to 28 minutes, a 30% performance improvement.
How to Enable High Concurrency Mode for Notebooks in Pipelines?
To enable high concurrency mode for your Fabric Spark workspace, you need to follow these steps:
- Go to the workspace settings in your Fabric workspace.
2. Navigate to the Data Engineer/Science section.
3. Select the Spark Compute menu.
4. Navigate to the High concurrency tab.
5. Enable the option “For pipeline running multiple notebooks”.
6. Save your changes.
Once you enable high concurrency mode for pipelines in a workspace, all Spark sessions triggered by notebook steps in a pipeline will be High Concurrency sessions and the system automatically starts packing notebooks into the shared session.
By adopting high concurrency mode, you can enjoy faster pipeline execution, reduced costs, and improved overall efficiency for your data-driven workloads.
To learn more about using high concurrency for notebooks in pipelines please refer to our documentation, High Concurrency Mode for Notebooks in Pipelines
For more information on high concurrency mode, please read Overview of High Concurrency Mode in Microsoft Fabric