Microsoft Fabric Updates Blog

Introducing Code-First AutoML and Hyperparameter Tuning: Now in Public Preview for Fabric Data Science

At the recent Fabric Conference, we announced that both code-first automated machine learning (AutoML) and hyperparameter tuning are now in Public Preview, a key step in making machine learning more complete and widely accessible in the Fabric Data Science.

Our system seamlessly integrates the open-source Fast Library for Automated Machine Learning & Tuning (FLAML), offering a tailored version directly within the default Fabric 1.2 runtime. This means that users can access both AutoML and Tune capabilities without the need for any extra installation or configuration steps.

What is Tune?

Hyperparameter tuning is the technique of optimizing the settings that dictate how our machine learning models learn. These hyperparameters, such as learning rate and batch size, aren’t learned during training—they’re set by the user. The right hyperparameters can dramatically improve model performance, making this step vital for achieving peak accuracy and generalization.

In Fabric Data Science notebooks, users can tune their machine learning models using flaml.tune. This feature empowers users to meticulously search for the most effective hyperparameters, ensuring models reach their highest potential performance. With flaml.tune, users can navigate extensive hyperparameter spaces with ease, quickly pinpointing the best configurations for optimal outcomes.

Screenshot of the run list view within a machine learning experiment. Users can browse the various runs that were generated from a tuning trial.

The integration of flaml.tune into Fabric also takes advantage of Apache Spark to enable hyperparameter tuning at scale, providing users with capabilities such as:

  • Parallel Hyperparameter Tuning Trials: For tuning tasks involving single-node learners like Scikit-Learn and XGBoost, flaml.tune allows for parallel processing. By setting use_spark = True, users can utilize their Spark cluster to conduct numerous trials simultaneously, greatly reducing the time required for tuning.
  • Tuning for Spark-based Models: Beyond traditional models, flaml.tune also supports hyperparameter tuning for SparkML and SynapseML models, expanding its utility.
  • Experiment Tracking: Each tuning attempt is meticulously logged as part of a Machine Learning Experiment in Fabric. This includes detailed records of the trials, covering key metrics and parameter configurations, providing a comprehensive overview of the tuning process.
  • Visualization Tools: To aid in the analysis of tuning trials, Fabric includes specialized visualization tools. This capability is unique to the Fabric fork of FLAML. With this feature, users can generate and examine various plots, such as parallel coordinates and contour plots, to assess trial outcomes and make informed decisions about their models.

What is AutoML?

Automated Machine Learning (AutoML) streamlines the development of machine learning models by automating training and optimization, eliminating the need for deep technical expertise. This capability simplifies the traditionally complex and time-consuming processes of selecting algorithms, tuning hyperparameters, and validating models. This innovation democratizes machine learning, making advanced data analysis accessible to both experts and novices across various industries to solve complex problems and drive innovation.

A screenshot of the run details view of an AutoML trial.

With AutoML, users can take their training data and a provide a machine learning task to find the best model. The integration of flaml.automl into Fabric takes advantage of Apache Spark to enable AutoML at scale, providing users with capabilities such as:

  • Versatile ML Task Support: Whether you’re tackling binary classification, multi-class classification, regression, or forecasting, AutoML within Fabric provides a straightforward path to embark on various machine learning projects. Start your AutoML journey by simply specifying your data and the task at hand.
  • Customizable AutoML configurations: Gain the flexibility to mold your AutoML trials to your preferences and project requirements. Select your desired optimization metric and fine-tune settings such as trial duration and the degree of parallelism for a customized experience.
  • Parallel AutoML trials: Utilize the power of Apache Spark to run multiple AutoML trials in parallel across a Spark cluster. This not only speeds up the process but also explores a broader range of model options for your data.
  • Integration with Pandas on Spark dataframes: This integration allows for AutoML trials to efficiently process Pandas on Spark dataframes, ensuring thorough exploration and optimal selection from a wide range of Apache Spark models, including SparkML and SynapseML. It enhances AutoML’s ability to effectively manage and analyze large datasets with ease.
  • Comprehensive Experiment Tracking: Each AutoML run is logged as part of a Machine Learning Experiment in Fabric. This includes extensive details on key metrics and parameter configurations, offering a clear view of your AutoML trial.
  • Advanced Visualization Tools: Fabric enhances the analysis of AutoML trials with advanced visualization tools. These tools empower users to create and review a variety of plots, including parallel coordinates and feature importance charts, enabling a deeper understanding of AutoML results.

Get started today

Begin your journey with hyperparameter tuning and AutoML directly from the Fabric Data Science homepage by exploring the AI Samples gallery. These tutorials guide you through utilizing AutoML and Tune within Fabric Notebooks, streamlining the process to efficiently optimize and develop your ML models.

GIF showing the new samples added to the samples gallery.

Next Steps

Bài đăng blog có liên quan

Introducing Code-First AutoML and Hyperparameter Tuning: Now in Public Preview for Fabric Data Science

tháng 10 31, 2024 của Jovan Popovic

Fabric Data Warehouse is a modern data warehouse optimized for analytical data models, primarily focused on the smaller numeric, datetime, and string types that are suitable for analytics. For the textual data, Fabric DW supports the VARCHAR type that can store up to 8KB of text, which is suitable for most of the textual values … Continue reading “Announcing public preview of VARCHAR(MAX) and VARBINARY(MAX) types in Fabric Data Warehouse”

tháng 10 29, 2024 của Dandan Zhang

Managed private endpoints allow Fabric experiences to securely access data sources without exposing them to the public network or requiring complex network configurations. We announced General Availability for Managed Private Endpoint in Fabric in May of this year. Learn more here: Announcing General Availability of Fabric Private Links, Trusted Workspace Access, and Managed Private Endpoints. … Continue reading “APIs for Managed Private Endpoint are now available”