Using Microsoft Fabric’s Lakehouse Data and prompt flow in Azure Machine Learning Service to create RAG applications
Microsoft Fabric’s Lakehouse helps us better unified management of enterprise-level data environments. In the process of transforming to AI, we cannot do without the assistance of these enterprise data. In my previous blog, I mentioned how to build RAG applications based on data in the Microsoft Fabric environment. In this post, I will introduce how to build a RAG application through prompt flow in a more professional machine learning environment – Azure Machine Learning Service combined with Microsoft Fabric’s Lakehouse data.
Azure Machine Learning Service is a machine learning platform that I enjoy using, covering the machine learning process from data, training, testing, deployment, monitoring, etc. We can very quickly introduce Microsoft Fabric Lakehouse data to Azure Machine Learning Service through a short script.
1. Get the ABFS Path of Lakehouse in Microsoft Fabric.
Choose Your Microsoft Fabric’s Lakehouse, Click Files -> Properties.
Copy ABFS Path
abfss://<One Lake workspace name>
@msit-onelake.dfs.fabric.microsoft.com/<Lakehouse ID>
/Files
2. Create a new Notebook in your local machine. Execute the following code to import Lakehouse data into Azure Machine Learning Service
! pip install azure-ai-ml -U
! pip install mltable azureml-dataprep[pandas] -U
! pip install azureml-fsspec -U
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
subscription_id = "Your Azure Subscription ID"
resource_group = "Your Azure Machine Learning Service Workspace Resource Group"
workspace = "Your Azure Machine Learning Service Workspace Name"
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)
artifact = OneLakeArtifact(
name=<Lakehouse ID>,
type="lake_house"
)
store = OneLakeDatastore(
name="onelake_lh_for_azureml",
description="Credential-less OneLake datastore.",
endpoint="msit-onelake.dfs.fabric.microsoft.com",
artifact=artifact,
one_lake_workspace_name=<One Lake workspace name>,
)
ml_client.create_or_update(store)
3. Test the data to see if it is imported successfully.
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azureml.fsspec import AzureMachineLearningFileSystem
uri = 'azureml://subscriptions/<Your Azure Subscription ID >/resourcegroups/<Your Azure Machine Learning Service Resource Group>/workspaces/<Your Azure Machine Learning Service Workspace Name>/datastores/onelake_lh_for_azureml'
# create the filesystem
fs = AzureMachineLearningFileSystem(uri)
fs.ls()
with fs.open('Files/csv/sales.csv') as f:
data = f.readlines()
print(data[0:5])
f.close()
You can select Data from Azure Machine Learning Service to see if the relevant data is imported successfully.
from azure.ai.ml.entities import Data
import pandas as pd
import mltable
csv_path = 'azureml://datastores/onelake_lh_for_azureml/paths/Files/csv'
my_csv_data = Data(
path=csv_path,
type=AssetTypes.URI_FOLDER,
description="demo",
name="csv_data_source",
version="1.0.0"
)
ml_client.data.create_or_update(my_csv_data)
csv_data = ml_client.data.get("csv_data_source", version="1.0.0")
path = {
'folder': csv_data.path
}
tbl = mltable.from_delimited_files(paths=[path])
df = pd.read_csv( csv_data.path + '/sales.csv')
df
Of course, you can also check the data in the workspace of Azure Machine Learning Service to see if it is synchronized well.
In the previous content we used Semantic Kernel. In this blog, we use prompt flow to build the application. Prompt flow is a development tool designed to streamline the entire development cycle of AI applications powered by Large Language Models (LLMs). As the momentum for LLM-based AI applications continues to grow across the globe, Prompt flow provides a comprehensive solution that simplifies the process of prototyping, experimenting, iterating, and deploying your AI applications. If you’re looking for a versatile and intuitive development tool that will streamline your LLM-based AI application development, then prompt flow is the perfect solution for you.
The biggest feature of prompt flow is to help the Prompt project to be better integrated into the project. Especially in stabilizing the output of LLM, it allows you to choose the best Prompt and combine it with LLM for effective work.
Prompt flow development applications can be developed on Azure Machine Learning Service, on the command line, or on Visual Studio Code. It is recommended that you develop on Visual Studio Code. Firstly, you need to install prompt flow for VS Code extensions.
After successful installation, click on the prompt flow extensions on the left sidebar and select Installation Dependencies. When the environment is successfully configured, you can choose to create and build the Prompt flow application.
Prompt flow can support different connections, such as Azure OpenAI Service, Azure Cognitive Search, Azure Content Safety and also support Custom Connections. You can set relevant content according to your needs.
Custom Connection is often used. You can set some link configurations, mainly in the form of key-value pairs.
Use prompt flow to quickly build a flow for enterprise data. The following are implementations for structured data and unstructured data, as well as a simple example of the Chat flow process. All of this data all comes from our Azure Machine Learning Service (imported from Microsoft Fabric Lakehouse)
This is a RAG application for unstructured data and structured data built by prompt flow
You can download samples in my GitHub Repo