Microsoft Fabric Updates Blog

Chat your data in Microsoft Fabric with Semantic Kernel

Using Microsoft Fabric’s Lakehouse we can manage different data sources. Today Microsoft Copilot is very popular, and we hope that Microsoft Fabric can become an indispensable part of enterprise data management and make it easier for enterprise data to connect with LLM. This blog will combines data engineering and data science perspectives to construct Copilot tools based on business data in Microsoft Fabric.

Semantic Kernel

The framework for LLM-based applications is mainly chosen between LangChain and Semantic Kernel. I personally prefer Semantic Kernel to better control the connection between Prompt and Code. And for a multi-programming language application scenario, Semantic Kernel is undoubtedly the best choice. Not only .NET, but also Python and Java are available, allowing more developers and enterprises to use existing technologies to seamlessly enter Copilot applications.

Chat with your structured enterprise data

How to let LLM communication with data? For enterprises, there is unstructured data and structured data. We have many solutions for unstructured data. Through Embedding or Fine-tune, we can quickly complete Copilot applications for unstructured data. For structured data, perhaps we need some traditional techniques. In past scenarios, we can extract and analyze data through Python’s Library(Panda, Spark, Matplotlib, etc) and T-SQL. So we can also use Prompt to help us generate sentences for structured data to complete chat scenarios based on structured data. GPT-4 has very powerful code generation capabilities. We can make good use of Prompt’s role capabilities to enable GPT-4’s data language generation for structured data.

This is the application architecture diagram

Let’s Chat with your enterprise data

A. Choose Data Engineering, create lake house and upload csv and your data prompt to this lake house

# create lake house, give a name ‘mydatasource’ , and upload files from (MSFabricwithSKSamples/skills at main · kinfey/MSFabricwithSKSamples (github.com) and (MSFabricwithSKSamples/datasets at main · kinfey/MSFabricwithSKSamples (github.com))

# load sales.csv’s data to table




# check your Lakehouse


B. Choose Data Science, create Notebook

# sync Lakehouse to your Data Science environment



# using pip install semantic-kernel in Your Notebook

! pip install semantic-kernel

# import library

import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion


# init your semantic-kernel, including

kernel = sk.Kernel()

deployment = 'Your Azure OpenAI Service GPT-4-32k model deployment name'
api_key = 'Your Azure OpenAI Service API Key'
endpoint = 'Your Azure OpenAI Service Endpoint'


kernel.add_chat_service("chat", AzureChatCompletion(deployment, endpoint, api_key))

base_skills_directory = '/lakehouse/default/Files/skills'

skills = {
**kernel.import_semantic_skill_from_directory(base_skills_directory , "dataskill"),
}

csv_skill = skills["csv"]
pysql_skill = skills["spark"]


C. Chat with your csv data under Panda

import pandas as pd
df = pd.read_csv("/lakehouse/default/Files/csv/ProductList.csv")


You can now manage your CSV data in Copilot Chat

Data statistics scene

panda_context_variables = sk.ContextVariables(variables={
"question": "Generate a bar chart to list Product Type and display the number of products owned by the corresponding Product Type"
})
panda_result = await csv_skill.invoke_async(variables=panda_context_variables )
pandasql = panda_result .result.replace("\\n","").replace("\\n\\n", "")
exec(pandasql)




Query data scenarios

context_variables = sk.ContextVariables(variables={
"question": "Find Product Type belonging to Car audio and display them as table "
})
result = await csv_skill.invoke_async(variables=context_variables)
sql = result.result.replace("\\n","").replace("\\n\\n", "")
exec(result.result)



D. Chat with your csv data under Spark

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *
df = spark.read.load('Files/csv/sales.csv',
format='csv',
header=True
)

You can now manage your Spark data in Copilot Chat

sparksql_context_variables = sk.ContextVariables(variables={
"question": "Search OrderDate on '2019-07-01' and counts SalesOrderLineNumber"
})
sparksql_result = await pysql_skill.invoke_async(variables=sparksql_context_variables )
sparksql = sparksql_result .result.replace("\n","")
exec(sparksql)




E. Chat with your table under T-SQL

You can now generative your T-SQL in Copilot Chat

sql_context_variables = sk.ContextVariables(variables={
"datasource": "mydatasource",
"question": "Count the total Quantity of the Item column in Sales for Road-150 Red, 44"
})
sql_result = await sql_skill.invoke_async(variables=sql_context_variables)
sql_result.result.replace("\n"," ")


Here you need to manually generate a row of generated T-SQL for execution, such as

%%sql
SELECT SUM(Quantity)  FROM mydatasource.Sales  WHERE Item = 'Road-150 Red, 44'




Through the above steps, Semantic Kernel has opened up the connection between enterprise data and large models in Microsoft Fabric. Let LLM better serve our enterprise intelligence. If you are interested in this content, you can download the complete example from my GitHub repo (https://github.com/kinfey/MSFabricwithSKSamples) and give me more feedback on the issue



Liittyvät blogikirjoitukset

Chat your data in Microsoft Fabric with Semantic Kernel

marraskuuta 4, 2024 tekijä Salil Kanade

AI is transforming data warehousing, making complex analytics more accessible and efficient. With tools like Copilot for Data Warehouse and AI Skill, Microsoft Fabric offers two powerful, complementary resources that serve both data developers and business users. This blog explores how these tools differ, when to use each, and how they can work together to … Continue reading “Data Warehouse: Copilot & AI Skill”

lokakuuta 22, 2024 tekijä Estera Kot

We’re thrilled to announce that Fabric Runtime 1.3 has officially moved from Public Preview to General Availability (GA). This is a major upgrade to our Apache Spark-based big data execution engine, which powers both data engineering and data science workflows. Fabric Runtime 1.3 is now fully integrated into the Fabric platform, ensuring a smooth and … Continue reading “Fabric Runtime 1.3 is Generally Available! Upgrade your data engineering and science workloads to harness the latest innovations and performance enhancements”