Introducing Synapse Data Warehouse in Microsoft Fabric
See Arun Ulagaratchagan’s blog post to read the full Microsoft Fabric preview announcement.
In today’s data-driven world, data warehousing is a critical component of any business strategy. Its goal is to assist companies in effectively managing and analyzing vast amounts of data, enabling them to make informed decisions and drive growth. However, with the ever-increasing amount of data being collected, the challenges associated with its management are also growing and this is compounded by the evolution of the era of AI. Traditional data warehousing solutions have become complex and costly, often resulting in data duplication, vendor lock-ins and governance issues.
We are thrilled to announce the preview of Synapse Data Warehouse in Microsoft Fabric! Synapse Data Warehouse is the next generation of data warehousing in Microsoft Fabric that is the first transactional data warehouse to natively support an open data format enabling IT teams, data engineers and business users to collaborate seamlessly and extract actionable insights from their data, all without compromising enterprise security or governance. Just like the previous data warehouse generation, SQL provides multi-table ACID transactional guarantees. It is built on the well-established SQL Server Query Optimizer and Distributed Query Processing engine but is bolstered with the key improvements below that add significant new value to enterprises –
- Fully managed: this new data warehouse is a fully managed SaaS solution and effortlessly extends modern data architectures to both professional developers who love to write code and citizen enthusiasts with no coding skills. What previously took enterprises months to accomplish can now be done in minutes efficiently.
- No provisioning and managing of resources: instead of provisioning dedicated clusters, it is based on a fully serverless compute infrastructure where resources are provisioned in milliseconds as jobs requests come in. Enterprises benefit from resource efficiencies and only pay for what they use.
- Separation of storage and compute: compute nodes used are independent of storage enabling enterprises to scale and pay for either one separately.
- Open data standards: data is not locked-in the proprietary SQL Server format but is stored in the open data standard of Delta-Parquet in the Microsoft OneLake providing interoperability not only with all workloads in Fabric but also the Spark ecosystem without requiring any data movement.
- Cross-querying: as a result of the open data standard support, data in the lake whether processed by a Fabric workload or any other compute engine can be queried and cross-joined without making any copies of the data.
- Auto-scaling: it automatically scales resources instantly as query and usage requirements increase and down-scales when there is no more need for these resources, all without any user intervention.
- Self-optimizing: it automatically detects and isolates workloads to deliver predictable performance. The best performance is based on caching which is automatic and multi-tiered based on activity. Query plans generated are optimal. There is no need to hire highly skilled engineers to manage workload groups or tune the data warehouse.
- Fully integrated: it is fully integrated with all Fabric workloads right out of the box for any developer. Users can continue to benefit from the rich capabilities of the SQL engine using the T-SQL language or a simple user interface. All this with the continued benefits of the SQL ecosystem.
Let’s dive deeper into its capabilities.
Analytics is a continuum of activities requiring the ability to ingest, prep and analyze data, creating business semantics, machine learning models and BI reports. It requires collaboration across IT, data engineers, business analysts and data scientists across the organization. As data is shared or discovered, it needs to be secured and governed. Synapse Data Warehouse makes this easy through the key capabilities below.
Simple and intuitive Warehouse experience
Fabric has a dedicated home page for data warehouse, where a new Warehouse can be created with just a name and sensitivity label. There’s no provisioning or setup required. The user interface is a familiar relational database experience. The Warehouse Explorer exposes schemas, tables, stored procedures, and all other database objects. Anyone new to warehousing can start with a Warehouse sample!
Easily ingest and prep data
Data can be loaded into the Warehouse writing T-SQL queries using the COPY command. It can also be ingested using a Data Factory Pipeline which is available now or Dataflow Gen2 which will be available soon, from right within the Warehouse Editor. Pipelines offer connectivity to several data sources, ability to sub-select tables and preview data. Tables are automatically created, and column data types are automatically mapped from source type to Parquet. Data ingested into the Warehouse is stored in OneLake. Table transactions are guaranteed by the SQL compute engine and Delta logs are published periodically. Delta-Parquet in OneLake can be viewed using OneLake Explorer and accessed effortlessly using Notebooks.
Analyze data visually or by writing code
The professional developer can continue writing T-SQL code for querying and analyzing data. Citizen developers can use the Visual Query Editor which is a drag-and-drop user interface for compiling queries and even performing complex joins and groupings. T-SQL is automatically generated, which can be edited as well.
Virtual warehouses for cross-joining warehouses and lakehouses
Traditionally, when users wanted to merge data from both the data warehouse and the lake, they were faced with a cumbersome process of creating pipelines, transferring data, and duplicating their data. In Microsoft Fabric, users can create virtual warehouses containing data from any source in Fabric whether a Warehouse or Lakehouse and across any storage or any cloud. As long as the data is in Delta tables, a shortcut can be created to it and queried or cross-joined using T-SQL three-part-naming conventions or the Visual Query Editor.
Add business semantics to data and build Power BI reports that serve insights
Synapse Data Warehouse is fully integrated with Power BI in Microsoft Fabric. A Power BI dataset is automatically generated and kept in sync with the data in the Warehouse. Users do not have to worry about Power BI modes and make trade-offs based on their data volumes increasing or needing to meet performance or security requirements. The experiences to create relationships between tables, add Power BI semantics known as measures is all within the Warehouse Editor. A new Power BI report is a one-click experience!
Manage security and govern data
Data warehouse supports traditional T-SQL security constructs. GRANTS, REVOKE or DENY can be used to secure objects with the Warehouse. Object level security allows for controlling granular access for both collaboration and consumption.
Data governance is critical for enterprises. Like the rest of Microsoft Fabric experiences, sensitivity labels can be applied to the Warehouse which carries forward to any item downstream. End-to-end lineage information can be viewed as well.
Performance and scale
Data Warehousing in Microsoft Fabric is currently in preview. The focus of this preview is on providing a rich set of SaaS features and functionality tailored to all skill levels. The preview delivers on the promise of providing simplified experiences through an open data format over a single copy of data. While this release is not focused on performance, concurrency, and scale, additional functionality to handle complex workloads and provide industry-leading performance will land incrementally as we progress towards General Availability of data warehousing in Microsoft Fabric.
Azure Synapse Dedicated SQL Pools update
Existing Azure Synapse Dedicated SQL Pools will continue to provide a robust, enterprise-grade PaaS solution. Synapse Data Warehouse in Microsoft Fabric is an evolution in the form of a simplified SaaS solution that can connect to the existing PaaS offering. Customers will be able to upgrade from their current products into Fabric at their own pace.
In addition to what is shipping at preview, rich capabilities will be released every month and details updated via monthly blog posts. Here are just a few features to give you a sneak peek:
- Automatic statistics: statistics are automatically computed in the Warehouse as queries are executed ensuring users get optimal performance.
- Zero copy Table clones: users can create zero copy Table clones using a T-SQL command.
- Data warehouse in Deployment Pipelines: users can use Warehouses in Deployment Pipelines and deploy to Dev, Test and Production workspaces. They can compare schemas, rollback changes and automate via the use of REST APIs.
- Data warehouse Git integration: users can connect to a Git repository, develop their warehouse SQL scripts and code, manage versions, commits, and pull requests and download SQL projects.
- Data warehouse REST APIs: users can use public REST APIs to automate creation, management, and administration of their data warehouses.
- Warehouse integration with Microsoft Fabric Monitoring Hub: users can view query details, monitor, and troubleshoot performance of their solution end-to-end using the Monitoring Hub.
- Dataflows Gen2: users can use Dataflows Gen2 with familiar Power Query experiences to transform data and load into the Warehouse.
Get started with Microsoft Fabric
Microsoft Fabric is currently in preview. Try out everything Fabric has to offer by signing up for the free trial—no credit card information required. Everyone who signs up gets a fixed Fabric trial capacity, which may be used for any feature or capability from integrating data to creating machine learning models. Existing Power BI Premium customers can simply turn on Fabric through the Power BI admin portal. After July 1, 2023, Fabric will be enabled for all Power BI tenants.
Learn more about Synapse Data Warehouse in Fabric and how customers are using it at the following Build 2023 session:
If you want to learn more about Microsoft Fabric, consider:
- Signing up for the Microsoft Fabric free trial
- Visiting the Microsoft Fabric website
- Reading the more in-depth Fabric experience announcement blogs:
- Data Factory experience in Fabric blog
- Synapse Data Engineering experience in Fabric blog
- Synapse Data Science experience in Fabric blog
- Synapse Real-Time Analytics experience in Fabric blog
- Power BI announcement blog
- Data Activator experience in Fabric blog
- Administration and governance in Fabric blog
- OneLake in Fabric blog
- Microsoft 365 data integration in Fabric blog
- Dataverse and Microsoft Fabric integration blog
- Exploring the Fabric technical documentation
- Reading the free e-book on getting started with Fabric
- Exploring Fabric through the Guided Tour
- Watching the free Fabric webinar series
- Joining the Fabric community to post your questions, share your feedback, and learn from others
- Visiting Microsoft Fabric Ideas to submit suggestions for improvements and vote on your peers’ ideas