Microsoft Fabric Updates Blog

Announcing public preview of VARCHAR(MAX) and VARBINARY(MAX) types in Fabric Data Warehouse

Fabric Data Warehouse is a modern data warehouse optimized for analytical data models, primarily focused on the smaller numeric, datetime, and string types that are suitable for analytics. For the textual data, Fabric DW supports the VARCHAR type that can store up to 8KB of text, which is suitable for most of the textual values that you are using in analytics (for example names of towns/countries, job titles, addresses, phone numbers, etc.) However, you are not able to store large textual values for descriptions, comments, and other kinds of free text values, without a risk of potential data truncation.

We are happy to share that we are increasing the size limit for the string and binary columns in Fabric DW. We are starting a public preview for VARCHAR(MAX) and VARBINARY(MAX) types, allowing you to store up to 1MB of data per cell.

To store large textual or binary data in tables, you can use VARCHAR(MAX) and VARBINARY(MAX) columns in your table definitions:

CREATE TABLE Product (
       ProductID INT,
       Name VARCHAR(256),
       Description VARCHAR(MAX),
       Picture VARBINARY(MAX)
)

We are also introducing several performance enhancements to accelerate operations on string and binary columns, such as the faster LIKE operator, string column filtering, and batch mode execution on large textual values. Therefore, you should experience minimal overhead when using the VARCHAR(MAX) type compared to the VARCHAR(N) type, under the assumption that you are working with similar data. While we still recommend optimizing your data types, we are ensuring that sub-optimal and large data types introduce minimal overhead.

In addition to storing unstructured textual and binary data, the VARCHAR(MAX) and VARBINARY(MAX) types enable new scenarios involving:

  • Working with JSON data: JSON (JavaScript Object Notation) is widely used format for representing semi-structured data. Storing JSON data directly in your database allows more flexible and efficient data management, especially for applications that require dynamic, complex, and hierarchical data structures.
  • Working with spatial data: Spatial data represents the physical location and shape of objects, which is crucial for geographic information systems (GIS), mapping applications, and location-based services. Supporting spatial data types allows you to store and query geographic information, enabling advanced spatial analysis and visualization.

Working with JSON data

The VARCHAR(MAX) type allows you to store JSON data formatted as text in DW tables. JSON is a flexible storage format that enables you to easily store arrays of addresses, phone numbers, or custom properties without a predefined schema, as shown in the following example:

CREATE TABLE Person (
       id INT,
       name VARCHAR(255),
       surname VARCHAR(255),
       addresses VARCHAR(MAX),
       additionalDetails VARCHAR(MAX)
);

VARCHAR(MAX) columns formatted as JSON text are commonly used to store complex and semi-structured data. If you need to store structures such as arrays or maps, the JSON format provides a flexible way to store them in Data Warehouse tables. With the ability to store up to 1MB of text, the VARCHAR(MAX) type ensures you can handle large amounts of complex data with nested fields, without the risk of truncation.

Fabric DW supports T-SQL JSON functions like JSON_VALUE and OPENJSON, enabling you to efficiently extract information from VARCHAR(MAX) columns containing JSON text.

Working with spatial data

Another scenario that Fabric DW supports is the use of various T-SQL spatial functions to perform complex spatial calculations, such as finding distances or intersections between spatial objects.

While the most commonly used spatial objects are points stored as (latitude, longitude) pairs, there may be cases where you need to store more complex spatial objects that exceed 8KB in size. For example, you might need to store information about the boundaries of taxi zones. The common formats for serializing spatial objects are well-known binary (WKB) and well-known text (WKT). In such cases, you can store spatial information in a VARBINARY(MAX) column using the well-known binary format:

CREATE TABLE dbo.TaxiZones (
    OBJECTID int,
    Shape_Leng numeric(19, 11),
    Shape_Area numeric(19, 11),
    zone varchar(8000),
    borough varchar(256),
    geometry varbinary(max)
)

In this example, the geometry column might contain the boundary of a taxi zone stored in the WKB format. You can use geometry::STGeomFromWKB T-SQL function to convert well-known binary to geometry objects.

The VARBINARY(MAX) column type, with its 1MB per cell limit, allows you to store large and complex spatial objects without worrying about truncation, which could corrupt your spatial data.

If you prefer to use well-known text for storing spatial data, the VARCHAR(MAX) type can accommodate most spatial objects you might need in your scenarios. In this case you could use geometry::STGeomFromText function to convert well-known text to geometry objects.

Once you convert VARBINARY or VARCHAR data to geometry objects, you can use a variety of T-SQL spatial functions with these spatial objects.

Conclusion

The VARCHAR(MAX) and VARBINARY(MAX) types in the Fabric Datawarehouse enable you to store large amounts of text and binary data. With these new types, you can efficiently manage and process extensive textual data thanks to performance enhancements we have implemented. Additionally, the support for larger data types opens new possibilities for handling complex semi-structured data formatted as JSON and working with spatial data. These advancements ensure that your data warehouse is more versatile and capable of meeting diverse analytical needs.

Liittyvät blogikirjoitukset

Announcing public preview of VARCHAR(MAX) and VARBINARY(MAX) types in Fabric Data Warehouse

lokakuuta 29, 2024 tekijä Dandan Zhang

Managed private endpoints allow Fabric experiences to securely access data sources without exposing them to the public network or requiring complex network configurations. We announced General Availability for Managed Private Endpoint in Fabric in May of this year. Learn more here: Announcing General Availability of Fabric Private Links, Trusted Workspace Access, and Managed Private Endpoints. … Continue reading “APIs for Managed Private Endpoint are now available”

lokakuuta 28, 2024 tekijä Gali Reznick

The Data Activator team has rolled out usage reporting to help you better understand your capacity consumption and future charges. When you look at the Capacity Metrics App you’ll now see operations for the reflex items included. Our usage reporting is based on the following four meters: Rule uptime per hour: This is a flat … Continue reading “Usage reporting for Data Activator is now live”