Microsoft Fabric Updates Blog

Announcing improvements to CSV data ingestion in Synapse Data Warehouse in Microsoft Fabric

CSV files are widely used for data exchange and data ingestion into data warehouses, but they often pose challenges on performance. In accordance to a study from Microsoft Research, up to 90% of the total time spent in data ingestion occurs in parsing non-binary data such as JSON when using conventional file parsers. This is illustrated in Figure 1.

A bar chart comparing percentage of time spent on parsing versus query processing, for four different queries. The chart shows that more than 90% of the cost comes from parsing for queries one, two, and three, and over 80% of the cost comes from parsing for query number four.
Figure 1: Parsing vs. Query processing Cost
Twitter Dataset, Queries from [30], Spark+Jackson
Source: “Mison: A Fast JSON Parser for Data Analytics”

Today, we’re excited to announce a new, faster way to ingest data from CSV files into Data Warehouse in Microsoft Fabric: introducing CSV file parser version 2.0 for COPY INTO. The new CSV file parser builds on innovation from Microsoft Research’s Data Platform and Analytics group to make CSV file ingestion blazing fast on Data Warehouse.


The performance benefits you will enjoy with the new CSV file parser vary depending on the number of files you have in the source, the size of these files, and the data layout. Our testing revealed an overall improvement of 38% in ingestion times on a diverse set of scenarios, and in some cases, more than 4 times faster when compared to the legacy CSV parser.  

How it works

To use the new CSV file parser, we have introduced a new option to the COPY INTO statement: PARSER_VERSION. When this option is used with the value ‘2.0’, the new CSV file parser is used. For example:

COPY INTO mytable
    FILE_TYPE = 'CSV',
    PARSER_VERSION = '2.0' --this parameter is optional, and is the new default

The performance of the new CSV file parser is so great that we have decided to make it the default option for COPY INTO, so you don’t even have to specify that option to enjoy the benefits of the new file parser. In some rare cases, however, the new CSV parser is not supported, so you may need to use the legacy CSV file parser by specifying the option PARSER_VERSION = ‘1.0’ with COPY INTO. For more information on unsupported scenarios and full syntax, refer to our documentation.

Next steps

The new CSV file parser is now globally available. As mentioned, it is the new default file parser for CSV files during ingestion, so you do not need to do anything to enjoy its benefits.

To learn more about the Mison parser, visit Mison: A Fast JSON Parser for Data Analytics. Even though this research has been published with JSON formats at its origin, this work has since expanded to other file formats, such as CSV.

Related blog posts

Announcing improvements to CSV data ingestion in Synapse Data Warehouse in Microsoft Fabric

February 9, 2024 by Ruixin Xu

During January 2024, we announced the worldwide availability for public preview of Copilot in Microsoft Fabric. This preview includes Copilot for Power BI, Data Factory and Data Science & Data Engineering. With the Copilot in preview, Microsoft Fabric brings an improved way to transform, enrich and analyze data, and shortens the time to insights.  Today, we announce that Copilot … Continue reading “Announcing Fabric Copilot pricing “

February 1, 2024 by Kimberly Williams

We are excited to announce that Microsoft Fabric, our all-in-one analytics solution for enterprises, has achieved new certifications for HIPAA and ISO 27017, ISO 27018, ISO 27001, ISO 27701. These certifications demonstrate our commitment to providing the highest level of security and privacy for our customers’ data. What are these certifications and why do they matter? HIPAA (Health Insurance Portability … Continue reading “Microsoft Fabric is now HIPAA compliant”