How to read and write Apache Parquet files in MuleSoft

Apache Parquet is a file format designed to support fast data processing for complex data. Unlike row-based formats like CSV, Parquet is column-oriented – meaning the values of each table column are stored next to each other rather than those of each record. Included in the Parquet file is metadata that includes the schema and structure of each file making it a self-describing format.

Using Parquet format has two advantages

  • Reduced storage
  • Query performance

Out-of-the-box, MuleSoft doesn’t currently support reading and writing Parquet files today but the flexibility of the platform allows developers to easily extended it to support Parquet. Using the Mule SDK and the Apache Parquet libraries, I created a community connector which you can find here.

https://github.com/djuang1/parquet

Check out this video that walks you through an example Mule application that leverages the Parquet connector.

Apache Parquet is a file format designed to support fast data processing for complex data. Unlike row-based formats like CSV, Parquet is column-oriented – meaning the values of each table column are stored next to each other rather than those of each record. Included in the Parquet file is metadata that includes the schema and…