Read and write Apache Parquet files using MuleSoft

UPDATE: 10/24/2022 – The Connector has been updated to support InputStream so there’s no need read from a local file store. You can read or write directly from a stream to an endpoint like AWS S3.

Apache Parquet is a file format designed to support fast data processing for complex data. Unlike row-based formats like CSV, Parquet is column-oriented – meaning the values of each table column are stored next to each other rather than those of each record. Included in the Parquet file is metadata that includes the schema and structure of each file making it a self-describing format.

Using Parquet format has two advantages

  • Reduced storage
  • Query performance

Out-of-the-box, MuleSoft doesn’t currently support reading and writing Parquet files today but the flexibility of the platform allows developers to easily extended it to support Parquet. Using the Mule SDK and the Apache Parquet libraries, I created a community connector which you can find here.

https://github.com/djuang1/parquet

Check out this video that walks you through an example Mule application that leverages the Parquet connector.


Comments

6 responses to “Read and write Apache Parquet files using MuleSoft”

  1. Hi, Can you please tell me if 1.0.28-snapshot is the latest (main branch) or 1.0.31 in the release branch is the one that we should use?

    1. Go with the 1.0.28-snapshot.

  2. Leonardo Avatar
    Leonardo

    Hi djuang, Is it possible to publish the connector to anypoint exchange?

    1. Yes, I’m working through the process of getting it published.

  3. Hi Dejim, We are using parquet connector in our project and observe that the jar file becomes bulky with parquet connector (~180MB). Can you please tell me did you faced the same issue.

    1. Yes, unfortunately that’s the size of all the additional files that are needed to handle the Parquet format.

Leave a Reply to djuang1 Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.