Need help with parquet-rs?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

sunchao
141 Stars 18 Forks Apache License 2.0 337 Commits 27 Opened issues

Description

Apache Parquet implementation in Rust

Services available

!
?

Need anything else?

Contributors list

No Data

parquet-rs

Build Status Coverage Status License API docs Master API docs

An Apache Parquet implementation in Rust.

NOTE: this project has merged into Apache Arrow, and development will continue there. To file an issue or pull request, please file a JIRA in the Arrow project.

Usage

Add this to your Cargo.toml:

toml
[dependencies]
parquet = "0.4"

and this to your crate root:

rust
extern crate parquet;

Example usage of reading data: ```rust use std::fs::File; use std::path::Path; use parquet::file::reader::{FileReader, SerializedFileReader};

let file = File::open(&Path::new("/path/to/file")).unwrap(); let reader = SerializedFileReader::new(file).unwrap(); let mut iter = reader.getrowiter(None).unwrap(); while let Some(record) = iter.next() { println!("{}", record); } ``` See crate documentation on available API.

Supported Parquet Version

  • Parquet-format 2.4.0

To update Parquet format to a newer version, check if parquet-format version is available. Then simply update version of

parquet-format
crate in Cargo.toml.

Features

  • [X] All encodings supported
  • [X] All compression codecs supported
  • [X] Read support
    • [X] Primitive column value readers
    • [X] Row record reader
    • [ ] Arrow record reader
  • [X] Statistics support
  • [X] Write support
    • [X] Primitive column value writers
    • [ ] Row record writer
    • [ ] Arrow record writer
  • [ ] Predicate pushdown
  • [ ] Parquet format 2.5 support
  • [ ] HDFS support

Requirements

  • Rust nightly

See Working with nightly Rust to install nightly toolchain and set it as default.

Build

Run

cargo build
or
cargo build --release
to build in release mode. Some features take advantage of SSE4.2 instructions, which can be enabled by adding
RUSTFLAGS="-C target-feature=+sse4.2"
before the
cargo build
command.

Test

Run

cargo test
for unit tests.

Binaries

The following binaries are provided (use

cargo install
to install them): - parquet-schema for printing Parquet file schema and metadata.
Usage: parquet-schema  [verbose]
, where
file-path
is the path to a Parquet file, and optional
verbose
is the boolean flag that allows to print full metadata or schema only (when not specified only schema will be printed).
  • parquet-read for reading records from a Parquet file.
    Usage: parquet-read  [num-records]
    , where
    file-path
    is the path to a Parquet file, and
    num-records
    is the number of records to read from a file (when not specified all records will be printed).

If you see

Library not loaded
error, please make sure
LD_LIBRARY_PATH
is set properly:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib

Benchmarks

Run

cargo bench
for benchmarks.

Docs

To build documentation, run

cargo doc --no-deps
. To compile and view in the browser, run
cargo doc --no-deps --open
.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.