parquet-rs

by sunchao

sunchao /parquet-rs

Apache Parquet implementation in Rust

135 Stars 18 Forks Last release: almost 2 years ago (0.4.2) Apache License 2.0 337 Commits 7 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

parquet-rs

Build Status Coverage Status License API docs Master API docs

An Apache Parquet implementation in Rust.

NOTE: this project has merged into Apache Arrow, and development will continue there. To file an issue or pull request, please file a JIRA in the Arrow project.

Usage

Add this to your Cargo.toml:

toml
[dependencies]
parquet = "0.4"

and this to your crate root:

rust
extern crate parquet;

Example usage of reading data: ```rust use std::fs::File; use std::path::Path; use parquet::file::reader::{FileReader, SerializedFileReader};

let file = File::open(&Path::new("/path/to/file")).unwrap(); let reader = SerializedFileReader::new(file).unwrap(); let mut iter = reader.getrowiter(None).unwrap(); while let Some(record) = iter.next() { println!("{}", record); } ``` See crate documentation on available API.

Supported Parquet Version

  • Parquet-format 2.4.0

To update Parquet format to a newer version, check if parquet-format version is available. Then simply update version of

parquet-format
crate in Cargo.toml.

Features

  • [X] All encodings supported
  • [X] All compression codecs supported
  • [X] Read support
    • [X] Primitive column value readers
    • [X] Row record reader
    • [ ] Arrow record reader
  • [X] Statistics support
  • [X] Write support
    • [X] Primitive column value writers
    • [ ] Row record writer
    • [ ] Arrow record writer
  • [ ] Predicate pushdown
  • [ ] Parquet format 2.5 support
  • [ ] HDFS support

Requirements

  • Rust nightly

See Working with nightly Rust to install nightly toolchain and set it as default.

Build

Run

cargo build
or
cargo build --release
to build in release mode. Some features take advantage of SSE4.2 instructions, which can be enabled by adding
RUSTFLAGS="-C target-feature=+sse4.2"
before the
cargo build
command.

Test

Run

cargo test
for unit tests.

Binaries

The following binaries are provided (use

cargo install
to install them): - parquet-schema for printing Parquet file schema and metadata.
Usage: parquet-schema  [verbose]
, where
file-path
is the path to a Parquet file, and optional
verbose
is the boolean flag that allows to print full metadata or schema only (when not specified only schema will be printed).
  • parquet-read for reading records from a Parquet file.
    Usage: parquet-read  [num-records]
    , where
    file-path
    is the path to a Parquet file, and
    num-records
    is the number of records to read from a file (when not specified all records will be printed).

If you see

Library not loaded
error, please make sure
LD_LIBRARY_PATH
is set properly:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib

Benchmarks

Run

cargo bench
for benchmarks.

Docs

To build documentation, run

cargo doc --no-deps
. To compile and view in the browser, run
cargo doc --no-deps --open
.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.