Ballista: Distributed Compute Platform
Ballista is a distributed compute platform primarily implemented in Rust, using Apache Arrow as the memory model. It is
built on an architecture that allows other programming languages to be supported as first-class citizens without paying
a penalty for serialization costs.
The foundational technologies in Ballista are:
Ballista can be deployed in Kubernetes, or as a standalone cluster using
etcd for discovery.
The following diagram highlights some of the integrations that will be possible with this unique architecture. Note
that not all components shown here are available yet.
How does this compare to Apache Spark?
Although Ballista is largely inspired by Apache Spark, there are some key differences.
- The choice of Rust as the main execution language means that memory usage is deterministic and avoids the overhead of
- Ballista is designed from the ground up to use columnar data, enabling a number of efficiencies such as vectorized
processing (SIMD and GPU) and efficient compression. Although Spark does have some columnar support, it is still
largely row-based today.
- The combination of Rust and Arrow provides excellent memory efficiency and memory usage can be 5x - 10x lower than
Apache Spark in some cases, which means that more processing can fit on a single node, reducing the overhead of
- The use of Apache Arrow as the memory model and network protocol means that data can be exchanged between executors
in any programming language with minimal serialization overhead.
The following examples should help illustrate the current capabilities of Ballista
Ballista releases are now available on crates.io,
Maven Central and
Docker Hub. Please refer to the
user guide for instructions on using a released version of Ballista.
We are currently working on performance tuning and adding support for more complex operators, particularly joins, using the
TPC-H benchmarks to drive requirements. The full roadmap is available
The user guide is hosted at https://ballistacompute.org,
along with the blog where news and release notes are posted.
Developer documentation can be found in the docs directory.
See CONTRIBUTING.md for information on contributing to this project.