Multidimensional arrays are core to a wide variety of applications. Some are well-suited to single machines, especially when datasets fit in memory. Others can benefit from distributed computing, especially for out-of-memory datasets and complex workflows.
Bolt is an Python project designed to faciliate working with ndarrays in local and distributed settings. It exposes array operations through either local or distributed implementations, and makes it easy to switch between them. The distributed implementation is currently backed by Spark, but can support other backends in the future. The distributed operations leverage efficient data structures for multidimensional array manipulation, and aim to cover most of NumPy’s ndarray interface in a distributed setting.