Brussels / 3 & 4 February 2018


The Magnificent Modular Mahout

An extensible library for distributed math and HPC

Open source big data engines as well as HPC libraries seem to be proliferating at an increasing rate. Technical debt can be incurred with statistical and machine learning algorithms that require a highly specialized knowledge of the algorithm at hand as well as the distributed engine / HPC library which the method has been written against. The Apache Mahout project presents a highly modular stack which introduces levels of abstraction between the mathematical implementation of the algorithm (an R-Like Scala DSL) and the execution of the code. Users are able to interchange Apache Spark, Apache Flink (batch), and H2O distributed engines, as well as ViennaCL for OpenCL on GPU and OpenMP, and CUDA native solvers. Users can also port high level algorithms to new distributed engines or native solvers by defining a handful of BLAS operations.

Audience members will ideally have some concept of distributed engines such as Apache Spark, and a basic under standing of BLAS packs and linear algebra. (Basic understanding of linear algebra meaning they remember that things like matrix times matrix, matrix times vector, matrix transposed, and matrix decompositions are things that exist).

Current research is being done on creating a quantum BLAS pack for Apache Mahout, which will be a prototype of the next generation of High Performance Computing- however trying to even begin to explore the topic of quantum computing in 20 minutes is observed, and so research and progress at the time of the conference will be mentioned in passing only.


Photo of Trevor Grant Trevor Grant