Brussels / 31 January & 1 February 2015

schedule

Distributed tile processing with GeoTrellis and Spark


GeoTrellis is a geospatial Scala library and framework for doing high performance geospatial processing in a distributed environment. This past year the developers of GeoTrellis have created extensions to the Apache Spark cluster computing platform to ingest and process raster data stored in Accumulo and HDFS. Spark and GeoTrellis can be used to process and serve raster data through web services to create TMS tile layers that can be used on web maps. The framework can work with both spatial-only tiles, as well as spatial-temporal tiles such as climate model data.

In this talk I'll describe the process of using GeoTrellis to ingest raster data into Accumulo, and give examples of how we can manipulate that data using spark. I will go into the architecture of the GeoTrellis core library, and how it leverages the powerful type system of Scala to make geospatial coding a lot easier. I will go into the architecture that has allowed us to geospatially enable the Apache Spark clustering engine, the difficulties we faced while working with the 3 libraries, and how we overcame those challenges. I will demonstrate working with both spatial-only and spatio-temporal raster data using the framework on an AWS cluster with Spark and Accumulo.

Speakers

Photo of Rob Emanuele Rob Emanuele

Links