Brussels / 2 & 3 February 2019

schedule

Multiplex graph analysis with GraphBLAS


Introduction

Graph analysis workloads present resource-intensive computations that require a large amount of memory and CPU time. Consequently, there an abundance of graph processing tools which build on distributed data processing frameworks, including Spark GraphX, Flink Gelly and Giraph (which runs on Hadoop). According to a recent survey, most of these systems build on the vertex-centric programming model, originally introduced in Google’s Pregel paper. This model defines graph analytical algorithms in terms of vertices communicating with their neighbours through message passing, which allows both easy parallelization (for the systems) and intuitive formalization of the computation (for developers). While these systems indeed exhibit horizontal scalability, they introduce numerous inefficiencies requiring a large amount of resources even for moderately sized graphs. Most practical applications only use graphs up to a few hundred million vertices and edges, which can now be stored comfortably on a single machine. For such graphs, it is worth investigating techniques that allow their evaluation without the additional cost and complexity of operating a distributed cluster.

GraphBLAS

The GraphBLAS initiative is an effort to design a set of standard building blocks that allow users to formulate graph computations in the language of linear algebra, using operations on sparse adjacency matrices defined on custom semirings. Since its inception, GraphBLAS has been implemented for multiple languages (e.g. C, C++, and Java). Additionally, GraphBLAS is being designed in collaboration with hardware vendors (such as Intel and Nvidia) to define a standardized set of interfaces, which will allow building specialized hardware components for graph processing in the future.

Multiplex graph metrics

Graph analysis has a significant overlap with network science, a field that aims to uncover the hidden structural properties of graphs and determine the interplay between their vertices. Most works in network science only study homogeneous (monoplex) graphs, and do not distinguish between different types of vertices and edges. We believe this abstraction is wasteful for most real-life networks, which are heterogeneous (multiplex) and emerge by different types of interactions. To illustrate such analyses, we calculated multiplex clustering metrics on the Paradise papers data set to find interesting entities that were engaged in disproportionately high levels of activities with their interconnected neighbours. We found that even on this relatively small data set (2M vertices and 3M edges), naive implementations did not terminate in days. Hence, we adapted techniques from GraphBLAS to optimize the computations to finish in a few minutes.

Outline of the talk

This talk gives a brief overview of how linear algebra can be used to define graph computations on monoplex graphs, and how we applied it to speedup the calculation of multiplex graph metrics. We present the lessons learnt while experimenting with sparse matrix libraries in C, Java, and Julia. Our graph analyzer framework is available as open-source.

Intended audience: users interested in applying multiplex graph analytical techniques for their problems and developers who strive to implement high-performing graph analytical computations.

Speaker biography.

Gabor Szarnyas is a researcher working on graph processing techniques. His core research areas are live graph pattern matching, benchmarking graph queries, and analyzing large-scale multiplex networks. His main research project is ingraph, an openCypher-compatible query engine supporting live query evaluation. His research team was the first to publish a formalisation that captures the semantics of a core subset of the openCypher language.

Gabor works at the Budapest University of Technology and Economics, teaching system modelling and database theory. He conducted research visits at the University of York, McGill University and the University of Waterloo. He is a member of the openCypher Implementers Group and the LDBC Social Network Benchmark task force. He received 1st prize in the MODELS 2016 ACM Student Research Competition conference and 2nd prize at the SIGMOD 2018 competition. He is a frequent speaker at industrial conferences (FOSDEM, GraphConnect) and meetups (openCypher meetup NYC, Budapest Neo4j meetup).

Speakers

Gabor Szarnyas

Links