Brussels / 2 & 3 February 2019

schedule

HPC, Big Data and Data Science devroom


09 10 11 12 13 14 15 16 17 18
Sunday RAPIDS
Data Science on GPUs
OpenHPC Update CK: an open-source framework to automate, reproduce, crowdsource and reuse experiments at HPC conferences Couple scientific simulation codes with preCICE
A journey towards sustainable research software
ReFrame: A Regression Testing and Continuous Integration Framework for HPC systems Reproducible science with containers on HPC through Singularity
Singularity containers
Nakadi: Streaming Events for 100s of Teams
Serving all sorts of users and use cases, the sane way
NUMAPROF, A NUMA Memory Profiler Setting up an HPC lab from scratch
with Mr-Provisioner, Jenkins and Ansible
Speeding up Programs with OpenACC in GCC The 8 Principles of Production Data Science The convergence of HPC and BigData
What does it mean for HPC sysadmins?
Introducing Kubeflow
(w. Special Guests Tensorflow and Apache Spark)
Validating Big Data Jobs
An exploration with Spark & Airflow (+ friends)
From Zero to Portability
Apache Beam's Journey to Cross-Language Data Processing
Streaming Pipelines for Neural Machine Translation Deep Learning on Massively Parallel Processing Databases Condition Monitoring & Transfer Learning
Good predictions in situations with (initially) almost no data

High Performance Computing (HPC) and Big Data are two important approaches to scientific computing. HPC typically deals with smaller, highly structured data sets and huge amounts of computation while Big Data, not surprisingly, deals with gigantic, unstructured data sets and focuses on the I/O bottlenecks. With the Big Data trend unlocking access to an unprecedented amount of data, Data Science has emerged to tackle the problem of creating processes and approaches to extracting knowledge or insights from these data sets. Machine learning and predictive analytics algorithms have joined the family of more traditional HPC algorithms and are pushing the requirements of cluster and data scalability.

Free and Open Source communities have been the foundation of the HPC and Big Data communities for some time. In the HPC community, it should be no surprise that currently 100% of the Top500 supercomputers in the world run (some variant of) Linux. On the Big Data side, the Hadoop ecosystem has had a tremendous amount of Open Source contributions from a wide range of organizations coming together under the Apache Software Foundation.

Our goal is to bring the communities together, share expertise, learn how we can benefit from each other's work and foster further joint research and collaboration. We welcome talks about Free and Open Source solutions to the challenges presented by large scale computing, data management and data analysis.

Event Speakers Start End

Sunday

  RAPIDS
Data Science on GPUs
chau 09:00 09:45
  OpenHPC Update Adrian Reber 09:50 10:15
  CK: an open-source framework to automate, reproduce, crowdsource and reuse experiments at HPC conferences Grigori Fursin 10:15 10:40
  Couple scientific simulation codes with preCICE
A journey towards sustainable research software
Gerasimos Chourdakis 10:45 11:10
  ReFrame: A Regression Testing and Continuous Integration Framework for HPC systems Victor Holanda 11:15 11:40
  Reproducible science with containers on HPC through Singularity
Singularity containers
Eduardo Arango 11:45 12:10
  Nakadi: Streaming Events for 100s of Teams
Serving all sorts of users and use cases, the sane way
Lionel Montrieux 12:15 12:25
  NUMAPROF, A NUMA Memory Profiler Sébastien Valat 12:25 12:35
  Setting up an HPC lab from scratch
with Mr-Provisioner, Jenkins and Ansible
Renato Golin 12:40 12:50
  Speeding up Programs with OpenACC in GCC Thomas Schwinge 12:50 13:00
  The 8 Principles of Production Data Science Alejandro 13:00 13:10
  The convergence of HPC and BigData
What does it mean for HPC sysadmins?
Damien François 13:15 13:40
  Introducing Kubeflow
(w. Special Guests Tensorflow and Apache Spark)
Trevor Grant 13:40 14:05
  Validating Big Data Jobs
An exploration with Spark & Airflow (+ friends)
Holden Karau 14:05 14:30
  From Zero to Portability
Apache Beam's Journey to Cross-Language Data Processing
Maximilian Michels 14:30 14:55
  Streaming Pipelines for Neural Machine Translation joern 15:30 15:55
  Deep Learning on Massively Parallel Processing Databases Frank McQuillan 16:00 16:25
  Condition Monitoring & Transfer Learning
Good predictions in situations with (initially) almost no data
Daniel Germanus 16:30 16:55