Brussels / 3 & 4 February 2018


AMENDMENT DNA sequencing performance in Go, C++, and Java

While Go is not primarily designed for parallel programming, it nevertheless has features that end up being beneficial for parallelism as well, especially the inclusion of a work-stealing scheduler for goroutines and a concurrent, parallel garbage collector. For this reason, we have recently included Go as one of several candidate programming languages in an evaluation of their suitability for expressing sequencing pipelines. Other programming languages we have evaluated were C++ and Java. Go hits a sweet spot of performing very close to the best results with little programming effort and few compromises in terms of safety and generality.

This talk will present highlights of this experiments and the most important insights.

A DNA sequencer takes a DNA sample, such as human tissue, and applies chemical processes to eventually read small fragments of the DNA sample and output them as large files. These files are then fed into software pipelines that reconstruct the original DNA sequence from those fragments, among other things. Such sequencing pipelines need large amounts of storage, on disk and/or in RAM, and can strongly benefit from parallel execution to improve runtime performance. Data sets for human DNA samples are usually in the order of several hundreds of GB of uncompressed data, and runtimes are typically in the order of several hours for single samples.

Go is usually presented as a language designed for concurrent applications, and it is usually stressed that support for parallel programming was not a major design goal. It may therefore be surprising to a larger audience that Go actually fares very well for parallel programming tasks, with relatively little effort. This talk may encourage Go developers to explore the language for application areas that can benefit from parallelism, beyond the domains where Go is typically used.

Please note that this talk replaces one entitled "Writing a structured syslog parser in Go" that was due to have been given by Brian Knox, who has sent his apologies but is now unable to attend FOSDEM this year.


Photo of Pascal Costanza Pascal Costanza