Brussels / 2 & 3 February 2013

schedule

Divide and conquer in the cloud: one big server or many small ones?


The MySQL RDBMS is traditionally tuned for OLTP workloads. OLAP (analytic) workloads have traditionally been sub-optimal. While it is true that Amazon EC2 and other cloud providers now provide large machines with SSD disks, these are still best for OLTP workloads.

The largest EC2 server is limited to 64GB. MySQL queries are single threaded. Perhaps spreading your data over eight 17.1GB servers might cost the same(or less) and perform significantly better. So, the question is, is one big server with lots of very fast storage the best option for analytic queries?

This talk will introduce Shard-Query which can spread data over many servers but treat the set as one big server. Basic features will be discussed, but the focus is on performance, not how Shard-Query works.

The talk will compare the price/performance difference of OLAP queries on one ""Quadruple Extra Large High-IO"" server compared with eight ""Extra Large High Memory"" servers. While eight servers increase operational complexity, the performance improvement trade-off may very well be acceptable.

Shard-Query is an open source (bsd licensed) MPP query engine for MySQL: http://code.google.com/p/shard-query

Speakers

Justin Swanhart

Attachments