FOSDEM '10 is a free and non-commercial event organized by the community, for the community. Its goal is to provide Free and Open Source developers a place to meet. No registration necessary.

   

Interview: Bernard Li

Bernard Li will give a talk about Ganglia at FOSDEM 2010.

Could you briefly introduce yourself? And how are you involved in the Ganglia project?

I have worked in the High Performance Computing field for the past six years and have been involved with various HPC related deployment/provisioning and monitoring open source software. I started out in the Ganglia project by maintaining the RPM spec file and working on the web frontend. I am currently one of the project administrators.

What will your talk be about, exactly?

The talk will give a brief introduction about Ganglia's humble beginnings and how it has evolved into the de facto standard for monitoring a large collection of computers. Discussions of the technical architecture will then take place and we will also dive into more advanced topics such as scalability issues with large installations, writing C/Python metrics, etc. We will also have a user testimonial from a member of the Ganglia community who has deployed it in a tier-1 investment bank.

What do you hope to accomplish by giving this talk? What do you expect?

I hope to introduce Ganglia to more people and attract more users/developers to contribute to the project. A group of developers I am particularly interested in attracting attention from are web frontend/AJAX developers -- we have a great backend that collects a lot of data, it would be nice to see what we can do to improve the visualization of these data.

What's the history of the Ganglia project? How did it evolve?

Ganglia was started around 1999 by Matt Massie as part of the Berkeley Millennium Project. Since then the project has seen 40+ releases and 299,208 total downloads recorded by SourceForge.net. Currently it is one of the de facto standards for monitoring clusters and grids. The project started out as a monitoring tool for HPC clusters and grids, and is now pervasive in web farms, large enterprises and is also gaining popularity in cloud computing environments.

Why would one choose Ganglia over the countless other monitoring systems? What's the unique selling point?

Ganglia has been around for the past 10 years and is known for its scalable design and easy installation/setup. It has a very small system resource footprint and can be easily deployed to a wide range of platforms without interfering with other applications. Besides the 30+ default system resource metrics such as CPU, memory, load, networking statistics, additional user-defined monitoring metrics can be easily added to the system by using gmetric or the new C/Python plugin interface. The PHP-based frontend has also seen a number of enhancements contributed due to its modular design. Ganglia is in general great for real-time monitoring of systems as well as trending for purposes of capacity planning or system-level troubleshooting.

Does Ganglia also handle notification, e.g. when a node in the cluster goes down?

Not by design. Software such as Nagios is designed to do that. However, there is ongoing work to make the metrics aggregation daemon (gmetad) modular such that you could write your own code to manipulate the collected data. Some examples are storing the metric data in a SQL database, send out alerts when metric passes a certain threshold. There is a proof-of-concept re-write of gmetad in Python currently in our source code repository's development trunk.

On the Ganglia wiki, I read that it "can scale to handle clusters with 2000 nodes." Is this a hard limit or just the biggest cluster being monitored by Ganglia at this moment?

This is by no means the hard limit. It was simply the largest cluster that was reported to us at the time the wiki page was updated. Come to the talk to learn about one of the largest installations and see how the size of your installation stacks up.

How many developers are working on Ganglia?

Over the years we've had numerous developers work on the project. Currently we have around 4-5 active developers working on the project.

What new features will we see in Ganglia in 2010?

Since the release of Ganglia 3.1 with the modular plugin interface for metric monitoring, we have seen a steady influx of user-contributed code for monitoring various metrics with the new interface, so I have little doubt that such enhancements will continue to come in. Other ongoing initiatives such as native Windows client, a modular interface for gmetad as well as further enhancements to the web frontend are under way.

Have you enjoyed previous FOSDEM editions?

No, this will be my first FOSDEM. I look forward to a great meeting!

Creative Commons License
This interview is licensed under a Creative Commons Attribution 2.0 Belgium License.