Brussels / 2 & 3 February 2013

schedule

CRIU: Checkpoint and Restore (mostly) In Userspace


Checkpoint/restore is a feature that allows to freeze a set of running processes and save their complete state to disk. Unfortunately, many attempts to merge such functionality to the upstream Linux kernel failed miserably, mostly for the code complexity reasons. OpenVZ kernel developers team found a way to overcome this inability to merge the code upstream, by implementing most of the required pieces in userspace, with a minimal intervention into the kernel.

Checkpoint/restore is a feature that allows to freeze a set of running processes and save their complete state to disk. This state can later be restored and the processes are resumed exactly the way they were running before. This feature opens a whole set of possibilities, from doing a live migration to fast start of huge applications. Unfortunately, many attempts to merge such functionality to the upstream Linux kernel failed miserably, mostly for the code complexity reasons. That leaves the Linux community with a poor option of using the non-upstreamed kernel patches available from e.g. OpenVZ or Oren Laadan. OpenVZ kernel developers team found a way to overcome this inability to merge the code upstream, by implementing most of the required pieces in userspace, with a minimal intervention into the kernel. The project started about a year ago, but it’s already enough powerful. Now CRIU is capable to dump an LXC container with Apache and MySQL. This report will describe basic design of CRUI and and highlight some interesting parts of it such as dumping and restoring TCP connections. In addition it will describe some interesting usage scenarios such as rebooting to a newer kernel in a few seconds without losing state of processes and network connections. The report will be interesting for system and distro developers, advanced users, and anyone interested in containers, virtualization, and high availability.

Speakers

Andrey Vagin

Links