Unlocking extra cluster capacity with enhanced Linux cgroup scheduling
- Track: Kernel
- Room: UA2.114 (Baudoux)
- Day: Sunday
- Start: 16:40
- End: 17:00
- Video only: ua2114
- Chat: Join the conversation!
Cluster orchestrators such as Kubernetes rely on an accurate model of the resources available on each worker node in a cluster and on the resources a given job requires, using this information to place the job onto a suitable worker node in the cluster. If either is inaccurate, the orchestrator will make poor job placement decisions, resulting in poor performance.
I observe that Linux kernel scheduling overheads can, for workloads making heavy use of Linux's group scheduling (cgroups) which include common serverless workloads, become so significant as to make the orchestrator model of worker node resources inaccurate. In practice this effect is mitigated by over-provisioning the cluster.
I propose and evaluate an enhancement to the Linux Completely Fair Scheduler (CFS) that mitigates these effects. By prioritising task completion over strict fairness, the enhanced scheduler is able to drain contended CPU run queues more rapidly and reduce time lost to context switching. Experimental results show that this approach can deliver equivalent performance using up at least 10% fewer worker nodes, significantly improving cluster efficiency.
Speakers
| Al Amjad Isstaif |