Brussels / 4 & 5 February 2023


InnoDB change buffer: Unsafe at any speed

The tale of some corruption bugs and how they were found

One of the innovations in InnoDB was the change buffer (originally, insert buffer), which aims to convert random I/O to more sequential I/O, by buffering certain changes to secondary index B-tree leaf pages.

Due to its design and nature, any bugs related to the change buffer are extremely hard to reproduce. The change buffer is also becoming irrelevant, as the difference between random and sequential I/O is disappearing along with rotational storage (HDDs).

Thanks to the rr debugger and some improvements to InnoDB data structures, we have been able to reproduce and fix several tricky bugs related to the InnoDB change buffer.

We shortly explain how MVCC works for InnoDB secondary indexes and how the change buffer is supposed to work.

We describe some bug scenarios at a high level, possibly showing some code snippets or procedure call stacks.

Finally, we show how instead of debugging a core dump and guessing what lead to the problem, we can use "rr record" and get a deterministic execution trace leading to the failure in "rr replay". We can set breakpoints and data watchpoints and examine the state of the traced process at any point of time of execution. This even works across process boundaries, for crash recovery bugs.


Marko Mäkelä