The long road to lazy preemption

submited by
Style Pass
2024-11-01 09:30:04

Current kernels have four different modes that regulate when one task can be preempted in favor of another. PREEMPT_NONE, the simplest mode, only allows preemption to happen when the running task has exhausted its time slice. PREEMPT_VOLUNTARY adds a large number of points within the kernel where preemption can happen if needed. PREEMPT_FULL allows preemption at almost any point except places in the kernel that prevent it, such as when a spinlock is held. Finally, PREEMPT_RT prioritizes preemption over most other things, even making most spinlock-holding code preemptible.

A higher level of preemption enables the system to respond more quickly to events; whether an event is the movement of a mouse or an "imminent meltdown" signal from a nuclear reactor, faster response tends to be more gratifying. But a higher level of preemption can hurt the overall throughput of the system; workloads with a lot of long-running, CPU-intensive tasks tend to benefit from being disturbed as little as possible. More frequent preemption can also lead to higher lock contention. That is why the different modes exist; the optimal preemption mode will vary for different workloads.

Most distributions ship kernels built with the PREEMPT_DYNAMIC pseudo-mode, which allows any of the first three modes to be selected at boot time, with PREEMPT_VOLUNTARY being the default. On systems with debugfs mounted, the current mode can be read from /sys/kernel/debug/sched/preempt.

Leave a Comment