How I helped fix sleep-wake hangs on Linux with AMD GPUs

submited by
Style Pass
2024-12-30 14:30:03

I dual-boot my desktop between Windows and Linux. Over the past few years, Linux would often crash when I tried to sleep my computer with high RAM usage. Upon waking it would show a black screen with moving cursor, or enter a "vegetative" state with no image on-screen, only responding to magic SysRq or a hard reset. I traced this behavior to an amdgpu driver power/memory management bug, which took over a year to brainstorm and implement solutions for.

I started debugging this issue in 2023-09. My setup was a Gigabyte B550M DS3H motherboard with AMD RX 570 GPU and 1TB Kingston A2000 NVMe SSD, running Arch Linux with systemd-boot and Linux 6.4.

The first thing I do after a system crash is to check the journals. For example, journalctl --system -b -1 will print system logs from the previous boot (dmesg and system services, excluding logs from my user account's apps).

The output showed that some sleep attempts had out-of-memory (OOM) errors in kernel code under amdgpu_device_suspend, and it took one or more failed attempts before the system crashed. Though oftentimes journalctl would print no logs whatsoever of the broken system waking up, terminating at:

Leave a Comment