drm/v3d: Address race-condition between per-fd GPU stats and fd release #6973

mairacanal · 2025-07-24T22:21:15Z

This patch series was initially motivated by a race condition (exposed in PATCH 4/4) where we lacked synchronization for job->file access. This led to use-after-free issues when a file descriptor was closed while a job was still running.

However, beyond fixing this specific race, the series introduces broader improvements to active job management and locking. While PATCH 1/4 and 2/4 are primarily code refactors, PATCH 3/4 introduces changes to the locking scheme. Previously, all queues shared the same spinlock, which caused unnecessary contention during high GPU usage across different queues. PATCH 3/4 allows queues to operate more independently.

[1] Upstream series: https://lore.kernel.org/dri-devel/[email protected]/T/

Closes #6958

Instead of storing a pointer to the DRM file data, store a pointer directly to the private V3D file struct. Signed-off-by: Maíra Canal <[email protected]>

Instead of storing the queue's active job in four different variables, store the active job inside the queue's state. This way, it's possible to access all active jobs using an index based in `enum v3d_queue`. Signed-off-by: Maíra Canal <[email protected]>

Each V3D queue works independently and all the dependencies between the jobs are handled through the DRM scheduler. Therefore, there is no need to use one single lock for all queues. Using it, creates unnecessary contention between different queues that can operate independently. Replace the global spinlock with per-queue locks to improve parallelism and reduce contention between different V3D queues (BIN, RENDER, TFU, CSD). This allows independent queues to operate concurrently while maintaining proper synchronization within each queue. Signed-off-by: Maíra Canal <[email protected]>

When the file descriptor is closed while a job is still running, there's a race condition between the job completion callback and the file descriptor cleanup. This can lead to accessing freed memory when updating per-fd GPU stats, such as the following example: [56120.512903] Unable to handle kernel paging request at virtual address 0000330a92b9688a [56120.520881] Mem abort info: [56120.523687] ESR = 0x0000000096000005 [56120.527454] EC = 0x25: DABT (current EL), IL = 32 bits [56120.532785] SET = 0, FnV = 0 [56120.535847] EA = 0, S1PTW = 0 [56120.538995] FSC = 0x05: level 1 translation fault [56120.543891] Data abort info: [56120.546778] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [56120.552289] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [56120.557362] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [56120.562690] user pgtable: 16k pages, 47-bit VAs, pgdp=0000000023f54000 [56120.569239] [0000330a92b9688a] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [56120.577975] Internal error: Oops: 0000000096000005 [raspberrypi#1] PREEMPT SMP CPU: 0 UID: 1000 PID: 1497409 Comm: mpv Not tainted 6.12.37-ncvm5+ raspberrypi#1 Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT) pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : v3d_job_update_stats+0x64/0x168 [v3d] lr : v3d_job_update_stats+0x40/0x168 [v3d] sp : ffffc00080003e60 x29: ffffc00080003e60 x28: ffff800002860000 x27: 0000000000000000 x26: 0000000000000000 x25: ffff800002860000 x24: ffff800002630800 x23: ffff800060786000 x22: 0000330a933c31fb x21: 0000000000000001 x20: 0000330a92b96302 x19: ffff800060786b10 x18: 0000000000000000 x17: ffffaf90506a0000 x16: ffffd06fce57c360 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffffd06f5d0fec40 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000002978dbd535a x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : 0000300001fddf88 x2 : 0000000000000020 x1 : 0000000000010001 x0 : 0000330a92b96872 Call trace: v3d_job_update_stats+0x64/0x168 [v3d] v3d_irq+0x118/0x2e0 [v3d] __handle_irq_event_percpu+0x60/0x220 Fix such an issue by protecting all accesses to `job->file_priv` with the queue's lock. With that, we can clear `job->file_priv` before the V3D per-fd structure is freed and assure that `job->file_priv` exists during the per-fd GPU stats updates. Fixes: e1bc3a1 ("drm/v3d: Avoid NULL pointer dereference in `v3d_job_update_stats()`") Signed-off-by: Maíra Canal <[email protected]>

popcornmix · 2025-07-25T10:50:35Z

Changes look sane, and a test of kodi, neverball and webgl aquarium worked fine.

mairacanal · 2025-07-25T11:41:39Z

Changes look sane, and a test of kodi, neverball and webgl aquarium worked fine.

I also built a kernel out of this branch and submitted it to Mesa CI to run the full CTS testing suite. No regressions were identified.

Thanks for your review!

kernel: drm/v3d: Address race-condition between per-fd GPU stats and fd release See: raspberrypi/linux#6973 kernel: pwm: rp1: use pwmchip_get_drvdata() instead of container_of() See: raspberrypi/linux#6972

mairacanal added 4 commits July 24, 2025 19:10

drm/v3d: Store a pointer to struct v3d_file_priv inside each job

0b42028

Instead of storing a pointer to the DRM file data, store a pointer directly to the private V3D file struct. Signed-off-by: Maíra Canal <[email protected]>

popcornmix approved these changes Jul 25, 2025

View reviewed changes

pelwell merged commit 5a72e3a into raspberrypi:rpi-6.12.y Jul 25, 2025
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

drm/v3d: Address race-condition between per-fd GPU stats and fd release #6973

drm/v3d: Address race-condition between per-fd GPU stats and fd release #6973

mairacanal commented Jul 24, 2025

Uh oh!

popcornmix commented Jul 25, 2025

Uh oh!

mairacanal commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

drm/v3d: Address race-condition between per-fd GPU stats and fd release #6973

drm/v3d: Address race-condition between per-fd GPU stats and fd release #6973

Conversation

mairacanal commented Jul 24, 2025

Uh oh!

popcornmix commented Jul 25, 2025

Uh oh!

mairacanal commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!