summaryrefslogtreecommitdiff
path: root/kernel/sched
diff options
context:
space:
mode:
authorRik van Riel <riel@surriel.com>2026-06-16 16:38:17 -0400
committerThomas Gleixner <tglx@kernel.org>2026-06-19 21:44:16 +0200
commitde3ab9bd3133899efb92e4cd05ba4203e58fc0a3 (patch)
treefae58f6a39cd018000ae4f5743c983961b177c76 /kernel/sched
parenta552c81ff4a16738ca5a44a177d552eb38d552ce (diff)
sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and mm_cid.active is set, the CID is checked with cid_in_transit() before setting the transition bit. In per-CPU mode a newly forked or exec'd task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are assigned lazily on schedule-in. With cid_in_transit() the guard passes for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET | MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this to clear_bit() with MM_CID_UNSET as the bit number, triggering an out-of-bounds write. Symptoms: this is genuine memory corruption, but a bounded out-of-bounds write, not an arbitrary one. MM_CID_UNSET is the fixed sentinel BIT(31), so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid() strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET, mm_cidmask(mm)). The cid bitmap is embedded in the mm_struct slab object (after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus() bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a fixed offset of 2^31 / 8 == 256 MiB past the bitmap base. The address is not attacker-influenced (fixed sentinel -> fixed offset) and the op only clears a single bit; what sits 256 MiB further along the direct map is whatever kernel object happens to live there, so this corrupts one bit of unpredictable kernel memory -- it is not an arbitrary-address or arbitrary-value write. It triggers only in per-CPU CID mode, when a CPU is running an active task of the target mm whose cid is still MM_CID_UNSET -- the fork()/execve() window before that task's next schedule-in assigns it a real CID -- and a per-CPU -> per-task fixup walks over it (the mode fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred max_cids recompute in mm_cid_work_fn()). In practice syzkaller surfaced it as a KASAN use-after-free reported in __schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined via mm_cid_schedout() -> mm_drop_cid(). Guard the transition-bit assignment against MM_CID_UNSET, in addition to the existing cid_in_transit() check, so the bit is only set on a genuine task-owned CID. A CPU-owned (MM_CID_ONCPU) CID of a running active task is handled by the cid_on_cpu(pcp->cid) branch above and never reaches this path, so excluding MM_CID_UNSET (and the already-transitioning case) is sufficient. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions") Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Assisted-by: Claude:claude-opus-4-8 syzkaller Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260616203818.1516263-1-riel@surriel.com
Diffstat (limited to 'kernel/sched')
-rw-r--r--kernel/sched/core.c15
1 files changed, 13 insertions, 2 deletions
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b791e9e9f67..3cc6fb1d2054 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10909,8 +10909,19 @@ static void mm_cid_fixup_cpus_to_tasks(struct mm_struct *mm)
} else if (rq->curr->mm == mm && rq->curr->mm_cid.active) {
unsigned int cid = rq->curr->mm_cid.cid;
- /* Ensure it has the transition bit set */
- if (!cid_in_transit(cid)) {
+ /*
+ * Set the transition bit only on a genuine task-owned
+ * CID. A running active task can legitimately have
+ * MM_CID_UNSET here: in per-CPU mode CIDs are assigned
+ * lazily on schedule-in, so the fork()/execve() window
+ * leaves the task active with no owned CID. Setting the
+ * transition bit on MM_CID_UNSET would later feed
+ * clear_bit() an out-of-bounds bit number via
+ * mm_cid_schedout(), so exclude it. A CPU-owned
+ * (MM_CID_ONCPU) CID is handled by the cid_on_cpu()
+ * branch above and never reaches here.
+ */
+ if (cid != MM_CID_UNSET && !cid_in_transit(cid)) {
cid = cid_to_transit_cid(cid);
rq->curr->mm_cid.cid = cid;
pcp->cid = cid;