linux-toradex.git/kernel/entry/common.c, branch v6.16-rc6

entry: Inline syscall_exit_to_user_mode()

2025-04-29T06:27:10+00:00

Similar to commit 221a164035fd ("entry: Move syscall_enter_from_user_mode()
to header file"), move syscall_exit_to_user_mode() to the header file as
well.

Testing was done with the byte-unixbench syscall benchmark (which calls
getpid) and QEMU. On riscv I measured a 7.09246% improvement, on x86 a
2.98843% improvement, on loongarch a 6.07954% improvement, and on s390 a
11.1328% improvement.

The Intel bot also reported "kernel test robot noticed a 1.9% improvement
of stress-ng.seek.ops_per_sec".

Signed-off-by: Charlie Jenkins 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Alexandre Ghiti 
Link: https://lore.kernel.org/all/20250320-riscv_optimize_entry-v6-4-63e187e26041@rivosinc.com
Link: https://lore.kernel.org/linux-riscv/202502051555.85ae6844-lkp@intel.com/

seccomp: remove the 'sd' argument from __secure_computing()

2025-02-10T17:26:22+00:00

After the previous changes 'sd' is always NULL.

Signed-off-by: Oleg Nesterov 
Reviewed-by: Kees Cook 
Link: https://lore.kernel.org/r/20250128150313.GA15336@redhat.com
Signed-off-by: Kees Cook

sched: Add TIF_NEED_RESCHED_LAZY infrastructure

2024-11-05T11:55:37+00:00

Add the basic infrastructure to split the TIF_NEED_RESCHED bit in two.
Either bit will cause a resched on return-to-user, but only
TIF_NEED_RESCHED will drive IRQ preemption.

No behavioural change intended.

Suggested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Sebastian Andrzej Siewior 
Link: https://lkml.kernel.org/r/20241007075055.219540785@infradead.org

treewide: context_tracking: Rename CONTEXT_* into CT_STATE_*

2024-07-29T02:03:10+00:00

Context tracking state related symbols currently use a mix of the
CONTEXT_ (e.g. CONTEXT_KERNEL) and CT_SATE_ (e.g. CT_STATE_MASK) prefixes.

Clean up the naming and make the ctx_state enum use the CT_STATE_ prefix.

Suggested-by: Frederic Weisbecker 
Signed-off-by: Valentin Schneider 
Acked-by: Frederic Weisbecker 
Acked-by: Thomas Gleixner 
Signed-off-by: Neeraj Upadhyay

entry: Respect changes to system call number by trace_sys_enter()

2024-03-12T12:23:32+00:00

When a probe is registered at the trace_sys_enter() tracepoint, and that
probe changes the system call number, the old system call still gets
executed.  This worked correctly until commit b6ec41346103 ("core/entry:
Report syscall correctly for trace and audit"), which removed the
re-evaluation of the syscall number after the trace point.

Restore the original semantics by re-evaluating the system call number
after trace_sys_enter(). 

The performance impact of this re-evaluation is minimal because it only
takes place when a trace point is active, and compared to the actual trace
point overhead the read from a cache hot variable is negligible.

Fixes: b6ec41346103 ("core/entry: Report syscall correctly for trace and audit")
Signed-off-by: André Rösti 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20240311211704.7262-1-an.roesti@gmail.com

entry: Move syscall_enter_from_user_mode() to header file

2023-12-21T22:12:18+00:00

To allow inlining of syscall_enter_from_user_mode(), move it
to entry-common.h.

Signed-off-by: Sven Schnelle 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20231218074520.1998026-4-svens@linux.ibm.com

entry: Move enter_from_user_mode() to header file

2023-12-21T22:12:18+00:00

To allow inlining of enter_from_user_mode(), move it to
entry-common.h.

Signed-off-by: Sven Schnelle 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20231218074520.1998026-3-svens@linux.ibm.com

entry: Move exit to usermode functions to header file

2023-12-21T22:12:18+00:00

To allow inlining, move exit_to_user_mode() to
entry-common.h.

Signed-off-by: Sven Schnelle 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20231218074520.1998026-2-svens@linux.ibm.com

entry: Remove empty addr_limit_user_check()

2023-08-23T08:32:39+00:00

Back when set_fs() was a generic API for altering the address limit,
addr_limit_user_check() was a safety measure to prevent userspace being
able to issue syscalls with an unbound limit.

With the the removal of set_fs() as a generic API, the last user of
addr_limit_user_check() was removed in commit:

  b5a5a01d8e9a44ec ("arm64: uaccess: remove addr_limit_user_check()")

... as since that commit, no architecture defines TIF_FSCHECK, and hence
addr_limit_user_check() always expands to nothing.

Remove addr_limit_user_check(), updating the comment in
exit_to_user_mode_prepare() to no longer refer to it. At the same time,
the comment is reworded to be a little more generic so as to cover
kmap_assert_nomap() in addition to lockdep_sys_exit().

No functional change.

Signed-off-by: Mark Rutland 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20230821163526.2319443-1-mark.rutland@arm.com

entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up

2023-03-21T14:13:15+00:00

RCU sometimes needs to perform a delayed wake up for specific kthreads
handling offloaded callbacks (RCU_NOCB).  These wakeups are performed
by timers and upon entry to idle (also to guest and to user on nohz_full).

However the delayed wake-up on kernel exit is actually performed after
the thread flags are fetched towards the fast path check for work to
do on exit to user. As a result, and if there is no other pending work
to do upon that kernel exit, the current task will resume to userspace
with TIF_RESCHED set and the pending wake up ignored.

Fix this with fetching the thread flags _after_ the delayed RCU-nocb
kthread wake-up.

Fixes: 47b8ff194c1f ("entry: Explicitly flush pending rcuog wakeup before last rescheduling point")
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Joel Fernandes (Google) 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20230315194349.10798-3-joel@joelfernandes.org