diff options
| author | Maoyi Xie <maoyixie.tju@gmail.com> | 2026-05-04 23:37:54 +0800 |
|---|---|---|
| committer | Jens Axboe <axboe@kernel.dk> | 2026-05-06 04:58:56 -0600 |
| commit | 9cc6bac1bebf8310d2950d1411a91479e86d69a1 (patch) | |
| tree | 70a1d09443ebd04a84a547f1b6342f3620d4f6cb /include | |
| parent | 04fe9aeb4f3c0999e6715385664c677469dfd8f4 (diff) | |
io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
io_uring's IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT accept a
timespec from the caller via io_parse_user_time(). With
IORING_TIMEOUT_ABS, the timestamp is an absolute deadline on the
selected clock. The clock is CLOCK_MONOTONIC by default.
CLOCK_BOOTTIME and CLOCK_REALTIME are also selectable.
A submitter inside a CLONE_NEWTIME time namespace observes
CLOCK_MONOTONIC and CLOCK_BOOTTIME shifted by the namespace's
offsets relative to the host. Every other ABS timer interface in
the kernel converts the caller's absolute time to host view via
timens_ktime_to_host() before arming an hrtimer:
kernel/time/posix-timers.c -- timer_settime(TIMER_ABSTIME)
kernel/time/posix-stubs.c -- clock_nanosleep(TIMER_ABSTIME)
kernel/time/alarmtimer.c -- alarm_timer_nsleep(TIMER_ABSTIME)
fs/timerfd.c -- timerfd_settime(TFD_TIMER_ABSTIME)
io_parse_user_time() does not. As a result, an absolute timeout
submitted from within a time namespace is interpreted in host
view. That is generally a different point in time. It may already
be in the past, causing the timer to fire immediately, or far in
the future, causing the timer not to fire when expected.
Reproducer: in unshare --user --time, with a -10s monotonic
offset, submit IORING_OP_TIMEOUT with IORING_TIMEOUT_ABS and
deadline = now + 1s. The CQE is delivered after <1ms instead of
the expected ~1s.
Apply timens_ktime_to_host() to the parsed time when
IORING_TIMEOUT_ABS is set. Split the existing clock id resolver
in io_timeout_get_clock() into a flags only helper
io_flags_to_clock(), so io_parse_user_time() can resolve the
clock without a struct io_timeout_data.
timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces, e.g. CLOCK_REALTIME. It is also a no-op for callers
in the initial time namespace. The fast path is unchanged.
SQPOLL is also covered. The SQPOLL kernel thread is created via
create_io_thread() with CLONE_THREAD and no CLONE_NEW* flag.
copy_namespaces() therefore shares the submitter's nsproxy by
reference. Inside the SQPOLL kthread, current->nsproxy->time_ns
is the submitter's time_ns. timens_ktime_to_host() resolves
correctly.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-2-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'include')
0 files changed, 0 insertions, 0 deletions
