linux-toradex.git/io_uring, branch v6.4-rc7

io_uring/io-wq: clear current->worker_private on exit

2023-06-14T18:54:55+00:00

A recent fix stopped clearing PF_IO_WORKER from current->flags on exit,
which meant that we can now call inc/dec running on the worker after it
has been removed if it ends up scheduling in/out as part of exit.

If this happens after an RCU grace period has passed, then the struct
pointed to by current->worker_private may have been freed, and we can
now be accessing memory that is freed.

Ensure this doesn't happen by clearing the task worker_private field.
Both io_wq_worker_running() and io_wq_worker_sleeping() check this
field before going any further, and we don't need any accounting etc
done after this worker has exited.

Fixes: fd37b884003c ("io_uring/io-wq: don't clear PF_IO_WORKER on exit")
Reported-by: Zorro Lang 
Signed-off-by: Jens Axboe

io_uring/net: save msghdr->msg_control for retries

2023-06-14T01:26:42+00:00

If the application sets ->msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.

Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/880
Reported-and-tested-by: Marek Majkowski 
Signed-off-by: Jens Axboe

io_uring/io-wq: don't clear PF_IO_WORKER on exit

2023-06-12T18:16:09+00:00

A recent commit gated the core dumping task exit logic on current->flags
remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
This exposed a problem with the io-wq handling of that, which explicitly
clears PF_IO_WORKER before calling do_exit().

The reasons for this manual clear of PF_IO_WORKER is historical, where
io-wq used to potentially trigger a sleep on exit. As the io-wq thread
is exiting, it should not participate any further accounting. But these
days we don't need to rely on current->flags anymore, so we can safely
remove the PF_IO_WORKER clearing.

Reported-by: Zorro Lang 
Reported-by: Dave Chinner 
Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/
Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Jens Axboe 
Signed-off-by: Linus Torvalds

io_uring: undeprecate epoll_ctl support

2023-05-27T02:22:41+00:00

Libuv recently started using it so there is at least one consumer now.

Cc: stable@vger.kernel.org
Fixes: 61a2732af4b0 ("io_uring: deprecate epoll_ctl support")
Link: https://github.com/libuv/libuv/pull/3979
Signed-off-by: Ben Noordhuis 
Link: https://lore.kernel.org/r/20230506095502.13401-1-info@bnoordhuis.nl
Signed-off-by: Jens Axboe

io_uring: unlock sqd->lock before sq thread release CPU

2023-05-25T15:30:13+00:00

The sq thread actively releases CPU resources by calling the
cond_resched() and schedule() interfaces when it is idle. Therefore,
more resources are available for other threads to run.

There exists a problem in sq thread: it does not unlock sqd->lock before
releasing CPU resources every time. This makes other threads pending on
sqd->lock for a long time. For example, the following interfaces all
require sqd->lock: io_sq_offload_create(), io_register_iowq_max_workers()
and io_ring_exit_work().

Before the sq thread releases CPU resources, unlocking sqd->lock will
provide the user a better experience because it can respond quickly to
user requests.

Signed-off-by: Kanchan Joshi
Signed-off-by: Wenwen Chen
Link: https://lore.kernel.org/r/20230525082626.577862-1-wenwen.chen@samsung.com
Signed-off-by: Jens Axboe

Merge tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux

2023-05-07T17:00:09+00:00

Pull more io_uring updates from Jens Axboe:
 "Nothing major in here, just two different parts:

   - A small series from Breno that enables passing the full SQE down
     for ->uring_cmd().

     This is a prerequisite for enabling full network socket operations.
     Queued up a bit late because of some stylistic concerns that got
     resolved, would be nice to have this in 6.4-rc1 so the dependent
     work will be easier to handle for 6.5.

   - Fix for the huge page coalescing, which was a regression introduced
     in the 6.3 kernel release (Tobias)"

* tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux:
  io_uring: Remove unnecessary BUILD_BUG_ON
  io_uring: Pass whole sqe to commands
  io_uring: Create a helper to return the SQE size
  io_uring/rsrc: check for nonconsecutive pages

io_uring: Remove unnecessary BUILD_BUG_ON

2023-05-04T14:19:05+00:00

In the io_uring_cmd_prep_async() there is an unnecessary compilation time
check to check if cmd is correctly placed at field 48 of the SQE.

This is unnecessary, since this check is already in place at
io_uring_init():

          BUILD_BUG_SQE_ELEM(48, __u64,  addr3);

Remove it and the uring_cmd_pdu_size() function, which is not used
anymore.

Keith started a discussion about this topic in the following thread:
Link: https://lore.kernel.org/lkml/ZDBmQOhbyU0iLhMw@kbusch-mbp.dhcp.thefacebook.com/

Signed-off-by: Breno Leitao 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Pavel Begunkov 
Link: https://lore.kernel.org/r/20230504121856.904491-4-leitao@debian.org
Signed-off-by: Jens Axboe

io_uring: Pass whole sqe to commands

2023-05-04T14:19:05+00:00

Currently uring CMD operation relies on having large SQEs, but future
operations might want to use normal SQE.

The io_uring_cmd currently only saves the payload (cmd) part of the SQE,
but, for commands that use normal SQE size, it might be necessary to
access the initial SQE fields outside of the payload/cmd block.  So,
saves the whole SQE other than just the pdu.

This changes slightly how the io_uring_cmd works, since the cmd
structures and callbacks are not opaque to io_uring anymore. I.e, the
callbacks can look at the SQE entries, not only, in the cmd structure.

The main advantage is that we don't need to create custom structures for
simple commands.

Creates io_uring_sqe_cmd() that returns the cmd private data as a null
pointer and avoids casting in the callee side.
Also, make most of ublk_drv's sqe->cmd priv structure into const, and use
io_uring_sqe_cmd() to get the private structure, removing the unwanted
cast. (There is one case where the cast is still needed since the
header->{len,addr} is updated in the private structure)

Suggested-by: Pavel Begunkov 
Signed-off-by: Breno Leitao 
Reviewed-by: Keith Busch 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Pavel Begunkov 
Link: https://lore.kernel.org/r/20230504121856.904491-3-leitao@debian.org
Signed-off-by: Jens Axboe

io_uring: Create a helper to return the SQE size

2023-05-04T14:19:05+00:00

Create a simple helper that returns the size of the SQE. The SQE could
have two size, depending of the flags.

If IO_URING_SETUP_SQE128 flag is set, then return a double SQE,
otherwise returns the sizeof of io_uring_sqe (64 bytes).

Signed-off-by: Breno Leitao 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Pavel Begunkov 
Link: https://lore.kernel.org/r/20230504121856.904491-2-leitao@debian.org
Signed-off-by: Jens Axboe

io_uring/rsrc: check for nonconsecutive pages

2023-05-03T15:00:22+00:00

Pages that are from the same folio do not necessarily need to be
consecutive. In that case, we cannot consolidate them into a single bvec
entry. Before applying the huge page optimization from commit 57bebf807e2a
("io_uring/rsrc: optimise registered huge pages"), check that the memory
is actually consecutive.

Cc: stable@vger.kernel.org
Fixes: 57bebf807e2a ("io_uring/rsrc: optimise registered huge pages")
Signed-off-by: Tobias Holl 
[axboe: formatting]
Signed-off-by: Jens Axboe