linux-toradex.git/fs/proc, branch v5.12-rc5

mm: use is_cow_mapping() across tree where proper

2021-03-13T19:27:30+00:00

After is_cow_mapping() is exported in mm.h, replace some manual checks
elsewhere throughout the tree but start to use the new helper.

Link: https://lkml.kernel.org/r/20210217233547.93892-5-peterx@redhat.com
Signed-off-by: Peter Xu 
Reviewed-by: Jason Gunthorpe 
Cc: VMware Graphics 
Cc: Roland Scheidegger 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Mike Kravetz 
Cc: Alexey Dobriyan 
Cc: Andrea Arcangeli 
Cc: Christoph Hellwig 
Cc: David Gibson 
Cc: Gal Pressman 
Cc: Jan Kara 
Cc: Jann Horn 
Cc: Kirill Shutemov 
Cc: Kirill Tkhai 
Cc: Matthew Wilcox 
Cc: Miaohe Lin 
Cc: Mike Rapoport 
Cc: Wei Zhang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block

2021-02-27T16:29:02+00:00

Pull io_uring thread rewrite from Jens Axboe:
 "This converts the io-wq workers to be forked off the tasks in question
  instead of being kernel threads that assume various bits of the
  original task identity.

  This kills > 400 lines of code from io_uring/io-wq, and it's the worst
  part of the code. We've had several bugs in this area, and the worry
  is always that we could be missing some pieces for file types doing
  unusual things (recent /dev/tty example comes to mind, userfaultfd
  reads installing file descriptors is another fun one... - both of
  which need special handling, and I bet it's not the last weird oddity
  we'll find).

  With these identical workers, we can have full confidence that we're
  never missing anything. That, in itself, is a huge win. Outside of
  that, it's also more efficient since we're not wasting space and code
  on tracking state, or switching between different states.

  I'm sure we're going to find little things to patch up after this
  series, but testing has been pretty thorough, from the usual
  regression suite to production. Any issue that may crop up should be
  manageable.

  There's also a nice series of further reductions we can do on top of
  this, but I wanted to get the meat of it out sooner rather than later.
  The general worry here isn't that it's fundamentally broken. Most of
  the little issues we've found over the last week have been related to
  just changes in how thread startup/exit is done, since that's the main
  difference between using kthreads and these kinds of threads. In fact,
  if all goes according to plan, I want to get this into the 5.10 and
  5.11 stable branches as well.

  That said, the changes outside of io_uring/io-wq are:

   - arch setup, simple one-liner to each arch copy_thread()
     implementation.

   - Removal of net and proc restrictions for io_uring, they are no
     longer needed or useful"

* tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits)
  io-wq: remove now unused IO_WQ_BIT_ERROR
  io_uring: fix SQPOLL thread handling over exec
  io-wq: improve manager/worker handling over exec
  io_uring: ensure SQPOLL startup is triggered before error shutdown
  io-wq: make buffered file write hashed work map per-ctx
  io-wq: fix race around io_worker grabbing
  io-wq: fix races around manager/worker creation and task exit
  io_uring: ensure io-wq context is always destroyed for tasks
  arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
  io_uring: cleanup ->user usage
  io-wq: remove nr_process accounting
  io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
  net: remove cmsg restriction from io_uring based send/recvmsg calls
  Revert "proc: don't allow async path resolution of /proc/self components"
  Revert "proc: don't allow async path resolution of /proc/thread-self components"
  io_uring: move SQPOLL thread io-wq forked worker
  io-wq: make io_wq_fork_thread() available to other users
  io-wq: only remove worker from free_list, if it was there
  io_uring: remove io_identity
  io_uring: remove any grabbing of context
  ...

proc: use kvzalloc for our kernel buffer

2021-02-26T17:41:03+00:00

Since

  sysctl: pass kernel pointers to ->proc_handler

we have been pre-allocating a buffer to copy the data from the proc
handlers into, and then copying that to userspace.  The problem is this
just blindly kzalloc()'s the buffer size passed in from the read, which in
the case of our 'cat' binary was 64kib.  Order-4 allocations are not
awesome, and since we can potentially allocate up to our maximum order, so
use kvzalloc for these buffers.

[willy@infradead.org: changelog tweaks]

Link: https://lkml.kernel.org/r/6345270a2c1160b89dd5e6715461f388176899d1.1612972413.git.josef@toxicpanda.com
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Josef Bacik 
Reviewed-by: Christoph Hellwig 
Acked-by: Vlastimil Babka 
Cc: Al Viro 
Cc: Alexey Dobriyan 
CC: Matthew Wilcox 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

proc/wchan: use printk format instead of lookup_symbol_name()

2021-02-26T17:41:03+00:00

To resolve the symbol fuction name for wchan, use the printk format
specifier %ps instead of manually looking up the symbol function name
via lookup_symbol_name().

Link: https://lkml.kernel.org/r/20201217165413.GA1959@ls3530.fritz.box
Signed-off-by: Helge Deller 
Cc: Alexey Dobriyan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

vmalloc: remove redundant NULL check

2021-02-24T21:38:30+00:00

Fix below warnings reported by coccicheck:

  fs/proc/vmcore.c:1503:2-7: WARNING: NULL check before some freeing functions is not needed.

Link: https://lkml.kernel.org/r/1611216753-44598-1-git-send-email-abaci-bugfix@linux.alibaba.com
Signed-off-by: Yang Li 
Reported-by: Abaci Robot 
Acked-by: Baoquan He 
Cc: Dave Young 
Cc: Vivek Goyal 
Cc: Alexey Dobriyan 
Cc: "Uladzislau Rezki (Sony)" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages

2021-02-24T21:38:29+00:00

Currently we use struct per_cpu_nodestat to cache the vmstat counters,
which leads to inaccurate statistics especially THP vmstat counters.  In
the systems with hundreds of processors it can be GBs of memory.  For
example, for a 96 CPUs system, the threshold is the maximum number of 125.
And the per cpu counters can cache 23.4375 GB in total.

The THP page is already a form of batched addition (it will add 512 worth
of memory in one go) so skipping the batching seems like sensible.
Although every THP stats update overflows the per-cpu counter, resorting
to atomic global updates.  But it can make the statistics more accuracy
for the THP vmstat counters.

So we convert the NR_FILE_PMDMAPPED account to pages.  This patch is
consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page
arrival").  Doing this also can make the unit of vmstat counters more
unified.  Finally, the unit of the vmstat counters are pages, kB and
bytes.  The B/KB suffix can tell us that the unit is bytes or kB.  The
rest which is without suffix are pages.

Link: https://lkml.kernel.org/r/20201228164110.2838-7-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Cc: Alexey Dobriyan 
Cc: Feng Tang 
Cc: Greg Kroah-Hartman 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Michal Hocko 
Cc: NeilBrown 
Cc: Pankaj Gupta 
Cc: Rafael. J. Wysocki 
Cc: Randy Dunlap 
Cc: Roman Gushchin 
Cc: Sami Tolvanen 
Cc: Shakeel Butt 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: convert NR_SHMEM_PMDMAPPED account to pages

2021-02-24T21:38:29+00:00

Currently we use struct per_cpu_nodestat to cache the vmstat counters,
which leads to inaccurate statistics especially THP vmstat counters.  In
the systems with hundreds of processors it can be GBs of memory.  For
example, for a 96 CPUs system, the threshold is the maximum number of 125.
And the per cpu counters can cache 23.4375 GB in total.

The THP page is already a form of batched addition (it will add 512 worth
of memory in one go) so skipping the batching seems like sensible.
Although every THP stats update overflows the per-cpu counter, resorting
to atomic global updates.  But it can make the statistics more accuracy
for the THP vmstat counters.

So we convert the NR_SHMEM_PMDMAPPED account to pages.  This patch is
consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page
arrival").  Doing this also can make the unit of vmstat counters more
unified.  Finally, the unit of the vmstat counters are pages, kB and
bytes.  The B/KB suffix can tell us that the unit is bytes or kB.  The
rest which is without suffix are pages.

Link: https://lkml.kernel.org/r/20201228164110.2838-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Cc: Alexey Dobriyan 
Cc: Feng Tang 
Cc: Greg Kroah-Hartman 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Michal Hocko 
Cc: NeilBrown 
Cc: Pankaj Gupta 
Cc: Rafael. J. Wysocki 
Cc: Randy Dunlap 
Cc: Roman Gushchin 
Cc: Sami Tolvanen 
Cc: Shakeel Butt 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: convert NR_SHMEM_THPS account to pages

2021-02-24T21:38:29+00:00

Currently we use struct per_cpu_nodestat to cache the vmstat counters,
which leads to inaccurate statistics especially THP vmstat counters.  In
the systems with hundreds of processors it can be GBs of memory.  For
example, for a 96 CPUs system, the threshold is the maximum number of 125.
And the per cpu counters can cache 23.4375 GB in total.

The THP page is already a form of batched addition (it will add 512 worth
of memory in one go) so skipping the batching seems like sensible.
Although every THP stats update overflows the per-cpu counter, resorting
to atomic global updates.  But it can make the statistics more accuracy
for the THP vmstat counters.

So we convert the NR_SHMEM_THPS account to pages.  This patch is
consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page
arrival").  Doing this also can make the unit of vmstat counters more
unified.  Finally, the unit of the vmstat counters are pages, kB and
bytes.  The B/KB suffix can tell us that the unit is bytes or kB.  The
rest which is without suffix are pages.

Link: https://lkml.kernel.org/r/20201228164110.2838-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Cc: Alexey Dobriyan 
Cc: Feng Tang 
Cc: Greg Kroah-Hartman 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Michal Hocko 
Cc: NeilBrown 
Cc: Pankaj Gupta 
Cc: Rafael. J. Wysocki 
Cc: Randy Dunlap 
Cc: Roman Gushchin 
Cc: Sami Tolvanen 
Cc: Shakeel Butt 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: convert NR_FILE_THPS account to pages

2021-02-24T21:38:29+00:00

Currently we use struct per_cpu_nodestat to cache the vmstat counters,
which leads to inaccurate statistics especially THP vmstat counters.  In
the systems with if hundreds of processors it can be GBs of memory.  For
example, for a 96 CPUs system, the threshold is the maximum number of 125.
And the per cpu counters can cache 23.4375 GB in total.

The THP page is already a form of batched addition (it will add 512 worth
of memory in one go) so skipping the batching seems like sensible.
Although every THP stats update overflows the per-cpu counter, resorting
to atomic global updates.  But it can make the statistics more accuracy
for the THP vmstat counters.

So we convert the NR_FILE_THPS account to pages.  This patch is consistent
with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival").
Doing this also can make the unit of vmstat counters more unified.
Finally, the unit of the vmstat counters are pages, kB and bytes.  The
B/KB suffix can tell us that the unit is bytes or kB.  The rest which is
without suffix are pages.

Link: https://lkml.kernel.org/r/20201228164110.2838-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Cc: Alexey Dobriyan 
Cc: Feng Tang 
Cc: Greg Kroah-Hartman 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Michal Hocko 
Cc: NeilBrown 
Cc: Pankaj Gupta 
Cc: Rafael. J. Wysocki 
Cc: Randy Dunlap 
Cc: Roman Gushchin 
Cc: Sami Tolvanen 
Cc: Shakeel Butt 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: convert NR_ANON_THPS account to pages

2021-02-24T21:38:29+00:00

Currently we use struct per_cpu_nodestat to cache the vmstat counters,
which leads to inaccurate statistics especially THP vmstat counters.  In
the systems with hundreds of processors it can be GBs of memory.  For
example, for a 96 CPUs system, the threshold is the maximum number of 125.
And the per cpu counters can cache 23.4375 GB in total.

The THP page is already a form of batched addition (it will add 512 worth
of memory in one go) so skipping the batching seems like sensible.
Although every THP stats update overflows the per-cpu counter, resorting
to atomic global updates.  But it can make the statistics more accuracy
for the THP vmstat counters.

So we convert the NR_ANON_THPS account to pages.  This patch is consistent
with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival").
Doing this also can make the unit of vmstat counters more unified.
Finally, the unit of the vmstat counters are pages, kB and bytes.  The
B/KB suffix can tell us that the unit is bytes or kB.  The rest which is
without suffix are pages.

Link: https://lkml.kernel.org/r/20201228164110.2838-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song 
Cc: Greg Kroah-Hartman 
Cc: Rafael. J. Wysocki 
Cc: Alexey Dobriyan 
Cc: Johannes Weiner 
Cc: Vladimir Davydov 
Cc: Hugh Dickins 
Cc: Shakeel Butt 
Cc: Roman Gushchin 
Cc: Sami Tolvanen 
Cc: Feng Tang 
Cc: NeilBrown 
Cc: Joonsoo Kim 
Cc: Randy Dunlap 
Cc: Michal Hocko 
Cc: Pankaj Gupta 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds