linux-toradex.git/include/linux/sched, branch v4.16-rc4

Merge branch 'akpm' (patches from Andrew)

2018-02-22T18:45:46+00:00

Merge misc fixes from Andrew Morton:
 "16 fixes"

* emailed patches from Andrew Morton :
  mm: don't defer struct page initialization for Xen pv guests
  lib/Kconfig.debug: enable RUNTIME_TESTING_MENU
  vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems
  selftests/memfd: add run_fuse_test.sh to TEST_FILES
  bug.h: work around GCC PR82365 in BUG()
  mm/swap.c: make functions and their kernel-doc agree (again)
  mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc
  ida: do zeroing in ida_pre_get()
  mm, swap, frontswap: fix THP swap if frontswap enabled
  certs/blacklist_nohashes.c: fix const confusion in certs blacklist
  kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
  mm, mlock, vmscan: no more skipping pagevecs
  mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats
  Kbuild: always define endianess in kconfig.h
  include/linux/sched/mm.h: re-inline mmdrop()
  tools: fix cross-compile var clobbering

efivarfs: Limit the rate for non-root to read files

2018-02-22T18:21:02+00:00

Each read from a file in efivarfs results in two calls to EFI
(one to get the file size, another to get the actual data).

On X86 these EFI calls result in broadcast system management
interrupts (SMI) which affect performance of the whole system.
A malicious user can loop performing reads from efivarfs bringing
the system to its knees.

Linus suggested per-user rate limit to solve this.

So we add a ratelimit structure to "user_struct" and initialize
it for the root user for no limit. When allocating user_struct for
other users we set the limit to 100 per second. This could be used
for other places that want to limit the rate of some detrimental
user action.

In efivarfs if the limit is exceeded when reading, we take an
interruptible nap for 50ms and check the rate limit again.

Signed-off-by: Tony Luck 
Acked-by: Ard Biesheuvel 
Signed-off-by: Linus Torvalds

include/linux/sched/mm.h: re-inline mmdrop()

2018-02-21T23:35:42+00:00

As Peter points out, Doing a CALL+RET for just the decrement is a bit silly.

Fixes: d70f2a14b72a4bc ("include/linux/sched/mm.h: uninline mmdrop_async(), etc")
Acked-by: Peter Zijlstra (Intel) 
Cc: Ingo Molnar 
Cc: Michal Hocko 
Cc: Oleg Nesterov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge branch 'linus' into sched/urgent, to resolve conflicts

2018-02-06T20:12:31+00:00

 Conflicts:
	arch/arm64/kernel/entry.S
	arch/x86/Kconfig
	include/linux/sched/mm.h
	kernel/fork.c

Signed-off-by: Ingo Molnar

membarrier: Provide core serializing command, *_SYNC_CORE

2018-02-05T20:35:03+00:00

Provide core serializing membarrier command to support memory reclaim
by JIT.

Each architecture needs to explicitly opt into that support by
documenting in their architecture code how they provide the core
serializing instructions required when returning from the membarrier
IPI, and after the scheduler has updated the curr->mm pointer (before
going back to user-space). They should then select
ARCH_HAS_MEMBARRIER_SYNC_CORE to enable support for that command on
their architecture.

Architectures selecting this feature need to either document that
they issue core serializing instructions when returning to user-space,
or implement their architecture-specific sync_core_before_usermode().

Signed-off-by: Mathieu Desnoyers 
Acked-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Cc: Andrea Parri 
Cc: Andrew Hunter 
Cc: Andy Lutomirski 
Cc: Avi Kivity 
Cc: Benjamin Herrenschmidt 
Cc: Boqun Feng 
Cc: Dave Watson 
Cc: David Sehr 
Cc: Greg Hackmann 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Maged Michael 
Cc: Michael Ellerman 
Cc: Paul E. McKenney 
Cc: Paul Mackerras 
Cc: Russell King 
Cc: Will Deacon 
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-9-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar

membarrier: Provide GLOBAL_EXPEDITED command

2018-02-05T20:34:31+00:00

Allow expedited membarrier to be used for data shared between processes
through shared memory.

Processes wishing to receive the membarriers register with
MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue
membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED.

This allows extremely simple kernel-level implementation: we have almost
everything we need with the PRIVATE_EXPEDITED barrier code. All we need
to do is to add a flag in the mm_struct that will be used to check
whether we need to send the IPI to the current thread of each CPU.

There is a slight downside to this approach compared to targeting
specific shared memory users: when performing a membarrier operation,
all registered "global" receivers will get the barrier, even if they
don't share a memory mapping with the sender issuing
MEMBARRIER_CMD_GLOBAL_EXPEDITED.

This registration approach seems to fit the requirement of not
disturbing processes that really deeply care about real-time: they
simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.

In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED"
command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of
MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward
compatibility.

Signed-off-by: Mathieu Desnoyers 
Acked-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Cc: Andrea Parri 
Cc: Andrew Hunter 
Cc: Andy Lutomirski 
Cc: Avi Kivity 
Cc: Benjamin Herrenschmidt 
Cc: Boqun Feng 
Cc: Dave Watson 
Cc: David Sehr 
Cc: Greg Hackmann 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Maged Michael 
Cc: Michael Ellerman 
Cc: Paul E. McKenney 
Cc: Paul Mackerras 
Cc: Russell King 
Cc: Will Deacon 
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar

membarrier: Document scheduler barrier requirements

2018-02-05T20:34:21+00:00

Document the membarrier requirement on having a full memory barrier in
__schedule() after coming from user-space, before storing to rq->curr.
It is provided by smp_mb__after_spinlock() in __schedule().

Document that membarrier requires a full barrier on transition from
kernel thread to userspace thread. We currently have an implicit barrier
from atomic_dec_and_test() in mmdrop() that ensures this.

The x86 switch_mm_irqs_off() full barrier is currently provided by many
cpumask update operations as well as write_cr3(). Document that
write_cr3() provides this barrier.

Signed-off-by: Mathieu Desnoyers 
Acked-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Cc: Andrea Parri 
Cc: Andrew Hunter 
Cc: Andy Lutomirski 
Cc: Avi Kivity 
Cc: Benjamin Herrenschmidt 
Cc: Boqun Feng 
Cc: Dave Watson 
Cc: David Sehr 
Cc: Greg Hackmann 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Maged Michael 
Cc: Michael Ellerman 
Cc: Paul E. McKenney 
Cc: Paul Mackerras 
Cc: Russell King 
Cc: Will Deacon 
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-4-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar

powerpc, membarrier: Skip memory barrier in switch_mm()

2018-02-05T20:34:02+00:00

Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Signed-off-by: Mathieu Desnoyers 
Acked-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Cc: Alan Stern 
Cc: Alexander Viro 
Cc: Andrea Parri 
Cc: Andrew Hunter 
Cc: Andy Lutomirski 
Cc: Avi Kivity 
Cc: Benjamin Herrenschmidt 
Cc: Boqun Feng 
Cc: Dave Watson 
Cc: David Sehr 
Cc: Greg Hackmann 
Cc: Linus Torvalds 
Cc: Maged Michael 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Paul E. McKenney 
Cc: Paul Mackerras 
Cc: Russell King 
Cc: Will Deacon 
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar

Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

2018-02-04T00:25:42+00:00

Pull hardened usercopy whitelisting from Kees Cook:
 "Currently, hardened usercopy performs dynamic bounds checking on slab
  cache objects. This is good, but still leaves a lot of kernel memory
  available to be copied to/from userspace in the face of bugs.

  To further restrict what memory is available for copying, this creates
  a way to whitelist specific areas of a given slab cache object for
  copying to/from userspace, allowing much finer granularity of access
  control.

  Slab caches that are never exposed to userspace can declare no
  whitelist for their objects, thereby keeping them unavailable to
  userspace via dynamic copy operations. (Note, an implicit form of
  whitelisting is the use of constant sizes in usercopy operations and
  get_user()/put_user(); these bypass all hardened usercopy checks since
  these sizes cannot change at runtime.)

  This new check is WARN-by-default, so any mistakes can be found over
  the next several releases without breaking anyone's system.

  The series has roughly the following sections:
   - remove %p and improve reporting with offset
   - prepare infrastructure and whitelist kmalloc
   - update VFS subsystem with whitelists
   - update SCSI subsystem with whitelists
   - update network subsystem with whitelists
   - update process memory with whitelists
   - update per-architecture thread_struct with whitelists
   - update KVM with whitelists and fix ioctl bug
   - mark all other allocations as not whitelisted
   - update lkdtm for more sensible test overage"

* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
  lkdtm: Update usercopy tests for whitelisting
  usercopy: Restrict non-usercopy caches to size 0
  kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
  kvm: whitelist struct kvm_vcpu_arch
  arm: Implement thread_struct whitelist for hardened usercopy
  arm64: Implement thread_struct whitelist for hardened usercopy
  x86: Implement thread_struct whitelist for hardened usercopy
  fork: Provide usercopy whitelisting for task_struct
  fork: Define usercopy region in thread_stack slab caches
  fork: Define usercopy region in mm_struct slab caches
  net: Restrict unwhitelisted proto caches to size 0
  sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
  sctp: Define usercopy region in SCTP proto slab cache
  caif: Define usercopy region in caif proto slab cache
  ip: Define usercopy region in IP proto slab cache
  net: Define usercopy region in struct proto slab cache
  scsi: Define usercopy region in scsi_sense_cache slab cache
  cifs: Define usercopy region in cifs_request slab cache
  vxfs: Define usercopy region in vxfs_inode slab cache
  ufs: Define usercopy region in ufs_inode_cache slab cache
  ...

include/linux/sched/mm.h: uninline mmdrop_async(), etc

2018-02-01T01:18:36+00:00

mmdrop_async() is only used in fork.c.  Move that and its support
functions into fork.c, uninline it all.

Quite a lot of code gets moved around to avoid forward declarations.

Cc: Ingo Molnar 
Cc: Michal Hocko 
Cc: Peter Zijlstra 
Cc: Oleg Nesterov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds