<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/uapi/asm-generic, branch v4.10-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING</title>
<updated>2016-11-30T15:04:25+00:00</updated>
<author>
<name>Francis Yan</name>
<email>francisyyan@gmail.com</email>
</author>
<published>2016-11-28T07:07:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1c885808e45601b2b6f68b30ac1d999e10b6f606'/>
<id>1c885808e45601b2b6f68b30ac1d999e10b6f606</id>
<content type='text'>
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan &lt;francisyyan@gmail.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan &lt;francisyyan@gmail.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>generic syscalls: kill cruft from removed pkey syscalls</title>
<updated>2016-10-17T16:50:56+00:00</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@intel.com</email>
</author>
<published>2016-10-17T15:18:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=71757904efadefdf5505712f675218ce59483c5d'/>
<id>71757904efadefdf5505712f675218ce59483c5d</id>
<content type='text'>
pkey_set() and pkey_get() were syscalls present in older versions
of the protection keys patches.  They were fully excised from the
x86 code, but some cruft was left in the generic syscall code.  The
C++ comments were intended to help to make it more glaring to me to
fix them before actually submitting them.  That technique worked,
but later than I would have liked.

I test-compiled this for arm64.

Fixes: a60f7b69d92c0 ("generic syscalls: Wire up memory protection keys syscalls")
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: x86@kernel.org
Cc: linux-arch@vger.kernel.org
Cc: mgorman@techsingularity.net
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
pkey_set() and pkey_get() were syscalls present in older versions
of the protection keys patches.  They were fully excised from the
x86 code, but some cruft was left in the generic syscall code.  The
C++ comments were intended to help to make it more glaring to me to
fix them before actually submitting them.  That technique worked,
but later than I would have liked.

I test-compiled this for arm64.

Fixes: a60f7b69d92c0 ("generic syscalls: Wire up memory protection keys syscalls")
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: x86@kernel.org
Cc: linux-arch@vger.kernel.org
Cc: mgorman@techsingularity.net
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>generic syscalls: Wire up memory protection keys syscalls</title>
<updated>2016-09-09T11:02:27+00:00</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2016-07-29T16:30:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a60f7b69d92c0142c80a30d669a76b617b7f6879'/>
<id>a60f7b69d92c0142c80a30d669a76b617b7f6879</id>
<content type='text'>
These new syscalls are implemented as generic code, so enable them for
architectures like arm64 which use the generic syscall table.

According to Arnd:

  Even if the support is x86 specific for the forseeable future, it may be
  good to reserve the number just in case.  The other architecture specific
  syscall lists are usually left to the individual arch maintainers, most a
  lot of the newer architectures share this table.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: mgorman@techsingularity.net
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163018.505A6875@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These new syscalls are implemented as generic code, so enable them for
architectures like arm64 which use the generic syscall table.

According to Arnd:

  Even if the support is x86 specific for the forseeable future, it may be
  good to reserve the number just in case.  The other architecture specific
  syscall lists are usually left to the individual arch maintainers, most a
  lot of the newer architectures share this table.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: mgorman@techsingularity.net
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163018.505A6875@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/pkeys: Allocation/free syscalls</title>
<updated>2016-09-09T11:02:27+00:00</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2016-07-29T16:30:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e8c24d3a23a469f1f40d4de24d872ca7023ced0a'/>
<id>e8c24d3a23a469f1f40d4de24d872ca7023ced0a</id>
<content type='text'>
This patch adds two new system calls:

	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	int pkey_free(int pkey);

These implement an "allocator" for the protection keys
themselves, which can be thought of as analogous to the allocator
that the kernel has for file descriptors.  The kernel tracks
which numbers are in use, and only allows operations on keys that
are valid.  A key which was not obtained by pkey_alloc() may not,
for instance, be passed to pkey_mprotect().

These system calls are also very important given the kernel's use
of pkeys to implement execute-only support.  These help ensure
that userspace can never assume that it has control of a key
unless it first asks the kernel.  The kernel does not promise to
preserve PKRU (right register) contents except for allocated
pkeys.

The 'init_access_rights' argument to pkey_alloc() specifies the
rights that will be established for the returned pkey.  For
instance:

	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);

will allocate 'pkey', but also sets the bits in PKRU[1] such that
writing to 'pkey' is already denied.

The kernel does not prevent pkey_free() from successfully freeing
in-use pkeys (those still assigned to a memory range by
pkey_mprotect()).  It would be expensive to implement the checks
for this, so we instead say, "Just don't do it" since sane
software will never do it anyway.

Any piece of userspace calling pkey_alloc() needs to be prepared
for it to fail.  Why?  pkey_alloc() returns the same error code
(ENOSPC) when there are no pkeys and when pkeys are unsupported.
They can be unsupported for a whole host of reasons, so apps must
be prepared for this.  Also, libraries or LD_PRELOADs might steal
keys before an application gets access to them.

This allocation mechanism could be implemented in userspace.
Even if we did it in userspace, we would still need additional
user/kernel interfaces to tell userspace which keys are being
used by the kernel internally (such as for execute-only
mappings).  Having the kernel provide this facility completely
removes the need for these additional interfaces, or having an
implementation of this in userspace at all.

Note that we have to make changes to all of the architectures
that do not use mman-common.h because we use the new
PKEY_DENY_ACCESS/WRITE macros in arch-independent code.

1. PKRU is the Protection Key Rights User register.  It is a
   usermode-accessible register that controls whether writes
   and/or access to each individual pkey is allowed or denied.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds two new system calls:

	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	int pkey_free(int pkey);

These implement an "allocator" for the protection keys
themselves, which can be thought of as analogous to the allocator
that the kernel has for file descriptors.  The kernel tracks
which numbers are in use, and only allows operations on keys that
are valid.  A key which was not obtained by pkey_alloc() may not,
for instance, be passed to pkey_mprotect().

These system calls are also very important given the kernel's use
of pkeys to implement execute-only support.  These help ensure
that userspace can never assume that it has control of a key
unless it first asks the kernel.  The kernel does not promise to
preserve PKRU (right register) contents except for allocated
pkeys.

The 'init_access_rights' argument to pkey_alloc() specifies the
rights that will be established for the returned pkey.  For
instance:

	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);

will allocate 'pkey', but also sets the bits in PKRU[1] such that
writing to 'pkey' is already denied.

The kernel does not prevent pkey_free() from successfully freeing
in-use pkeys (those still assigned to a memory range by
pkey_mprotect()).  It would be expensive to implement the checks
for this, so we instead say, "Just don't do it" since sane
software will never do it anyway.

Any piece of userspace calling pkey_alloc() needs to be prepared
for it to fail.  Why?  pkey_alloc() returns the same error code
(ENOSPC) when there are no pkeys and when pkeys are unsupported.
They can be unsupported for a whole host of reasons, so apps must
be prepared for this.  Also, libraries or LD_PRELOADs might steal
keys before an application gets access to them.

This allocation mechanism could be implemented in userspace.
Even if we did it in userspace, we would still need additional
user/kernel interfaces to tell userspace which keys are being
used by the kernel internally (such as for execute-only
mappings).  Having the kernel provide this facility completely
removes the need for these additional interfaces, or having an
implementation of this in userspace at all.

Note that we have to make changes to all of the architectures
that do not use mman-common.h because we use the new
PKEY_DENY_ACCESS/WRITE macros in arch-independent code.

1. PKRU is the Protection Key Rights User register.  It is a
   usermode-accessible register that controls whether writes
   and/or access to each individual pkey is allowed or denied.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>asm-generic: Drop renameat syscall from default list</title>
<updated>2016-05-04T22:42:21+00:00</updated>
<author>
<name>James Hogan</name>
<email>james.hogan@imgtec.com</email>
</author>
<published>2016-04-29T21:29:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b0da6d44157aa6e652de7634343708251ba64146'/>
<id>b0da6d44157aa6e652de7634343708251ba64146</id>
<content type='text'>
The newer renameat2 syscall provides all the functionality provided by
the renameat syscall and adds flags, so future architectures won't need
to include renameat.

Therefore drop the renameat syscall from the generic syscall list unless
__ARCH_WANT_RENAMEAT is defined by the architecture's unistd.h prior to
including asm-generic/unistd.h, and adjust all architectures using the
generic syscall list to define it so that no in-tree architectures are
affected.

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Acked-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: linux-arch@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Aurelien Jacquiot &lt;a-jacquiot@ti.com&gt;
Cc: linux-c6x-dev@linux-c6x.org
Cc: Richard Kuo &lt;rkuo@codeaurora.org&gt;
Cc: linux-hexagon@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: Jonas Bonn &lt;jonas@southpole.se&gt;
Cc: linux@lists.openrisc.net
Cc: Chen Liqin &lt;liqin.linux@gmail.com&gt;
Cc: Lennox Wu &lt;lennox.wu@gmail.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Cc: Guan Xuetao &lt;gxt@mprc.pku.edu.cn&gt;
Cc: Ley Foon Tan &lt;lftan@altera.com&gt;
Cc: nios2-dev@lists.rocketboards.org
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: uclinux-h8-devel@lists.sourceforge.jp
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The newer renameat2 syscall provides all the functionality provided by
the renameat syscall and adds flags, so future architectures won't need
to include renameat.

Therefore drop the renameat syscall from the generic syscall list unless
__ARCH_WANT_RENAMEAT is defined by the architecture's unistd.h prior to
including asm-generic/unistd.h, and adjust all architectures using the
generic syscall list to define it so that no in-tree architectures are
affected.

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Acked-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: linux-arch@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Aurelien Jacquiot &lt;a-jacquiot@ti.com&gt;
Cc: linux-c6x-dev@linux-c6x.org
Cc: Richard Kuo &lt;rkuo@codeaurora.org&gt;
Cc: linux-hexagon@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: Jonas Bonn &lt;jonas@southpole.se&gt;
Cc: linux@lists.openrisc.net
Cc: Chen Liqin &lt;liqin.linux@gmail.com&gt;
Cc: Lennox Wu &lt;lennox.wu@gmail.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Cc: Guan Xuetao &lt;gxt@mprc.pku.edu.cn&gt;
Cc: Ley Foon Tan &lt;lftan@altera.com&gt;
Cc: nios2-dev@lists.rocketboards.org
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: uclinux-h8-devel@lists.sourceforge.jp
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>asm-generic: use compat version for preadv2 and pwritev2</title>
<updated>2016-05-04T22:42:20+00:00</updated>
<author>
<name>Yury Norov</name>
<email>ynorov@caviumnetworks.com</email>
</author>
<published>2016-05-02T16:12:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1f93e9f2318b598e6775a1fc9701604993c512b1'/>
<id>1f93e9f2318b598e6775a1fc9701604993c512b1</id>
<content type='text'>
Compat architectures that does not use generic unistd (mips, s390),
declare compat version in their syscall tables for preadv2 and
pwritev2. Generic unistd syscall table should do it as well.

[arnd: this initially slipped through the review and an
 incorrect patch got merged. arch/tile/ is the only architecture
 that could be affected for their 32-bit compat mode, every
 other architecture we support today is fine.]

Signed-off-by: Yury Norov &lt;ynorov@caviumnetworks.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Compat architectures that does not use generic unistd (mips, s390),
declare compat version in their syscall tables for preadv2 and
pwritev2. Generic unistd syscall table should do it as well.

[arnd: this initially slipped through the review and an
 incorrect patch got merged. arch/tile/ is the only architecture
 that could be affected for their 32-bit compat mode, every
 other architecture we support today is fine.]

Signed-off-by: Yury Norov &lt;ynorov@caviumnetworks.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>generic syscalls: wire up preadv2 and pwritev2 syscalls</title>
<updated>2016-04-23T20:38:08+00:00</updated>
<author>
<name>Andre Przywara</name>
<email>andre.przywara@arm.com</email>
</author>
<published>2016-04-11T09:17:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=987aedb5d6f6e10c5203c6d0aab9a60ec22c7e93'/>
<id>987aedb5d6f6e10c5203c6d0aab9a60ec22c7e93</id>
<content type='text'>
These new syscalls are implemented as generic code, so enable them for
architectures like arm64 which use the generic syscall table.

Signed-off-by: Andre Przywara &lt;andre.przywara@arm.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These new syscalls are implemented as generic code, so enable them for
architectures like arm64 which use the generic syscall table.

Signed-off-by: Andre Przywara &lt;andre.przywara@arm.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2016-03-21T02:08:56+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-03-21T02:08:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=643ad15d47410d37d43daf3ef1c8ac52c281efa5'/>
<id>643ad15d47410d37d43daf3ef1c8ac52c281efa5</id>
<content type='text'>
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/pkeys: Fix siginfo ABI breakage caused by new u64 field</title>
<updated>2016-03-05T14:00:06+00:00</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2016-03-01T12:54:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=49cd53bf14aeb471c4a2682300dfc05ef2fd09f2'/>
<id>49cd53bf14aeb471c4a2682300dfc05ef2fd09f2</id>
<content type='text'>
Stephen Rothwell reported this linux-next build failure:

	http://lkml.kernel.org/r/20160226164406.065a1ffc@canb.auug.org.au

... caused by the Memory Protection Keys patches from the tip tree triggering
a newly introduced build-time sanity check on an ARM build, because they changed
the ABI of siginfo in an unexpected way.

If u64 has a natural alignment of 8 bytes (which is the case on most mainstream
platforms, with the notable exception of x86-32), then the leadup to the
_sifields union matters:

typedef struct siginfo {
        int si_signo;
        int si_errno;
        int si_code;

        union {
	...
        } _sifields;
} __ARCH_SI_ATTRIBUTES siginfo_t;

Note how the first 3 fields give us 12 bytes, so _sifields is not 8
naturally bytes aligned.

Before the _pkey field addition the largest element of _sifields (on
32-bit platforms) was 32 bits. With the u64 added, the minimum alignment
requirement increased to 8 bytes on those (rare) 32-bit platforms. Thus
GCC padded the space after si_code with 4 extra bytes, and shifted all
_sifields offsets by 4 bytes - breaking the ABI of all of those
remaining fields.

On 64-bit platforms this problem was hidden due to _sifields already
having numerous fields with natural 8 bytes alignment (pointers).

To fix this, we replace the u64 with an '__u32'.  The __u32 does not
increase the minimum alignment requirement of the union, and it is
also large enough to store the 16-bit pkey we have today on x86.

Reported-by: Stehen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Stehen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-next@vger.kernel.org
Fixes: cd0ea35ff551 ("signals, pkeys: Notify userspace about protection key faults")
Link: http://lkml.kernel.org/r/20160301125451.02C7426D@viggo.jf.intel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Stephen Rothwell reported this linux-next build failure:

	http://lkml.kernel.org/r/20160226164406.065a1ffc@canb.auug.org.au

... caused by the Memory Protection Keys patches from the tip tree triggering
a newly introduced build-time sanity check on an ARM build, because they changed
the ABI of siginfo in an unexpected way.

If u64 has a natural alignment of 8 bytes (which is the case on most mainstream
platforms, with the notable exception of x86-32), then the leadup to the
_sifields union matters:

typedef struct siginfo {
        int si_signo;
        int si_errno;
        int si_code;

        union {
	...
        } _sifields;
} __ARCH_SI_ATTRIBUTES siginfo_t;

Note how the first 3 fields give us 12 bytes, so _sifields is not 8
naturally bytes aligned.

Before the _pkey field addition the largest element of _sifields (on
32-bit platforms) was 32 bits. With the u64 added, the minimum alignment
requirement increased to 8 bytes on those (rare) 32-bit platforms. Thus
GCC padded the space after si_code with 4 extra bytes, and shifted all
_sifields offsets by 4 bytes - breaking the ABI of all of those
remaining fields.

On 64-bit platforms this problem was hidden due to _sifields already
having numerous fields with natural 8 bytes alignment (pointers).

To fix this, we replace the u64 with an '__u32'.  The __u32 does not
increase the minimum alignment requirement of the union, and it is
also large enough to store the 16-bit pkey we have today on x86.

Reported-by: Stehen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Stehen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-next@vger.kernel.org
Fixes: cd0ea35ff551 ("signals, pkeys: Notify userspace about protection key faults")
Link: http://lkml.kernel.org/r/20160301125451.02C7426D@viggo.jf.intel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Facility to report route quality of connected sockets</title>
<updated>2016-02-26T03:01:22+00:00</updated>
<author>
<name>Tom Herbert</name>
<email>tom@herbertland.com</email>
</author>
<published>2016-02-24T18:02:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a87cb3e48ee86d29868d3f59cfb9ce1a8fa63314'/>
<id>a87cb3e48ee86d29868d3f59cfb9ce1a8fa63314</id>
<content type='text'>
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.

This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.

Signed-off-by: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.

This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.

Signed-off-by: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
