<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux, branch v4.9.25</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>new privimitive: iov_iter_revert()</title>
<updated>2017-04-21T07:31:21+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2017-02-17T23:42:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ff76ab9e03a50a4df26329e547e75f865a2bfa9f'/>
<id>ff76ab9e03a50a4df26329e547e75f865a2bfa9f</id>
<content type='text'>
commit 27c0e3748e41ca79171ffa3e97415a20af6facd0 upstream.

opposite to iov_iter_advance(); the caller is responsible for never
using it to move back past the initial position.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 27c0e3748e41ca79171ffa3e97415a20af6facd0 upstream.

opposite to iov_iter_advance(); the caller is responsible for never
using it to move back past the initial position.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups</title>
<updated>2017-04-21T07:31:18+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-03-16T20:54:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f44236a1b05b583862eb8b06dd8412660fdc7fad'/>
<id>f44236a1b05b583862eb8b06dd8412660fdc7fad</id>
<content type='text'>
commit 77f88796cee819b9c4562b0b6b44691b3b7755b1 upstream.

Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator.  Once the new kthread starts
running, it initializes itself and wakes up the creator.  The creator
then can further configure the kthread and then let it start doing its
job by waking it up.

In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland.  When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity.  This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.

Unfortunately, the cgroup side of protection is racy.  While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.

This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one.  Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.

This patch adds task-&gt;no_cgroup_migration which disallows the task to
be migrated by userland.  kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed.  The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.

It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side.  Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.

v2: Switch to a simpler implementation using a new task_struct bit
    field suggested by Oleg.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reported-and-debugged-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 77f88796cee819b9c4562b0b6b44691b3b7755b1 upstream.

Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator.  Once the new kthread starts
running, it initializes itself and wakes up the creator.  The creator
then can further configure the kthread and then let it start doing its
job by waking it up.

In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland.  When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity.  This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.

Unfortunately, the cgroup side of protection is racy.  While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.

This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one.  Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.

This patch adds task-&gt;no_cgroup_migration which disallows the task to
be migrated by userland.  kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed.  The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.

It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side.  Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.

v2: Switch to a simpler implementation using a new task_struct bit
    field suggested by Oleg.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reported-and-debugged-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: Disable MSI for HiSilicon Hip06/Hip07 Root Ports</title>
<updated>2017-04-12T10:41:21+00:00</updated>
<author>
<name>Dongdong Liu</name>
<email>liudongdong3@huawei.com</email>
</author>
<published>2017-04-04T19:32:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0ddf07d2a130adda9bfe7ec1fd0be88426862ccb'/>
<id>0ddf07d2a130adda9bfe7ec1fd0be88426862ccb</id>
<content type='text'>
[ Upstream commit 72f2ff0deb870145a5a2d24cd75b4f9936159a62 ]

The PCIe Root Port in Hip06/Hip07 SoCs advertises an MSI capability, but it
cannot generate MSIs.  It can transfer MSI/MSI-X from downstream devices,
but does not support MSI/MSI-X itself.

Add a quirk to prevent use of MSI/MSI-X by the Root Port.

[bhelgaas: changelog, sort vendor ID #define, drop device ID #define]
Signed-off-by: Dongdong Liu &lt;liudongdong3@huawei.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Reviewed-by: Gabriele Paoloni &lt;gabriele.paoloni@huawei.com&gt;
Reviewed-by: Zhou Wang &lt;wangzhou1@hisilicon.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 72f2ff0deb870145a5a2d24cd75b4f9936159a62 ]

The PCIe Root Port in Hip06/Hip07 SoCs advertises an MSI capability, but it
cannot generate MSIs.  It can transfer MSI/MSI-X from downstream devices,
but does not support MSI/MSI-X itself.

Add a quirk to prevent use of MSI/MSI-X by the Root Port.

[bhelgaas: changelog, sort vendor ID #define, drop device ID #define]
Signed-off-by: Dongdong Liu &lt;liudongdong3@huawei.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Reviewed-by: Gabriele Paoloni &lt;gabriele.paoloni@huawei.com&gt;
Reviewed-by: Zhou Wang &lt;wangzhou1@hisilicon.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: smccc: Update HVC comment to describe new quirk parameter</title>
<updated>2017-04-12T10:41:21+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2017-04-04T19:32:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=35b366d584da2a82444f73d8c17d5afb84b3f8e2'/>
<id>35b366d584da2a82444f73d8c17d5afb84b3f8e2</id>
<content type='text'>
[ Upstream commit 3046ec674d441562c6bb3e4284cd866743042ef3 ]

Commit 680a0873e193 ("arm: kernel: Add SMC structure parameter") added
a new "quirk" parameter to the SMC and HVC SMCCC backends, but only
updated the comment for the SMC version. This patch adds the new
paramater to the comment describing the HVC version too.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 3046ec674d441562c6bb3e4284cd866743042ef3 ]

Commit 680a0873e193 ("arm: kernel: Add SMC structure parameter") added
a new "quirk" parameter to the SMC and HVC SMCCC backends, but only
updated the comment for the SMC version. This patch adds the new
paramater to the comment describing the HVC version too.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>firmware: qcom: scm: Fix interrupted SCM calls</title>
<updated>2017-04-12T10:41:21+00:00</updated>
<author>
<name>Andy Gross</name>
<email>andy.gross@linaro.org</email>
</author>
<published>2017-04-04T19:32:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bec9918bb4dae2099c37791133403c32aa2738c7'/>
<id>bec9918bb4dae2099c37791133403c32aa2738c7</id>
<content type='text'>
[ Upstream commit 82bcd087029f6056506ea929f11af02622230901 ]

This patch adds a Qualcomm specific quirk to the arm_smccc_smc call.

On Qualcomm ARM64 platforms, the SMC call can return before it has
completed.  If this occurs, the call can be restarted, but it requires
using the returned session ID value from the interrupted SMC call.

The quirk stores off the session ID from the interrupted call in the
quirk structure so that it can be used by the caller.

This patch folds in a fix given by Sricharan R:
https://lkml.org/lkml/2016/9/28/272

Signed-off-by: Andy Gross &lt;andy.gross@linaro.org&gt;
Reviewed-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 82bcd087029f6056506ea929f11af02622230901 ]

This patch adds a Qualcomm specific quirk to the arm_smccc_smc call.

On Qualcomm ARM64 platforms, the SMC call can return before it has
completed.  If this occurs, the call can be restarted, but it requires
using the returned session ID value from the interrupted SMC call.

The quirk stores off the session ID from the interrupted call in the
quirk structure so that it can be used by the caller.

This patch folds in a fix given by Sricharan R:
https://lkml.org/lkml/2016/9/28/272

Signed-off-by: Andy Gross &lt;andy.gross@linaro.org&gt;
Reviewed-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm: kernel: Add SMC structure parameter</title>
<updated>2017-04-12T10:41:21+00:00</updated>
<author>
<name>Andy Gross</name>
<email>andy.gross@linaro.org</email>
</author>
<published>2017-04-04T19:32:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=007f0a2f2c0fcfa9ecef016c0910aebd0b784fdd'/>
<id>007f0a2f2c0fcfa9ecef016c0910aebd0b784fdd</id>
<content type='text'>
[ Upstream commit 680a0873e193bae666439f4b5e32c758e68f114c ]

This patch adds a quirk parameter to the arm_smccc_(smc/hvc) calls.
The quirk structure allows for specialized SMC operations due to SoC
specific requirements.  The current arm_smccc_(smc/hvc) is renamed and
macros are used instead to specify the standard arm_smccc_(smc/hvc) or
the arm_smccc_(smc/hvc)_quirk function.

This patch and partial implementation was suggested by Will Deacon.

Signed-off-by: Andy Gross &lt;andy.gross@linaro.org&gt;
Reviewed-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 680a0873e193bae666439f4b5e32c758e68f114c ]

This patch adds a quirk parameter to the arm_smccc_(smc/hvc) calls.
The quirk structure allows for specialized SMC operations due to SoC
specific requirements.  The current arm_smccc_(smc/hvc) is renamed and
macros are used instead to specify the standard arm_smccc_(smc/hvc) or
the arm_smccc_(smc/hvc)_quirk function.

This patch and partial implementation was suggested by Will Deacon.

Signed-off-by: Andy Gross &lt;andy.gross@linaro.org&gt;
Reviewed-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>random: use chacha20 for get_random_int/long</title>
<updated>2017-04-12T10:41:15+00:00</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2017-01-06T18:32:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7c03613344663982a27c49d5951c80c575714ab8'/>
<id>7c03613344663982a27c49d5951c80c575714ab8</id>
<content type='text'>
commit f5b98461cb8167ba362ad9f74c41d126b7becea7 upstream.

Now that our crng uses chacha20, we can rely on its speedy
characteristics for replacing MD5, while simultaneously achieving a
higher security guarantee. Before the idea was to use these functions if
you wanted random integers that aren't stupidly insecure but aren't
necessarily secure either, a vague gray zone, that hopefully was "good
enough" for its users. With chacha20, we can strengthen this claim,
since either we're using an rdrand-like instruction, or we're using the
same crng as /dev/urandom. And it's faster than what was before.

We could have chosen to replace this with a SipHash-derived function,
which might be slightly faster, but at the cost of having yet another
RNG construction in the kernel. By moving to chacha20, we have a single
RNG to analyze and verify, and we also already get good performance
improvements on all platforms.

Implementation-wise, rather than use a generic buffer for both
get_random_int/long and memcpy based on the size needs, we use a
specific buffer for 32-bit reads and for 64-bit reads. This way, we're
guaranteed to always have aligned accesses on all platforms. While
slightly more verbose in C, the assembly this generates is a lot
simpler than otherwise.

Finally, on 32-bit platforms where longs and ints are the same size,
we simply alias get_random_int to get_random_long.

Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Suggested-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f5b98461cb8167ba362ad9f74c41d126b7becea7 upstream.

Now that our crng uses chacha20, we can rely on its speedy
characteristics for replacing MD5, while simultaneously achieving a
higher security guarantee. Before the idea was to use these functions if
you wanted random integers that aren't stupidly insecure but aren't
necessarily secure either, a vague gray zone, that hopefully was "good
enough" for its users. With chacha20, we can strengthen this claim,
since either we're using an rdrand-like instruction, or we're using the
same crng as /dev/urandom. And it's faster than what was before.

We could have chosen to replace this with a SipHash-derived function,
which might be slightly faster, but at the cost of having yet another
RNG construction in the kernel. By moving to chacha20, we have a single
RNG to analyze and verify, and we also already get good performance
improvements on all platforms.

Implementation-wise, rather than use a generic buffer for both
get_random_int/long and memcpy based on the size needs, we use a
specific buffer for 32-bit reads and for 64-bit reads. This way, we're
guaranteed to always have aligned accesses on all platforms. While
slightly more verbose in C, the assembly this generates is a lot
simpler than otherwise.

Finally, on 32-bit platforms where longs and ints are the same size,
we simply alias get_random_int to get_random_long.

Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Suggested-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: rmap: fix huge file mmap accounting in the memcg stats</title>
<updated>2017-04-08T07:30:35+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2017-03-31T22:11:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b5707920e4d8f9df0ed2db4e6fce24ede50eb4a2'/>
<id>b5707920e4d8f9df0ed2db4e6fce24ede50eb4a2</id>
<content type='text'>
commit 553af430e7c981e6e8fa5007c5b7b5773acc63dd upstream.

Huge pages are accounted as single units in the memcg's "file_mapped"
counter.  Account the correct number of base pages, like we do in the
corresponding node counter.

Link: http://lkml.kernel.org/r/20170322005111.3156-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 553af430e7c981e6e8fa5007c5b7b5773acc63dd upstream.

Huge pages are accounted as single units in the memcg's "file_mapped"
counter.  Account the correct number of base pages, like we do in the
corresponding node counter.

Link: http://lkml.kernel.org/r/20170322005111.3156-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: kvm_io_bus_unregister_dev() should never fail</title>
<updated>2017-04-08T07:30:34+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2017-03-23T17:24:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1563625c717c3682f3dcbc12cb96ef4f7ce147f9'/>
<id>1563625c717c3682f3dcbc12cb96ef4f7ce147f9</id>
<content type='text'>
commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.

No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.

There is nothing the callers could do, except retrying over and over
again.

So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).

Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reviewed-by: Cornelia Huck &lt;cornelia.huck@de.ibm.com&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.

No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.

There is nothing the callers could do, except retrying over and over
again.

So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).

Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reviewed-by: Cornelia Huck &lt;cornelia.huck@de.ibm.com&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>fscrypt: remove broken support for detecting keyring key revocation</title>
<updated>2017-03-31T08:31:46+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2017-02-21T23:07:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2984e52c75c657db7901f6189f02e0251ca963c2'/>
<id>2984e52c75c657db7901f6189f02e0251ca963c2</id>
<content type='text'>
commit 1b53cf9815bb4744958d41f3795d5d5a1d365e2d upstream.

Filesystem encryption ostensibly supported revoking a keyring key that
had been used to "unlock" encrypted files, causing those files to become
"locked" again.  This was, however, buggy for several reasons, the most
severe of which was that when key revocation happened to be detected for
an inode, its fscrypt_info was immediately freed, even while other
threads could be using it for encryption or decryption concurrently.
This could be exploited to crash the kernel or worse.

This patch fixes the use-after-free by removing the code which detects
the keyring key having been revoked, invalidated, or expired.  Instead,
an encrypted inode that is "unlocked" now simply remains unlocked until
it is evicted from memory.  Note that this is no worse than the case for
block device-level encryption, e.g. dm-crypt, and it still remains
possible for a privileged user to evict unused pages, inodes, and
dentries by running 'sync; echo 3 &gt; /proc/sys/vm/drop_caches', or by
simply unmounting the filesystem.  In fact, one of those actions was
already needed anyway for key revocation to work even somewhat sanely.
This change is not expected to break any applications.

In the future I'd like to implement a real API for fscrypt key
revocation that interacts sanely with ongoing filesystem operations ---
waiting for existing operations to complete and blocking new operations,
and invalidating and sanitizing key material and plaintext from the VFS
caches.  But this is a hard problem, and for now this bug must be fixed.

This bug affected almost all versions of ext4, f2fs, and ubifs
encryption, and it was potentially reachable in any kernel configured
with encryption support (CONFIG_EXT4_ENCRYPTION=y,
CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or
CONFIG_UBIFS_FS_ENCRYPTION=y).  Note that older kernels did not use the
shared fs/crypto/ code, but due to the potential security implications
of this bug, it may still be worthwhile to backport this fix to them.

Fixes: b7236e21d55f ("ext4 crypto: reorganize how we store keys in the inode")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Acked-by: Michael Halcrow &lt;mhalcrow@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 1b53cf9815bb4744958d41f3795d5d5a1d365e2d upstream.

Filesystem encryption ostensibly supported revoking a keyring key that
had been used to "unlock" encrypted files, causing those files to become
"locked" again.  This was, however, buggy for several reasons, the most
severe of which was that when key revocation happened to be detected for
an inode, its fscrypt_info was immediately freed, even while other
threads could be using it for encryption or decryption concurrently.
This could be exploited to crash the kernel or worse.

This patch fixes the use-after-free by removing the code which detects
the keyring key having been revoked, invalidated, or expired.  Instead,
an encrypted inode that is "unlocked" now simply remains unlocked until
it is evicted from memory.  Note that this is no worse than the case for
block device-level encryption, e.g. dm-crypt, and it still remains
possible for a privileged user to evict unused pages, inodes, and
dentries by running 'sync; echo 3 &gt; /proc/sys/vm/drop_caches', or by
simply unmounting the filesystem.  In fact, one of those actions was
already needed anyway for key revocation to work even somewhat sanely.
This change is not expected to break any applications.

In the future I'd like to implement a real API for fscrypt key
revocation that interacts sanely with ongoing filesystem operations ---
waiting for existing operations to complete and blocking new operations,
and invalidating and sanitizing key material and plaintext from the VFS
caches.  But this is a hard problem, and for now this bug must be fixed.

This bug affected almost all versions of ext4, f2fs, and ubifs
encryption, and it was potentially reachable in any kernel configured
with encryption support (CONFIG_EXT4_ENCRYPTION=y,
CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or
CONFIG_UBIFS_FS_ENCRYPTION=y).  Note that older kernels did not use the
shared fs/crypto/ code, but due to the potential security implications
of this bug, it may still be worthwhile to backport this fix to them.

Fixes: b7236e21d55f ("ext4 crypto: reorganize how we store keys in the inode")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Acked-by: Michael Halcrow &lt;mhalcrow@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
