<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux, branch v4.9.13</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ptr_ring: fix race conditions when resizing</title>
<updated>2017-02-26T10:10:51+00:00</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@redhat.com</email>
</author>
<published>2017-02-19T05:17:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7c56012e92b51030b13c4652a9494d422562d5d5'/>
<id>7c56012e92b51030b13c4652a9494d422562d5d5</id>
<content type='text'>
[ Upstream commit e71695307114335be1ed912f4a347396c2ed0e69 ]

Resizing currently drops consumer lock.  This can cause entries to be
reordered, which isn't good in itself.  More importantly, consumer can
detect a false ring empty condition and block forever.

Further, nesting of consumer within producer lock is problematic for
tun, since it produces entries in a BH, which causes a lock order
reversal:

       CPU0                    CPU1
       ----                    ----
  consume:
  lock(&amp;(&amp;r-&gt;consumer_lock)-&gt;rlock);
                               resize:
                               local_irq_disable();
                               lock(&amp;(&amp;r-&gt;producer_lock)-&gt;rlock);
                               lock(&amp;(&amp;r-&gt;consumer_lock)-&gt;rlock);
  &lt;Interrupt&gt;
  produce:
  lock(&amp;(&amp;r-&gt;producer_lock)-&gt;rlock);

To fix, nest producer lock within consumer lock during resize,
and keep consumer lock during the whole swap operation.

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: stable@vger.kernel.org
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit e71695307114335be1ed912f4a347396c2ed0e69 ]

Resizing currently drops consumer lock.  This can cause entries to be
reordered, which isn't good in itself.  More importantly, consumer can
detect a false ring empty condition and block forever.

Further, nesting of consumer within producer lock is problematic for
tun, since it produces entries in a BH, which causes a lock order
reversal:

       CPU0                    CPU1
       ----                    ----
  consume:
  lock(&amp;(&amp;r-&gt;consumer_lock)-&gt;rlock);
                               resize:
                               local_irq_disable();
                               lock(&amp;(&amp;r-&gt;producer_lock)-&gt;rlock);
                               lock(&amp;(&amp;r-&gt;consumer_lock)-&gt;rlock);
  &lt;Interrupt&gt;
  produce:
  lock(&amp;(&amp;r-&gt;producer_lock)-&gt;rlock);

To fix, nest producer lock within consumer lock during resize,
and keep consumer lock during the whole swap operation.

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: stable@vger.kernel.org
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: introduce device min_header_len</title>
<updated>2017-02-18T14:11:43+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-02-07T20:57:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6ebde312a8ed469172fc9694ca0c8411994d47ff'/>
<id>6ebde312a8ed469172fc9694ca0c8411994d47ff</id>
<content type='text'>
[ Upstream commit 217e6fa24ce28ec87fca8da93c9016cb78028612 ]

The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.

Previously, packet sockets would drop packets smaller than or equal
to dev-&gt;hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.

Introduce an explicit dev-&gt;min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.

Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan &lt;sowmini.varadhan@oracle.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Sowmini Varadhan &lt;sowmini.varadhan@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 217e6fa24ce28ec87fca8da93c9016cb78028612 ]

The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.

Previously, packet sockets would drop packets smaller than or equal
to dev-&gt;hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.

Introduce an explicit dev-&gt;min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.

Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan &lt;sowmini.varadhan@oracle.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Sowmini Varadhan &lt;sowmini.varadhan@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>can: Fix kernel panic at security_sock_rcv_skb</title>
<updated>2017-02-18T14:11:40+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2017-01-27T16:11:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=adf86d59bb9b08d9eb67054251d29484c5ec102c'/>
<id>adf86d59bb9b08d9eb67054251d29484c5ec102c</id>
<content type='text'>
[ Upstream commit f1712c73714088a7252d276a57126d56c7d37e64 ]

Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()

The main problem seems that the sockets themselves are not RCU
protected.

If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.

Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.

[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [&lt;ffffffff81495e25&gt;] selinux_socket_sock_rcv_skb+0x65/0x2a0

Call Trace:
 &lt;IRQ&gt;
 [&lt;ffffffff81485d8c&gt;] security_sock_rcv_skb+0x4c/0x60
 [&lt;ffffffff81d55771&gt;] sk_filter+0x41/0x210
 [&lt;ffffffff81d12913&gt;] sock_queue_rcv_skb+0x53/0x3a0
 [&lt;ffffffff81f0a2b3&gt;] raw_rcv+0x2a3/0x3c0
 [&lt;ffffffff81f06eab&gt;] can_rcv_filter+0x12b/0x370
 [&lt;ffffffff81f07af9&gt;] can_receive+0xd9/0x120
 [&lt;ffffffff81f07beb&gt;] can_rcv+0xab/0x100
 [&lt;ffffffff81d362ac&gt;] __netif_receive_skb_core+0xd8c/0x11f0
 [&lt;ffffffff81d36734&gt;] __netif_receive_skb+0x24/0xb0
 [&lt;ffffffff81d37f67&gt;] process_backlog+0x127/0x280
 [&lt;ffffffff81d36f7b&gt;] net_rx_action+0x33b/0x4f0
 [&lt;ffffffff810c88d4&gt;] __do_softirq+0x184/0x440
 [&lt;ffffffff81f9e86c&gt;] do_softirq_own_stack+0x1c/0x30
 &lt;EOI&gt;
 [&lt;ffffffff810c76fb&gt;] do_softirq.part.18+0x3b/0x40
 [&lt;ffffffff810c8bed&gt;] do_softirq+0x1d/0x20
 [&lt;ffffffff81d30085&gt;] netif_rx_ni+0xe5/0x110
 [&lt;ffffffff8199cc87&gt;] slcan_receive_buf+0x507/0x520
 [&lt;ffffffff8167ef7c&gt;] flush_to_ldisc+0x21c/0x230
 [&lt;ffffffff810e3baf&gt;] process_one_work+0x24f/0x670
 [&lt;ffffffff810e44ed&gt;] worker_thread+0x9d/0x6f0
 [&lt;ffffffff810e4450&gt;] ? rescuer_thread+0x480/0x480
 [&lt;ffffffff810ebafc&gt;] kthread+0x12c/0x150
 [&lt;ffffffff81f9ccef&gt;] ret_from_fork+0x3f/0x70

Reported-by: Zhang Yanmin &lt;yanmin.zhang@intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Oliver Hartkopp &lt;socketcan@hartkopp.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit f1712c73714088a7252d276a57126d56c7d37e64 ]

Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()

The main problem seems that the sockets themselves are not RCU
protected.

If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.

Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.

[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [&lt;ffffffff81495e25&gt;] selinux_socket_sock_rcv_skb+0x65/0x2a0

Call Trace:
 &lt;IRQ&gt;
 [&lt;ffffffff81485d8c&gt;] security_sock_rcv_skb+0x4c/0x60
 [&lt;ffffffff81d55771&gt;] sk_filter+0x41/0x210
 [&lt;ffffffff81d12913&gt;] sock_queue_rcv_skb+0x53/0x3a0
 [&lt;ffffffff81f0a2b3&gt;] raw_rcv+0x2a3/0x3c0
 [&lt;ffffffff81f06eab&gt;] can_rcv_filter+0x12b/0x370
 [&lt;ffffffff81f07af9&gt;] can_receive+0xd9/0x120
 [&lt;ffffffff81f07beb&gt;] can_rcv+0xab/0x100
 [&lt;ffffffff81d362ac&gt;] __netif_receive_skb_core+0xd8c/0x11f0
 [&lt;ffffffff81d36734&gt;] __netif_receive_skb+0x24/0xb0
 [&lt;ffffffff81d37f67&gt;] process_backlog+0x127/0x280
 [&lt;ffffffff81d36f7b&gt;] net_rx_action+0x33b/0x4f0
 [&lt;ffffffff810c88d4&gt;] __do_softirq+0x184/0x440
 [&lt;ffffffff81f9e86c&gt;] do_softirq_own_stack+0x1c/0x30
 &lt;EOI&gt;
 [&lt;ffffffff810c76fb&gt;] do_softirq.part.18+0x3b/0x40
 [&lt;ffffffff810c8bed&gt;] do_softirq+0x1d/0x20
 [&lt;ffffffff81d30085&gt;] netif_rx_ni+0xe5/0x110
 [&lt;ffffffff8199cc87&gt;] slcan_receive_buf+0x507/0x520
 [&lt;ffffffff8167ef7c&gt;] flush_to_ldisc+0x21c/0x230
 [&lt;ffffffff810e3baf&gt;] process_one_work+0x24f/0x670
 [&lt;ffffffff810e44ed&gt;] worker_thread+0x9d/0x6f0
 [&lt;ffffffff810e4450&gt;] ? rescuer_thread+0x480/0x480
 [&lt;ffffffff810ebafc&gt;] kthread+0x12c/0x150
 [&lt;ffffffff81f9ccef&gt;] ret_from_fork+0x3f/0x70

Reported-by: Zhang Yanmin &lt;yanmin.zhang@intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Oliver Hartkopp &lt;socketcan@hartkopp.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()</title>
<updated>2017-02-14T23:25:39+00:00</updated>
<author>
<name>Dexuan Cui</name>
<email>decui@microsoft.com</email>
</author>
<published>2017-01-28T18:46:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1cf897fcc5a99e5ecf2f6fb12adec6d485a17e14'/>
<id>1cf897fcc5a99e5ecf2f6fb12adec6d485a17e14</id>
<content type='text'>
commit 433e19cf33d34bb6751c874a9c00980552fe508c upstream.

Commit a389fcfd2cb5 ("Drivers: hv: vmbus: Fix signaling logic in
hv_need_to_signal_on_read()")
added the proper mb(), but removed the test "prev_write_sz &lt; pending_sz"
when making the signal decision.

As a result, the guest can signal the host unnecessarily,
and then the host can throttle the guest because the host
thinks the guest is buggy or malicious; finally the user
running stress test can perceive intermittent freeze of
the guest.

This patch brings back the test, and properly handles the
in-place consumption APIs used by NetVSC (see get_next_pkt_raw(),
put_pkt_raw() and commit_rd_index()).

Fixes: a389fcfd2cb5 ("Drivers: hv: vmbus: Fix signaling logic in
hv_need_to_signal_on_read()")

Signed-off-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Reported-by: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Tested-by: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Signed-off-by: K. Y. Srinivasan &lt;kys@microsoft.com&gt;
Cc: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 433e19cf33d34bb6751c874a9c00980552fe508c upstream.

Commit a389fcfd2cb5 ("Drivers: hv: vmbus: Fix signaling logic in
hv_need_to_signal_on_read()")
added the proper mb(), but removed the test "prev_write_sz &lt; pending_sz"
when making the signal decision.

As a result, the guest can signal the host unnecessarily,
and then the host can throttle the guest because the host
thinks the guest is buggy or malicious; finally the user
running stress test can perceive intermittent freeze of
the guest.

This patch brings back the test, and properly handles the
in-place consumption APIs used by NetVSC (see get_next_pkt_raw(),
put_pkt_raw() and commit_rd_index()).

Fixes: a389fcfd2cb5 ("Drivers: hv: vmbus: Fix signaling logic in
hv_need_to_signal_on_read()")

Signed-off-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Reported-by: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Tested-by: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Signed-off-by: K. Y. Srinivasan &lt;kys@microsoft.com&gt;
Cc: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Drivers: hv: vmbus: On the read path cleanup the logic to interrupt the host</title>
<updated>2017-02-14T23:25:38+00:00</updated>
<author>
<name>K. Y. Srinivasan</name>
<email>kys@microsoft.com</email>
</author>
<published>2016-11-06T21:14:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=964dfbe3dd2d36f9d35018568e303d9847fc1026'/>
<id>964dfbe3dd2d36f9d35018568e303d9847fc1026</id>
<content type='text'>
commit 3372592a140db69fd63837e81f048ab4abf8111e upstream.

Signal the host when we determine the host is to be signaled -
on th read path. The currrent code determines the need to signal in the
ringbuffer code and actually issues the signal elsewhere. This can result
in the host viewing this interrupt as spurious since the host may also
poll the channel. Make the necessary adjustments.

Signed-off-by: K. Y. Srinivasan &lt;kys@microsoft.com&gt;
Cc: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 3372592a140db69fd63837e81f048ab4abf8111e upstream.

Signal the host when we determine the host is to be signaled -
on th read path. The currrent code determines the need to signal in the
ringbuffer code and actually issues the signal elsewhere. This can result
in the host viewing this interrupt as spurious since the host may also
poll the channel. Make the necessary adjustments.

Signed-off-by: K. Y. Srinivasan &lt;kys@microsoft.com&gt;
Cc: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Drivers: hv: vmbus: On write cleanup the logic to interrupt the host</title>
<updated>2017-02-14T23:25:38+00:00</updated>
<author>
<name>K. Y. Srinivasan</name>
<email>kys@microsoft.com</email>
</author>
<published>2016-11-06T21:14:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e2fdf7841cb32128685ddcd6db7a51d0e3c3c739'/>
<id>e2fdf7841cb32128685ddcd6db7a51d0e3c3c739</id>
<content type='text'>
commit 1f6ee4e7d83586c8b10bd4f2f4346353d04ce884 upstream.

Signal the host when we determine the host is to be signaled.
The currrent code determines the need to signal in the ringbuffer
code and actually issues the signal elsewhere. This can result
in the host viewing this interrupt as spurious since the host may also
poll the channel. Make the necessary adjustments.

Signed-off-by: K. Y. Srinivasan &lt;kys@microsoft.com&gt;
Cc: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 1f6ee4e7d83586c8b10bd4f2f4346353d04ce884 upstream.

Signal the host when we determine the host is to be signaled.
The currrent code determines the need to signal in the ringbuffer
code and actually issues the signal elsewhere. This can result
in the host viewing this interrupt as spurious since the host may also
poll the channel. Make the necessary adjustments.

Signed-off-by: K. Y. Srinivasan &lt;kys@microsoft.com&gt;
Cc: Rolf Neugebauer &lt;rolf.neugebauer@docker.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cpumask: use nr_cpumask_bits for parsing functions</title>
<updated>2017-02-14T23:25:34+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-02-08T22:30:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c4236b0c71169b6e5fb5f2272dd0292156c81e97'/>
<id>c4236b0c71169b6e5fb5f2272dd0292156c81e97</id>
<content type='text'>
commit 4d59b6ccf000862beed6fc0765d3209f98a8d8a2 upstream.

Commit 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and
parsing functions") converted both cpumask printing and parsing
functions to use nr_cpu_ids instead of nr_cpumask_bits.  While this was
okay for the printing functions as it just picked one of the two output
formats that we were alternating between depending on a kernel config,
doing the same for parsing wasn't okay.

nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS.  We can always use
nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can
be more efficient to use NR_CPUS when we can get away with it.
Converting the printing functions to nr_cpu_ids makes sense because it
affects how the masks get presented to userspace and doesn't break
anything; however, using nr_cpu_ids for parsing functions can
incorrectly leave the higher bits uninitialized while reading in these
masks from userland.  As all testing and comparison functions use
nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks
can erroneously yield false negative results.

This made the taskstats interface incorrectly return -EINVAL even when
the inputs were correct.

Fix it by restoring the parse functions to use nr_cpumask_bits instead
of nr_cpu_ids.

Link: http://lkml.kernel.org/r/20170206182442.GB31078@htj.duckdns.org
Fixes: 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and parsing functions")
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Martin Steigerwald &lt;martin.steigerwald@teamix.de&gt;
Debugged-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4d59b6ccf000862beed6fc0765d3209f98a8d8a2 upstream.

Commit 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and
parsing functions") converted both cpumask printing and parsing
functions to use nr_cpu_ids instead of nr_cpumask_bits.  While this was
okay for the printing functions as it just picked one of the two output
formats that we were alternating between depending on a kernel config,
doing the same for parsing wasn't okay.

nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS.  We can always use
nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can
be more efficient to use NR_CPUS when we can get away with it.
Converting the printing functions to nr_cpu_ids makes sense because it
affects how the masks get presented to userspace and doesn't break
anything; however, using nr_cpu_ids for parsing functions can
incorrectly leave the higher bits uninitialized while reading in these
masks from userland.  As all testing and comparison functions use
nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks
can erroneously yield false negative results.

This made the taskstats interface incorrectly return -EINVAL even when
the inputs were correct.

Fix it by restoring the parse functions to use nr_cpumask_bits instead
of nr_cpu_ids.

Link: http://lkml.kernel.org/r/20170206182442.GB31078@htj.duckdns.org
Fixes: 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and parsing functions")
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Martin Steigerwald &lt;martin.steigerwald@teamix.de&gt;
Debugged-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>irqdomain: Avoid activating interrupts more than once</title>
<updated>2017-02-09T07:08:31+00:00</updated>
<author>
<name>Marc Zyngier</name>
<email>marc.zyngier@arm.com</email>
</author>
<published>2017-01-17T16:00:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e02136282296dbc90f3c88b1cc5202ec0d5ed9f1'/>
<id>e02136282296dbc90f3c88b1cc5202ec0d5ed9f1</id>
<content type='text'>
commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c upstream.

Since commit f3b0946d629c ("genirq/msi: Make sure PCI MSIs are
activated early"), we can end-up activating a PCI/MSI twice (once
at allocation time, and once at startup time).

This is normally of no consequences, except that there is some
HW out there that may misbehave if activate is used more than once
(the GICv3 ITS, for example, uses the activate callback
to issue the MAPVI command, and the architecture spec says that
"If there is an existing mapping for the EventID-DeviceID
combination, behavior is UNPREDICTABLE").

While this could be worked around in each individual driver, it may
make more sense to tackle the issue at the core level. In order to
avoid getting in that situation, let's have a per-interrupt flag
to remember if we have already activated that interrupt or not.

Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-and-tested-by: Andre Przywara &lt;andre.przywara@arm.com&gt;
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Link: http://lkml.kernel.org/r/1484668848-24361-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c upstream.

Since commit f3b0946d629c ("genirq/msi: Make sure PCI MSIs are
activated early"), we can end-up activating a PCI/MSI twice (once
at allocation time, and once at startup time).

This is normally of no consequences, except that there is some
HW out there that may misbehave if activate is used more than once
(the GICv3 ITS, for example, uses the activate callback
to issue the MAPVI command, and the architecture spec says that
"If there is an existing mapping for the EventID-DeviceID
combination, behavior is UNPREDICTABLE").

While this could be worked around in each individual driver, it may
make more sense to tackle the issue at the core level. In order to
avoid getting in that situation, let's have a per-interrupt flag
to remember if we have already activated that interrupt or not.

Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-and-tested-by: Andre Przywara &lt;andre.przywara@arm.com&gt;
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Link: http://lkml.kernel.org/r/1484668848-24361-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>percpu-refcount: fix reference leak during percpu-atomic transition</title>
<updated>2017-02-09T07:08:28+00:00</updated>
<author>
<name>Douglas Miller</name>
<email>dougmill@linux.vnet.ibm.com</email>
</author>
<published>2017-01-28T12:42:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=12f822d23deee45421bf65dc9f5ff0fdcc783701'/>
<id>12f822d23deee45421bf65dc9f5ff0fdcc783701</id>
<content type='text'>
commit 966d2b04e070bc040319aaebfec09e0144dc3341 upstream.

percpu_ref_tryget() and percpu_ref_tryget_live() should return
"true" IFF they acquire a reference. But the return value from
atomic_long_inc_not_zero() is a long and may have high bits set,
e.g. PERCPU_COUNT_BIAS, and the return value of the tryget routines
is bool so the reference may actually be acquired but the routines
return "false" which results in a reference leak since the caller
assumes it does not need to do a corresponding percpu_ref_put().

This was seen when performing CPU hotplug during I/O, as hangs in
blk_mq_freeze_queue_wait where percpu_ref_kill (blk_mq_freeze_queue_start)
raced with percpu_ref_tryget (blk_mq_timeout_work).
Sample stack trace:

__switch_to+0x2c0/0x450
__schedule+0x2f8/0x970
schedule+0x48/0xc0
blk_mq_freeze_queue_wait+0x94/0x120
blk_mq_queue_reinit_work+0xb8/0x180
blk_mq_queue_reinit_prepare+0x84/0xa0
cpuhp_invoke_callback+0x17c/0x600
cpuhp_up_callbacks+0x58/0x150
_cpu_up+0xf0/0x1c0
do_cpu_up+0x120/0x150
cpu_subsys_online+0x64/0xe0
device_online+0xb4/0x120
online_store+0xb4/0xc0
dev_attr_store+0x68/0xa0
sysfs_kf_write+0x80/0xb0
kernfs_fop_write+0x17c/0x250
__vfs_write+0x6c/0x1e0
vfs_write+0xd0/0x270
SyS_write+0x6c/0x110
system_call+0x38/0xe0

Examination of the queue showed a single reference (no PERCPU_COUNT_BIAS,
and __PERCPU_REF_DEAD, __PERCPU_REF_ATOMIC set) and no requests.
However, conditions at the time of the race are count of PERCPU_COUNT_BIAS + 0
and __PERCPU_REF_DEAD and __PERCPU_REF_ATOMIC set.

The fix is to make the tryget routines use an actual boolean internally instead
of the atomic long result truncated to a int.

Fixes: e625305b3907 percpu-refcount: make percpu_ref based on longs instead of ints
Link: https://bugzilla.kernel.org/show_bug.cgi?id=190751
Signed-off-by: Douglas Miller &lt;dougmill@linux.vnet.ibm.com&gt;
Reviewed-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: e625305b3907 ("percpu-refcount: make percpu_ref based on longs instead of ints")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 966d2b04e070bc040319aaebfec09e0144dc3341 upstream.

percpu_ref_tryget() and percpu_ref_tryget_live() should return
"true" IFF they acquire a reference. But the return value from
atomic_long_inc_not_zero() is a long and may have high bits set,
e.g. PERCPU_COUNT_BIAS, and the return value of the tryget routines
is bool so the reference may actually be acquired but the routines
return "false" which results in a reference leak since the caller
assumes it does not need to do a corresponding percpu_ref_put().

This was seen when performing CPU hotplug during I/O, as hangs in
blk_mq_freeze_queue_wait where percpu_ref_kill (blk_mq_freeze_queue_start)
raced with percpu_ref_tryget (blk_mq_timeout_work).
Sample stack trace:

__switch_to+0x2c0/0x450
__schedule+0x2f8/0x970
schedule+0x48/0xc0
blk_mq_freeze_queue_wait+0x94/0x120
blk_mq_queue_reinit_work+0xb8/0x180
blk_mq_queue_reinit_prepare+0x84/0xa0
cpuhp_invoke_callback+0x17c/0x600
cpuhp_up_callbacks+0x58/0x150
_cpu_up+0xf0/0x1c0
do_cpu_up+0x120/0x150
cpu_subsys_online+0x64/0xe0
device_online+0xb4/0x120
online_store+0xb4/0xc0
dev_attr_store+0x68/0xa0
sysfs_kf_write+0x80/0xb0
kernfs_fop_write+0x17c/0x250
__vfs_write+0x6c/0x1e0
vfs_write+0xd0/0x270
SyS_write+0x6c/0x110
system_call+0x38/0xe0

Examination of the queue showed a single reference (no PERCPU_COUNT_BIAS,
and __PERCPU_REF_DEAD, __PERCPU_REF_ATOMIC set) and no requests.
However, conditions at the time of the race are count of PERCPU_COUNT_BIAS + 0
and __PERCPU_REF_DEAD and __PERCPU_REF_ATOMIC set.

The fix is to make the tryget routines use an actual boolean internally instead
of the atomic long result truncated to a int.

Fixes: e625305b3907 percpu-refcount: make percpu_ref based on longs instead of ints
Link: https://bugzilla.kernel.org/show_bug.cgi?id=190751
Signed-off-by: Douglas Miller &lt;dougmill@linux.vnet.ibm.com&gt;
Reviewed-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: e625305b3907 ("percpu-refcount: make percpu_ref based on longs instead of ints")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>base/memory, hotplug: fix a kernel oops in show_valid_zones()</title>
<updated>2017-02-09T07:08:28+00:00</updated>
<author>
<name>Toshi Kani</name>
<email>toshi.kani@hpe.com</email>
</author>
<published>2017-02-03T21:13:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6cb0497aec810617388dfe674209cd417f509844'/>
<id>6cb0497aec810617388dfe674209cd417f509844</id>
<content type='text'>
commit a96dfddbcc04336bbed50dc2b24823e45e09e80c upstream.

Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

[1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
Signed-off-by: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Zhang Zhen &lt;zhenzhang.zhang@huawei.com&gt;
Cc: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a96dfddbcc04336bbed50dc2b24823e45e09e80c upstream.

Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

[1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
Signed-off-by: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Zhang Zhen &lt;zhenzhang.zhang@huawei.com&gt;
Cc: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
