linux-toradex.git/include/linux, branch v4.9.13

ptr_ring: fix race conditions when resizing

2017-02-26T10:10:51+00:00

[ Upstream commit e71695307114335be1ed912f4a347396c2ed0e69 ]

Resizing currently drops consumer lock.  This can cause entries to be
reordered, which isn't good in itself.  More importantly, consumer can
detect a false ring empty condition and block forever.

Further, nesting of consumer within producer lock is problematic for
tun, since it produces entries in a BH, which causes a lock order
reversal:

       CPU0                    CPU1
       ----                    ----
  consume:
  lock(&(&r->consumer_lock)->rlock);
                               resize:
                               local_irq_disable();
                               lock(&(&r->producer_lock)->rlock);
                               lock(&(&r->consumer_lock)->rlock);
  
  produce:
  lock(&(&r->producer_lock)->rlock);

To fix, nest producer lock within consumer lock during resize,
and keep consumer lock during the whole swap operation.

Reported-by: Dmitry Vyukov 
Cc: stable@vger.kernel.org
Cc: "David S. Miller" 
Acked-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: introduce device min_header_len

2017-02-18T14:11:43+00:00

[ Upstream commit 217e6fa24ce28ec87fca8da93c9016cb78028612 ]

The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.

Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.

Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.

Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan 
Signed-off-by: Willem de Bruijn 
Acked-by: Eric Dumazet 
Acked-by: Sowmini Varadhan 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

can: Fix kernel panic at security_sock_rcv_skb

2017-02-18T14:11:40+00:00

[ Upstream commit f1712c73714088a7252d276a57126d56c7d37e64 ]

Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()

The main problem seems that the sockets themselves are not RCU
protected.

If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.

Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.

[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] selinux_socket_sock_rcv_skb+0x65/0x2a0

Call Trace:
 
 [] security_sock_rcv_skb+0x4c/0x60
 [] sk_filter+0x41/0x210
 [] sock_queue_rcv_skb+0x53/0x3a0
 [] raw_rcv+0x2a3/0x3c0
 [] can_rcv_filter+0x12b/0x370
 [] can_receive+0xd9/0x120
 [] can_rcv+0xab/0x100
 [] __netif_receive_skb_core+0xd8c/0x11f0
 [] __netif_receive_skb+0x24/0xb0
 [] process_backlog+0x127/0x280
 [] net_rx_action+0x33b/0x4f0
 [] __do_softirq+0x184/0x440
 [] do_softirq_own_stack+0x1c/0x30
 
 [] do_softirq.part.18+0x3b/0x40
 [] do_softirq+0x1d/0x20
 [] netif_rx_ni+0xe5/0x110
 [] slcan_receive_buf+0x507/0x520
 [] flush_to_ldisc+0x21c/0x230
 [] process_one_work+0x24f/0x670
 [] worker_thread+0x9d/0x6f0
 [] ? rescuer_thread+0x480/0x480
 [] kthread+0x12c/0x150
 [] ret_from_fork+0x3f/0x70

Reported-by: Zhang Yanmin 
Signed-off-by: Eric Dumazet 
Acked-by: Oliver Hartkopp 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()

2017-02-14T23:25:39+00:00

commit 433e19cf33d34bb6751c874a9c00980552fe508c upstream.

Commit a389fcfd2cb5 ("Drivers: hv: vmbus: Fix signaling logic in
hv_need_to_signal_on_read()")
added the proper mb(), but removed the test "prev_write_sz < pending_sz"
when making the signal decision.

As a result, the guest can signal the host unnecessarily,
and then the host can throttle the guest because the host
thinks the guest is buggy or malicious; finally the user
running stress test can perceive intermittent freeze of
the guest.

This patch brings back the test, and properly handles the
in-place consumption APIs used by NetVSC (see get_next_pkt_raw(),
put_pkt_raw() and commit_rd_index()).

Fixes: a389fcfd2cb5 ("Drivers: hv: vmbus: Fix signaling logic in
hv_need_to_signal_on_read()")

Signed-off-by: Dexuan Cui 
Reported-by: Rolf Neugebauer 
Tested-by: Rolf Neugebauer 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Signed-off-by: K. Y. Srinivasan 
Cc: Rolf Neugebauer 
Signed-off-by: Greg Kroah-Hartman

Drivers: hv: vmbus: On the read path cleanup the logic to interrupt the host

2017-02-14T23:25:38+00:00

commit 3372592a140db69fd63837e81f048ab4abf8111e upstream.

Signal the host when we determine the host is to be signaled -
on th read path. The currrent code determines the need to signal in the
ringbuffer code and actually issues the signal elsewhere. This can result
in the host viewing this interrupt as spurious since the host may also
poll the channel. Make the necessary adjustments.

Signed-off-by: K. Y. Srinivasan 
Cc: Rolf Neugebauer 
Signed-off-by: Greg Kroah-Hartman

Drivers: hv: vmbus: On write cleanup the logic to interrupt the host

2017-02-14T23:25:38+00:00

commit 1f6ee4e7d83586c8b10bd4f2f4346353d04ce884 upstream.

Signal the host when we determine the host is to be signaled.
The currrent code determines the need to signal in the ringbuffer
code and actually issues the signal elsewhere. This can result
in the host viewing this interrupt as spurious since the host may also
poll the channel. Make the necessary adjustments.

Signed-off-by: K. Y. Srinivasan 
Cc: Rolf Neugebauer 
Signed-off-by: Greg Kroah-Hartman

cpumask: use nr_cpumask_bits for parsing functions

2017-02-14T23:25:34+00:00

commit 4d59b6ccf000862beed6fc0765d3209f98a8d8a2 upstream.

Commit 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and
parsing functions") converted both cpumask printing and parsing
functions to use nr_cpu_ids instead of nr_cpumask_bits. While this was
okay for the printing functions as it just picked one of the two output
formats that we were alternating between depending on a kernel config,
doing the same for parsing wasn't okay.

nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS. We can always use
nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can
be more efficient to use NR_CPUS when we can get away with it.
Converting the printing functions to nr_cpu_ids makes sense because it
affects how the masks get presented to userspace and doesn't break
anything; however, using nr_cpu_ids for parsing functions can
incorrectly leave the higher bits uninitialized while reading in these
masks from userland. As all testing and comparison functions use
nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks
can erroneously yield false negative results.

This made the taskstats interface incorrectly return -EINVAL even when
the inputs were correct.

Fix it by restoring the parse functions to use nr_cpumask_bits instead
of nr_cpu_ids.

Link: http://lkml.kernel.org/r/20170206182442.GB31078@htj.duckdns.org
Fixes: 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and parsing functions")
Signed-off-by: Tejun Heo
Reported-by: Martin Steigerwald
Debugged-by: Ben Hutchings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

irqdomain: Avoid activating interrupts more than once

2017-02-09T07:08:31+00:00

commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c upstream.

Since commit f3b0946d629c ("genirq/msi: Make sure PCI MSIs are
activated early"), we can end-up activating a PCI/MSI twice (once
at allocation time, and once at startup time).

This is normally of no consequences, except that there is some
HW out there that may misbehave if activate is used more than once
(the GICv3 ITS, for example, uses the activate callback
to issue the MAPVI command, and the architecture spec says that
"If there is an existing mapping for the EventID-DeviceID
combination, behavior is UNPREDICTABLE").

While this could be worked around in each individual driver, it may
make more sense to tackle the issue at the core level. In order to
avoid getting in that situation, let's have a per-interrupt flag
to remember if we have already activated that interrupt or not.

Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-and-tested-by: Andre Przywara 
Signed-off-by: Marc Zyngier 
Link: http://lkml.kernel.org/r/1484668848-24361-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

percpu-refcount: fix reference leak during percpu-atomic transition

2017-02-09T07:08:28+00:00

commit 966d2b04e070bc040319aaebfec09e0144dc3341 upstream.

percpu_ref_tryget() and percpu_ref_tryget_live() should return
"true" IFF they acquire a reference. But the return value from
atomic_long_inc_not_zero() is a long and may have high bits set,
e.g. PERCPU_COUNT_BIAS, and the return value of the tryget routines
is bool so the reference may actually be acquired but the routines
return "false" which results in a reference leak since the caller
assumes it does not need to do a corresponding percpu_ref_put().

This was seen when performing CPU hotplug during I/O, as hangs in
blk_mq_freeze_queue_wait where percpu_ref_kill (blk_mq_freeze_queue_start)
raced with percpu_ref_tryget (blk_mq_timeout_work).
Sample stack trace:

__switch_to+0x2c0/0x450
__schedule+0x2f8/0x970
schedule+0x48/0xc0
blk_mq_freeze_queue_wait+0x94/0x120
blk_mq_queue_reinit_work+0xb8/0x180
blk_mq_queue_reinit_prepare+0x84/0xa0
cpuhp_invoke_callback+0x17c/0x600
cpuhp_up_callbacks+0x58/0x150
_cpu_up+0xf0/0x1c0
do_cpu_up+0x120/0x150
cpu_subsys_online+0x64/0xe0
device_online+0xb4/0x120
online_store+0xb4/0xc0
dev_attr_store+0x68/0xa0
sysfs_kf_write+0x80/0xb0
kernfs_fop_write+0x17c/0x250
__vfs_write+0x6c/0x1e0
vfs_write+0xd0/0x270
SyS_write+0x6c/0x110
system_call+0x38/0xe0

Examination of the queue showed a single reference (no PERCPU_COUNT_BIAS,
and __PERCPU_REF_DEAD, __PERCPU_REF_ATOMIC set) and no requests.
However, conditions at the time of the race are count of PERCPU_COUNT_BIAS + 0
and __PERCPU_REF_DEAD and __PERCPU_REF_ATOMIC set.

The fix is to make the tryget routines use an actual boolean internally instead
of the atomic long result truncated to a int.

Fixes: e625305b3907 percpu-refcount: make percpu_ref based on longs instead of ints
Link: https://bugzilla.kernel.org/show_bug.cgi?id=190751
Signed-off-by: Douglas Miller 
Reviewed-by: Jens Axboe 
Signed-off-by: Tejun Heo 
Fixes: e625305b3907 ("percpu-refcount: make percpu_ref based on longs instead of ints")
Signed-off-by: Greg Kroah-Hartman

base/memory, hotplug: fix a kernel oops in show_valid_zones()

2017-02-09T07:08:28+00:00

commit a96dfddbcc04336bbed50dc2b24823e45e09e80c upstream.

Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

[1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
Signed-off-by: Toshi Kani 
Cc: Greg Kroah-Hartman 
Cc: Zhang Zhen 
Cc: Reza Arbab 
Cc: David Rientjes 
Cc: Dan Williams 
Signed-off-by: Greg Kroah-Hartman 

Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds