linux-toradex.git, branch v3.16.3

Linux 3.16.3

2014-09-17T17:22:16+00:00

KEYS: Fix termination condition in assoc array garbage collection

2014-09-17T16:22:23+00:00

commit 95389b08d93d5c06ec63ab49bd732b0069b7c35e upstream.

This fixes CVE-2014-3631.

It is possible for an associative array to end up with a shortcut node at the
root of the tree if there are more than fan-out leaves in the tree, but they
all crowd into the same slot in the lowest level (ie. they all have the same
first nibble of their index keys).

When assoc_array_gc() returns back up the tree after scanning some leaves, it
can fall off of the root and crash because it assumes that the back pointer
from a shortcut (after label ascend_old_tree) must point to a normal node -
which isn't true of a shortcut node at the root.

Should we find we're ascending rootwards over a shortcut, we should check to
see if the backpointer is zero - and if it is, we have completed the scan.

This particular bug cannot occur if the root node is not a shortcut - ie. if
you have fewer than 17 keys in a keyring or if you have at least two keys that
sit into separate slots (eg. a keyring and a non keyring).

This can be reproduced by:

	ring=`keyctl newring bar @s`
	for ((i=1; i<=18; i++)); do last_key=`keyctl newring foo$i $ring`; done
	keyctl timeout $last_key 2

Doing this:

	echo 3 >/proc/sys/kernel/keys/gc_delay

first will speed things up.

If we do fall off of the top of the tree, we get the following oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [] assoc_array_gc+0x2f7/0x540
PGD dae15067 PUD cfc24067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: xt_nat xt_mark nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_ni
CPU: 0 PID: 26011 Comm: kworker/0:1 Not tainted 3.14.9-200.fc20.x86_64 #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: events key_garbage_collector
task: ffff8800918bd580 ti: ffff8800aac14000 task.ti: ffff8800aac14000
RIP: 0010:[] [] assoc_array_gc+0x2f7/0x540
RSP: 0018:ffff8800aac15d40  EFLAGS: 00010206
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800aaecacc0
RDX: ffff8800daecf440 RSI: 0000000000000001 RDI: ffff8800aadc2bc0
RBP: ffff8800aac15da8 R08: 0000000000000001 R09: 0000000000000003
R10: ffffffff8136ccc7 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000070 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 00000000db10d000 CR4: 00000000000006f0
Stack:
 ffff8800aac15d50 0000000000000011 ffff8800aac15db8 ffffffff812e2a70
 ffff880091a00600 0000000000000000 ffff8800aadc2bc3 00000000cd42c987
 ffff88003702df20 ffff88003702dfa0 0000000053b65c09 ffff8800aac15fd8
Call Trace:
 [] ? keyring_detect_cycle_iterator+0x30/0x30
 [] keyring_gc+0x75/0x80
 [] key_garbage_collector+0x154/0x3c0
 [] process_one_work+0x176/0x430
 [] worker_thread+0x11b/0x3a0
 [] ? rescuer_thread+0x3b0/0x3b0
 [] kthread+0xd8/0xf0
 [] ? insert_kthread_work+0x40/0x40
 [] ret_from_fork+0x7c/0xb0
 [] ? insert_kthread_work+0x40/0x40
Code: 08 4c 8b 22 0f 84 bf 00 00 00 41 83 c7 01 49 83 e4 fc 41 83 ff 0f 4c 89 65 c0 0f 8f 5a fe ff ff 48 8b 45 c0 4d 63 cf 49 83 c1 02 <4e> 8b 34 c8 4d 85 f6 0f 84 be 00 00 00 41 f6 c6 01 0f 84 92
RIP  [] assoc_array_gc+0x2f7/0x540
 RSP 
CR2: 0000000000000018
---[ end trace 1129028a088c0cbd ]---

Signed-off-by: David Howells 
Acked-by: Don Zickus 
Signed-off-by: James Morris 
Signed-off-by: Greg Kroah-Hartman

KEYS: Fix use-after-free in assoc_array_gc()

2014-09-17T16:22:22+00:00

commit 27419604f51a97d497853f14142c1059d46eb597 upstream.

An edit script should be considered inaccessible by a function once it has
called assoc_array_apply_edit() or assoc_array_cancel_edit().

However, assoc_array_gc() is accessing the edit script just after the
gc_complete: label.

Reported-by: Andreea-Cristina Bernat 
Signed-off-by: David Howells 
Reviewed-by: Andreea-Cristina Bernat 
cc: shemming@brocade.com
cc: paulmck@linux.vnet.ibm.com
Signed-off-by: James Morris 
Signed-off-by: Greg Kroah-Hartman

CIFS: Fix SMB2 readdir error handling

2014-09-17T16:22:22+00:00

commit 52755808d4525f4d5b86d112d36ffc7a46f3fb48 upstream.

SMB2 servers indicates the end of a directory search with
STATUS_NO_MORE_FILE error code that is not processed now.
This causes generic/257 xfstest to fail. Fix this by triggering
the end of search by this error code in SMB2_query_directory.

Also when negotiating CIFS protocol we tell the server to close
the search automatically at the end and there is no need to do
it itself. In the case of SMB2 protocol, we need to close it
explicitly - separate close directory checks for different
protocols.

Signed-off-by: Pavel Shilovsky 
Signed-off-by: Steve French 
Signed-off-by: Greg Kroah-Hartman

vfs: fix bad hashing of dentries

2014-09-17T16:22:22+00:00

commit 99d263d4c5b2f541dfacb5391e22e8c91ea982a6 upstream.

Josef Bacik found a performance regression between 3.2 and 3.10 and
narrowed it down to commit bfcfaa77bdf0 ("vfs: use 'unsigned long'
accesses for dcache name comparison and hashing"). He reports:

 "The test case is essentially

      for (i = 0; i < 1000000; i++)
              mkdir("a$i");

  On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
  dir/sec with 3.10.  This is because we spend waaaaay more time in
  __d_lookup on 3.10 than in 3.2.

  The new hashing function for strings is suboptimal for <
  sizeof(unsigned long) string names (and hell even > sizeof(unsigned
  long) string names that I've tested).  I broke out the old hashing
  function and the new one into a userspace helper to get real numbers
  and this is what I'm getting:

      Old hash table had 1000000 entries, 0 dupes, 0 max dupes
      New hash table had 12628 entries, 987372 dupes, 900 max dupes
      We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash

  My test does the hash, and then does the d_hash into a integer pointer
  array the same size as the dentry hash table on my system, and then
  just increments the value at the address we got to see how many
  entries we overlap with.

  As you can see the old hash function ended up with all 1 million
  entries in their own bucket, whereas the new one they are only
  distributed among ~12.5k buckets, which is why we're using so much
  more CPU in __d_lookup".

The reason for this hash regression is two-fold:

 - On 64-bit architectures the down-mixing of the original 64-bit
   word-at-a-time hash into the final 32-bit hash value is very
   simplistic and suboptimal, and just adds the two 32-bit parts
   together.

   In particular, because there is no bit shuffling and the mixing
   boundary is also a byte boundary, similar character patterns in the
   low and high word easily end up just canceling each other out.

 - the old byte-at-a-time hash mixed each byte into the final hash as it
   hashed the path component name, resulting in the low bits of the hash
   generally being a good source of hash data.  That is not true for the
   word-at-a-time case, and the hash data is distributed among all the
   bits.

The fix is the same in both cases: do a better job of mixing the bits up
and using as much of the hash data as possible.  We already have the
"hash_32|64()" functions to do that.

Reported-by: Josef Bacik 
Cc: Al Viro 
Cc: Christoph Hellwig 
Cc: Chris Mason 
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

drm/nouveau: Bump version from 1.1.1 to 1.1.2

2014-09-17T16:22:22+00:00

commit 7820e5eef0faa4a5e10834296680827f7ce78a89 upstream.

Linux 3.16 fixed multiple bugs in kms pageflip completion events
and timestamping, which were originally introduced in Linux 3.13.

These fixes have been backported to all stable kernels since 3.13.

However, the userspace nouveau-ddx needs to be aware if it is
running on a kernel on which these bugs are fixed, or not.

Bump the patchlevel of the drm driver version to signal this,
so backporting this patch to stable 3.13+ kernels will give the
ddx the required info.

Signed-off-by: Mario Kleiner 
Signed-off-by: Ben Skeggs 
Signed-off-by: Greg Kroah-Hartman

drm/nouveau: Dis/Enable vblank irqs during suspend/resume.

2014-09-17T16:22:21+00:00

commit 9cba5efab5a8145ae6c52ea273553f069c294482 upstream.

Vblank irqs don't get disabled during suspend or driver
unload, which causes irq delivery after "suspend" or
driver unload, at least until the gpu is powered off.
This could race with drm_vblank_cleanup() in the case
of nouveau and cause a use-after-free bug if the driver
is unloaded.

More annoyingly during everyday use, at least on nv50
display engine (likely also others), vblank irqs are
off after a resume from suspend, but the drm doesn't
know this, so all vblank related functionality is dead
after a resume. E.g., all windowed OpenGL clients will
hang at swapbuffers time, as well as many fullscreen
clients in many cases. This makes suspend/resume useless
if one wants to use any OpenGL apps after the resume.

In Linux 3.16, drm_vblank_on() was added, complementing
the older drm_vblank_off()  to solve these problems
elegantly, so use those calls in nouveaus suspend/resume
code.

For kernels 3.8 - 3.15, we need to cherry-pick the
drm_vblank_on() patch to support this patch.

Signed-off-by: Mario Kleiner 
Signed-off-by: Ben Skeggs 
Signed-off-by: Greg Kroah-Hartman

IB/srp: Fix deadlock between host removal and multipathd

2014-09-17T16:22:21+00:00

commit bcc05910359183b431da92713e98eed478edf83a upstream.

If scsi_remove_host() is invoked after a SCSI device has been blocked,
if the fast_io_fail_tmo or dev_loss_tmo work gets scheduled on the
workqueue executing srp_remove_work() and if an I/O request is
scheduled after the SCSI device had been blocked by e.g. multipathd
then the following deadlock can occur:

    kworker/6:1     D ffff880831f3c460     0   195      2 0x00000000
    Call Trace:
     [] schedule+0x29/0x70
     [] schedule_timeout+0x10f/0x2a0
     [] msleep+0x2f/0x40
     [] __blk_drain_queue+0x4e/0x180
     [] blk_cleanup_queue+0x225/0x230
     [] __scsi_remove_device+0x62/0xe0 [scsi_mod]
     [] scsi_forget_host+0x6f/0x80 [scsi_mod]
     [] scsi_remove_host+0x7a/0x130 [scsi_mod]
     [] srp_remove_work+0x95/0x180 [ib_srp]
     [] process_one_work+0x1ea/0x6c0
     [] worker_thread+0x11b/0x3a0
     [] kthread+0xed/0x110
     [] ret_from_fork+0x7c/0xb0
    multipathd      D ffff880096acc460     0  5340      1 0x00000000
    Call Trace:
     [] schedule+0x29/0x70
     [] schedule_timeout+0x10f/0x2a0
     [] io_schedule_timeout+0x9b/0xf0
     [] wait_for_completion_io_timeout+0xdc/0x110
     [] blk_execute_rq+0x9b/0x100
     [] sg_io+0x1a5/0x450
     [] scsi_cmd_ioctl+0x2a1/0x430
     [] scsi_cmd_blk_ioctl+0x42/0x50
     [] sd_ioctl+0xbe/0x140 [sd_mod]
     [] blkdev_ioctl+0x234/0x840
     [] block_ioctl+0x41/0x50
     [] do_vfs_ioctl+0x300/0x520
     [] SyS_ioctl+0x41/0x80
     [] tracesys+0xd0/0xd5

Fix this by scheduling removal work on another workqueue than the
transport layer timers.

Signed-off-by: Bart Van Assche 
Reviewed-by: Sagi Grimberg 
Reviewed-by: David Dillow 
Cc: Sebastian Parschauer 
Signed-off-by: Roland Dreier 
Signed-off-by: Greg Kroah-Hartman

dm table: propagate QUEUE_FLAG_NO_SG_MERGE

2014-09-17T16:22:21+00:00

commit 200612ec33e555a356eebc717630b866ae2b694f upstream.

Commit 05f1dd5 ("block: add queue flag for disabling SG merging")
introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
default in blk_mq_init_queue for mq-enabled devices.  The effect of
the flag is to bypass the SG segment merging.  Instead, the
bio->bi_vcnt is used as the number of hardware segments.

With a device mapper target on top of a device with
QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
than a driver is prepared to handle.  I ran into this when backporting
the virtio_blk mq support.  It triggerred this BUG_ON, in
virtio_queue_rq:

        BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);

The queue's max is set here:
        blk_queue_max_segments(q, vblk->sg_elems-2);

Basically, what happens is that a bio is built up for the dm device
(which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
bio_add_page.  That path will call into __blk_recalc_rq_segments, so
what you end up with is bi_phys_segments being much smaller than bi_vcnt
(and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
is submitted, it gets cloned.  When the cloned bio is submitted, it will
end up in blk_recount_segments, here:

        if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
                bio->bi_phys_segments = bio->bi_vcnt;

and now we've set bio->bi_phys_segments to a number that is beyond what
was registered as queue_max_segments by the driver.

The right way to fix this is to propagate the queue flag up the stack.

The rules for propagating the flag are simple:
- if the flag is set for any underlying device, it must be set for the
  upper device
- consequently, if the flag is not set for any underlying device, it
  should not be set for the upper device.

Signed-off-by: Jeff Moyer 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

mtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc()

2014-09-17T16:22:21+00:00

commit 40ddbf5069bd4e11447c0088fc75318e0aac53f0 upstream.

commit 65b97cf6b8de introduced in v3.7 caused a regression
by using a reversed CS_MASK thus causing omap_calculate_ecc to
always fail. As the NAND base driver never checks for .calculate()'s
return value, the zeroed ECC values are used as is without showing
any error to the user. However, this won't work and the NAND device
won't be guarded by any error code.

Fix the issue by using the correct mask.

Code was tested on omap3beagle using the following procedure
- flash the primary bootloader (MLO) from the kernel to the first
NAND partition using nandwrite.
- boot the board from NAND. This utilizes OMAP ROM loader that
relies on 1-bit Hamming code ECC.

Fixes: 65b97cf6b8de (mtd: nand: omap2: handle nand on gpmc)

Signed-off-by: Roger Quadros 
Signed-off-by: Tony Lindgren 
Signed-off-by: Greg Kroah-Hartman