linux-toradex.git/block, branch Colibri_T30_LinuxImageV2.1Beta2_20140206

block: initialize request_queue's numa node during

2012-01-11T17:26:34+00:00

commit 5151412dd4338b273afdb107c3772528e9e67d92 upstream.

struct request_queue is allocated with __GFP_ZERO so its "node" field is
zero before initialization.  This causes an oops if node 0 is offline in
the page allocator because its zonelists are not initialized.  From Dave
Young's dmesg:

	SRAT: Node 1 PXM 2 0-d0000000
	SRAT: Node 1 PXM 2 100000000-330000000
	SRAT: Node 0 PXM 1 330000000-630000000
	Initmem setup node 1 0000000000000000-000000000affb000
	...
	Built 1 zonelists in Node order, mobility grouping on.
	...
	BUG: unable to handle kernel paging request at 0000000000001c08
	IP: [] __alloc_pages_nodemask+0xb5/0x870

and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on
zonelist->_zonerefs.

The fix is to initialize q->node at the time of allocation so the correct
node is passed to the slab allocator later.

Since blk_init_allocated_queue_node() is no longer needed, merge it with
blk_init_allocated_queue().

[rientjes@google.com: changelog, initializing q->node]
Reported-by: Dave Young 
Signed-off-by: Mike Snitzer 
Signed-off-by: David Rientjes 
Tested-by: Dave Young 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman 

Change-Id: I24b14588aef6226f3bcdf37e78af61cbe9a31fd2
Reviewed-on: http://git-master/r/74168
Reviewed-by: Varun Wadekar 
Tested-by: Varun Wadekar

cfq-iosched: fix cfq_cic_link() race confition

2012-01-11T17:23:31+00:00

commit 5eb46851de3904cd1be9192fdacb8d34deadc1fc upstream.

cfq_cic_link() has race condition. When some processes which shared ioc
issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
sometimes. The race condition might stop I/O by following steps:

step  1: Process A: Issue an I/O to /dev/sda
step  2: Process A: Get an ioc (iocA here) in get_io_context() which does not
		    linked with a cic for the device
step  3: Process A: Get a new cic for the device (cicA here) in
		    cfq_alloc_io_context()

step  4: Process B: Issue an I/O to /dev/sda
step  5: Process B: Get iocA in get_io_context() since process A and B share the
		    same ioc
step  6: Process B: Get a new cic for the device (cicB here) in
		    cfq_alloc_io_context() since iocA has not been linked with a
		    cic for the device yet

step  7: Process A: Link cicA to iocA in cfq_cic_link()
step  8: Process A: Dispatch I/O to driver and finish it

step  9: Process B: Try to link cicB to iocA in cfq_cic_link()
		    But it fails with showing "cfq: cic link failed!" kernel
		    message, since iocA has already linked with cicA at step 7.
step 10: Process B: Wait for finishig I/O in get_request_wait()
		    The function does not wake up, when there is no I/O to the
		    device.

When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().

Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman 

Change-Id: I679b98b517dcddd7c3568081b50948a786884ad1
Reviewed-on: http://git-master/r/74162
Reviewed-by: Varun Wadekar 
Tested-by: Varun Wadekar

cfq-iosched: free cic_index if blkio_alloc_blkg_stats fails

2012-01-11T17:23:11+00:00

commit 2984ff38ccf6cbc02a7a996a36c7d6f69f3c6146 upstream.

If we fail allocating the blkpg stats, we free cfqd and cfgq.
But we need to free the IDA cfqd->cic_index as well.

Signed-off-by: majianpeng 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman 

Change-Id: Ie0b58526fabbd53e2343f9ee0474f2070d717967
Reviewed-on: http://git-master/r/74161
Reviewed-by: Varun Wadekar 
Tested-by: Varun Wadekar

block: genhd: Add disk/partition specific uevent callbacks for partition info

2011-12-01T05:38:14+00:00

	For disk devices, a new uevent parameter 'NPARTS' specifies the number
of partitions detected by the kernel. Partition devices get 'PARTN' which
specifies the partitions index in the table.

Signed-off-by: San Mehat

block: Always check length of all iov entries in blk_rq_map_user_iov()

2011-11-21T22:35:29+00:00

commit 6b76106d8ef31111d6fc469564b83b5f5542794f upstream.

Even after commit 5478755616ae2ef1ce144dded589b62b2a50d575
("block: check for proper length of iov entries earlier ...")
we still won't check for zero-length entries after an unaligned
entry.  Remove the break-statement, so all entries are checked.

Signed-off-by: Ben Hutchings 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blk-flush: move the queue kick into

2011-11-11T17:44:36+00:00

commit e67b77c791ca2778198c9e7088f3266ed2da7a55 upstream.

A dm-multipath user reported[1] a problem when trying to boot
a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
(block: fix flush machinery for stacking drivers with differring
flush flags) applied.  It turns out that an empty flush request
can be sent into blk_insert_flush.  When the BUG_ON was fixed
to allow for this, I/O on the underlying device would stall.  The
reason is that blk_insert_cloned_request does not kick the queue.
In the aforementioned commit, I had added a special case to
kick the queue if data was sent down but the queue flags did
not require a flush.  A better solution is to push the queue
kick up into blk_insert_cloned_request.

This patch, along with a follow-on which fixes the BUG_ON, fixes
the issue reported.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout 
Signed-off-by: Jeff Moyer 
Acked-by: Tejun Heo 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blk-flush: fix invalid BUG_ON in blk_insert_flush

2011-11-11T17:44:34+00:00

commit 834f9f61a525d2f6d3d0c93894e26326c8d3ceed upstream.

A user reported a regression due to commit
4853abaae7e4a2af938115ce9071ef8684fb7af4 (block: fix flush
machinery for stacking drivers with differring flush flags).
Part of the problem is that blk_insert_flush required a
single bio be attached to the request.  In reality, having
no attached bio is also a valid case, as can be observed with
an empty flush.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout 
Signed-off-by: Jeff Moyer 
Acked-by: Tejun Heo 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: make gendisk hold a reference to its queue

2011-11-11T17:44:30+00:00

commit f992ae801a7dec34a4ed99a6598bbbbfb82af4fb upstream.

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[]  [] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [] lock_acquire+0x95/0x140
  [] _raw_spin_lock+0x3b/0x50
  [] bdi_lock_two+0x5c/0x70
  [] bdev_inode_switch_bdi+0x4c/0xf0
  [] __blkdev_put+0x11b/0x1d0
  [] __blkdev_put+0x160/0x1d0
  [] blkdev_put+0x5f/0x190
  [] kill_block_super+0x4d/0x80
  [] deactivate_locked_super+0x45/0x70
  [] deactivate_super+0x4a/0x70
  [] mntput_no_expire+0xed/0x130
  [] sys_umount+0x7e/0x3a0
  [] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo 
Cc: Jens Axboe 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: Free queue resources at blk_release_queue()

2011-09-28T14:07:01+00:00

A kernel crash is observed when a mounted ext3/ext4 filesystem is
physically removed. The problem is that blk_cleanup_queue() frees up
some resources eg by calling elevator_exit(), which are not checked for
in normal operation. So we should rather move these calls to the
destructor function blk_release_queue() as at that point all remaining
references are gone. However, in doing so we have to ensure that any
externally supplied queue_lock is disconnected as the driver might free
up the lock after the call of blk_cleanup_queue(),

Signed-off-by: Hannes Reinecke 
Signed-off-by: Jens Axboe

blk-cgroup: be able to remove the record of unplugged device

2011-09-21T08:22:10+00:00

The bug is we're not able to remove the device from blkio cgroup's
per-device control files if it gets unplugged.

To reproduce the bug:

  # mount -t cgroup -o blkio xxx /cgroup
  # cd /cgroup
  # echo "8:0 1000" > blkio.throttle.read_bps_device
  # unplug the device
  # cat blkio.throttle.read_bps_device
  8:0	1000
  # echo "8:0 0" > blkio.throttle.read_bps_device
  -bash: echo: write error: No such device

After patching, the device removal will succeed.

Thanks for the comments of Paul, Zefan, and Vivek.

Signed-off-by: Wanlong Gao 
Cc: Li Zefan 
Cc: Paul Menage 
Acked-by: Vivek Goyal 
Cc: Jens Axboe 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Jens Axboe