<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/block, branch Colibri_T30_LinuxImageV2.1Beta2_20140206</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>block: initialize request_queue's numa node during</title>
<updated>2012-01-11T17:26:34+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2011-11-23T09:59:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b5f50e1779db150d7c907f1ecf3caa7e31458787'/>
<id>b5f50e1779db150d7c907f1ecf3caa7e31458787</id>
<content type='text'>
commit 5151412dd4338b273afdb107c3772528e9e67d92 upstream.

struct request_queue is allocated with __GFP_ZERO so its "node" field is
zero before initialization.  This causes an oops if node 0 is offline in
the page allocator because its zonelists are not initialized.  From Dave
Young's dmesg:

	SRAT: Node 1 PXM 2 0-d0000000
	SRAT: Node 1 PXM 2 100000000-330000000
	SRAT: Node 0 PXM 1 330000000-630000000
	Initmem setup node 1 0000000000000000-000000000affb000
	...
	Built 1 zonelists in Node order, mobility grouping on.
	...
	BUG: unable to handle kernel paging request at 0000000000001c08
	IP: [&lt;ffffffff8111c355&gt;] __alloc_pages_nodemask+0xb5/0x870

and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on
zonelist-&gt;_zonerefs.

The fix is to initialize q-&gt;node at the time of allocation so the correct
node is passed to the slab allocator later.

Since blk_init_allocated_queue_node() is no longer needed, merge it with
blk_init_allocated_queue().

[rientjes@google.com: changelog, initializing q-&gt;node]
Reported-by: Dave Young &lt;dyoung@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Tested-by: Dave Young &lt;dyoung@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

Change-Id: I24b14588aef6226f3bcdf37e78af61cbe9a31fd2
Reviewed-on: http://git-master/r/74168
Reviewed-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
Tested-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5151412dd4338b273afdb107c3772528e9e67d92 upstream.

struct request_queue is allocated with __GFP_ZERO so its "node" field is
zero before initialization.  This causes an oops if node 0 is offline in
the page allocator because its zonelists are not initialized.  From Dave
Young's dmesg:

	SRAT: Node 1 PXM 2 0-d0000000
	SRAT: Node 1 PXM 2 100000000-330000000
	SRAT: Node 0 PXM 1 330000000-630000000
	Initmem setup node 1 0000000000000000-000000000affb000
	...
	Built 1 zonelists in Node order, mobility grouping on.
	...
	BUG: unable to handle kernel paging request at 0000000000001c08
	IP: [&lt;ffffffff8111c355&gt;] __alloc_pages_nodemask+0xb5/0x870

and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on
zonelist-&gt;_zonerefs.

The fix is to initialize q-&gt;node at the time of allocation so the correct
node is passed to the slab allocator later.

Since blk_init_allocated_queue_node() is no longer needed, merge it with
blk_init_allocated_queue().

[rientjes@google.com: changelog, initializing q-&gt;node]
Reported-by: Dave Young &lt;dyoung@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Tested-by: Dave Young &lt;dyoung@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

Change-Id: I24b14588aef6226f3bcdf37e78af61cbe9a31fd2
Reviewed-on: http://git-master/r/74168
Reviewed-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
Tested-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cfq-iosched: fix cfq_cic_link() race confition</title>
<updated>2012-01-11T17:23:31+00:00</updated>
<author>
<name>Yasuaki Ishimatsu</name>
<email>isimatu.yasuaki@jp.fujitsu.com</email>
</author>
<published>2011-12-02T09:07:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2be95e9bbff7680659d9a17f5d04d2c2c638edb9'/>
<id>2be95e9bbff7680659d9a17f5d04d2c2c638edb9</id>
<content type='text'>
commit 5eb46851de3904cd1be9192fdacb8d34deadc1fc upstream.

cfq_cic_link() has race condition. When some processes which shared ioc
issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
sometimes. The race condition might stop I/O by following steps:

step  1: Process A: Issue an I/O to /dev/sda
step  2: Process A: Get an ioc (iocA here) in get_io_context() which does not
		    linked with a cic for the device
step  3: Process A: Get a new cic for the device (cicA here) in
		    cfq_alloc_io_context()

step  4: Process B: Issue an I/O to /dev/sda
step  5: Process B: Get iocA in get_io_context() since process A and B share the
		    same ioc
step  6: Process B: Get a new cic for the device (cicB here) in
		    cfq_alloc_io_context() since iocA has not been linked with a
		    cic for the device yet

step  7: Process A: Link cicA to iocA in cfq_cic_link()
step  8: Process A: Dispatch I/O to driver and finish it

step  9: Process B: Try to link cicB to iocA in cfq_cic_link()
		    But it fails with showing "cfq: cic link failed!" kernel
		    message, since iocA has already linked with cicA at step 7.
step 10: Process B: Wait for finishig I/O in get_request_wait()
		    The function does not wake up, when there is no I/O to the
		    device.

When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().

Signed-off-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

Change-Id: I679b98b517dcddd7c3568081b50948a786884ad1
Reviewed-on: http://git-master/r/74162
Reviewed-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
Tested-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5eb46851de3904cd1be9192fdacb8d34deadc1fc upstream.

cfq_cic_link() has race condition. When some processes which shared ioc
issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
sometimes. The race condition might stop I/O by following steps:

step  1: Process A: Issue an I/O to /dev/sda
step  2: Process A: Get an ioc (iocA here) in get_io_context() which does not
		    linked with a cic for the device
step  3: Process A: Get a new cic for the device (cicA here) in
		    cfq_alloc_io_context()

step  4: Process B: Issue an I/O to /dev/sda
step  5: Process B: Get iocA in get_io_context() since process A and B share the
		    same ioc
step  6: Process B: Get a new cic for the device (cicB here) in
		    cfq_alloc_io_context() since iocA has not been linked with a
		    cic for the device yet

step  7: Process A: Link cicA to iocA in cfq_cic_link()
step  8: Process A: Dispatch I/O to driver and finish it

step  9: Process B: Try to link cicB to iocA in cfq_cic_link()
		    But it fails with showing "cfq: cic link failed!" kernel
		    message, since iocA has already linked with cicA at step 7.
step 10: Process B: Wait for finishig I/O in get_request_wait()
		    The function does not wake up, when there is no I/O to the
		    device.

When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().

Signed-off-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

Change-Id: I679b98b517dcddd7c3568081b50948a786884ad1
Reviewed-on: http://git-master/r/74162
Reviewed-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
Tested-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cfq-iosched: free cic_index if blkio_alloc_blkg_stats fails</title>
<updated>2012-01-11T17:23:11+00:00</updated>
<author>
<name>majianpeng</name>
<email>majianpeng@gmail.com</email>
</author>
<published>2011-11-30T14:47:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4b0dccaad0b0cc6148b8f2c61f9364f56c35f4fb'/>
<id>4b0dccaad0b0cc6148b8f2c61f9364f56c35f4fb</id>
<content type='text'>
commit 2984ff38ccf6cbc02a7a996a36c7d6f69f3c6146 upstream.

If we fail allocating the blkpg stats, we free cfqd and cfgq.
But we need to free the IDA cfqd-&gt;cic_index as well.

Signed-off-by: majianpeng &lt;majianpeng@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

Change-Id: Ie0b58526fabbd53e2343f9ee0474f2070d717967
Reviewed-on: http://git-master/r/74161
Reviewed-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
Tested-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2984ff38ccf6cbc02a7a996a36c7d6f69f3c6146 upstream.

If we fail allocating the blkpg stats, we free cfqd and cfgq.
But we need to free the IDA cfqd-&gt;cic_index as well.

Signed-off-by: majianpeng &lt;majianpeng@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

Change-Id: Ie0b58526fabbd53e2343f9ee0474f2070d717967
Reviewed-on: http://git-master/r/74161
Reviewed-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
Tested-by: Varun Wadekar &lt;vwadekar@nvidia.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: genhd: Add disk/partition specific uevent callbacks for partition info</title>
<updated>2011-12-01T05:38:14+00:00</updated>
<author>
<name>San Mehat</name>
<email>san@google.com</email>
</author>
<published>2009-10-10T16:35:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cccdf161d25abf929efa998b42b5436ad32f3420'/>
<id>cccdf161d25abf929efa998b42b5436ad32f3420</id>
<content type='text'>
	For disk devices, a new uevent parameter 'NPARTS' specifies the number
of partitions detected by the kernel. Partition devices get 'PARTN' which
specifies the partitions index in the table.

Signed-off-by: San Mehat &lt;san@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
	For disk devices, a new uevent parameter 'NPARTS' specifies the number
of partitions detected by the kernel. Partition devices get 'PARTN' which
specifies the partitions index in the table.

Signed-off-by: San Mehat &lt;san@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: Always check length of all iov entries in blk_rq_map_user_iov()</title>
<updated>2011-11-21T22:35:29+00:00</updated>
<author>
<name>Ben Hutchings</name>
<email>ben@decadent.org.uk</email>
</author>
<published>2011-11-13T18:58:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=742b42b9e87b64b2d2b54826dd3761897e71502d'/>
<id>742b42b9e87b64b2d2b54826dd3761897e71502d</id>
<content type='text'>
commit 6b76106d8ef31111d6fc469564b83b5f5542794f upstream.

Even after commit 5478755616ae2ef1ce144dded589b62b2a50d575
("block: check for proper length of iov entries earlier ...")
we still won't check for zero-length entries after an unaligned
entry.  Remove the break-statement, so all entries are checked.

Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6b76106d8ef31111d6fc469564b83b5f5542794f upstream.

Even after commit 5478755616ae2ef1ce144dded589b62b2a50d575
("block: check for proper length of iov entries earlier ...")
we still won't check for zero-length entries after an unaligned
entry.  Remove the break-statement, so all entries are checked.

Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>blk-flush: move the queue kick into</title>
<updated>2011-11-11T17:44:36+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2011-10-17T10:57:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e94d1d00463f6fdcef956828ff698a2bce3f7a8f'/>
<id>e94d1d00463f6fdcef956828ff698a2bce3f7a8f</id>
<content type='text'>
commit e67b77c791ca2778198c9e7088f3266ed2da7a55 upstream.

A dm-multipath user reported[1] a problem when trying to boot
a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
(block: fix flush machinery for stacking drivers with differring
flush flags) applied.  It turns out that an empty flush request
can be sent into blk_insert_flush.  When the BUG_ON was fixed
to allow for this, I/O on the underlying device would stall.  The
reason is that blk_insert_cloned_request does not kick the queue.
In the aforementioned commit, I had added a special case to
kick the queue if data was sent down but the queue flags did
not require a flush.  A better solution is to push the queue
kick up into blk_insert_cloned_request.

This patch, along with a follow-on which fixes the BUG_ON, fixes
the issue reported.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout &lt;christophe@saout.de&gt;
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e67b77c791ca2778198c9e7088f3266ed2da7a55 upstream.

A dm-multipath user reported[1] a problem when trying to boot
a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
(block: fix flush machinery for stacking drivers with differring
flush flags) applied.  It turns out that an empty flush request
can be sent into blk_insert_flush.  When the BUG_ON was fixed
to allow for this, I/O on the underlying device would stall.  The
reason is that blk_insert_cloned_request does not kick the queue.
In the aforementioned commit, I had added a special case to
kick the queue if data was sent down but the queue flags did
not require a flush.  A better solution is to push the queue
kick up into blk_insert_cloned_request.

This patch, along with a follow-on which fixes the BUG_ON, fixes
the issue reported.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout &lt;christophe@saout.de&gt;
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>blk-flush: fix invalid BUG_ON in blk_insert_flush</title>
<updated>2011-11-11T17:44:34+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2011-10-17T10:57:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d16a63a89dcb0cece02469b3bb17062c3d48656f'/>
<id>d16a63a89dcb0cece02469b3bb17062c3d48656f</id>
<content type='text'>
commit 834f9f61a525d2f6d3d0c93894e26326c8d3ceed upstream.

A user reported a regression due to commit
4853abaae7e4a2af938115ce9071ef8684fb7af4 (block: fix flush
machinery for stacking drivers with differring flush flags).
Part of the problem is that blk_insert_flush required a
single bio be attached to the request.  In reality, having
no attached bio is also a valid case, as can be observed with
an empty flush.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout &lt;christophe@saout.de&gt;
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 834f9f61a525d2f6d3d0c93894e26326c8d3ceed upstream.

A user reported a regression due to commit
4853abaae7e4a2af938115ce9071ef8684fb7af4 (block: fix flush
machinery for stacking drivers with differring flush flags).
Part of the problem is that blk_insert_flush required a
single bio be attached to the request.  In reality, having
no attached bio is also a valid case, as can be observed with
an empty flush.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout &lt;christophe@saout.de&gt;
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: make gendisk hold a reference to its queue</title>
<updated>2011-11-11T17:44:30+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-10-17T11:42:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6022736e9bb4ff26f0e5556814aa43c4057e7e86'/>
<id>6022736e9bb4ff26f0e5556814aa43c4057e7e86</id>
<content type='text'>
commit f992ae801a7dec34a4ed99a6598bbbbfb82af4fb upstream.

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 &gt; /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[&lt;ffffffff810d0879&gt;]  [&lt;ffffffff810d0879&gt;] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [&lt;ffffffff810d2845&gt;] lock_acquire+0x95/0x140
  [&lt;ffffffff81aed87b&gt;] _raw_spin_lock+0x3b/0x50
  [&lt;ffffffff811573bc&gt;] bdi_lock_two+0x5c/0x70
  [&lt;ffffffff811c2f6c&gt;] bdev_inode_switch_bdi+0x4c/0xf0
  [&lt;ffffffff811c3fcb&gt;] __blkdev_put+0x11b/0x1d0
  [&lt;ffffffff811c4010&gt;] __blkdev_put+0x160/0x1d0
  [&lt;ffffffff811c40df&gt;] blkdev_put+0x5f/0x190
  [&lt;ffffffff8118f18d&gt;] kill_block_super+0x4d/0x80
  [&lt;ffffffff8118f4a5&gt;] deactivate_locked_super+0x45/0x70
  [&lt;ffffffff8119003a&gt;] deactivate_super+0x4a/0x70
  [&lt;ffffffff811ac4ad&gt;] mntput_no_expire+0xed/0x130
  [&lt;ffffffff811acf2e&gt;] sys_umount+0x7e/0x3a0
  [&lt;ffffffff81aeeeab&gt;] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f992ae801a7dec34a4ed99a6598bbbbfb82af4fb upstream.

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 &gt; /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[&lt;ffffffff810d0879&gt;]  [&lt;ffffffff810d0879&gt;] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [&lt;ffffffff810d2845&gt;] lock_acquire+0x95/0x140
  [&lt;ffffffff81aed87b&gt;] _raw_spin_lock+0x3b/0x50
  [&lt;ffffffff811573bc&gt;] bdi_lock_two+0x5c/0x70
  [&lt;ffffffff811c2f6c&gt;] bdev_inode_switch_bdi+0x4c/0xf0
  [&lt;ffffffff811c3fcb&gt;] __blkdev_put+0x11b/0x1d0
  [&lt;ffffffff811c4010&gt;] __blkdev_put+0x160/0x1d0
  [&lt;ffffffff811c40df&gt;] blkdev_put+0x5f/0x190
  [&lt;ffffffff8118f18d&gt;] kill_block_super+0x4d/0x80
  [&lt;ffffffff8118f4a5&gt;] deactivate_locked_super+0x45/0x70
  [&lt;ffffffff8119003a&gt;] deactivate_super+0x4a/0x70
  [&lt;ffffffff811ac4ad&gt;] mntput_no_expire+0xed/0x130
  [&lt;ffffffff811acf2e&gt;] sys_umount+0x7e/0x3a0
  [&lt;ffffffff81aeeeab&gt;] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: Free queue resources at blk_release_queue()</title>
<updated>2011-09-28T14:07:01+00:00</updated>
<author>
<name>Hannes Reinecke</name>
<email>hare@suse.de</email>
</author>
<published>2011-09-28T14:07:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=777eb1bf15b8532c396821774bf6451e563438f5'/>
<id>777eb1bf15b8532c396821774bf6451e563438f5</id>
<content type='text'>
A kernel crash is observed when a mounted ext3/ext4 filesystem is
physically removed. The problem is that blk_cleanup_queue() frees up
some resources eg by calling elevator_exit(), which are not checked for
in normal operation. So we should rather move these calls to the
destructor function blk_release_queue() as at that point all remaining
references are gone. However, in doing so we have to ensure that any
externally supplied queue_lock is disconnected as the driver might free
up the lock after the call of blk_cleanup_queue(),

Signed-off-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A kernel crash is observed when a mounted ext3/ext4 filesystem is
physically removed. The problem is that blk_cleanup_queue() frees up
some resources eg by calling elevator_exit(), which are not checked for
in normal operation. So we should rather move these calls to the
destructor function blk_release_queue() as at that point all remaining
references are gone. However, in doing so we have to ensure that any
externally supplied queue_lock is disconnected as the driver might free
up the lock after the call of blk_cleanup_queue(),

Signed-off-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-cgroup: be able to remove the record of unplugged device</title>
<updated>2011-09-21T08:22:10+00:00</updated>
<author>
<name>Wanlong Gao</name>
<email>gaowanlong@cn.fujitsu.com</email>
</author>
<published>2011-09-21T08:22:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d11bb4462c4cc6ddd45c6927c617ad79fa6fb8fc'/>
<id>d11bb4462c4cc6ddd45c6927c617ad79fa6fb8fc</id>
<content type='text'>
The bug is we're not able to remove the device from blkio cgroup's
per-device control files if it gets unplugged.

To reproduce the bug:

  # mount -t cgroup -o blkio xxx /cgroup
  # cd /cgroup
  # echo "8:0 1000" &gt; blkio.throttle.read_bps_device
  # unplug the device
  # cat blkio.throttle.read_bps_device
  8:0	1000
  # echo "8:0 0" &gt; blkio.throttle.read_bps_device
  -bash: echo: write error: No such device

After patching, the device removal will succeed.

Thanks for the comments of Paul, Zefan, and Vivek.

Signed-off-by: Wanlong Gao &lt;gaowanlong@cn.fujitsu.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Paul Menage &lt;paul@paulmenage.org&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The bug is we're not able to remove the device from blkio cgroup's
per-device control files if it gets unplugged.

To reproduce the bug:

  # mount -t cgroup -o blkio xxx /cgroup
  # cd /cgroup
  # echo "8:0 1000" &gt; blkio.throttle.read_bps_device
  # unplug the device
  # cat blkio.throttle.read_bps_device
  8:0	1000
  # echo "8:0 0" &gt; blkio.throttle.read_bps_device
  -bash: echo: write error: No such device

After patching, the device removal will succeed.

Thanks for the comments of Paul, Zefan, and Vivek.

Signed-off-by: Wanlong Gao &lt;gaowanlong@cn.fujitsu.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Paul Menage &lt;paul@paulmenage.org&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
</feed>
