<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/block_dev.c, branch v3.3.5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>block: Fix NULL pointer dereference in sd_revalidate_disk</title>
<updated>2012-03-02T09:38:33+00:00</updated>
<author>
<name>Jun'ichi Nomura</name>
<email>j-nomura@ce.jp.nec.com</email>
</author>
<published>2012-03-02T09:38:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fe316bf2d5847bc5dd975668671a7b1067603bc7'/>
<id>fe316bf2d5847bc5dd975668671a7b1067603bc7</id>
<content type='text'>
Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.

However it ends up calling driver's revalidate_disk without open
and could cause oops.

In the case of SCSI:

  process A                  process B
  ----------------------------------------------
  sys_open
    __blkdev_get
      sd_open
        returns -ENOMEDIUM
                             scsi_remove_device
                               &lt;scsi_device torn down&gt;
      rescan_partitions
        sd_revalidate_disk
          &lt;oops&gt;
Oopses are reported here:
http://marc.info/?l=linux-scsi&amp;m=132388619710052

This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.

Reported-by: Huajun Li &lt;huajun.li.lee@gmail.com&gt;
Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.

However it ends up calling driver's revalidate_disk without open
and could cause oops.

In the case of SCSI:

  process A                  process B
  ----------------------------------------------
  sys_open
    __blkdev_get
      sd_open
        returns -ENOMEDIUM
                             scsi_remove_device
                               &lt;scsi_device torn down&gt;
      rescan_partitions
        sd_revalidate_disk
          &lt;oops&gt;
Oopses are reported here:
http://marc.info/?l=linux-scsi&amp;m=132388619710052

This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.

Reported-by: Huajun Li &lt;huajun.li.lee@gmail.com&gt;
Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: cache request_queue in struct block_device</title>
<updated>2012-01-13T04:13:12+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2012-01-13T01:20:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=87192a2a49c475cf322cb143e0fa63b0102d8567'/>
<id>87192a2a49c475cf322cb143e0fa63b0102d8567</id>
<content type='text'>
This makes it possible to get from the inode to the request_queue with one
less cache miss.  Used in followon optimization.

The livetime of the pointer is the same as the gendisk.

This assumes that the queue will always stay the same in the gendisk while
it's visible to block_devices.  I think that's safe correct?

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This makes it possible to get from the inode to the request_queue with one
less cache miss.  Used in followon optimization.

The livetime of the pointer is the same as the gendisk.

This assumes that the queue will always stay the same in the gendisk while
it's visible to block_devices.  I think that's safe correct?

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block_dev: Suppress bdev_cache_init() kmemleak warninig</title>
<updated>2012-01-10T18:08:55+00:00</updated>
<author>
<name>Sergey Senozhatsky</name>
<email>sergey.senozhatsky@gmail.com</email>
</author>
<published>2012-01-09T23:43:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ace8577aeb438025ecf642f5eda3aa551d251951'/>
<id>ace8577aeb438025ecf642f5eda3aa551d251951</id>
<content type='text'>
Kmemleak reports the following warning in bdev_cache_init()
[    0.003738] kmemleak: Object 0xffff880153035200 (size 256):
[    0.003823] kmemleak:   comm "swapper/0", pid 0, jiffies 4294667299
[    0.003909] kmemleak:   min_count = 1
[    0.003988] kmemleak:   count = 0
[    0.004066] kmemleak:   flags = 0x1
[    0.004144] kmemleak:   checksum = 0
[    0.004224] kmemleak:   backtrace:
[    0.004303]      [&lt;ffffffff814755ac&gt;] kmemleak_alloc+0x21/0x3e
[    0.004446]      [&lt;ffffffff811100ba&gt;] kmem_cache_alloc+0xca/0x1dc
[    0.004592]      [&lt;ffffffff811371b1&gt;] alloc_vfsmnt+0x1f/0x198
[    0.004736]      [&lt;ffffffff811375c5&gt;] vfs_kern_mount+0x36/0xd2
[    0.004879]      [&lt;ffffffff8113929a&gt;] kern_mount_data+0x18/0x32
[    0.005025]      [&lt;ffffffff81ab9075&gt;] bdev_cache_init+0x51/0x81
[    0.005169]      [&lt;ffffffff81ab8abf&gt;] vfs_caches_init+0x101/0x10d
[    0.005313]      [&lt;ffffffff81a9bae3&gt;] start_kernel+0x344/0x383
[    0.005456]      [&lt;ffffffff81a9b2a7&gt;] x86_64_start_reservations+0xae/0xb2
[    0.005602]      [&lt;ffffffff81a9b3ad&gt;] x86_64_start_kernel+0x102/0x111
[    0.005747]      [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff
[    0.008653] kmemleak: Trying to color unknown object at 0xffff880153035220 as Grey
[    0.008754] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc0-dbg-04200-g8180888-dirty #888
[    0.008856] Call Trace:
[    0.008934]  [&lt;ffffffff81118704&gt;] ? find_and_get_object+0x44/0x118
[    0.009023]  [&lt;ffffffff81118fe6&gt;] paint_ptr+0x57/0x8f
[    0.009109]  [&lt;ffffffff81475935&gt;] kmemleak_not_leak+0x23/0x42
[    0.009195]  [&lt;ffffffff81ab9096&gt;] bdev_cache_init+0x72/0x81
[    0.009282]  [&lt;ffffffff81ab8abf&gt;] vfs_caches_init+0x101/0x10d
[    0.009368]  [&lt;ffffffff81a9bae3&gt;] start_kernel+0x344/0x383
[    0.009466]  [&lt;ffffffff81a9b2a7&gt;] x86_64_start_reservations+0xae/0xb2
[    0.009555]  [&lt;ffffffff81a9b140&gt;] ? early_idt_handlers+0x140/0x140
[    0.009643]  [&lt;ffffffff81a9b3ad&gt;] x86_64_start_kernel+0x102/0x111

due to attempt to mark pointer to `struct vfsmount' as a gray object, which
is embedded into `struct mount' returned from alloc_vfsmnt().

Make `bd_mnt' static, avoiding need to tell kmemleak to mark it gray, as
suggested by Al Viro.

Signed-off-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Kmemleak reports the following warning in bdev_cache_init()
[    0.003738] kmemleak: Object 0xffff880153035200 (size 256):
[    0.003823] kmemleak:   comm "swapper/0", pid 0, jiffies 4294667299
[    0.003909] kmemleak:   min_count = 1
[    0.003988] kmemleak:   count = 0
[    0.004066] kmemleak:   flags = 0x1
[    0.004144] kmemleak:   checksum = 0
[    0.004224] kmemleak:   backtrace:
[    0.004303]      [&lt;ffffffff814755ac&gt;] kmemleak_alloc+0x21/0x3e
[    0.004446]      [&lt;ffffffff811100ba&gt;] kmem_cache_alloc+0xca/0x1dc
[    0.004592]      [&lt;ffffffff811371b1&gt;] alloc_vfsmnt+0x1f/0x198
[    0.004736]      [&lt;ffffffff811375c5&gt;] vfs_kern_mount+0x36/0xd2
[    0.004879]      [&lt;ffffffff8113929a&gt;] kern_mount_data+0x18/0x32
[    0.005025]      [&lt;ffffffff81ab9075&gt;] bdev_cache_init+0x51/0x81
[    0.005169]      [&lt;ffffffff81ab8abf&gt;] vfs_caches_init+0x101/0x10d
[    0.005313]      [&lt;ffffffff81a9bae3&gt;] start_kernel+0x344/0x383
[    0.005456]      [&lt;ffffffff81a9b2a7&gt;] x86_64_start_reservations+0xae/0xb2
[    0.005602]      [&lt;ffffffff81a9b3ad&gt;] x86_64_start_kernel+0x102/0x111
[    0.005747]      [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff
[    0.008653] kmemleak: Trying to color unknown object at 0xffff880153035220 as Grey
[    0.008754] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc0-dbg-04200-g8180888-dirty #888
[    0.008856] Call Trace:
[    0.008934]  [&lt;ffffffff81118704&gt;] ? find_and_get_object+0x44/0x118
[    0.009023]  [&lt;ffffffff81118fe6&gt;] paint_ptr+0x57/0x8f
[    0.009109]  [&lt;ffffffff81475935&gt;] kmemleak_not_leak+0x23/0x42
[    0.009195]  [&lt;ffffffff81ab9096&gt;] bdev_cache_init+0x72/0x81
[    0.009282]  [&lt;ffffffff81ab8abf&gt;] vfs_caches_init+0x101/0x10d
[    0.009368]  [&lt;ffffffff81a9bae3&gt;] start_kernel+0x344/0x383
[    0.009466]  [&lt;ffffffff81a9b2a7&gt;] x86_64_start_reservations+0xae/0xb2
[    0.009555]  [&lt;ffffffff81a9b140&gt;] ? early_idt_handlers+0x140/0x140
[    0.009643]  [&lt;ffffffff81a9b3ad&gt;] x86_64_start_kernel+0x102/0x111

due to attempt to mark pointer to `struct vfsmount' as a gray object, which
is embedded into `struct mount' returned from alloc_vfsmnt().

Make `bd_mnt' static, avoiding need to tell kmemleak to mark it gray, as
suggested by Al Viro.

Signed-off-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: move code out of buffer.c</title>
<updated>2012-01-04T03:54:07+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2011-09-16T06:31:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ff01bb4832651c6d25ac509a06a10fcbd75c461c'/>
<id>ff01bb4832651c6d25ac509a06a10fcbd75c461c</id>
<content type='text'>
Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: fix the stupidity with i_dentry in inode destructors</title>
<updated>2012-01-04T03:52:40+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2011-12-12T20:51:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6b520e0565422966cdf1c3759bd73df77b0f248c'/>
<id>6b520e0565422966cdf1c3759bd73df77b0f248c</id>
<content type='text'>
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else.  Not to mention the removal of
boilerplate code from -&gt;destroy_inode() instances...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else.  Not to mention the removal of
boilerplate code from -&gt;destroy_inode() instances...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>trim fs/internal.h</title>
<updated>2012-01-04T03:52:35+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2011-11-22T02:15:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f47ec3f28354795f000c14bf18ed967ec81a3ec3'/>
<id>f47ec3f28354795f000c14bf18ed967ec81a3ec3</id>
<content type='text'>
some stuff in there can actually become static; some belongs to pnode.h
as it's a private interface between namespace.c and pnode.c...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
some stuff in there can actually become static; some belongs to pnode.h
as it's a private interface between namespace.c and pnode.c...

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-block</title>
<updated>2011-11-05T00:22:14+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-11-05T00:22:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3d0a8d10cfb4cc3d1877c29a866ee7d8a46aa2fa'/>
<id>3d0a8d10cfb4cc3d1877c29a866ee7d8a46aa2fa</id>
<content type='text'>
* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
  virtio-blk: use ida to allocate disk index
  hpsa: add small delay when using PCI Power Management to reset for kump
  cciss: add small delay when using PCI Power Management to reset for kump
  xen/blkback: Fix two races in the handling of barrier requests.
  xen/blkback: Check for proper operation.
  xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
  xen/blkback: Report VBD_WSECT (wr_sect) properly.
  xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
  xen-blkfront: plug device number leak in xlblk_init() error path
  xen-blkfront: If no barrier or flush is supported, use invalid operation.
  xen-blkback: use kzalloc() in favor of kmalloc()+memset()
  xen-blkback: fixed indentation and comments
  xen-blkfront: fix a deadlock while handling discard response
  xen-blkfront: Handle discard requests.
  xen-blkback: Implement discard requests ('feature-discard')
  xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
  drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
  drivers/block/loop.c: emit uevent on auto release
  drivers/block/cpqarray.c: use pci_dev-&gt;revision
  loop: always allow userspace partitions and optionally support automatic scanning
  ...

Fic up trivial header file includsion conflict in drivers/block/loop.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
  virtio-blk: use ida to allocate disk index
  hpsa: add small delay when using PCI Power Management to reset for kump
  cciss: add small delay when using PCI Power Management to reset for kump
  xen/blkback: Fix two races in the handling of barrier requests.
  xen/blkback: Check for proper operation.
  xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
  xen/blkback: Report VBD_WSECT (wr_sect) properly.
  xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
  xen-blkfront: plug device number leak in xlblk_init() error path
  xen-blkfront: If no barrier or flush is supported, use invalid operation.
  xen-blkback: use kzalloc() in favor of kmalloc()+memset()
  xen-blkback: fixed indentation and comments
  xen-blkfront: fix a deadlock while handling discard response
  xen-blkfront: Handle discard requests.
  xen-blkback: Implement discard requests ('feature-discard')
  xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
  drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
  drivers/block/loop.c: emit uevent on auto release
  drivers/block/cpqarray.c: use pci_dev-&gt;revision
  loop: always allow userspace partitions and optionally support automatic scanning
  ...

Fic up trivial header file includsion conflict in drivers/block/loop.c
</pre>
</div>
</content>
</entry>
<entry>
<title>block: make gendisk hold a reference to its queue</title>
<updated>2011-10-19T12:31:07+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-10-19T12:31:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=523e1d399ce0e23bec562abe2b2f8d297af81161'/>
<id>523e1d399ce0e23bec562abe2b2f8d297af81161</id>
<content type='text'>
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 &gt; /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[&lt;ffffffff810d0879&gt;]  [&lt;ffffffff810d0879&gt;] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [&lt;ffffffff810d2845&gt;] lock_acquire+0x95/0x140
  [&lt;ffffffff81aed87b&gt;] _raw_spin_lock+0x3b/0x50
  [&lt;ffffffff811573bc&gt;] bdi_lock_two+0x5c/0x70
  [&lt;ffffffff811c2f6c&gt;] bdev_inode_switch_bdi+0x4c/0xf0
  [&lt;ffffffff811c3fcb&gt;] __blkdev_put+0x11b/0x1d0
  [&lt;ffffffff811c4010&gt;] __blkdev_put+0x160/0x1d0
  [&lt;ffffffff811c40df&gt;] blkdev_put+0x5f/0x190
  [&lt;ffffffff8118f18d&gt;] kill_block_super+0x4d/0x80
  [&lt;ffffffff8118f4a5&gt;] deactivate_locked_super+0x45/0x70
  [&lt;ffffffff8119003a&gt;] deactivate_super+0x4a/0x70
  [&lt;ffffffff811ac4ad&gt;] mntput_no_expire+0xed/0x130
  [&lt;ffffffff811acf2e&gt;] sys_umount+0x7e/0x3a0
  [&lt;ffffffff81aeeeab&gt;] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 &gt; /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[&lt;ffffffff810d0879&gt;]  [&lt;ffffffff810d0879&gt;] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [&lt;ffffffff810d2845&gt;] lock_acquire+0x95/0x140
  [&lt;ffffffff81aed87b&gt;] _raw_spin_lock+0x3b/0x50
  [&lt;ffffffff811573bc&gt;] bdi_lock_two+0x5c/0x70
  [&lt;ffffffff811c2f6c&gt;] bdev_inode_switch_bdi+0x4c/0xf0
  [&lt;ffffffff811c3fcb&gt;] __blkdev_put+0x11b/0x1d0
  [&lt;ffffffff811c4010&gt;] __blkdev_put+0x160/0x1d0
  [&lt;ffffffff811c40df&gt;] blkdev_put+0x5f/0x190
  [&lt;ffffffff8118f18d&gt;] kill_block_super+0x4d/0x80
  [&lt;ffffffff8118f4a5&gt;] deactivate_locked_super+0x45/0x70
  [&lt;ffffffff8119003a&gt;] deactivate_super+0x4a/0x70
  [&lt;ffffffff811ac4ad&gt;] mntput_no_expire+0xed/0x130
  [&lt;ffffffff811acf2e&gt;] sys_umount+0x7e/0x3a0
  [&lt;ffffffff81aeeeab&gt;] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Avoid dereferencing a 'request_queue' after last close.</title>
<updated>2011-09-10T07:20:21+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2011-09-10T07:20:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=94007751bb02797ba87bac7aacee2731ac2039a3'/>
<id>94007751bb02797ba87bac7aacee2731ac2039a3</id>
<content type='text'>
On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed.  The free
is done in a separate thread so it might happen a short time later.

__blkdev_put calls bdev_inode_switch_bdi *after* -&gt;release has been
called.

Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock.  This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.

So move the called to bdev_inode_switch_bdi before the call to
-&gt;release.

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Acked-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: stable@kernel.org
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed.  The free
is done in a separate thread so it might happen a short time later.

__blkdev_put calls bdev_inode_switch_bdi *after* -&gt;release has been
called.

Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock.  This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.

So move the called to bdev_inode_switch_bdi before the call to
-&gt;release.

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Acked-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: stable@kernel.org
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: add GENHD_FL_NO_PART_SCAN</title>
<updated>2011-08-23T18:01:04+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-08-23T18:01:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d27769ec3df1a8de9ca450d2dcd72d1ab259ba32'/>
<id>d27769ec3df1a8de9ca450d2dcd72d1ab259ba32</id>
<content type='text'>
There are cases where suppressing partition scan is useful - e.g. for
lo devices and pseudo SATA devices which advertise to be a disk but
get upset on partition scan (some port multiplier control devices show
such behavior).

This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan
regardless of the number of possible partitions.  disk_partitionable()
is renamed to disk_part_scan_enabled() as suppressing partition scan
doesn't imply the device can't be partitioned using
BLKPG_ADD/DEL_PARTITION calls from userland.  show_partition() now
directly tests disk_max_parts() to maintain backward-compatibility.

-v2: Updated to make it clear that only partition scan is suppressed
     not partitioning itself as suggested by Kay Sievers.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are cases where suppressing partition scan is useful - e.g. for
lo devices and pseudo SATA devices which advertise to be a disk but
get upset on partition scan (some port multiplier control devices show
such behavior).

This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan
regardless of the number of possible partitions.  disk_partitionable()
is renamed to disk_part_scan_enabled() as suppressing partition scan
doesn't imply the device can't be partitioned using
BLKPG_ADD/DEL_PARTITION calls from userland.  show_partition() now
directly tests disk_max_parts() to maintain backward-compatibility.

-v2: Updated to make it clear that only partition scan is suppressed
     not partitioning itself as suggested by Kay Sievers.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
