linux-toradex.git/fs/block_dev.c, branch v3.3.5

block: Fix NULL pointer dereference in sd_revalidate_disk

2012-03-02T09:38:33+00:00

Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.

However it ends up calling driver's revalidate_disk without open
and could cause oops.

In the case of SCSI:

  process A                  process B
  ----------------------------------------------
  sys_open
    __blkdev_get
      sd_open
        returns -ENOMEDIUM
                             scsi_remove_device
                               
      rescan_partitions
        sd_revalidate_disk
          
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052

This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.

Reported-by: Huajun Li 
Signed-off-by: Jun'ichi Nomura 
Acked-by: Tejun Heo 
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

vfs: cache request_queue in struct block_device

2012-01-13T04:13:12+00:00

This makes it possible to get from the inode to the request_queue with one
less cache miss.  Used in followon optimization.

The livetime of the pointer is the same as the gendisk.

This assumes that the queue will always stay the same in the gendisk while
it's visible to block_devices.  I think that's safe correct?

Signed-off-by: Andi Kleen 
Acked-by: Jeff Moyer 
Cc: Jens Axboe 
Cc: Christoph Hellwig 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

block_dev: Suppress bdev_cache_init() kmemleak warninig

2012-01-10T18:08:55+00:00

Kmemleak reports the following warning in bdev_cache_init()
[    0.003738] kmemleak: Object 0xffff880153035200 (size 256):
[    0.003823] kmemleak:   comm "swapper/0", pid 0, jiffies 4294667299
[    0.003909] kmemleak:   min_count = 1
[    0.003988] kmemleak:   count = 0
[    0.004066] kmemleak:   flags = 0x1
[    0.004144] kmemleak:   checksum = 0
[    0.004224] kmemleak:   backtrace:
[    0.004303]      [] kmemleak_alloc+0x21/0x3e
[    0.004446]      [] kmem_cache_alloc+0xca/0x1dc
[    0.004592]      [] alloc_vfsmnt+0x1f/0x198
[    0.004736]      [] vfs_kern_mount+0x36/0xd2
[    0.004879]      [] kern_mount_data+0x18/0x32
[    0.005025]      [] bdev_cache_init+0x51/0x81
[    0.005169]      [] vfs_caches_init+0x101/0x10d
[    0.005313]      [] start_kernel+0x344/0x383
[    0.005456]      [] x86_64_start_reservations+0xae/0xb2
[    0.005602]      [] x86_64_start_kernel+0x102/0x111
[    0.005747]      [] 0xffffffffffffffff
[    0.008653] kmemleak: Trying to color unknown object at 0xffff880153035220 as Grey
[    0.008754] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc0-dbg-04200-g8180888-dirty #888
[    0.008856] Call Trace:
[    0.008934]  [] ? find_and_get_object+0x44/0x118
[    0.009023]  [] paint_ptr+0x57/0x8f
[    0.009109]  [] kmemleak_not_leak+0x23/0x42
[    0.009195]  [] bdev_cache_init+0x72/0x81
[    0.009282]  [] vfs_caches_init+0x101/0x10d
[    0.009368]  [] start_kernel+0x344/0x383
[    0.009466]  [] x86_64_start_reservations+0xae/0xb2
[    0.009555]  [] ? early_idt_handlers+0x140/0x140
[    0.009643]  [] x86_64_start_kernel+0x102/0x111

due to attempt to mark pointer to `struct vfsmount' as a gray object, which
is embedded into `struct mount' returned from alloc_vfsmnt().

Make `bd_mnt' static, avoiding need to tell kmemleak to mark it gray, as
suggested by Al Viro.

Signed-off-by: Sergey Senozhatsky 
Signed-off-by: Al Viro

fs: move code out of buffer.c

2012-01-04T03:54:07+00:00

Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.

Signed-off-by: Nick Piggin 
Cc: Al Viro 
Cc: Christoph Hellwig 
Signed-off-by: Andrew Morton 
Signed-off-by: Al Viro

vfs: fix the stupidity with i_dentry in inode destructors

2012-01-04T03:52:40+00:00

Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else.  Not to mention the removal of
boilerplate code from ->destroy_inode() instances...

Signed-off-by: Al Viro

trim fs/internal.h

2012-01-04T03:52:35+00:00

some stuff in there can actually become static; some belongs to pnode.h
as it's a private interface between namespace.c and pnode.c...

Signed-off-by: Al Viro

Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-block

2011-11-05T00:22:14+00:00

* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
  virtio-blk: use ida to allocate disk index
  hpsa: add small delay when using PCI Power Management to reset for kump
  cciss: add small delay when using PCI Power Management to reset for kump
  xen/blkback: Fix two races in the handling of barrier requests.
  xen/blkback: Check for proper operation.
  xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
  xen/blkback: Report VBD_WSECT (wr_sect) properly.
  xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
  xen-blkfront: plug device number leak in xlblk_init() error path
  xen-blkfront: If no barrier or flush is supported, use invalid operation.
  xen-blkback: use kzalloc() in favor of kmalloc()+memset()
  xen-blkback: fixed indentation and comments
  xen-blkfront: fix a deadlock while handling discard response
  xen-blkfront: Handle discard requests.
  xen-blkback: Implement discard requests ('feature-discard')
  xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
  drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
  drivers/block/loop.c: emit uevent on auto release
  drivers/block/cpqarray.c: use pci_dev->revision
  loop: always allow userspace partitions and optionally support automatic scanning
  ...

Fic up trivial header file includsion conflict in drivers/block/loop.c

block: make gendisk hold a reference to its queue

2011-10-19T12:31:07+00:00

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[]  [] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [] lock_acquire+0x95/0x140
  [] _raw_spin_lock+0x3b/0x50
  [] bdi_lock_two+0x5c/0x70
  [] bdev_inode_switch_bdi+0x4c/0xf0
  [] __blkdev_put+0x11b/0x1d0
  [] __blkdev_put+0x160/0x1d0
  [] blkdev_put+0x5f/0x190
  [] kill_block_super+0x4d/0x80
  [] deactivate_locked_super+0x45/0x70
  [] deactivate_super+0x4a/0x70
  [] mntput_no_expire+0xed/0x130
  [] sys_umount+0x7e/0x3a0
  [] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo 
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Avoid dereferencing a 'request_queue' after last close.

2011-09-10T07:20:21+00:00

On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed.  The free
is done in a separate thread so it might happen a short time later.

__blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
called.

Since commit f758eeabeb96f878c860e8f110f94ec8820822a9
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock.  This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.

So move the called to bdev_inode_switch_bdi before the call to
->release.

Cc: Christoph Hellwig 
Cc: Hugh Dickins 
Cc: Andrew Morton 
Cc: Wu Fengguang 
Acked-by: Wu Fengguang 
Cc: stable@kernel.org
Signed-off-by: NeilBrown

block: add GENHD_FL_NO_PART_SCAN

2011-08-23T18:01:04+00:00

There are cases where suppressing partition scan is useful - e.g. for
lo devices and pseudo SATA devices which advertise to be a disk but
get upset on partition scan (some port multiplier control devices show
such behavior).

This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan
regardless of the number of possible partitions.  disk_partitionable()
is renamed to disk_part_scan_enabled() as suppressing partition scan
doesn't imply the device can't be partitioned using
BLKPG_ADD/DEL_PARTITION calls from userland.  show_partition() now
directly tests disk_max_parts() to maintain backward-compatibility.

-v2: Updated to make it clear that only partition scan is suppressed
     not partitioning itself as suggested by Kay Sievers.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Signed-off-by: Jens Axboe