<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/block_dev.c, branch v4.4.93</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>fs/block_dev: always invalidate cleancache in invalidate_bdev()</title>
<updated>2017-05-20T12:27:01+00:00</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2017-05-03T21:56:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7aad381af8c37b18a68c86d87650025552914dca'/>
<id>7aad381af8c37b18a68c86d87650025552914dca</id>
<content type='text'>
commit a5f6a6a9c72eac38a7fadd1a038532bc8516337c upstream.

invalidate_bdev() calls cleancache_invalidate_inode() iff -&gt;nrpages != 0
which doen't make any sense.

Make sure that invalidate_bdev() always calls cleancache_invalidate_inode()
regardless of mapping-&gt;nrpages value.

Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Alexey Kuznetsov &lt;kuznet@virtuozzo.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Nikolay Borisov &lt;n.borisov.lkml@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a5f6a6a9c72eac38a7fadd1a038532bc8516337c upstream.

invalidate_bdev() calls cleancache_invalidate_inode() iff -&gt;nrpages != 0
which doen't make any sense.

Make sure that invalidate_bdev() always calls cleancache_invalidate_inode()
regardless of mapping-&gt;nrpages value.

Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache")
Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Alexey Kuznetsov &lt;kuznet@virtuozzo.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Nikolay Borisov &lt;n.borisov.lkml@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: get rid of blk_integrity_revalidate()</title>
<updated>2017-05-14T11:32:59+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-04-18T16:43:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a4c6a08906f8c8df19ee2b3514fa76be64ddc83'/>
<id>4a4c6a08906f8c8df19ee2b3514fa76be64ddc83</id>
<content type='text'>
commit 19b7ccf8651df09d274671b53039c672a52ad84d upstream.

Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:

    if (bi-&gt;profile)
            disk-&gt;queue-&gt;backing_dev_info-&gt;capabilities |=
                    BDI_CAP_STABLE_WRITES;
    else
            disk-&gt;queue-&gt;backing_dev_info-&gt;capabilities &amp;=
                    ~BDI_CAP_STABLE_WRITES;

It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time.  This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Tested-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
[idryomov@gmail.com: backport to &lt; 4.11: bdi is embedded in queue]
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 19b7ccf8651df09d274671b53039c672a52ad84d upstream.

Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:

    if (bi-&gt;profile)
            disk-&gt;queue-&gt;backing_dev_info-&gt;capabilities |=
                    BDI_CAP_STABLE_WRITES;
    else
            disk-&gt;queue-&gt;backing_dev_info-&gt;capabilities &amp;=
                    ~BDI_CAP_STABLE_WRITES;

It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time.  This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Tested-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
[idryomov@gmail.com: backport to &lt; 4.11: bdi is embedded in queue]
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: protect iterate_bdevs() against concurrent close</title>
<updated>2017-01-09T07:07:47+00:00</updated>
<author>
<name>Rabin Vincent</name>
<email>rabinv@axis.com</email>
</author>
<published>2016-12-01T08:18:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f4f02a856a92e1a2990a8d0f55d08a78a29c3cc0'/>
<id>f4f02a856a92e1a2990a8d0f55d08a78a29c3cc0</id>
<content type='text'>
commit af309226db916e2c6e08d3eba3fa5c34225200c4 upstream.

If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev-&gt;b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
 IP: [&lt;ffffffff81314790&gt;] blk_get_backing_dev_info+0x10/0x20
 PGD 9e62067 PUD 9ee8067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
 RIP: 0010:[&lt;ffffffff81314790&gt;]  [&lt;ffffffff81314790&gt;] blk_get_backing_dev_info+0x10/0x20
 RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
 FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
 Stack:
  ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
  0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
  ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
 Call Trace:
  [&lt;ffffffff8112e7f5&gt;] __filemap_fdatawrite_range+0x85/0x90
  [&lt;ffffffff8112e81f&gt;] filemap_fdatawrite+0x1f/0x30
  [&lt;ffffffff811b25d6&gt;] fdatawrite_one_bdev+0x16/0x20
  [&lt;ffffffff811bc402&gt;] iterate_bdevs+0xf2/0x130
  [&lt;ffffffff811b2763&gt;] sys_sync+0x63/0x90
  [&lt;ffffffff815d4272&gt;] entry_SYSCALL_64_fastpath+0x12/0x76
 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 &lt;48&gt; 8b 80 08 05 00 00 5d
 RIP  [&lt;ffffffff81314790&gt;] blk_get_backing_dev_info+0x10/0x20
  RSP &lt;ffff880009f5fe68&gt;
 CR2: 0000000000000508
 ---[ end trace 2487336ceb3de62d ]---

The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():

 while :; do head -c1 /dev/nullb0; done &gt; /dev/null &amp; while :; do sync; done

Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.

Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices")
Reported-and-tested-by: Wei Fang &lt;fangwei1@huawei.com&gt;
Signed-off-by: Rabin Vincent &lt;rabinv@axis.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit af309226db916e2c6e08d3eba3fa5c34225200c4 upstream.

If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev-&gt;b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
 IP: [&lt;ffffffff81314790&gt;] blk_get_backing_dev_info+0x10/0x20
 PGD 9e62067 PUD 9ee8067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
 RIP: 0010:[&lt;ffffffff81314790&gt;]  [&lt;ffffffff81314790&gt;] blk_get_backing_dev_info+0x10/0x20
 RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
 FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
 Stack:
  ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
  0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
  ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
 Call Trace:
  [&lt;ffffffff8112e7f5&gt;] __filemap_fdatawrite_range+0x85/0x90
  [&lt;ffffffff8112e81f&gt;] filemap_fdatawrite+0x1f/0x30
  [&lt;ffffffff811b25d6&gt;] fdatawrite_one_bdev+0x16/0x20
  [&lt;ffffffff811bc402&gt;] iterate_bdevs+0xf2/0x130
  [&lt;ffffffff811b2763&gt;] sys_sync+0x63/0x90
  [&lt;ffffffff815d4272&gt;] entry_SYSCALL_64_fastpath+0x12/0x76
 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 &lt;48&gt; 8b 80 08 05 00 00 5d
 RIP  [&lt;ffffffff81314790&gt;] blk_get_backing_dev_info+0x10/0x20
  RSP &lt;ffff880009f5fe68&gt;
 CR2: 0000000000000508
 ---[ end trace 2487336ceb3de62d ]---

The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():

 while :; do head -c1 /dev/nullb0; done &gt; /dev/null &amp; while :; do sync; done

Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.

Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices")
Reported-and-tested-by: Wei Fang &lt;fangwei1@huawei.com&gt;
Signed-off-by: Rabin Vincent &lt;rabinv@axis.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block_dev: don't test bdev-&gt;bd_contains when it is not stable</title>
<updated>2017-01-06T10:16:11+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2016-12-12T15:21:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d80411dea6a43adc8fd92c9c39e367f44aba1be9'/>
<id>d80411dea6a43adc8fd92c9c39e367f44aba1be9</id>
<content type='text'>
commit bcc7f5b4bee8e327689a4d994022765855c807ff upstream.

bdev-&gt;bd_contains is not stable before calling __blkdev_get().
When __blkdev_get() is called on a parition with -&gt;bd_openers == 0
it sets
  bdev-&gt;bd_contains = bdev;
which is not correct for a partition.
After a call to __blkdev_get() succeeds, -&gt;bd_openers will be &gt; 0
and then -&gt;bd_contains is stable.

When FMODE_EXCL is used, blkdev_get() calls
   bd_start_claiming() -&gt;  bd_prepare_to_claim() -&gt; bd_may_claim()

This call happens before __blkdev_get() is called, so -&gt;bd_contains
is not stable.  So bd_may_claim() cannot safely use -&gt;bd_contains.
It currently tries to use it, and this can lead to a BUG_ON().

This happens when a whole device is already open with a bd_holder (in
use by dm in my particular example) and two threads race to open a
partition of that device for the first time, one opening with O_EXCL and
one without.

The thread that doesn't use O_EXCL gets through blkdev_get() to
__blkdev_get(), gains the -&gt;bd_mutex, and sets bdev-&gt;bd_contains = bdev;

Immediately thereafter the other thread, using FMODE_EXCL, calls
bd_start_claiming() from blkdev_get().  This should fail because the
whole device has a holder, but because bdev-&gt;bd_contains == bdev
bd_may_claim() incorrectly reports success.
This thread continues and blocks on bd_mutex.

The first thread then sets bdev-&gt;bd_contains correctly and drops the mutex.
The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
again in:
			BUG_ON(!bd_may_claim(bdev, whole, holder));
The BUG_ON fires.

Fix this by removing the dependency on -&gt;bd_contains in
bd_may_claim().  As bd_may_claim() has direct access to the whole
device, it can simply test if the target bdev is the whole device.

Fixes: 6b4517a7913a ("block: implement bd_claiming and claiming block")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bcc7f5b4bee8e327689a4d994022765855c807ff upstream.

bdev-&gt;bd_contains is not stable before calling __blkdev_get().
When __blkdev_get() is called on a parition with -&gt;bd_openers == 0
it sets
  bdev-&gt;bd_contains = bdev;
which is not correct for a partition.
After a call to __blkdev_get() succeeds, -&gt;bd_openers will be &gt; 0
and then -&gt;bd_contains is stable.

When FMODE_EXCL is used, blkdev_get() calls
   bd_start_claiming() -&gt;  bd_prepare_to_claim() -&gt; bd_may_claim()

This call happens before __blkdev_get() is called, so -&gt;bd_contains
is not stable.  So bd_may_claim() cannot safely use -&gt;bd_contains.
It currently tries to use it, and this can lead to a BUG_ON().

This happens when a whole device is already open with a bd_holder (in
use by dm in my particular example) and two threads race to open a
partition of that device for the first time, one opening with O_EXCL and
one without.

The thread that doesn't use O_EXCL gets through blkdev_get() to
__blkdev_get(), gains the -&gt;bd_mutex, and sets bdev-&gt;bd_contains = bdev;

Immediately thereafter the other thread, using FMODE_EXCL, calls
bd_start_claiming() from blkdev_get().  This should fail because the
whole device has a holder, but because bdev-&gt;bd_contains == bdev
bd_may_claim() incorrectly reports success.
This thread continues and blocks on bd_mutex.

The first thread then sets bdev-&gt;bd_contains correctly and drops the mutex.
The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
again in:
			BUG_ON(!bd_may_claim(bdev, whole, holder));
The BUG_ON fires.

Fix this by removing the dependency on -&gt;bd_contains in
bd_may_claim().  As bd_may_claim() has direct access to the whole
device, it can simply test if the target bdev is the whole device.

Fixes: 6b4517a7913a ("block: implement bd_claiming and claiming block")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: detach bdev inode from its wb in __blkdev_put()</title>
<updated>2015-12-04T18:02:17+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-11-20T21:22:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=43d1c0eb7e11919f85200d2fce211173526f7304'/>
<id>43d1c0eb7e11919f85200d2fce211173526f7304</id>
<content type='text'>
Since 52ebea749aae ("writeback: make backing_dev_info host
cgroup-specific bdi_writebacks") inode, at some point in its lifetime,
gets attached to a wb (struct bdi_writeback).  Detaching happens on
evict, in inode_detach_wb() called from __destroy_inode(), and involves
updating wb.

However, detaching an internal bdev inode from its wb in
__destroy_inode() is too late.  Its bdi and by extension root wb are
embedded into struct request_queue, which has different lifetime rules
and can be freed long before the final bdput() is called (can be from
__fput() of a corresponding /dev inode, through dput() - evict() -
bd_forget().  bdevs hold onto the underlying disk/queue pair only while
opened; as soon as bdev is closed all bets are off.  In fact,
disk/queue can be gone before __blkdev_put() even returns:

1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
1500 {
...
1518         if (bdev-&gt;bd_contains == bdev) {
1519                 if (disk-&gt;fops-&gt;release)
1520                         disk-&gt;fops-&gt;release(disk, mode);

[ Driver puts its references to disk/queue ]

1521         }
1522         if (!bdev-&gt;bd_openers) {
1523                 struct module *owner = disk-&gt;fops-&gt;owner;
1524
1525                 disk_put_part(bdev-&gt;bd_part);
1526                 bdev-&gt;bd_part = NULL;
1527                 bdev-&gt;bd_disk = NULL;
1528                 if (bdev != bdev-&gt;bd_contains)
1529                         victim = bdev-&gt;bd_contains;
1530                 bdev-&gt;bd_contains = NULL;
1531
1532                 put_disk(disk);

[ We put ours, the queue is gone
  The last bdput() would result in a write to invalid memory ]

1533                 module_put(owner);
...
1539 }

Since bdev inodes are special anyway, detach them in __blkdev_put()
after clearing inode's dirty bits, turning the problematic
inode_detach_wb() in __destroy_inode() into a noop.

add_disk() grabs its disk-&gt;queue since 523e1d399ce0 ("block: make
gendisk hold a reference to its queue"), so the old -&gt;release comment
is removed in favor of the new inode_detach_wb() comment.

Cc: stable@vger.kernel.org # 4.2+, needs backporting
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since 52ebea749aae ("writeback: make backing_dev_info host
cgroup-specific bdi_writebacks") inode, at some point in its lifetime,
gets attached to a wb (struct bdi_writeback).  Detaching happens on
evict, in inode_detach_wb() called from __destroy_inode(), and involves
updating wb.

However, detaching an internal bdev inode from its wb in
__destroy_inode() is too late.  Its bdi and by extension root wb are
embedded into struct request_queue, which has different lifetime rules
and can be freed long before the final bdput() is called (can be from
__fput() of a corresponding /dev inode, through dput() - evict() -
bd_forget().  bdevs hold onto the underlying disk/queue pair only while
opened; as soon as bdev is closed all bets are off.  In fact,
disk/queue can be gone before __blkdev_put() even returns:

1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
1500 {
...
1518         if (bdev-&gt;bd_contains == bdev) {
1519                 if (disk-&gt;fops-&gt;release)
1520                         disk-&gt;fops-&gt;release(disk, mode);

[ Driver puts its references to disk/queue ]

1521         }
1522         if (!bdev-&gt;bd_openers) {
1523                 struct module *owner = disk-&gt;fops-&gt;owner;
1524
1525                 disk_put_part(bdev-&gt;bd_part);
1526                 bdev-&gt;bd_part = NULL;
1527                 bdev-&gt;bd_disk = NULL;
1528                 if (bdev != bdev-&gt;bd_contains)
1529                         victim = bdev-&gt;bd_contains;
1530                 bdev-&gt;bd_contains = NULL;
1531
1532                 put_disk(disk);

[ We put ours, the queue is gone
  The last bdput() would result in a write to invalid memory ]

1533                 module_put(owner);
...
1539 }

Since bdev inodes are special anyway, detach them in __blkdev_put()
after clearing inode's dirty bits, turning the problematic
inode_detach_wb() in __destroy_inode() into a noop.

add_disk() grabs its disk-&gt;queue since 523e1d399ce0 ("block: make
gendisk hold a reference to its queue"), so the old -&gt;release comment
is removed in favor of the new inode_detach_wb() comment.

Cc: stable@vger.kernel.org # 4.2+, needs backporting
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: protect rw_page against device teardown</title>
<updated>2015-11-19T21:47:10+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2015-11-19T21:29:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2e6edc95382cc36423aff18a237173ad62d5ab52'/>
<id>2e6edc95382cc36423aff18a237173ad62d5ab52</id>
<content type='text'>
Fix use after free crashes like the following:

 general protection fault: 0000 [#1] SMP
 Call Trace:
  [&lt;ffffffffa0050216&gt;] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem]
  [&lt;ffffffffa0050ba2&gt;] pmem_rw_page+0x42/0x80 [nd_pmem]
  [&lt;ffffffff8128fd90&gt;] bdev_read_page+0x50/0x60
  [&lt;ffffffff812972f0&gt;] do_mpage_readpage+0x510/0x770
  [&lt;ffffffff8128fd20&gt;] ? I_BDEV+0x20/0x20
  [&lt;ffffffff811d86dc&gt;] ? lru_cache_add+0x1c/0x50
  [&lt;ffffffff81297657&gt;] mpage_readpages+0x107/0x170
  [&lt;ffffffff8128fd20&gt;] ? I_BDEV+0x20/0x20
  [&lt;ffffffff8128fd20&gt;] ? I_BDEV+0x20/0x20
  [&lt;ffffffff8129058d&gt;] blkdev_readpages+0x1d/0x20
  [&lt;ffffffff811d615f&gt;] __do_page_cache_readahead+0x28f/0x310
  [&lt;ffffffff811d6039&gt;] ? __do_page_cache_readahead+0x169/0x310
  [&lt;ffffffff811c5abd&gt;] ? pagecache_get_page+0x2d/0x1d0
  [&lt;ffffffff811c76f6&gt;] filemap_fault+0x396/0x530
  [&lt;ffffffff811f816e&gt;] __do_fault+0x4e/0xf0
  [&lt;ffffffff811fce7d&gt;] handle_mm_fault+0x11bd/0x1b50

Cc: &lt;stable@vger.kernel.org&gt;
Cc: Jens Axboe &lt;axboe@fb.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Reported-by: kbuild test robot &lt;lkp@intel.com&gt;
Acked-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
[willy: symmetry fixups]
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix use after free crashes like the following:

 general protection fault: 0000 [#1] SMP
 Call Trace:
  [&lt;ffffffffa0050216&gt;] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem]
  [&lt;ffffffffa0050ba2&gt;] pmem_rw_page+0x42/0x80 [nd_pmem]
  [&lt;ffffffff8128fd90&gt;] bdev_read_page+0x50/0x60
  [&lt;ffffffff812972f0&gt;] do_mpage_readpage+0x510/0x770
  [&lt;ffffffff8128fd20&gt;] ? I_BDEV+0x20/0x20
  [&lt;ffffffff811d86dc&gt;] ? lru_cache_add+0x1c/0x50
  [&lt;ffffffff81297657&gt;] mpage_readpages+0x107/0x170
  [&lt;ffffffff8128fd20&gt;] ? I_BDEV+0x20/0x20
  [&lt;ffffffff8128fd20&gt;] ? I_BDEV+0x20/0x20
  [&lt;ffffffff8129058d&gt;] blkdev_readpages+0x1d/0x20
  [&lt;ffffffff811d615f&gt;] __do_page_cache_readahead+0x28f/0x310
  [&lt;ffffffff811d6039&gt;] ? __do_page_cache_readahead+0x169/0x310
  [&lt;ffffffff811c5abd&gt;] ? pagecache_get_page+0x2d/0x1d0
  [&lt;ffffffff811c76f6&gt;] filemap_fault+0x396/0x530
  [&lt;ffffffff811f816e&gt;] __do_fault+0x4e/0xf0
  [&lt;ffffffff811fce7d&gt;] handle_mm_fault+0x11bd/0x1b50

Cc: &lt;stable@vger.kernel.org&gt;
Cc: Jens Axboe &lt;axboe@fb.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Reported-by: kbuild test robot &lt;lkp@intel.com&gt;
Acked-by: Matthew Wilcox &lt;willy@linux.intel.com&gt;
[willy: symmetry fixups]
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/block_dev.c: Remove WARN_ON() when inode writeback fails</title>
<updated>2015-11-11T16:36:57+00:00</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2015-11-09T16:23:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dbd3ca50753e70e09cad747dce23b1a7683a3342'/>
<id>dbd3ca50753e70e09cad747dce23b1a7683a3342</id>
<content type='text'>
If a block device is hot removed and later last reference to device
is put, we try to writeback the dirty inode. But device is gone and
that writeback fails.

Currently we do a WARN_ON() which does not seem to be the right thing.
Convert it to a ratelimited kernel warning.

Reported-by: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
[jmoyer@redhat.com: get rid of unnecessary name initialization, 80 cols]
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a block device is hot removed and later last reference to device
is put, we try to writeback the dirty inode. But device is gone and
that writeback fails.

Currently we do a WARN_ON() which does not seem to be the right thing.
Convert it to a ratelimited kernel warning.

Reported-by: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
[jmoyer@redhat.com: get rid of unnecessary name initialization, 80 cols]
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: Inline blk_integrity in struct gendisk</title>
<updated>2015-10-21T20:42:42+00:00</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2015-10-21T17:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=25520d55cdb6ee289abc68f553d364d22478ff54'/>
<id>25520d55cdb6ee289abc68f553d364d22478ff54</id>
<content type='text'>
Up until now the_integrity profile has been dynamically allocated and
attached to struct gendisk after the disk has been made active.

This causes problems because NVMe devices need to register the profile
prior to the partition table being read due to a mandatory metadata
buffer requirement. In addition, DM goes through hoops to deal with
preallocating, but not initializing integrity profiles.

Since the integrity profile is small (4 bytes + a pointer), Christoph
suggested moving it to struct gendisk proper. This requires several
changes:

 - Moving the blk_integrity definition to genhd.h.

 - Inlining blk_integrity in struct gendisk.

 - Removing the dynamic allocation code.

 - Adding helper functions which allow gendisk to set up and tear down
   the integrity sysfs dir when a disk is added/deleted.

 - Adding a blk_integrity_revalidate() callback for updating the stable
   pages bdi setting.

 - The calls that depend on whether a device has an integrity profile or
   not now key off of the bi-&gt;profile pointer.

 - Simplifying the integrity support routines in DM (Mike Snitzer).

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reported-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Up until now the_integrity profile has been dynamically allocated and
attached to struct gendisk after the disk has been made active.

This causes problems because NVMe devices need to register the profile
prior to the partition table being read due to a mandatory metadata
buffer requirement. In addition, DM goes through hoops to deal with
preallocating, but not initializing integrity profiles.

Since the integrity profile is small (4 bytes + a pointer), Christoph
suggested moving it to struct gendisk proper. This requires several
changes:

 - Moving the blk_integrity definition to genhd.h.

 - Inlining blk_integrity in struct gendisk.

 - Removing the dynamic allocation code.

 - Adding helper functions which allow gendisk to set up and tear down
   the integrity sysfs dir when a disk is added/deleted.

 - Adding a blk_integrity_revalidate() callback for updating the stable
   pages bdi setting.

 - The calls that depend on whether a device has an integrity profile or
   not now key off of the bi-&gt;profile pointer.

 - Simplifying the integrity support routines in DM (Mike Snitzer).

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reported-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blockdev: don't set S_DAX for misaligned partitions</title>
<updated>2015-09-16T00:08:05+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2015-08-14T20:15:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f0b2e563bc419df7c1b3d2f494574c25125f6aed'/>
<id>f0b2e563bc419df7c1b3d2f494574c25125f6aed</id>
<content type='text'>
The dax code doesn't currently support misaligned partitions,
so disable O_DIRECT via dax until such time as that support
materializes.

Cc: &lt;stable@vger.kernel.org&gt;
Suggested-by: Boaz Harrosh &lt;boaz@plexistor.com&gt;
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The dax code doesn't currently support misaligned partitions,
so disable O_DIRECT via dax until such time as that support
materializes.

Cc: &lt;stable@vger.kernel.org&gt;
Suggested-by: Boaz Harrosh &lt;boaz@plexistor.com&gt;
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2015-09-09T00:52:23+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-09T00:52:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f6f7a6369203fa3e07efb7f35cfd81efe9f25b07'/>
<id>f6f7a6369203fa3e07efb7f35cfd81efe9f25b07</id>
<content type='text'>
Merge second patch-bomb from Andrew Morton:
 "Almost all of the rest of MM.  There was an unusually large amount of
  MM material this time"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (141 commits)
  zpool: remove no-op module init/exit
  mm: zbud: constify the zbud_ops
  mm: zpool: constify the zpool_ops
  mm: swap: zswap: maybe_preload &amp; refactoring
  zram: unify error reporting
  zsmalloc: remove null check from destroy_handle_cache()
  zsmalloc: do not take class lock in zs_shrinker_count()
  zsmalloc: use class-&gt;pages_per_zspage
  zsmalloc: consider ZS_ALMOST_FULL as migrate source
  zsmalloc: partial page ordering within a fullness_list
  zsmalloc: use shrinker to trigger auto-compaction
  zsmalloc: account the number of compacted pages
  zsmalloc/zram: introduce zs_pool_stats api
  zsmalloc: cosmetic compaction code adjustments
  zsmalloc: introduce zs_can_compact() function
  zsmalloc: always keep per-class stats
  zsmalloc: drop unused variable `nr_to_migrate'
  mm/memblock.c: fix comment in __next_mem_range()
  mm/page_alloc.c: fix type information of memoryless node
  memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge second patch-bomb from Andrew Morton:
 "Almost all of the rest of MM.  There was an unusually large amount of
  MM material this time"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (141 commits)
  zpool: remove no-op module init/exit
  mm: zbud: constify the zbud_ops
  mm: zpool: constify the zpool_ops
  mm: swap: zswap: maybe_preload &amp; refactoring
  zram: unify error reporting
  zsmalloc: remove null check from destroy_handle_cache()
  zsmalloc: do not take class lock in zs_shrinker_count()
  zsmalloc: use class-&gt;pages_per_zspage
  zsmalloc: consider ZS_ALMOST_FULL as migrate source
  zsmalloc: partial page ordering within a fullness_list
  zsmalloc: use shrinker to trigger auto-compaction
  zsmalloc: account the number of compacted pages
  zsmalloc/zram: introduce zs_pool_stats api
  zsmalloc: cosmetic compaction code adjustments
  zsmalloc: introduce zs_can_compact() function
  zsmalloc: always keep per-class stats
  zsmalloc: drop unused variable `nr_to_migrate'
  mm/memblock.c: fix comment in __next_mem_range()
  mm/page_alloc.c: fix type information of memoryless node
  memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
  ...
</pre>
</div>
</content>
</entry>
</feed>
