<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/block, branch v4.4.93</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>fix unbalanced page refcounting in bio_map_user_iov</title>
<updated>2017-10-18T07:20:42+00:00</updated>
<author>
<name>Vitaly Mayatskikh</name>
<email>v.mayatskih@gmail.com</email>
</author>
<published>2017-09-22T05:18:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=399c46095eb5d41934c8a99732028a9f1cf0ac50'/>
<id>399c46095eb5d41934c8a99732028a9f1cf0ac50</id>
<content type='text'>
commit 95d78c28b5a85bacbc29b8dba7c04babb9b0d467 upstream.

bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
IO vector has small consecutive buffers belonging to the same page.
bio_add_pc_page merges them into one, but the page reference is never
dropped.

Signed-off-by: Vitaly Mayatskikh &lt;v.mayatskih@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 95d78c28b5a85bacbc29b8dba7c04babb9b0d467 upstream.

bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
IO vector has small consecutive buffers belonging to the same page.
bio_add_pc_page merges them into one, but the page reference is never
dropped.

Signed-off-by: Vitaly Mayatskikh &lt;v.mayatskih@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>partitions/efi: Fix integer overflow in GPT size calculation</title>
<updated>2017-10-08T08:14:18+00:00</updated>
<author>
<name>Alden Tondettar</name>
<email>alden.tondettar@gmail.com</email>
</author>
<published>2017-01-15T22:31:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8e8c3d4bb62950c37f086be7d3d775b4879c30df'/>
<id>8e8c3d4bb62950c37f086be7d3d775b4879c30df</id>
<content type='text'>
[ Upstream commit c5082b70adfe8e1ea1cf4a8eff92c9f260e364d2 ]

If a GUID Partition Table claims to have more than 2**25 entries, the
calculation of the partition table size in alloc_read_gpt_entries() will
overflow a 32-bit integer and not enough space will be allocated for the
table.

Nothing seems to get written out of bounds, but later efi_partition() will
read up to 32768 bytes from a 128 byte buffer, possibly OOPSing or exposing
information to /proc/partitions and uevents.

The problem exists on both 64-bit and 32-bit platforms.

Fix the overflow and also print a meaningful debug message if the table
size is too large.

Signed-off-by: Alden Tondettar &lt;alden.tondettar@gmail.com&gt;
Acked-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit c5082b70adfe8e1ea1cf4a8eff92c9f260e364d2 ]

If a GUID Partition Table claims to have more than 2**25 entries, the
calculation of the partition table size in alloc_read_gpt_entries() will
overflow a 32-bit integer and not enough space will be allocated for the
table.

Nothing seems to get written out of bounds, but later efi_partition() will
read up to 32768 bytes from a 128 byte buffer, possibly OOPSing or exposing
information to /proc/partitions and uevents.

The problem exists on both 64-bit and 32-bit platforms.

Fix the overflow and also print a meaningful debug message if the table
size is too large.

Signed-off-by: Alden Tondettar &lt;alden.tondettar@gmail.com&gt;
Acked-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bsg-lib: don't free job in bsg_prepare_job</title>
<updated>2017-10-05T07:41:46+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-09-07T11:54:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=668cee82cd28d2c38a99f7cbddf3b3fd58f257b9'/>
<id>668cee82cd28d2c38a99f7cbddf3b3fd58f257b9</id>
<content type='text'>
commit f507b54dccfd8000c517d740bc45f20c74532d18 upstream.

The job structure is allocated as part of the request, so we should not
free it in the error path of bsg_prepare_job.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f507b54dccfd8000c517d740bc45f20c74532d18 upstream.

The job structure is allocated as part of the request, so we should not
free it in the error path of bsg_prepare_job.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: Relax a check in blk_start_queue()</title>
<updated>2017-09-27T09:00:14+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bart.vanassche@wdc.com</email>
</author>
<published>2017-08-17T20:12:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=30e81e7fe197dd14d5b7653c75140ea75fe5c3d4'/>
<id>30e81e7fe197dd14d5b7653c75140ea75fe5c3d4</id>
<content type='text'>
commit 4ddd56b003f251091a67c15ae3fe4a5c5c5e390a upstream.

Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:

WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
RIP: 0010:blk_start_queue+0x84/0xa0
Call Trace:
 skd_unquiesce_dev+0x12a/0x1d0 [skd]
 skd_complete_internal+0x1e7/0x5a0 [skd]
 skd_complete_other+0xc2/0xd0 [skd]
 skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
 skd_isr+0x14f/0x180 [skd]
 irq_forced_thread_fn+0x2a/0x70
 irq_thread+0x144/0x1a0
 kthread+0x125/0x140
 ret_from_fork+0x2a/0x40

Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
Signed-off-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: Paolo 'Blaisorblade' Giarrusso &lt;blaisorblade@yahoo.it&gt;
Cc: Andrew Morton &lt;akpm@osdl.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hannes Reinecke &lt;hare@suse.de&gt;
Cc: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4ddd56b003f251091a67c15ae3fe4a5c5c5e390a upstream.

Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:

WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
RIP: 0010:blk_start_queue+0x84/0xa0
Call Trace:
 skd_unquiesce_dev+0x12a/0x1d0 [skd]
 skd_complete_internal+0x1e7/0x5a0 [skd]
 skd_complete_other+0xc2/0xd0 [skd]
 skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
 skd_isr+0x14f/0x180 [skd]
 irq_forced_thread_fn+0x2a/0x70
 irq_thread+0x144/0x1a0
 kthread+0x125/0x140
 ret_from_fork+0x2a/0x40

Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
Signed-off-by: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: Paolo 'Blaisorblade' Giarrusso &lt;blaisorblade@yahoo.it&gt;
Cc: Andrew Morton &lt;akpm@osdl.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hannes Reinecke &lt;hare@suse.de&gt;
Cc: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix module reference leak on put_disk() call for cgroups throttle</title>
<updated>2017-07-05T12:37:16+00:00</updated>
<author>
<name>Roman Pen</name>
<email>roman.penyaev@profitbricks.com</email>
</author>
<published>2016-02-09T19:33:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=21d7c733251a5afd87374d3928738026c93e6261'/>
<id>21d7c733251a5afd87374d3928738026c93e6261</id>
<content type='text'>
commit 39a169b62b415390398291080dafe63aec751e0a upstream.

get_disk(),get_gendisk() calls have non explicit side effect: they
increase the reference on the disk owner module.

The following is the correct sequence how to get a disk reference and
to put it:

    disk = get_gendisk(...);

    /* use disk */

    owner = disk-&gt;fops-&gt;owner;
    put_disk(disk);
    module_put(owner);

fs/block_dev.c is aware of this required module_put() call, but f.e.
blkg_conf_finish(), which is located in block/blk-cgroup.c, does not put
a module reference.  To see a leakage in action cgroups throttle config
can be used.  In the following script I'm removing throttle for /dev/ram0
(actually this is NOP, because throttle was never set for this device):

    # lsmod | grep brd
    brd                     5175  0
    # i=100; while [ $i -gt 0 ]; do echo "1:0 0" &gt; \
        /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device; i=$(($i - 1)); \
    done
    # lsmod | grep brd
    brd                     5175  100

Now brd module has 100 references.

The issue is fixed by calling module_put() just right away put_disk().

Signed-off-by: Roman Pen &lt;roman.penyaev@profitbricks.com&gt;
Cc: Gi-Oh Kim &lt;gi-oh.kim@profitbricks.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 39a169b62b415390398291080dafe63aec751e0a upstream.

get_disk(),get_gendisk() calls have non explicit side effect: they
increase the reference on the disk owner module.

The following is the correct sequence how to get a disk reference and
to put it:

    disk = get_gendisk(...);

    /* use disk */

    owner = disk-&gt;fops-&gt;owner;
    put_disk(disk);
    module_put(owner);

fs/block_dev.c is aware of this required module_put() call, but f.e.
blkg_conf_finish(), which is located in block/blk-cgroup.c, does not put
a module reference.  To see a leakage in action cgroups throttle config
can be used.  In the following script I'm removing throttle for /dev/ram0
(actually this is NOP, because throttle was never set for this device):

    # lsmod | grep brd
    brd                     5175  0
    # i=100; while [ $i -gt 0 ]; do echo "1:0 0" &gt; \
        /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device; i=$(($i - 1)); \
    done
    # lsmod | grep brd
    brd                     5175  100

Now brd module has 100 references.

The issue is fixed by calling module_put() just right away put_disk().

Signed-off-by: Roman Pen &lt;roman.penyaev@profitbricks.com&gt;
Cc: Gi-Oh Kim &lt;gi-oh.kim@profitbricks.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>partitions/msdos: FreeBSD UFS2 file systems are not recognized</title>
<updated>2017-06-17T04:39:34+00:00</updated>
<author>
<name>Richard</name>
<email>richard@aaazen.com</email>
</author>
<published>2017-05-21T19:27:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b28c21baf28a854011d0d896f861995843e7a72e'/>
<id>b28c21baf28a854011d0d896f861995843e7a72e</id>
<content type='text'>
commit 223220356d5ebc05ead9a8d697abb0c0a906fc81 upstream.

The code in block/partitions/msdos.c recognizes FreeBSD, OpenBSD
and NetBSD partitions and does a reasonable job picking out OpenBSD
and NetBSD UFS subpartitions.

But for FreeBSD the subpartitions are always "bad".

    Kernel: &lt;bsd:bad subpartition - ignored

Though all 3 of these BSD systems use UFS as a file system, only
FreeBSD uses relative start addresses in the subpartition
declarations.

The following patch fixes this for FreeBSD partitions and leaves
the code for OpenBSD and NetBSD intact:

Signed-off-by: Richard Narron &lt;comet.berkeley@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 223220356d5ebc05ead9a8d697abb0c0a906fc81 upstream.

The code in block/partitions/msdos.c recognizes FreeBSD, OpenBSD
and NetBSD partitions and does a reasonable job picking out OpenBSD
and NetBSD UFS subpartitions.

But for FreeBSD the subpartitions are always "bad".

    Kernel: &lt;bsd:bad subpartition - ignored

Though all 3 of these BSD systems use UFS as a file system, only
FreeBSD uses relative start addresses in the subpartition
declarations.

The following patch fixes this for FreeBSD partitions and leaves
the code for OpenBSD and NetBSD intact:

Signed-off-by: Richard Narron &lt;comet.berkeley@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix blk_integrity_register to use template's interval_exp if not 0</title>
<updated>2017-05-20T12:27:00+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2017-04-22T21:22:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=21cb4dc57b3ca380abda80dbdac7eaff9c565ba7'/>
<id>21cb4dc57b3ca380abda80dbdac7eaff9c565ba7</id>
<content type='text'>
commit 2859323e35ab5fc42f351fbda23ab544eaa85945 upstream.

When registering an integrity profile: if the template's interval_exp is
not 0 use it, otherwise use the ilog2() of logical block size of the
provided gendisk.

This fixes a long-standing DM linear target bug where it cannot pass
integrity data to the underlying device if its logical block size
conflicts with the underlying device's logical block size.

Reported-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2859323e35ab5fc42f351fbda23ab544eaa85945 upstream.

When registering an integrity profile: if the template's interval_exp is
not 0 use it, otherwise use the ilog2() of logical block size of the
provided gendisk.

This fixes a long-standing DM linear target bug where it cannot pass
integrity data to the underlying device if its logical block size
conflicts with the underlying device's logical block size.

Reported-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: get rid of blk_integrity_revalidate()</title>
<updated>2017-05-14T11:32:59+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-04-18T16:43:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a4c6a08906f8c8df19ee2b3514fa76be64ddc83'/>
<id>4a4c6a08906f8c8df19ee2b3514fa76be64ddc83</id>
<content type='text'>
commit 19b7ccf8651df09d274671b53039c672a52ad84d upstream.

Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:

    if (bi-&gt;profile)
            disk-&gt;queue-&gt;backing_dev_info-&gt;capabilities |=
                    BDI_CAP_STABLE_WRITES;
    else
            disk-&gt;queue-&gt;backing_dev_info-&gt;capabilities &amp;=
                    ~BDI_CAP_STABLE_WRITES;

It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time.  This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Tested-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
[idryomov@gmail.com: backport to &lt; 4.11: bdi is embedded in queue]
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 19b7ccf8651df09d274671b53039c672a52ad84d upstream.

Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:

    if (bi-&gt;profile)
            disk-&gt;queue-&gt;backing_dev_info-&gt;capabilities |=
                    BDI_CAP_STABLE_WRITES;
    else
            disk-&gt;queue-&gt;backing_dev_info-&gt;capabilities &amp;=
                    ~BDI_CAP_STABLE_WRITES;

It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time.  This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Tested-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
[idryomov@gmail.com: backport to &lt; 4.11: bdi is embedded in queue]
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix del_gendisk() vs blkdev_ioctl crash</title>
<updated>2017-04-27T07:09:34+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2015-12-29T22:02:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6ddbac9aa800b99761a05ba520d62a0d8063a5b8'/>
<id>6ddbac9aa800b99761a05ba520d62a0d8063a5b8</id>
<content type='text'>
commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.

When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.

The nvdimm_revalidate_disk() implementation depends on
disk-&gt;driverfs_dev to be valid at entry.  However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.

There's no reason for del_gendisk() to clear -&gt;driverfs_dev.  That
device is the parent of the disk.  It is guaranteed to not be freed
until the disk, as a child, drops its -&gt;parent reference.

We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)-&gt;parent, but lets fix it globally since
-&gt;driverfs_dev follows the lifetime of the parent.  Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [&lt;ffffffffa00340a8&gt;] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
 CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
 [..]
 Call Trace:
  [&lt;ffffffff8143e5c7&gt;] rescan_partitions+0x87/0x2c0
  [&lt;ffffffff810f37f9&gt;] ? __lock_is_held+0x49/0x70
  [&lt;ffffffff81438c62&gt;] __blkdev_reread_part+0x72/0xb0
  [&lt;ffffffff81438cc5&gt;] blkdev_reread_part+0x25/0x40
  [&lt;ffffffff8143982d&gt;] blkdev_ioctl+0x4fd/0x9c0
  [&lt;ffffffff811246c9&gt;] ? current_kernel_time64+0x69/0xd0
  [&lt;ffffffff812916dd&gt;] block_ioctl+0x3d/0x50
  [&lt;ffffffff81264c38&gt;] do_vfs_ioctl+0x308/0x560
  [&lt;ffffffff8115dbd1&gt;] ? __audit_syscall_entry+0xb1/0x100
  [&lt;ffffffff810031d6&gt;] ? do_audit_syscall_entry+0x66/0x70
  [&lt;ffffffff81264f09&gt;] SyS_ioctl+0x79/0x90
  [&lt;ffffffff81902672&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@fb.com&gt;
Reported-by: Robert Hu &lt;robert.hu@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.

When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.

The nvdimm_revalidate_disk() implementation depends on
disk-&gt;driverfs_dev to be valid at entry.  However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.

There's no reason for del_gendisk() to clear -&gt;driverfs_dev.  That
device is the parent of the disk.  It is guaranteed to not be freed
until the disk, as a child, drops its -&gt;parent reference.

We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)-&gt;parent, but lets fix it globally since
-&gt;driverfs_dev follows the lifetime of the parent.  Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [&lt;ffffffffa00340a8&gt;] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
 CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
 [..]
 Call Trace:
  [&lt;ffffffff8143e5c7&gt;] rescan_partitions+0x87/0x2c0
  [&lt;ffffffff810f37f9&gt;] ? __lock_is_held+0x49/0x70
  [&lt;ffffffff81438c62&gt;] __blkdev_reread_part+0x72/0xb0
  [&lt;ffffffff81438cc5&gt;] blkdev_reread_part+0x25/0x40
  [&lt;ffffffff8143982d&gt;] blkdev_ioctl+0x4fd/0x9c0
  [&lt;ffffffff811246c9&gt;] ? current_kernel_time64+0x69/0xd0
  [&lt;ffffffff812916dd&gt;] block_ioctl+0x3d/0x50
  [&lt;ffffffff81264c38&gt;] do_vfs_ioctl+0x308/0x560
  [&lt;ffffffff8115dbd1&gt;] ? __audit_syscall_entry+0xb1/0x100
  [&lt;ffffffff810031d6&gt;] ? do_audit_syscall_entry+0x66/0x70
  [&lt;ffffffff81264f09&gt;] SyS_ioctl+0x79/0x90
  [&lt;ffffffff81902672&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@fb.com&gt;
Reported-by: Robert Hu &lt;robert.hu@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: Avoid memory reclaim when remapping queues</title>
<updated>2017-04-18T05:14:37+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@linux.vnet.ibm.com</email>
</author>
<published>2016-12-06T15:31:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f4522e36edaa9ec0cada0daa5c2628db762dd3d9'/>
<id>f4522e36edaa9ec0cada0daa5c2628db762dd3d9</id>
<content type='text'>
commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@linux.vnet.ibm.com&gt;
Cc: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Cc: Douglas Miller &lt;dougmill@linux.vnet.ibm.com&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@linux.vnet.ibm.com&gt;
Cc: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Cc: Douglas Miller &lt;dougmill@linux.vnet.ibm.com&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
