<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs, branch v4.4.129</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>writeback: safer lock nesting</title>
<updated>2018-04-24T07:32:12+00:00</updated>
<author>
<name>Greg Thelen</name>
<email>gthelen@google.com</email>
</author>
<published>2018-04-20T21:55:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6f051f8986a89d0c482ea1dfc96bc226fb12389f'/>
<id>6f051f8986a89d0c482ea1dfc96bc226fb12389f</id>
<content type='text'>
commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream.

lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains.  Switches occur when
enough writes are issued from a new domain.

This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &amp;locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().

    truncate
    __cancel_dirty_page
    lock_page_memcg
    unlocked_inode_to_wb_begin
    unlocked_inode_to_wb_end
    &lt;interrupts mistakenly enabled&gt;
                                    &lt;interrupt&gt;
                                    end_page_writeback
                                    test_clear_page_writeback
                                    lock_page_memcg
                                    &lt;deadlock&gt;
    unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:

  cd /mnt/cgroup/memory
  mkdir a b
  echo 1 &gt; a/memory.move_charge_at_immigrate
  echo 1 &gt; b/memory.move_charge_at_immigrate
  (
    echo $BASHPID &gt; a/cgroup.procs
    while true; do
      dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
  ) &amp;
  while true; do
    sync
  done &amp;
  sleep 1h &amp;
  SLEEP=$!
  while true; do
    echo $SLEEP &gt; a/cgroup.procs
    echo $SLEEP &gt; b/cgroup.procs
  done

The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel.  I suggest we should to prevent future
surprises.  And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146

Wang Long said "this deadlock occurs three times in our environment"

[gthelen@google.com: v4]
  Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Reported-by: Wang Long &lt;wanglong19@meituan.com&gt;
Acked-by: Wang Long &lt;wanglong19@meituan.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[v4.2+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[natechancellor: Applied to 4.4 based on Greg's backport on lkml.org]
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream.

lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains.  Switches occur when
enough writes are issued from a new domain.

This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &amp;locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().

    truncate
    __cancel_dirty_page
    lock_page_memcg
    unlocked_inode_to_wb_begin
    unlocked_inode_to_wb_end
    &lt;interrupts mistakenly enabled&gt;
                                    &lt;interrupt&gt;
                                    end_page_writeback
                                    test_clear_page_writeback
                                    lock_page_memcg
                                    &lt;deadlock&gt;
    unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:

  cd /mnt/cgroup/memory
  mkdir a b
  echo 1 &gt; a/memory.move_charge_at_immigrate
  echo 1 &gt; b/memory.move_charge_at_immigrate
  (
    echo $BASHPID &gt; a/cgroup.procs
    while true; do
      dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
  ) &amp;
  while true; do
    sync
  done &amp;
  sleep 1h &amp;
  SLEEP=$!
  while true; do
    echo $SLEEP &gt; a/cgroup.procs
    echo $SLEEP &gt; b/cgroup.procs
  done

The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel.  I suggest we should to prevent future
surprises.  And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146

Wang Long said "this deadlock occurs three times in our environment"

[gthelen@google.com: v4]
  Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Reported-by: Wang Long &lt;wanglong19@meituan.com&gt;
Acked-by: Wang Long &lt;wanglong19@meituan.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[v4.2+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[natechancellor: Applied to 4.4 based on Greg's backport on lkml.org]
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fanotify: fix logic of events on child</title>
<updated>2018-04-24T07:32:11+00:00</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2018-04-04T20:42:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=87d7ccbf09a7846818c30edf9bc8192719bfb05f'/>
<id>87d7ccbf09a7846818c30edf9bc8192719bfb05f</id>
<content type='text'>
commit 54a307ba8d3cd00a3902337ffaae28f436eeb1a4 upstream.

When event on child inodes are sent to the parent inode mark and
parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event
will not be delivered to the listener process. However, if the same
process also has a mount mark, the event to the parent inode will be
delivered regadless of the mount mark mask.

This behavior is incorrect in the case where the mount mark mask does
not contain the specific event type. For example, the process adds
a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD)
and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR).

A modify event on a file inside that directory (and inside that mount)
should not create a FAN_MODIFY event, because neither of the marks
requested to get that event on the file.

Fixes: 1968f5eed54c ("fanotify: use both marks when possible")
Cc: stable &lt;stable@vger.kernel.org&gt;
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
[natechancellor: Fix small conflict due to lack of 3cd5eca8d7a2f]
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 54a307ba8d3cd00a3902337ffaae28f436eeb1a4 upstream.

When event on child inodes are sent to the parent inode mark and
parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event
will not be delivered to the listener process. However, if the same
process also has a mount mark, the event to the parent inode will be
delivered regadless of the mount mark mask.

This behavior is incorrect in the case where the mount mark mask does
not contain the specific event type. For example, the process adds
a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD)
and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR).

A modify event on a file inside that directory (and inside that mount)
should not create a FAN_MODIFY event, because neither of the marks
requested to get that event on the file.

Fixes: 1968f5eed54c ("fanotify: use both marks when possible")
Cc: stable &lt;stable@vger.kernel.org&gt;
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
[natechancellor: Fix small conflict due to lack of 3cd5eca8d7a2f]
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: bugfix for mmaped pages in mpage_release_unused_pages()</title>
<updated>2018-04-24T07:32:11+00:00</updated>
<author>
<name>wangguang</name>
<email>wang.guang55@zte.com.cn</email>
</author>
<published>2016-09-15T15:32:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a529f29a3ea9cb7e2c90053c02c8213f527374df'/>
<id>a529f29a3ea9cb7e2c90053c02c8213f527374df</id>
<content type='text'>
commit 4e800c0359d9a53e6bf0ab216954971b2515247f upstream.

Pages clear buffers after ext4 delayed block allocation failed,
However, it does not clean its pte_dirty flag.
if the pages unmap ,in cording to the pte_dirty ,
unmap_page_range may try to call __set_page_dirty,

which may lead to the bugon at
mpage_prepare_extent_to_map:head = page_buffers(page);.

This patch just call clear_page_dirty_for_io to clean pte_dirty
at mpage_release_unused_pages for pages mmaped.

Steps to reproduce the bug:

（1） mmap a file in ext4
	addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED,
	       	            fd, 0);
	memset(addr, 'i', 4096);

（2） return EIO at

	ext4_writepages-&gt;mpage_map_and_submit_extent-&gt;mpage_map_one_extent

which causes this log message to be print:

                ext4_msg(sb, KERN_CRIT,
                        "Delayed block allocation failed for "
                        "inode %lu at logical offset %llu with"
                        " max blocks %u with error %d",
                        inode-&gt;i_ino,
                        (unsigned long long)map-&gt;m_lblk,
                        (unsigned)map-&gt;m_len, -err);

(3）Unmap the addr cause warning at

	__set_page_dirty:WARN_ON_ONCE(warn &amp;&amp; !PageUptodate(page));

(4) wait for a minute,then bugon happen.

Cc: stable@vger.kernel.org
Signed-off-by: wangguang &lt;wangguang03@zte.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
[@nathanchance: Resolved conflict from lack of 09cbfeaf1a5a6]
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4e800c0359d9a53e6bf0ab216954971b2515247f upstream.

Pages clear buffers after ext4 delayed block allocation failed,
However, it does not clean its pte_dirty flag.
if the pages unmap ,in cording to the pte_dirty ,
unmap_page_range may try to call __set_page_dirty,

which may lead to the bugon at
mpage_prepare_extent_to_map:head = page_buffers(page);.

This patch just call clear_page_dirty_for_io to clean pte_dirty
at mpage_release_unused_pages for pages mmaped.

Steps to reproduce the bug:

（1） mmap a file in ext4
	addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED,
	       	            fd, 0);
	memset(addr, 'i', 4096);

（2） return EIO at

	ext4_writepages-&gt;mpage_map_and_submit_extent-&gt;mpage_map_one_extent

which causes this log message to be print:

                ext4_msg(sb, KERN_CRIT,
                        "Delayed block allocation failed for "
                        "inode %lu at logical offset %llu with"
                        " max blocks %u with error %d",
                        inode-&gt;i_ino,
                        (unsigned long long)map-&gt;m_lblk,
                        (unsigned)map-&gt;m_len, -err);

(3）Unmap the addr cause warning at

	__set_page_dirty:WARN_ON_ONCE(warn &amp;&amp; !PageUptodate(page));

(4) wait for a minute,then bugon happen.

Cc: stable@vger.kernel.org
Signed-off-by: wangguang &lt;wangguang03@zte.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
[@nathanchance: Resolved conflict from lack of 09cbfeaf1a5a6]
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>autofs: mount point create should honour passed in mode</title>
<updated>2018-04-24T07:32:11+00:00</updated>
<author>
<name>Ian Kent</name>
<email>raven@themaw.net</email>
</author>
<published>2018-04-20T21:55:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ce98dd37cc5d5bcfde9ae899113e5f90d93f8acf'/>
<id>ce98dd37cc5d5bcfde9ae899113e5f90d93f8acf</id>
<content type='text'>
commit 1e6306652ba18723015d1b4967fe9de55f042499 upstream.

The autofs file system mkdir inode operation blindly sets the created
directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
cause selinux dac_override denials.

But the function also checks if the caller is the daemon (as no-one else
should be able to do anything here) so there's no point in not honouring
the passed in mode, allowing the daemon to set appropriate mode when
required.

Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net
Signed-off-by: Ian Kent &lt;raven@themaw.net&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 1e6306652ba18723015d1b4967fe9de55f042499 upstream.

The autofs file system mkdir inode operation blindly sets the created
directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
cause selinux dac_override denials.

But the function also checks if the caller is the daemon (as no-one else
should be able to do anything here) so there's no point in not honouring
the passed in mode, allowing the daemon to set appropriate mode when
required.

Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net
Signed-off-by: Ian Kent &lt;raven@themaw.net&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Don't leak MNT_INTERNAL away from internal mounts</title>
<updated>2018-04-24T07:32:11+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2018-04-20T02:03:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d10a274ada79aa5612126a77f10fa0c3af33bdb6'/>
<id>d10a274ada79aa5612126a77f10fa0c3af33bdb6</id>
<content type='text'>
commit 16a34adb9392b2fe4195267475ab5b472e55292c upstream.

We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies.  As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.

Cc: stable@kernel.org
Reported-by: Alexander Aring &lt;aring@mojatatu.com&gt;
Bisected-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Tested-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Tested-by: Alexander Aring &lt;aring@mojatatu.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 16a34adb9392b2fe4195267475ab5b472e55292c upstream.

We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies.  As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.

Cc: stable@kernel.org
Reported-by: Alexander Aring &lt;aring@mojatatu.com&gt;
Bisected-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Tested-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Tested-by: Alexander Aring &lt;aring@mojatatu.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>jffs2_kill_sb(): deal with failed allocations</title>
<updated>2018-04-24T07:32:11+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2018-04-03T03:56:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2154ecea6d72f9290d201fed637afb5178db5665'/>
<id>2154ecea6d72f9290d201fed637afb5178db5665</id>
<content type='text'>
commit c66b23c2840446a82c389e4cb1a12eb2a71fa2e4 upstream.

jffs2_fill_super() might fail to allocate jffs2_sb_info;
jffs2_kill_sb() must survive that.

Cc: stable@kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c66b23c2840446a82c389e4cb1a12eb2a71fa2e4 upstream.

jffs2_fill_super() might fail to allocate jffs2_sb_info;
jffs2_kill_sb() must survive that.

Cc: stable@kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()</title>
<updated>2018-04-24T07:32:10+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2017-01-12T02:50:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9b06cce3ca4d60d442c39bfa7c058b71b1cee6c2'/>
<id>9b06cce3ca4d60d442c39bfa7c058b71b1cee6c2</id>
<content type='text'>
commit c755e251357a0cee0679081f08c3f4ba797a8009 upstream.

The xattr_sem deadlock problems fixed in commit 2e81a4eeedca: "ext4:
avoid deadlock when expanding inode size" didn't include the use of
xattr_sem in fs/ext4/inline.c.  With the addition of project quota
which added a new extra inode field, this exposed deadlocks in the
inline_data code similar to the ones fixed by 2e81a4eeedca.

The deadlock can be reproduced via:

   dmesg -n 7
   mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
   mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
   mkdir /vdc/a
   umount /vdc
   mount -t ext4 /dev/vdc /vdc
   echo foo &gt; /vdc/a/foo

and looks like this:

[   11.158815]
[   11.160276] =============================================
[   11.161960] [ INFO: possible recursive locking detected ]
[   11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G        W
[   11.161960] ---------------------------------------------
[   11.161960] bash/2519 is trying to acquire lock:
[   11.161960]  (&amp;ei-&gt;xattr_sem){++++..}, at: [&lt;c1225a4b&gt;] ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]
[   11.161960] but task is already holding lock:
[   11.161960]  (&amp;ei-&gt;xattr_sem){++++..}, at: [&lt;c1227941&gt;] ext4_try_add_inline_entry+0x3a/0x152
[   11.161960]
[   11.161960] other info that might help us debug this:
[   11.161960]  Possible unsafe locking scenario:
[   11.161960]
[   11.161960]        CPU0
[   11.161960]        ----
[   11.161960]   lock(&amp;ei-&gt;xattr_sem);
[   11.161960]   lock(&amp;ei-&gt;xattr_sem);
[   11.161960]
[   11.161960]  *** DEADLOCK ***
[   11.161960]
[   11.161960]  May be due to missing lock nesting notation
[   11.161960]
[   11.161960] 4 locks held by bash/2519:
[   11.161960]  #0:  (sb_writers#3){.+.+.+}, at: [&lt;c11a2414&gt;] mnt_want_write+0x1e/0x3e
[   11.161960]  #1:  (&amp;type-&gt;i_mutex_dir_key){++++++}, at: [&lt;c119508b&gt;] path_openat+0x338/0x67a
[   11.161960]  #2:  (jbd2_handle){++++..}, at: [&lt;c123314a&gt;] start_this_handle+0x582/0x622
[   11.161960]  #3:  (&amp;ei-&gt;xattr_sem){++++..}, at: [&lt;c1227941&gt;] ext4_try_add_inline_entry+0x3a/0x152
[   11.161960]
[   11.161960] stack backtrace:
[   11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G        W       4.10.0-rc3-00015-g011b30a8a3cf #160
[   11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[   11.161960] Call Trace:
[   11.161960]  dump_stack+0x72/0xa3
[   11.161960]  __lock_acquire+0xb7c/0xcb9
[   11.161960]  ? kvm_clock_read+0x1f/0x29
[   11.161960]  ? __lock_is_held+0x36/0x66
[   11.161960]  ? __lock_is_held+0x36/0x66
[   11.161960]  lock_acquire+0x106/0x18a
[   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  down_write+0x39/0x72
[   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  ? _raw_read_unlock+0x22/0x2c
[   11.161960]  ? jbd2_journal_extend+0x1e2/0x262
[   11.161960]  ? __ext4_journal_get_write_access+0x3d/0x60
[   11.161960]  ext4_mark_inode_dirty+0x17d/0x26d
[   11.161960]  ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[   11.161960]  ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[   11.161960]  ext4_try_add_inline_entry+0x69/0x152
[   11.161960]  ext4_add_entry+0xa3/0x848
[   11.161960]  ? __brelse+0x14/0x2f
[   11.161960]  ? _raw_spin_unlock_irqrestore+0x44/0x4f
[   11.161960]  ext4_add_nondir+0x17/0x5b
[   11.161960]  ext4_create+0xcf/0x133
[   11.161960]  ? ext4_mknod+0x12f/0x12f
[   11.161960]  lookup_open+0x39e/0x3fb
[   11.161960]  ? __wake_up+0x1a/0x40
[   11.161960]  ? lock_acquire+0x11e/0x18a
[   11.161960]  path_openat+0x35c/0x67a
[   11.161960]  ? sched_clock_cpu+0xd7/0xf2
[   11.161960]  do_filp_open+0x36/0x7c
[   11.161960]  ? _raw_spin_unlock+0x22/0x2c
[   11.161960]  ? __alloc_fd+0x169/0x173
[   11.161960]  do_sys_open+0x59/0xcc
[   11.161960]  SyS_open+0x1d/0x1f
[   11.161960]  do_int80_syscall_32+0x4f/0x61
[   11.161960]  entry_INT80_32+0x2f/0x2f
[   11.161960] EIP: 0xb76ad469
[   11.161960] EFLAGS: 00000286 CPU: 0
[   11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
[   11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
[   11.161960]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4eeedca as a prereq)
Reported-by: George Spelvin &lt;linux@sciencehorizons.net&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c755e251357a0cee0679081f08c3f4ba797a8009 upstream.

The xattr_sem deadlock problems fixed in commit 2e81a4eeedca: "ext4:
avoid deadlock when expanding inode size" didn't include the use of
xattr_sem in fs/ext4/inline.c.  With the addition of project quota
which added a new extra inode field, this exposed deadlocks in the
inline_data code similar to the ones fixed by 2e81a4eeedca.

The deadlock can be reproduced via:

   dmesg -n 7
   mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
   mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
   mkdir /vdc/a
   umount /vdc
   mount -t ext4 /dev/vdc /vdc
   echo foo &gt; /vdc/a/foo

and looks like this:

[   11.158815]
[   11.160276] =============================================
[   11.161960] [ INFO: possible recursive locking detected ]
[   11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G        W
[   11.161960] ---------------------------------------------
[   11.161960] bash/2519 is trying to acquire lock:
[   11.161960]  (&amp;ei-&gt;xattr_sem){++++..}, at: [&lt;c1225a4b&gt;] ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]
[   11.161960] but task is already holding lock:
[   11.161960]  (&amp;ei-&gt;xattr_sem){++++..}, at: [&lt;c1227941&gt;] ext4_try_add_inline_entry+0x3a/0x152
[   11.161960]
[   11.161960] other info that might help us debug this:
[   11.161960]  Possible unsafe locking scenario:
[   11.161960]
[   11.161960]        CPU0
[   11.161960]        ----
[   11.161960]   lock(&amp;ei-&gt;xattr_sem);
[   11.161960]   lock(&amp;ei-&gt;xattr_sem);
[   11.161960]
[   11.161960]  *** DEADLOCK ***
[   11.161960]
[   11.161960]  May be due to missing lock nesting notation
[   11.161960]
[   11.161960] 4 locks held by bash/2519:
[   11.161960]  #0:  (sb_writers#3){.+.+.+}, at: [&lt;c11a2414&gt;] mnt_want_write+0x1e/0x3e
[   11.161960]  #1:  (&amp;type-&gt;i_mutex_dir_key){++++++}, at: [&lt;c119508b&gt;] path_openat+0x338/0x67a
[   11.161960]  #2:  (jbd2_handle){++++..}, at: [&lt;c123314a&gt;] start_this_handle+0x582/0x622
[   11.161960]  #3:  (&amp;ei-&gt;xattr_sem){++++..}, at: [&lt;c1227941&gt;] ext4_try_add_inline_entry+0x3a/0x152
[   11.161960]
[   11.161960] stack backtrace:
[   11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G        W       4.10.0-rc3-00015-g011b30a8a3cf #160
[   11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[   11.161960] Call Trace:
[   11.161960]  dump_stack+0x72/0xa3
[   11.161960]  __lock_acquire+0xb7c/0xcb9
[   11.161960]  ? kvm_clock_read+0x1f/0x29
[   11.161960]  ? __lock_is_held+0x36/0x66
[   11.161960]  ? __lock_is_held+0x36/0x66
[   11.161960]  lock_acquire+0x106/0x18a
[   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  down_write+0x39/0x72
[   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  ? _raw_read_unlock+0x22/0x2c
[   11.161960]  ? jbd2_journal_extend+0x1e2/0x262
[   11.161960]  ? __ext4_journal_get_write_access+0x3d/0x60
[   11.161960]  ext4_mark_inode_dirty+0x17d/0x26d
[   11.161960]  ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[   11.161960]  ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[   11.161960]  ext4_try_add_inline_entry+0x69/0x152
[   11.161960]  ext4_add_entry+0xa3/0x848
[   11.161960]  ? __brelse+0x14/0x2f
[   11.161960]  ? _raw_spin_unlock_irqrestore+0x44/0x4f
[   11.161960]  ext4_add_nondir+0x17/0x5b
[   11.161960]  ext4_create+0xcf/0x133
[   11.161960]  ? ext4_mknod+0x12f/0x12f
[   11.161960]  lookup_open+0x39e/0x3fb
[   11.161960]  ? __wake_up+0x1a/0x40
[   11.161960]  ? lock_acquire+0x11e/0x18a
[   11.161960]  path_openat+0x35c/0x67a
[   11.161960]  ? sched_clock_cpu+0xd7/0xf2
[   11.161960]  do_filp_open+0x36/0x7c
[   11.161960]  ? _raw_spin_unlock+0x22/0x2c
[   11.161960]  ? __alloc_fd+0x169/0x173
[   11.161960]  do_sys_open+0x59/0xcc
[   11.161960]  SyS_open+0x1d/0x1f
[   11.161960]  do_int80_syscall_32+0x4f/0x61
[   11.161960]  entry_INT80_32+0x2f/0x2f
[   11.161960] EIP: 0xb76ad469
[   11.161960] EFLAGS: 00000286 CPU: 0
[   11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
[   11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
[   11.161960]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4eeedca as a prereq)
Reported-by: George Spelvin &lt;linux@sciencehorizons.net&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: fix crashes in dioread_nolock mode</title>
<updated>2018-04-24T07:32:10+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-02-19T05:33:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b9b98c26705b8d5fba8f15faeb923b1c6f48d223'/>
<id>b9b98c26705b8d5fba8f15faeb923b1c6f48d223</id>
<content type='text'>
commit 74dae4278546b897eb81784fdfcce872ddd8b2b8 upstream.

Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.

Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 74dae4278546b897eb81784fdfcce872ddd8b2b8 upstream.

Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.

Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: don't allow r/w mounts if metadata blocks overlap the superblock</title>
<updated>2018-04-24T07:32:09+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2018-03-30T02:10:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4845fefe6d13a28702f92102dff11c1b773f00d0'/>
<id>4845fefe6d13a28702f92102dff11c1b773f00d0</id>
<content type='text'>
commit 18db4b4e6fc31eda838dd1c1296d67dbcb3dc957 upstream.

If some metadata block, such as an allocation bitmap, overlaps the
superblock, it's very likely that if the file system is mounted
read/write, the results will not be pretty.  So disallow r/w mounts
for file systems corrupted in this particular way.

Backport notes:
3.18.y is missing bc98a42c1f7d ("VFS: Convert sb-&gt;s_flags &amp; MS_RDONLY to sb_rdonly(sb)")
and e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
so we simply use the sb MS_RDONLY check from pre bc98a42c1f7d in place of the sb_rdonly
function used in the upstream variant of the patch.

Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Harsh Shandilya &lt;harsh@prjkt.io&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 18db4b4e6fc31eda838dd1c1296d67dbcb3dc957 upstream.

If some metadata block, such as an allocation bitmap, overlaps the
superblock, it's very likely that if the file system is mounted
read/write, the results will not be pretty.  So disallow r/w mounts
for file systems corrupted in this particular way.

Backport notes:
3.18.y is missing bc98a42c1f7d ("VFS: Convert sb-&gt;s_flags &amp; MS_RDONLY to sb_rdonly(sb)")
and e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
so we simply use the sb MS_RDONLY check from pre bc98a42c1f7d in place of the sb_rdonly
function used in the upstream variant of the patch.

Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Harsh Shandilya &lt;harsh@prjkt.io&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: fail ext4_iget for root directory if unallocated</title>
<updated>2018-04-24T07:32:07+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2018-03-30T01:56:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=990251318b97ed7153d9adbf633035536c7d685b'/>
<id>990251318b97ed7153d9adbf633035536c7d685b</id>
<content type='text'>
commit 8e4b5eae5decd9dfe5a4ee369c22028f90ab4c44 upstream.

If the root directory has an i_links_count of zero, then when the file
system is mounted, then when ext4_fill_super() notices the problem and
tries to call iput() the root directory in the error return path,
ext4_evict_inode() will try to free the inode on disk, before all of
the file system structures are set up, and this will result in an OOPS
caused by a NULL pointer dereference.

This issue has been assigned CVE-2018-1092.

https://bugzilla.kernel.org/show_bug.cgi?id=199179
https://bugzilla.redhat.com/show_bug.cgi?id=1560777

Reported-by: Wen Xu &lt;wen.xu@gatech.edu&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8e4b5eae5decd9dfe5a4ee369c22028f90ab4c44 upstream.

If the root directory has an i_links_count of zero, then when the file
system is mounted, then when ext4_fill_super() notices the problem and
tries to call iput() the root directory in the error return path,
ext4_evict_inode() will try to free the inode on disk, before all of
the file system structures are set up, and this will result in an OOPS
caused by a NULL pointer dereference.

This issue has been assigned CVE-2018-1092.

https://bugzilla.kernel.org/show_bug.cgi?id=199179
https://bugzilla.redhat.com/show_bug.cgi?id=1560777

Reported-by: Wen Xu &lt;wen.xu@gatech.edu&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
