<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/nilfs2, branch v3.14.34</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>nilfs2: fix deadlock of segment constructor over I_SYNC flag</title>
<updated>2015-02-11T06:54:47+00:00</updated>
<author>
<name>Ryusuke Konishi</name>
<email>konishi.ryusuke@lab.ntt.co.jp</email>
</author>
<published>2015-02-05T20:25:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=52e87609d9d3ea34cadb5676e8ea85d025ac9632'/>
<id>52e87609d9d3ea34cadb5676e8ea85d025ac9632</id>
<content type='text'>
commit 7ef3ff2fea8bf5e4a21cef47ad87710a3d0fdb52 upstream.

Nilfs2 eventually hangs in a stress test with fsstress program.  This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():

  nilfs_segctor_thread()
    nilfs_segctor_thread_construct()
      nilfs_segctor_unlock()
        nilfs_dispose_list()
          iput()
            iput_final()
              evict()
                inode_wait_for_writeback()  * wait for I_SYNC flag

  writeback_sb_inodes()
     * set I_SYNC flag on inode-&gt;i_state
    __writeback_single_inode()
      do_writepages()
        nilfs_writepages()
          nilfs_construct_dsync_segment()
            nilfs_segctor_sync()
               * wait for completion of segment constructor
    inode_sync_complete()
       * clear I_SYNC flag after __writeback_single_inode() completed

writeback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode-&gt;i_state.  do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion.  On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().

Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode-&gt;i_nlink has a
non-zero count.  We can prevent evict() from being called in iput() by
implementing sop-&gt;drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation.  So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.

Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Tested-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7ef3ff2fea8bf5e4a21cef47ad87710a3d0fdb52 upstream.

Nilfs2 eventually hangs in a stress test with fsstress program.  This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():

  nilfs_segctor_thread()
    nilfs_segctor_thread_construct()
      nilfs_segctor_unlock()
        nilfs_dispose_list()
          iput()
            iput_final()
              evict()
                inode_wait_for_writeback()  * wait for I_SYNC flag

  writeback_sb_inodes()
     * set I_SYNC flag on inode-&gt;i_state
    __writeback_single_inode()
      do_writepages()
        nilfs_writepages()
          nilfs_construct_dsync_segment()
            nilfs_segctor_sync()
               * wait for completion of segment constructor
    inode_sync_complete()
       * clear I_SYNC flag after __writeback_single_inode() completed

writeback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode-&gt;i_state.  do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion.  On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().

Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode-&gt;i_nlink has a
non-zero count.  We can prevent evict() from being called in iput() by
implementing sop-&gt;drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation.  So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.

Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Tested-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races</title>
<updated>2015-01-16T14:59:33+00:00</updated>
<author>
<name>Ryusuke Konishi</name>
<email>konishi.ryusuke@lab.ntt.co.jp</email>
</author>
<published>2014-12-10T23:54:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1cde12540bf232899dbf162427083b4982e4bf54'/>
<id>1cde12540bf232899dbf162427083b4982e4bf54</id>
<content type='text'>
commit 705304a863cc41585508c0f476f6d3ec28cf7e00 upstream.

Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar
ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
of insert_inode_locked() and a bug of a check for dead inodes needs to
be fixed.

If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.

If nilfs_iget() is called before nilfs_new_inode() calls
insert_inode_locked4(), it will create an in-core inode and read its
data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
zero and fail at nilfs_read_inode_common(), which will lead it to call
iget_failed() and cleanly fail.

However, this sanity check doesn't work as expected for reused on-disk
inodes because they leave a non-zero value in i_mode field and it
hinders the test of i_nlink.  This patch also fixes the issue by
removing the test on i_mode that nilfs2 doesn't need.

Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 705304a863cc41585508c0f476f6d3ec28cf7e00 upstream.

Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar
ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
of insert_inode_locked() and a bug of a check for dead inodes needs to
be fixed.

If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.

If nilfs_iget() is called before nilfs_new_inode() calls
insert_inode_locked4(), it will create an in-core inode and read its
data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
zero and fail at nilfs_read_inode_common(), which will lead it to call
iget_failed() and cleanly fail.

However, this sanity check doesn't work as expected for reused on-disk
inodes because they leave a non-zero value in i_mode field and it
hinders the test of i_nlink.  This patch also fixes the issue by
removing the test on i_mode that nilfs2 doesn't need.

Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: fix data loss with mmap()</title>
<updated>2014-10-05T21:52:21+00:00</updated>
<author>
<name>Andreas Rohner</name>
<email>andreas.rohner@gmx.net</email>
</author>
<published>2014-09-25T23:05:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=550a7377dad61b91022a820f99cd99b91f08a298'/>
<id>550a7377dad61b91022a820f99cd99b91f08a298</id>
<content type='text'>
commit 56d7acc792c0d98f38f22058671ee715ff197023 upstream.

This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system.  It is easily
reproducible with the following script:

  ----------------[BEGIN SCRIPT]--------------------
  mkfs.nilfs2 -f /dev/sdb
  mount /dev/sdb /mnt

  dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

  /root/mmaptest/mmaptest /mnt/testfile 30 10 5

  sync
  CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
  umount /mnt

  echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
  echo "AFTER MMAP:\t$CHECKSUM_AFTER"
  echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
  ----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

  ----------------[BEGIN mmaptest]--------------------
  data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
              MAP_SHARED, fd, file_offset);

  for (i = 0; i &lt; write_count; ++i) {
        memcpy(data + i * 4096, buf, sizeof(buf));
        msync(data, file_size - file_offset, MS_SYNC))
  }
  ----------------[END mmaptest]--------------------

The output of the script looks something like this:

  BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
  AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
  AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount.  This can be reproduced a 100% of the time.  The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it.  In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.

[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Tested-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 56d7acc792c0d98f38f22058671ee715ff197023 upstream.

This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system.  It is easily
reproducible with the following script:

  ----------------[BEGIN SCRIPT]--------------------
  mkfs.nilfs2 -f /dev/sdb
  mount /dev/sdb /mnt

  dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

  /root/mmaptest/mmaptest /mnt/testfile 30 10 5

  sync
  CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
  umount /mnt

  echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
  echo "AFTER MMAP:\t$CHECKSUM_AFTER"
  echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
  ----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

  ----------------[BEGIN mmaptest]--------------------
  data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
              MAP_SHARED, fd, file_offset);

  for (i = 0; i &lt; write_count; ++i) {
        memcpy(data + i * 4096, buf, sizeof(buf));
        msync(data, file_size - file_offset, MS_SYNC))
  }
  ----------------[END mmaptest]--------------------

The output of the script looks something like this:

  BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
  AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
  AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount.  This can be reproduced a 100% of the time.  The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it.  In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.

[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Tested-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block</title>
<updated>2014-01-30T19:19:05+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-01-30T19:19:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f568849edac8611d603e00bd6cbbcfea09395ae6'/>
<id>f568849edac8611d603e00bd6cbbcfea09395ae6</id>
<content type='text'>
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page-&gt;list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page-&gt;list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: add comments for ioctls</title>
<updated>2014-01-24T00:37:00+00:00</updated>
<author>
<name>Vyacheslav Dubeyko</name>
<email>slava@dubeyko.com</email>
</author>
<published>2014-01-23T23:55:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d623a9420c9ae2b748ba458c0e9d59084419fce0'/>
<id>d623a9420c9ae2b748ba458c0e9d59084419fce0</id>
<content type='text'>
Add comments for ioctls in fs/nilfs2/ioctl.c file and describe NILFS2
specific ioctls in Documentation/filesystems/nilfs2.txt.

Signed-off-by: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Reviewed-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Wenliang Fan &lt;fanwlexca@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add comments for ioctls in fs/nilfs2/ioctl.c file and describe NILFS2
specific ioctls in Documentation/filesystems/nilfs2.txt.

Signed-off-by: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Reviewed-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Wenliang Fan &lt;fanwlexca@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/nilfs2: fix integer overflow in nilfs_ioctl_wrap_copy()</title>
<updated>2014-01-24T00:37:00+00:00</updated>
<author>
<name>Wenliang Fan</name>
<email>fanwlexca@gmail.com</email>
</author>
<published>2014-01-23T23:55:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4b15d61718f0d1bd3bc32e15bffb25a31c1d5782'/>
<id>4b15d61718f0d1bd3bc32e15bffb25a31c1d5782</id>
<content type='text'>
The local variable 'pos' in nilfs_ioctl_wrap_copy function can overflow if
a large number was passed to argv-&gt;v_index from userspace and the sum of
argv-&gt;v_index and argv-&gt;v_nmembs exceeds the maximum value of __u64 type
integer (= ~(__u64)0 = 18446744073709551615).

Here, argv-&gt;v_index is a 64-bit width argument to specify the start
position of target data items (such as segment number, checkpoint number,
or virtual block address of nilfs), and argv-&gt;v_nmembs gives the total
number of the items that userland programs (such as lssu, lscp, or
cleanerd) want to get information about, which also gives the maximum
element count of argv-&gt;v_base[] array.

nilfs_ioctl_wrap_copy() calls dofunc() repeatedly and increments the
position variable 'pos' at the end of each iteration if dofunc() itself
didn't update 'pos':

      if (pos == ppos)
              pos += n;

This patch prevents the overflow here by rejecting pairs of a start
position (argv-&gt;v_index) and a total count (argv-&gt;v_nmembs) which leads to
the overflow.

[konishi.ryusuke@lab.ntt.co.jp: fix signedness issue]
Signed-off-by: Wenliang Fan &lt;fanwlexca@gmail.com&gt;
Cc: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The local variable 'pos' in nilfs_ioctl_wrap_copy function can overflow if
a large number was passed to argv-&gt;v_index from userspace and the sum of
argv-&gt;v_index and argv-&gt;v_nmembs exceeds the maximum value of __u64 type
integer (= ~(__u64)0 = 18446744073709551615).

Here, argv-&gt;v_index is a 64-bit width argument to specify the start
position of target data items (such as segment number, checkpoint number,
or virtual block address of nilfs), and argv-&gt;v_nmembs gives the total
number of the items that userland programs (such as lssu, lscp, or
cleanerd) want to get information about, which also gives the maximum
element count of argv-&gt;v_base[] array.

nilfs_ioctl_wrap_copy() calls dofunc() repeatedly and increments the
position variable 'pos' at the end of each iteration if dofunc() itself
didn't update 'pos':

      if (pos == ppos)
              pos += n;

This patch prevents the overflow here by rejecting pairs of a start
position (argv-&gt;v_index) and a total count (argv-&gt;v_nmembs) which leads to
the overflow.

[konishi.ryusuke@lab.ntt.co.jp: fix signedness issue]
Signed-off-by: Wenliang Fan &lt;fanwlexca@gmail.com&gt;
Cc: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: fix segctor bug that causes file system corruption</title>
<updated>2014-01-15T07:19:42+00:00</updated>
<author>
<name>Andreas Rohner</name>
<email>andreas.rohner@gmx.net</email>
</author>
<published>2014-01-15T01:56:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=70f2fe3a26248724d8a5019681a869abdaf3e89a'/>
<id>70f2fe3a26248724d8a5019681a869abdaf3e89a</id>
<content type='text'>
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean.  It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

  nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

  NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
  NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

 1. The cleaner starts the segment construction
 2. nilfs_segctor_collect is called
 3. sc_stage is on NILFS_ST_SUFILE and segments are freed
 4. sc_stage is on NILFS_ST_DAT current segment is full
 5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
 6. The new segment is one of the segments freed in step 3
 7. nilfs_sufile_cancel_freev is called and produces an error message
 8. Loop around and the collection starts again
 9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
10. A few hours later another segment construction allocates the
    segment and causes file system corruption

This can be prevented by simply reordering the statements.  If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Reviewed-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Tested-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean.  It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

  nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

  NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
  NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

 1. The cleaner starts the segment construction
 2. nilfs_segctor_collect is called
 3. sc_stage is on NILFS_ST_SUFILE and segments are freed
 4. sc_stage is on NILFS_ST_DAT current segment is full
 5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
 6. The new segment is one of the segments freed in step 3
 7. nilfs_sufile_cancel_freev is called and produces an error message
 8. Loop around and the collection starts again
 9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
10. A few hours later another segment construction allocates the
    segment and causes file system corruption

This can be prevented by simply reordering the statements.  If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Reviewed-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Tested-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: Abstract out bvec iterator</title>
<updated>2013-11-24T06:33:47+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kmo@daterainc.com</email>
</author>
<published>2013-10-11T22:44:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4f024f3797c43cb4b73cd2c50cec728842d0e49e'/>
<id>4f024f3797c43cb4b73cd2c50cec728842d0e49e</id>
<content type='text'>
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet &lt;kmo@daterainc.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: "Ed L. Cashin" &lt;ecashin@coraid.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Lars Ellenberg &lt;drbd-dev@lists.linbit.com&gt;
Cc: Jiri Kosina &lt;jkosina@suse.cz&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Geoff Levand &lt;geoff@infradead.org&gt;
Cc: Yehuda Sadeh &lt;yehuda@inktank.com&gt;
Cc: Sage Weil &lt;sage@inktank.com&gt;
Cc: Alex Elder &lt;elder@inktank.com&gt;
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris &lt;josh.h.morris@us.ibm.com&gt;
Cc: Philip Kelleher &lt;pjk1939@linux.vnet.ibm.com&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Jeremy Fitzhardinge &lt;jeremy@goop.org&gt;
Cc: Neil Brown &lt;neilb@suse.de&gt;
Cc: Alasdair Kergon &lt;agk@redhat.com&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh &lt;bharrosh@panasas.com&gt;
Cc: Benny Halevy &lt;bhalevy@tonian.com&gt;
Cc: "James E.J. Bottomley" &lt;JBottomley@parallels.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Nicholas A. Bellinger" &lt;nab@linux-iscsi.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Chris Mason &lt;chris.mason@fusionio.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Andreas Dilger &lt;adilger.kernel@dilger.ca&gt;
Cc: Jaegeuk Kim &lt;jaegeuk.kim@samsung.com&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Dave Kleikamp &lt;shaggy@kernel.org&gt;
Cc: Joern Engel &lt;joern@logfs.org&gt;
Cc: Prasad Joshi &lt;prasadjoshi.linux@gmail.com&gt;
Cc: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: KONISHI Ryusuke &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Ben Myers &lt;bpm@sgi.com&gt;
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
Cc: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Guo Chao &lt;yan@linux.vnet.ibm.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Asai Thambi S P &lt;asamymuthupa@micron.com&gt;
Cc: Selvan Mani &lt;smani@micron.com&gt;
Cc: Sam Bradshaw &lt;sbradshaw@micron.com&gt;
Cc: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Cc: "Roger Pau Monné" &lt;roger.pau@citrix.com&gt;
Cc: Jan Beulich &lt;jbeulich@suse.com&gt;
Cc: Stefano Stabellini &lt;stefano.stabellini@eu.citrix.com&gt;
Cc: Ian Campbell &lt;Ian.Campbell@citrix.com&gt;
Cc: Sebastian Ott &lt;sebott@linux.vnet.ibm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Jiang Liu &lt;jiang.liu@huawei.com&gt;
Cc: Nitin Gupta &lt;ngupta@vflare.org&gt;
Cc: Jerome Marchand &lt;jmarchand@redhat.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: Peng Tao &lt;tao.peng@emc.com&gt;
Cc: Andy Adamson &lt;andros@netapp.com&gt;
Cc: fanchaoting &lt;fanchaoting@cn.fujitsu.com&gt;
Cc: Jie Liu &lt;jeff.liu@oracle.com&gt;
Cc: Sunil Mushran &lt;sunil.mushran@gmail.com&gt;
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: Namjae Jeon &lt;namjae.jeon@samsung.com&gt;
Cc: Pankaj Kumar &lt;pankaj.km@samsung.com&gt;
Cc: Dan Magenheimer &lt;dan.magenheimer@oracle.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;6
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet &lt;kmo@daterainc.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: "Ed L. Cashin" &lt;ecashin@coraid.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Lars Ellenberg &lt;drbd-dev@lists.linbit.com&gt;
Cc: Jiri Kosina &lt;jkosina@suse.cz&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Geoff Levand &lt;geoff@infradead.org&gt;
Cc: Yehuda Sadeh &lt;yehuda@inktank.com&gt;
Cc: Sage Weil &lt;sage@inktank.com&gt;
Cc: Alex Elder &lt;elder@inktank.com&gt;
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris &lt;josh.h.morris@us.ibm.com&gt;
Cc: Philip Kelleher &lt;pjk1939@linux.vnet.ibm.com&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Jeremy Fitzhardinge &lt;jeremy@goop.org&gt;
Cc: Neil Brown &lt;neilb@suse.de&gt;
Cc: Alasdair Kergon &lt;agk@redhat.com&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh &lt;bharrosh@panasas.com&gt;
Cc: Benny Halevy &lt;bhalevy@tonian.com&gt;
Cc: "James E.J. Bottomley" &lt;JBottomley@parallels.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Nicholas A. Bellinger" &lt;nab@linux-iscsi.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Chris Mason &lt;chris.mason@fusionio.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Andreas Dilger &lt;adilger.kernel@dilger.ca&gt;
Cc: Jaegeuk Kim &lt;jaegeuk.kim@samsung.com&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Dave Kleikamp &lt;shaggy@kernel.org&gt;
Cc: Joern Engel &lt;joern@logfs.org&gt;
Cc: Prasad Joshi &lt;prasadjoshi.linux@gmail.com&gt;
Cc: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: KONISHI Ryusuke &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Ben Myers &lt;bpm@sgi.com&gt;
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
Cc: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Guo Chao &lt;yan@linux.vnet.ibm.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Asai Thambi S P &lt;asamymuthupa@micron.com&gt;
Cc: Selvan Mani &lt;smani@micron.com&gt;
Cc: Sam Bradshaw &lt;sbradshaw@micron.com&gt;
Cc: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Cc: "Roger Pau Monné" &lt;roger.pau@citrix.com&gt;
Cc: Jan Beulich &lt;jbeulich@suse.com&gt;
Cc: Stefano Stabellini &lt;stefano.stabellini@eu.citrix.com&gt;
Cc: Ian Campbell &lt;Ian.Campbell@citrix.com&gt;
Cc: Sebastian Ott &lt;sebott@linux.vnet.ibm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Jiang Liu &lt;jiang.liu@huawei.com&gt;
Cc: Nitin Gupta &lt;ngupta@vflare.org&gt;
Cc: Jerome Marchand &lt;jmarchand@redhat.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: Peng Tao &lt;tao.peng@emc.com&gt;
Cc: Andy Adamson &lt;andros@netapp.com&gt;
Cc: fanchaoting &lt;fanchaoting@cn.fujitsu.com&gt;
Cc: Jie Liu &lt;jeff.liu@oracle.com&gt;
Cc: Sunil Mushran &lt;sunil.mushran@gmail.com&gt;
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: Namjae Jeon &lt;namjae.jeon@samsung.com&gt;
Cc: Pankaj Kumar &lt;pankaj.km@samsung.com&gt;
Cc: Dan Magenheimer &lt;dan.magenheimer@oracle.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;6
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: fix issue with race condition of competition between segments for dirty blocks</title>
<updated>2013-09-30T21:31:02+00:00</updated>
<author>
<name>Vyacheslav Dubeyko</name>
<email>slava@dubeyko.com</email>
</author>
<published>2013-09-30T20:45:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7f42ec3941560f0902fe3671e36f2c20ffd3af0a'/>
<id>7f42ec3941560f0902fe3671e36f2c20ffd3af0a</id>
<content type='text'>
Many NILFS2 users were reported about strange file system corruption
(for example):

   NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
   NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

But such error messages are consequence of file system's issue that takes
place more earlier.  Fortunately, Jerome Poulin &lt;jeromepoulin@gmail.com&gt;
and Anton Eliasson &lt;devel@antoneliasson.se&gt; were reported about another
issue not so recently.  These reports describe the issue with segctor
thread's crash:

  BUG: unable to handle kernel paging request at 0000000000004c83
  IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

  Call Trace:
   nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
   nilfs_segctor_construct+0x17b/0x290 [nilfs2]
   nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
   kthread+0xc0/0xd0
   ret_from_fork+0x7c/0xb0

These two issues have one reason.  This reason can raise third issue
too.  Third issue results in hanging of segctor thread with eating of
100% CPU.

REPRODUCING PATH:

One of the possible way or the issue reproducing was described by
Jermoe me Poulin &lt;jeromepoulin@gmail.com&gt;:

1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e &gt; lscp.txt.xz
7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} &gt; /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux &gt; ps-aux-crashed.log
13. sysrq+W
14. sysrq+E  wait for everything to terminate
15. sysrq+SUSB

Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.

REPRODUCIBILITY:

The issue is reproduced not stable [60% - 80%].  It is very important to
have proper environment for the issue reproducing.  The critical
conditions for successful reproducing:

(1) It should have big modified file by mmap() way.

(2) This file should have the count of dirty blocks are greater that
    several segments in size (for example, two or three) from time to time
    during processing.

(3) It should be intensive background activity of files modification
    in another thread.

INVESTIGATION:

First of all, it is possible to see that the reason of crash is not valid
page address:

  NILFS [nilfs_segctor_complete_write]:2100 bh-&gt;b_count 0, bh-&gt;b_blocknr 13895680, bh-&gt;b_size 13897727, bh-&gt;b_page 0000000000001a82
  NILFS [nilfs_segctor_complete_write]:2101 segbuf-&gt;sb_segnum 6783

Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
number.  And b_blocknr with b_size values look like block numbers.  So,
buffer_head's pointer points on not proper address value.

Detailed investigation of the issue is discovered such picture:

  [-----------------------------SEGMENT 6783-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111149024, segbuf-&gt;sb_segnum 6783

  [-----------------------------SEGMENT 6784-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_lookup_dirty_data_buffers]:782 bh-&gt;b_count 1, bh-&gt;b_page ffffea000709b000, page-&gt;index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_lookup_dirty_data_buffers]:783 bh-&gt;b_assoc_buffers.next ffff8802174a6798, bh-&gt;b_assoc_buffers.prev ffff880221cffee8
  NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bh]:575 bh-&gt;b_count 1, bh-&gt;b_page ffffea000709b000, page-&gt;index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_segbuf_submit_bh]:576 segbuf-&gt;sb_segnum 6784
  NILFS [nilfs_segbuf_submit_bh]:577 bh-&gt;b_assoc_buffers.next ffff880218a0d5f8, bh-&gt;b_assoc_buffers.prev ffff880218bcdf50
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111150080, segbuf-&gt;sb_segnum 6784, segbuf-&gt;sb_nbio 0
  [----------] ditto
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111164416, segbuf-&gt;sb_segnum 6784, segbuf-&gt;sb_nbio 15

  [-----------------------------SEGMENT 6785-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_lookup_dirty_data_buffers]:782 bh-&gt;b_count 2, bh-&gt;b_page ffffea000709b000, page-&gt;index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_lookup_dirty_data_buffers]:783 bh-&gt;b_assoc_buffers.next ffff880219277e80, bh-&gt;b_assoc_buffers.prev ffff880221cffc88
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bh]:575 bh-&gt;b_count 2, bh-&gt;b_page ffffea000709b000, page-&gt;index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_segbuf_submit_bh]:576 segbuf-&gt;sb_segnum 6785
  NILFS [nilfs_segbuf_submit_bh]:577 bh-&gt;b_assoc_buffers.next ffff880218a0d5f8, bh-&gt;b_assoc_buffers.prev ffff880222cc7ee8
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111165440, segbuf-&gt;sb_segnum 6785, segbuf-&gt;sb_nbio 0
  [----------] ditto
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111177728, segbuf-&gt;sb_segnum 6785, segbuf-&gt;sb_nbio 12

  NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
  NILFS [nilfs_segbuf_wait]:676 segbuf-&gt;sb_segnum 6783
  NILFS [nilfs_segbuf_wait]:676 segbuf-&gt;sb_segnum 6784
  NILFS [nilfs_segbuf_wait]:676 segbuf-&gt;sb_segnum 6785

  NILFS [nilfs_segctor_complete_write]:2100 bh-&gt;b_count 0, bh-&gt;b_blocknr 13895680, bh-&gt;b_size 13897727, bh-&gt;b_page 0000000000001a82

  BUG: unable to handle kernel paging request at 0000000000001a82
  IP: [&lt;ffffffffa024d0f2&gt;] nilfs_end_page_io+0x12/0xd0 [nilfs2]

Usually, for every segment we collect dirty files in list.  Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer.  Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.

It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase.  Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:

  [SEGMENT 6784]: bh-&gt;b_assoc_buffers.next ffff880218a0d5f8, bh-&gt;b_assoc_buffers.prev ffff880218bcdf50
  [SEGMENT 6785]: bh-&gt;b_assoc_buffers.next ffff880218a0d5f8, bh-&gt;b_assoc_buffers.prev ffff880222cc7ee8

The next pointer is the same but prev pointer has changed.  It means
that buffer_head has next pointer from one list but prev pointer from
another.  Such modification can be made several times.  And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.

FIX:
This patch adds:

(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
    for every proccessed dirty block;

(2) checking of BH_Async_Write flag in
    nilfs_lookup_dirty_data_buffers() and
    nilfs_lookup_dirty_node_buffers();

(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
    nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().

Reported-by: Jerome Poulin &lt;jeromepoulin@gmail.com&gt;
Reported-by: Anton Eliasson &lt;devel@antoneliasson.se&gt;
Cc: Paul Fertser &lt;fercerpav@gmail.com&gt;
Cc: ARAI Shun-ichi &lt;hermes@ceres.dti.ne.jp&gt;
Cc: Piotr Szymaniak &lt;szarpaj@grubelek.pl&gt;
Cc: Juan Barry Manuel Canham &lt;Linux@riotingpacifist.net&gt;
Cc: Zahid Chowdhury &lt;zahid.chowdhury@starsolutions.com&gt;
Cc: Elmer Zhang &lt;freeboy6716@gmail.com&gt;
Cc: Kenneth Langga &lt;klangga@gmail.com&gt;
Signed-off-by: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Acked-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Many NILFS2 users were reported about strange file system corruption
(for example):

   NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
   NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

But such error messages are consequence of file system's issue that takes
place more earlier.  Fortunately, Jerome Poulin &lt;jeromepoulin@gmail.com&gt;
and Anton Eliasson &lt;devel@antoneliasson.se&gt; were reported about another
issue not so recently.  These reports describe the issue with segctor
thread's crash:

  BUG: unable to handle kernel paging request at 0000000000004c83
  IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

  Call Trace:
   nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
   nilfs_segctor_construct+0x17b/0x290 [nilfs2]
   nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
   kthread+0xc0/0xd0
   ret_from_fork+0x7c/0xb0

These two issues have one reason.  This reason can raise third issue
too.  Third issue results in hanging of segctor thread with eating of
100% CPU.

REPRODUCING PATH:

One of the possible way or the issue reproducing was described by
Jermoe me Poulin &lt;jeromepoulin@gmail.com&gt;:

1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e &gt; lscp.txt.xz
7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} &gt; /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux &gt; ps-aux-crashed.log
13. sysrq+W
14. sysrq+E  wait for everything to terminate
15. sysrq+SUSB

Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.

REPRODUCIBILITY:

The issue is reproduced not stable [60% - 80%].  It is very important to
have proper environment for the issue reproducing.  The critical
conditions for successful reproducing:

(1) It should have big modified file by mmap() way.

(2) This file should have the count of dirty blocks are greater that
    several segments in size (for example, two or three) from time to time
    during processing.

(3) It should be intensive background activity of files modification
    in another thread.

INVESTIGATION:

First of all, it is possible to see that the reason of crash is not valid
page address:

  NILFS [nilfs_segctor_complete_write]:2100 bh-&gt;b_count 0, bh-&gt;b_blocknr 13895680, bh-&gt;b_size 13897727, bh-&gt;b_page 0000000000001a82
  NILFS [nilfs_segctor_complete_write]:2101 segbuf-&gt;sb_segnum 6783

Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
number.  And b_blocknr with b_size values look like block numbers.  So,
buffer_head's pointer points on not proper address value.

Detailed investigation of the issue is discovered such picture:

  [-----------------------------SEGMENT 6783-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111149024, segbuf-&gt;sb_segnum 6783

  [-----------------------------SEGMENT 6784-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_lookup_dirty_data_buffers]:782 bh-&gt;b_count 1, bh-&gt;b_page ffffea000709b000, page-&gt;index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_lookup_dirty_data_buffers]:783 bh-&gt;b_assoc_buffers.next ffff8802174a6798, bh-&gt;b_assoc_buffers.prev ffff880221cffee8
  NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bh]:575 bh-&gt;b_count 1, bh-&gt;b_page ffffea000709b000, page-&gt;index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_segbuf_submit_bh]:576 segbuf-&gt;sb_segnum 6784
  NILFS [nilfs_segbuf_submit_bh]:577 bh-&gt;b_assoc_buffers.next ffff880218a0d5f8, bh-&gt;b_assoc_buffers.prev ffff880218bcdf50
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111150080, segbuf-&gt;sb_segnum 6784, segbuf-&gt;sb_nbio 0
  [----------] ditto
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111164416, segbuf-&gt;sb_segnum 6784, segbuf-&gt;sb_nbio 15

  [-----------------------------SEGMENT 6785-------------------------------]
  NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
  NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
  NILFS [nilfs_lookup_dirty_data_buffers]:782 bh-&gt;b_count 2, bh-&gt;b_page ffffea000709b000, page-&gt;index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_lookup_dirty_data_buffers]:783 bh-&gt;b_assoc_buffers.next ffff880219277e80, bh-&gt;b_assoc_buffers.prev ffff880221cffc88
  NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
  NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
  NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
  NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
  NILFS [nilfs_segbuf_submit_bh]:575 bh-&gt;b_count 2, bh-&gt;b_page ffffea000709b000, page-&gt;index 0, i_ino 1033103, i_size 25165824
  NILFS [nilfs_segbuf_submit_bh]:576 segbuf-&gt;sb_segnum 6785
  NILFS [nilfs_segbuf_submit_bh]:577 bh-&gt;b_assoc_buffers.next ffff880218a0d5f8, bh-&gt;b_assoc_buffers.prev ffff880222cc7ee8
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111165440, segbuf-&gt;sb_segnum 6785, segbuf-&gt;sb_nbio 0
  [----------] ditto
  NILFS [nilfs_segbuf_submit_bio]:464 bio-&gt;bi_sector 111177728, segbuf-&gt;sb_segnum 6785, segbuf-&gt;sb_nbio 12

  NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
  NILFS [nilfs_segbuf_wait]:676 segbuf-&gt;sb_segnum 6783
  NILFS [nilfs_segbuf_wait]:676 segbuf-&gt;sb_segnum 6784
  NILFS [nilfs_segbuf_wait]:676 segbuf-&gt;sb_segnum 6785

  NILFS [nilfs_segctor_complete_write]:2100 bh-&gt;b_count 0, bh-&gt;b_blocknr 13895680, bh-&gt;b_size 13897727, bh-&gt;b_page 0000000000001a82

  BUG: unable to handle kernel paging request at 0000000000001a82
  IP: [&lt;ffffffffa024d0f2&gt;] nilfs_end_page_io+0x12/0xd0 [nilfs2]

Usually, for every segment we collect dirty files in list.  Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer.  Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.

It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase.  Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:

  [SEGMENT 6784]: bh-&gt;b_assoc_buffers.next ffff880218a0d5f8, bh-&gt;b_assoc_buffers.prev ffff880218bcdf50
  [SEGMENT 6785]: bh-&gt;b_assoc_buffers.next ffff880218a0d5f8, bh-&gt;b_assoc_buffers.prev ffff880222cc7ee8

The next pointer is the same but prev pointer has changed.  It means
that buffer_head has next pointer from one list but prev pointer from
another.  Such modification can be made several times.  And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.

FIX:
This patch adds:

(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
    for every proccessed dirty block;

(2) checking of BH_Async_Write flag in
    nilfs_lookup_dirty_data_buffers() and
    nilfs_lookup_dirty_node_buffers();

(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
    nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().

Reported-by: Jerome Poulin &lt;jeromepoulin@gmail.com&gt;
Reported-by: Anton Eliasson &lt;devel@antoneliasson.se&gt;
Cc: Paul Fertser &lt;fercerpav@gmail.com&gt;
Cc: ARAI Shun-ichi &lt;hermes@ceres.dti.ne.jp&gt;
Cc: Piotr Szymaniak &lt;szarpaj@grubelek.pl&gt;
Cc: Juan Barry Manuel Canham &lt;Linux@riotingpacifist.net&gt;
Cc: Zahid Chowdhury &lt;zahid.chowdhury@starsolutions.com&gt;
Cc: Elmer Zhang &lt;freeboy6716@gmail.com&gt;
Cc: Kenneth Langga &lt;klangga@gmail.com&gt;
Signed-off-by: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Acked-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>truncate: drop 'oldsize' truncate_pagecache() parameter</title>
<updated>2013-09-12T22:38:02+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-09-12T22:13:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7caef26767c1727d7abfbbbfbe8b2bb473430d48'/>
<id>7caef26767c1727d7abfbbbfbe8b2bb473430d48</id>
<content type='text'>
truncate_pagecache() doesn't care about old size since commit
cedabed49b39 ("vfs: Fix vmtruncate() regression").  Let's drop it.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: OGAWA Hirofumi &lt;hirofumi@mail.parknet.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
truncate_pagecache() doesn't care about old size since commit
cedabed49b39 ("vfs: Fix vmtruncate() regression").  Let's drop it.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: OGAWA Hirofumi &lt;hirofumi@mail.parknet.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
