linux-toradex.git/fs/jbd2, branch v3.10.2

jbd2: fix theoretical race in jbd2__journal_restart

2013-07-22T01:21:22+00:00

commit 39c04153fda8c32e85b51c96eb5511a326ad7609 upstream.

Once we decrement transaction->t_updates, if this is the last handle
holding the transaction from closing, and once we release the
t_handle_lock spinlock, it's possible for the transaction to commit
and be released.  In practice with normal kernels, this probably won't
happen, since the commit happens in a separate kernel thread and it's
unlikely this could all happen within the space of a few CPU cycles.

On the other hand, with a real-time kernel, this could potentially
happen, so save the tid found in transaction->t_tid before we release
t_handle_lock.  It would require an insane configuration, such as one
where the jbd2 thread was set to a very high real-time priority,
perhaps because a high priority real-time thread is trying to read or
write to a file system.  But some people who use real-time kernels
have been known to do insane things, including controlling
laser-wielding industrial robots.  :-)

Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

jbd2: move superblock checksum calculation to jbd2_write_superblock()

2013-07-22T01:21:22+00:00

commit fe52d17cdd343ac43c85cf72940a58865b9d3bfb upstream.

Some of the functions which modify the jbd2 superblock were not
updating the checksum before calling jbd2_write_superblock().  Move
the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
that the checksum is calculated consistently.

Signed-off-by: "Theodore Ts'o" 
Cc: Darrick J. Wong 
Signed-off-by: Greg Kroah-Hartman

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2013-05-02T00:51:54+00:00

Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

2013-05-01T15:04:12+00:00

Pull ext4 updates from Ted Ts'o:
 "Mostly performance and bug fixes, plus some cleanups.  The one new
  feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
  allows installation of a hidden inode designed for boot loaders."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
  ext4: fix type-widening bug in inode table readahead code
  ext4: add check for inodes_count overflow in new resize ioctl
  ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
  ext4: fix online resizing for ext3-compat file systems
  jbd2: trace when lock_buffer in do_get_write_access takes a long time
  ext4: mark metadata blocks using bh flags
  buffer: add BH_Prio and BH_Meta flags
  ext4: mark all metadata I/O with REQ_META
  ext4: fix readdir error in case inline_data+^dir_index.
  ext4: fix readdir error in the case of inline_data+dir_index
  jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  ext4: mext_insert_extents should update extent block checksum
  ext4: move quota initialization out of inode allocation transaction
  ext4: reserve xattr index for Rich ACL support
  jbd2: reduce journal_head size
  ext4: clear buffer_uninit flag when submitting IO
  ext4: use io_end for multiple bios
  ext4: make ext4_bio_write_page() use BH_Async_Write flags
  ext4: Use kstrtoul() instead of parse_strtoul()
  ext4: defragmentation code cleanup
  ...

fs/buffer.c: remove unnecessary init operation after allocating buffer_head.

2013-04-29T22:54:39+00:00

bh allocation uses kmem_cache_zalloc() so we needn't call
'init_buffer(bh, NULL, NULL)' and perform other set-zero-operations.

Signed-off-by: Jianpeng Ma 
Cc: Jan Kara 
Cc: Theodore Ts'o 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

jbd2: trace when lock_buffer in do_get_write_access takes a long time

2013-04-21T20:47:54+00:00

While investigating interactivity problems it was clear that processes
sometimes stall for long periods of times if an attempt is made to
lock a buffer which is undergoing writeback.  It would stall in
a trace looking something like

[] __lock_buffer+0x2e/0x30
[] do_get_write_access+0x43f/0x4b0
[] jbd2_journal_get_write_access+0x2b/0x50
[] __ext4_journal_get_write_access+0x39/0x80
[] ext4_reserve_inode_write+0x78/0xa0
[] ext4_mark_inode_dirty+0x49/0x220
[] ext4_dirty_inode+0x41/0x60
[] __mark_inode_dirty+0x4e/0x2d0
[] update_time+0x79/0xc0
[] file_update_time+0x98/0x100
[] __generic_file_aio_write+0x17c/0x3b0
[] generic_file_aio_write+0x7a/0xf0
[] ext4_file_write+0x83/0xd0
[] do_sync_write+0xa3/0xe0
[] vfs_write+0xae/0x180
[] sys_write+0x4d/0x90
[] system_call_fastpath+0x1a/0x1f
[] 0xffffffffffffffff

Signed-off-by: Mel Gorman 
Signed-off-by: "Theodore Ts'o"

jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset

2013-04-19T21:49:23+00:00

The jbd2_alloc_handle() function is only called by new_handle().  So
this commit uses kmem_cache_zalloc() instead of
kmem_cache_alloc()/memset().

Signed-off-by: Zheng Liu 
Signed-off-by: "Theodore Ts'o"

procfs: new helper - PDE_DATA(inode)

2013-04-09T18:13:32+00:00

The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data.  Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.

Signed-off-by: Al Viro

jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback

2013-04-04T02:06:52+00:00

The following race is possible:

[kjournald2]                              other_task
jbd2_journal_commit_transaction()
  j_state = T_FINISHED;
  spin_unlock(&journal->j_list_lock);
                                         ->jbd2_journal_remove_checkpoint()
					   ->jbd2_journal_free_transaction();
					     ->kmem_cache_free(transaction)
  ->j_commit_callback(journal, transaction);
    -> USE_AFTER_FREE

WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
Call Trace:
 [] warn_slowpath_common+0xad/0xf0
 [] warn_slowpath_fmt+0x46/0x50
 [] ? ext4_journal_commit_callback+0x99/0xc0
 [] __list_del_entry+0x1c0/0x250
 [] ext4_journal_commit_callback+0x6f/0xc0
 [] jbd2_journal_commit_transaction+0x23a6/0x2570
 [] ? try_to_del_timer_sync+0x82/0xa0
 [] ? del_timer_sync+0x91/0x1e0
 [] kjournald2+0x19f/0x6a0
 [] ? wake_up_bit+0x40/0x40
 [] ? bit_spin_lock+0x80/0x80
 [] kthread+0x10e/0x120
 [] ? __init_kthread_worker+0x70/0x70
 [] ret_from_fork+0x7c/0xb0
 [] ? __init_kthread_worker+0x70/0x70

In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk.  This makes callback longer and race
window becomes wider.

In order to fix this we should mark transaction as finished only after
callbacks have completed

Signed-off-by: Dmitry Monakhov 
Signed-off-by: "Theodore Ts'o" 
Cc: stable@vger.kernel.org

ext4/jbd2: don't wait (forever) for stale tid caused by wraparound

2013-04-04T02:02:52+00:00

In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.

Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().

Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time.  To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.

As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started.  This
should have a small (but probably not measurable) improvement for
ext4's scalability.

Signed-off-by: "Theodore Ts'o" 
Reported-by: Ben Hutchings 
Reported-by: George Barnett 
Cc: stable@vger.kernel.org