linux-toradex.git/fs, branch v3.0.33

vfs: make AIO use the proper rw_verify_area() area helpers

2012-06-01T07:12:53+00:00

commit a70b52ec1aaeaf60f4739edb1b422827cb6f3893 upstream.

We had for some reason overlooked the AIO interface, and it didn't use
the proper rw_verify_area() helper function that checks (for example)
mandatory locking on the file, and that the size of the access doesn't
cause us to overflow the provided offset limits etc.

Instead, AIO did just the security_file_permission() thing (that
rw_verify_area() also does) directly.

This fixes it to do all the proper helper functions, which not only
means that now mandatory file locking works with AIO too, we can
actually remove lines of code.

Reported-by: Manish Honap 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

block: don't mark buffers beyond end of disk as mapped

2012-06-01T07:12:52+00:00

commit 080399aaaf3531f5b8761ec0ac30ff98891e8686 upstream.

Hi,

We have a bug report open where a squashfs image mounted on ppc64 would
exhibit errors due to trying to read beyond the end of the disk.  It can
easily be reproduced by doing the following:

[root@ibm-p750e-02-lp3 ~]# ls -l install.img
-rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img
[root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test
[root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null
dd: reading `/dev/loop0': Input/output error
277376+0 records in
277376+0 records out
142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s

In dmesg, you'll find the following:

squashfs: version 4.0 (2009/01/31) Phillip Lougher
[   43.106012] attempt to access beyond end of device
[   43.106029] loop0: rw=0, want=277410, limit=277408
[   43.106039] Buffer I/O error on device loop0, logical block 138704
[   43.106053] attempt to access beyond end of device
[   43.106057] loop0: rw=0, want=277412, limit=277408
[   43.106061] Buffer I/O error on device loop0, logical block 138705
[   43.106066] attempt to access beyond end of device
[   43.106070] loop0: rw=0, want=277414, limit=277408
[   43.106073] Buffer I/O error on device loop0, logical block 138706
[   43.106078] attempt to access beyond end of device
[   43.106081] loop0: rw=0, want=277416, limit=277408
[   43.106085] Buffer I/O error on device loop0, logical block 138707
[   43.106089] attempt to access beyond end of device
[   43.106093] loop0: rw=0, want=277418, limit=277408
[   43.106096] Buffer I/O error on device loop0, logical block 138708
[   43.106101] attempt to access beyond end of device
[   43.106104] loop0: rw=0, want=277420, limit=277408
[   43.106108] Buffer I/O error on device loop0, logical block 138709
[   43.106112] attempt to access beyond end of device
[   43.106116] loop0: rw=0, want=277422, limit=277408
[   43.106120] Buffer I/O error on device loop0, logical block 138710
[   43.106124] attempt to access beyond end of device
[   43.106128] loop0: rw=0, want=277424, limit=277408
[   43.106131] Buffer I/O error on device loop0, logical block 138711
[   43.106135] attempt to access beyond end of device
[   43.106139] loop0: rw=0, want=277426, limit=277408
[   43.106143] Buffer I/O error on device loop0, logical block 138712
[   43.106147] attempt to access beyond end of device
[   43.106151] loop0: rw=0, want=277428, limit=277408
[   43.106154] Buffer I/O error on device loop0, logical block 138713
[   43.106158] attempt to access beyond end of device
[   43.106162] loop0: rw=0, want=277430, limit=277408
[   43.106166] attempt to access beyond end of device
[   43.106169] loop0: rw=0, want=277432, limit=277408
...
[   43.106307] attempt to access beyond end of device
[   43.106311] loop0: rw=0, want=277470, limit=2774

Squashfs manages to read in the end block(s) of the disk during the
mount operation.  Then, when dd reads the block device, it leads to
block_read_full_page being called with buffers that are beyond end of
disk, but are marked as mapped.  Thus, it would end up submitting read
I/O against them, resulting in the errors mentioned above.  I fixed the
problem by modifying init_page_buffers to only set the buffer mapped if
it fell inside of i_size.

Cheers,
Jeff

Signed-off-by: Jeff Moyer 
Acked-by: Nick Piggin 

--

Changes from v1->v2: re-used max_block, as suggested by Nick Piggin.
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

wake up s_wait_unfrozen when ->freeze_fs fails

2012-05-21T16:40:05+00:00

commit e1616300a20c80396109c1cf013ba9a36055a3da upstream.

dd slept infinitely when fsfeeze failed because of EIO.
To fix this problem, if ->freeze_fs fails, freeze_super() wakes up
the tasks waiting for the filesystem to become unfrozen.

When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(),
the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen.

However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then
freeze_super() returns an error number. In this case, FITHAW ioctl returns
EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up
s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely.

Signed-off-by: Kazuya Mio 
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

ext4: fix error handling on inode bitmap corruption

2012-05-21T16:40:04+00:00

commit acd6ad83517639e8f09a8c5525b1dccd81cd2a10 upstream.

When insert_inode_locked() fails in ext4_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:)
which declares filesystem error and does not call unlock_new_inode().

Signed-off-by: Jan Kara 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

ext3: Fix error handling on inode bitmap corruption

2012-05-21T16:40:04+00:00

commit 1415dd8705394399d59a3df1ab48d149e1e41e77 upstream.

When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().

Reviewed-by: Eric Sandeen 
Signed-off-by: Jan Kara 
Signed-off-by: Greg Kroah-Hartman

NFSv4: Revalidate uid/gid after open

2012-05-21T16:40:04+00:00

This is a shorter (and more appropriate for stable kernels) analog to
the following upstream commit:

commit 6926afd1925a54a13684ebe05987868890665e2b
Author: Trond Myklebust 
Date:   Sat Jan 7 13:22:46 2012 -0500

    NFSv4: Save the owner/group name string when doing open

    ...so that we can do the uid/gid mapping outside the asynchronous RPC
    context.
    This fixes a bug in the current NFSv4 atomic open code where the client
    isn't able to determine what the true uid/gid fields of the file are,
    (because the asynchronous nature of the OPEN call denies it the ability
    to do an upcall) and so fills them with default values, marking the
    inode as needing revalidation.
    Unfortunately, in some cases, the VFS will do some additional sanity
    checks on the file, and may override the server's decision to allow
    the open because it sees the wrong owner/group fields.

    Signed-off-by: Trond Myklebust 

Without this patch, logging into two different machines with home
directories mounted over NFS4 and then running "vim" and typing ":q"
in each reliably produces the following error on the second machine:

	E137: Viminfo file is not writable: /users/system/rtheys/.viminfo

This regression was introduced by 80e52aced138 ("NFSv4: Don't do
idmapper upcalls for asynchronous RPC calls", merged during the 2.6.32
cycle) --- after the OPEN call, .viminfo has the default values for
st_uid and st_gid (0xfffffffe) cached because we do not want to let
rpciod wait for an idmapper upcall to fill them in.

The fix used in mainline is to save the owner and group as strings and
perform the upcall in _nfs4_proc_open outside the rpciod context,
which takes about 600 lines.  For stable, we can do something similar
with a one-liner: make open check for the stale fields and make a
(synchronous) GETATTR call to fill them when needed.

Trond dictated the patch, I typed it in, and Rik tested it.

Addresses http://bugs.debian.org/659111 and
          https://bugzilla.redhat.com/789298

Reported-by: Rik Theys 
Explained-by: David Flyn 
Signed-off-by: Jonathan Nieder 
Tested-by: Rik Theys 
Signed-off-by: Greg Kroah-Hartman

ext4: avoid deadlock on sync-mounted FS w/o journal

2012-05-21T16:40:04+00:00

commit c1bb05a657fb3d8c6179a4ef7980261fae4521d7 upstream.

Processes hang forever on a sync-mounted ext2 file system that
is mounted with the ext4 module (default in Fedora 16).

I can reproduce this reliably by mounting an ext2 partition with
"-o sync" and opening a new file an that partition with vim. vim
will hang in "D" state forever.  The same happens on ext4 without
a journal.

I am attaching a small patch here that solves this issue for me.
In the sync mounted case without a journal,
ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which
can't be called with buffer lock held.

Also move mb_cache_entry_release inside lock to avoid race
fixed previously by 8a2bfdcb ext[34]: EA block reference count racing fix
Note too that ext2 fixed this same problem in 2006 with
b2f49033 [PATCH] fix deadlock in ext2

Signed-off-by: Martin.Wilck@ts.fujitsu.com
[sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg]
Signed-off-by: Eric Sandeen 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

jffs2: Fix lock acquisition order bug in gc path

2012-05-21T16:40:03+00:00

commit 226bb7df3d22bcf4a1c0fe8206c80cc427498eae upstream.

The locking policy is such that the erase_complete_block spinlock is
nested within the alloc_sem mutex.  This fixes a case in which the
acquisition order was erroneously reversed.  This issue was caught by
the following lockdep splat:

   =======================================================
   [ INFO: possible circular locking dependency detected ]
   3.0.5 #1
   -------------------------------------------------------
   jffs2_gcd_mtd6/299 is trying to acquire lock:
    (&c->alloc_sem){+.+.+.}, at: [] jffs2_garbage_collect_pass+0x314/0x890

   but task is already holding lock:
    (&(&c->erase_completion_lock)->rlock){+.+...}, at: [] jffs2_garbage_collect_pass+0x308/0x890

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #1 (&(&c->erase_completion_lock)->rlock){+.+...}:
          [] validate_chain+0xe6c/0x10bc
          [] __lock_acquire+0x54c/0xba4
          [] lock_acquire+0xa4/0x114
          [] _raw_spin_lock+0x3c/0x4c
          [] jffs2_garbage_collect_pass+0x4c/0x890
          [] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [] kthread+0x98/0xa0
          [] kernel_thread_exit+0x0/0x8

   -> #0 (&c->alloc_sem){+.+.+.}:
          [] print_circular_bug+0x70/0x2c4
          [] validate_chain+0x1034/0x10bc
          [] __lock_acquire+0x54c/0xba4
          [] lock_acquire+0xa4/0x114
          [] mutex_lock_nested+0x74/0x33c
          [] jffs2_garbage_collect_pass+0x314/0x890
          [] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [] kthread+0x98/0xa0
          [] kernel_thread_exit+0x0/0x8

   other info that might help us debug this:

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&(&c->erase_completion_lock)->rlock);
                                  lock(&c->alloc_sem);
                                  lock(&(&c->erase_completion_lock)->rlock);
     lock(&c->alloc_sem);

    *** DEADLOCK ***

   1 lock held by jffs2_gcd_mtd6/299:
    #0:  (&(&c->erase_completion_lock)->rlock){+.+...}, at: [] jffs2_garbage_collect_pass+0x308/0x890

   stack backtrace:
   [] (unwind_backtrace+0x0/0x100) from [] (dump_stack+0x20/0x24)
   [] (dump_stack+0x20/0x24) from [] (print_circular_bug+0x1c8/0x2c4)
   [] (print_circular_bug+0x1c8/0x2c4) from [] (validate_chain+0x1034/0x10bc)
   [] (validate_chain+0x1034/0x10bc) from [] (__lock_acquire+0x54c/0xba4)
   [] (__lock_acquire+0x54c/0xba4) from [] (lock_acquire+0xa4/0x114)
   [] (lock_acquire+0xa4/0x114) from [] (mutex_lock_nested+0x74/0x33c)
   [] (mutex_lock_nested+0x74/0x33c) from [] (jffs2_garbage_collect_pass+0x314/0x890)
   [] (jffs2_garbage_collect_pass+0x314/0x890) from [] (jffs2_garbage_collect_thread+0x1b4/0x1cc)
   [] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [] (kthread+0x98/0xa0)
   [] (kthread+0x98/0xa0) from [] (kernel_thread_exit+0x0/0x8)

This was introduce in '81cfc9f jffs2: Fix serious write stall due to erase'.

Signed-off-by: Josh Cartwright 
Signed-off-by: Artem Bityutskiy 
Signed-off-by: David Woodhouse 
Signed-off-by: Greg Kroah-Hartman

hfsplus: Fix potential buffer overflows

2012-05-07T15:56:50+00:00

commit 6f24f892871acc47b40dd594c63606a17c714f77 upstream.

Commit ec81aecb2966 ("hfs: fix a potential buffer overflow") fixed a few
potential buffer overflows in the hfs filesystem.  But as Timo Warns
pointed out, these changes also need to be made on the hfsplus
filesystem as well.

Reported-by: Timo Warns 
Acked-by: WANG Cong 
Cc: Alexey Khoroshilov 
Cc: Miklos Szeredi 
Cc: Sage Weil 
Cc: Eugene Teo 
Cc: Roman Zippel 
Cc: Al Viro 
Cc: Christoph Hellwig 
Cc: Alexey Dobriyan 
Cc: Dave Anderson 
Cc: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Linus Torvalds

autofs: make the autofsv5 packet file descriptor use a packetized pipe

2012-05-07T15:56:37+00:00

commit 64f371bc3107e69efce563a3d0f0e6880de0d537 upstream.

The autofs packet size has had a very unfortunate size problem on x86:
because the alignment of 'u64' differs in 32-bit and 64-bit modes, and
because the packet data was not 8-byte aligned, the size of the autofsv5
packet structure differed between 32-bit and 64-bit modes despite
looking otherwise identical (300 vs 304 bytes respectively).

We first fixed that up by making the 64-bit compat mode know about this
problem in commit a32744d4abae ("autofs: work around unhappy compat
problem on x86-64"), and that made a 32-bit 'systemd' work happily on a
64-bit kernel because everything then worked the same way as on a 32-bit
kernel.

But it turned out that 'automount' had actually known and worked around
this problem in user space, so fixing the kernel to do the proper 32-bit
compatibility handling actually *broke* 32-bit automount on a 64-bit
kernel, because it knew that the packet sizes were wrong and expected
those incorrect sizes.

As a result, we ended up reverting that compatibility mode fix, and
thus breaking systemd again, in commit fcbf94b9dedd.

With both automount and systemd doing a single read() system call, and
verifying that they get *exactly* the size they expect but using
different sizes, it seemed that fixing one of them inevitably seemed to
break the other.  At one point, a patch I seriously considered applying
from Michael Tokarev did a "strcmp()" to see if it was automount that
was doing the operation.  Ugly, ugly.

However, a prettier solution exists now thanks to the packetized pipe
mode.  By marking the communication pipe as being packetized (by simply
setting the O_DIRECT flag), we can always just write the bigger packet
size, and if user-space does a smaller read, it will just get that
partial end result and the extra alignment padding will simply be thrown
away.

This makes both automount and systemd happy, since they now get the size
they asked for, and the kernel side of autofs simply no longer needs to
care - it could pad out the packet arbitrarily.

Of course, if there is some *other* user of autofs (please, please,
please tell me it ain't so - and we haven't heard of any) that tries to
read the packets with multiple writes, that other user will now be
broken - the whole point of the packetized mode is that one system call
gets exactly one packet, and you cannot read a packet in pieces.

Tested-by: Michael Tokarev 
Cc: Alan Cox 
Cc: David Miller 
Cc: Ian Kent 
Cc: Thomas Meyer 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman