linux-toradex.git/fs, branch v2.6.28.3

epoll: drop max_user_instances and rely only on max_user_watches

2009-02-02T17:53:26+00:00

commit 9df04e1f25effde823a600e755b51475d438f56b upstream.

Linus suggested to put limits where the money is, and max_user_watches
already does that w/out the need of max_user_instances.  That has the
advantage to mitigate the potential DoS while allowing pretty generous
default behavior.

Allowing top 4% of low memory (per user) to be allocated in epoll watches,
we have:

LOMEM    MAX_WATCHES (per user)
512MB    ~178000
1GB      ~356000
2GB      ~712000

A box with 512MB of lomem, will meet some challenge in hitting 180K
watches, socket buffers math teaches us.  No more max_user_instances
limits then.

Signed-off-by: Davide Libenzi 
Cc: Willy Tarreau 
Cc: Michael Kerrisk 
Cc: Bron Gondwana 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ext3: Add sanity check to make_indexed_dir

2009-02-02T17:53:24+00:00

commit a21102b55c4f8dfd3adb4a15a34cd62237b46039 upstream.

Make sure the rec_len field in the '..' entry is sane, lest we overrun
the directory block and cause a kernel oops on a purposefully
corrupted filesystem.

This fixes a bug related to a bug originally reported by Sami Liedes
for ext4 at:

http://bugzilla.kernel.org/show_bug.cgi?id=12430

Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

sysfs: fix problems with binary files

2009-02-02T17:53:20+00:00

commit 4503efd0891c40e30928afb4b23dc3f99c62a6b2 upstream.

Some sysfs binary files don't like having 0 passed to them as a size.
Fix this up at the root by just returning to the vfs if userspace asks
us for a zero sized buffer.

Thanks to Pavel Roskin for pointing this out.

Reported-by: Pavel Roskin 
Signed-off-by: Greg Kroah-Hartman

fuse: fix NULL deref in fuse_file_alloc()

2009-02-02T17:53:18+00:00

commit bb875b38dc5e343bdb696b2eab8233e4d195e208 upstream.

ff is set to NULL and then dereferenced on line 65.  Compile tested only.

Signed-off-by: Dan Carpenter 
Signed-off-by: Miklos Szeredi 
Signed-off-by: Greg Kroah-Hartman

fuse: fix missing fput on error

2009-02-02T17:53:18+00:00

commit 3ddf1e7f57237ac7c5d5bfb7058f1ea4f970b661 upstream.

Fix the leaking file reference if allocation or initialization of
fuse_conn failed.

Signed-off-by: Miklos Szeredi 
Signed-off-by: Greg Kroah-Hartman

fuse: destroy bdi on umount

2009-02-02T17:53:18+00:00

commit 26c3679101dbccc054dcf370143941844ba70531 upstream.

If a fuse filesystem is unmounted but the device file descriptor
remains open and a new mount reuses the old device number, then the
mount fails with EEXIST and the following warning is printed in the
kernel log:

  WARNING: at fs/sysfs/dir.c:462 sysfs_add_one+0x35/0x3d()
  sysfs: duplicate filename '0:15' can not be created

The cause is that the bdi belonging to the fuse filesystem was
destoryed only after the device file was released.  Fix this by
calling bdi_destroy() from fuse_put_super() instead.

Signed-off-by: Miklos Szeredi 
Signed-off-by: Greg Kroah-Hartman

inotify: clean up inotify_read and fix locking problems

2009-02-02T17:53:18+00:00

commit 3632dee2f8b8a9720329f29eeaa4ec4669a3aff8 upstream.

If userspace supplies an invalid pointer to a read() of an inotify
instance, the inotify device's event list mutex is unlocked twice.
This causes an unbalance which effectively leaves the data structure
unprotected, and we can trigger oopses by accessing the inotify
instance from different tasks concurrently.

The best fix (contributed largely by Linus) is a total rewrite
of the function in question:

On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote:
> The thing to notice is that:
>
>  - locking is done in just one place, and there is no question about it
>   not having an unlock.
>
>  - that whole double-while(1)-loop thing is gone.
>
>  - use multiple functions to make nesting and error handling sane
>
>  - do error testing after doing the things you always need to do, ie do
>   this:
>
>        mutex_lock(..)
>        ret = function_call();
>        mutex_unlock(..)
>
>        .. test ret here ..
>
>   instead of doing conditional exits with unlocking or freeing.
>
> So if the code is written in this way, it may still be buggy, but at least
> it's not buggy because of subtle "forgot to unlock" or "forgot to free"
> issues.
>
> This _always_ unlocks if it locked, and it always frees if it got a
> non-error kevent.

Cc: John McCutchan 
Cc: Robert Love 
Signed-off-by: Vegard Nossum 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fs: sys_sync fix

2009-01-25T00:41:52+00:00

commit 856bf4d717feb8c55d4e2f817b71ebb70cfbc67b upstream.

s_syncing livelock avoidance was breaking data integrity guarantee of
sys_sync, by allowing sys_sync to skip writing or waiting for superblocks
if there is a concurrent sys_sync happening.

This livelock avoidance is much less important now that we don't have the
get_super_to_sync() call after every sb that we sync.  This was replaced
by __put_super_and_need_restart.

Signed-off-by: Nick Piggin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fs: sync_sb_inodes fix

2009-01-25T00:41:52+00:00

commit 38f21977663126fef53f5585e7f1653d8ebe55c4 upstream.

Fix data integrity semantics required by sys_sync, by iterating over all
inodes and waiting for any writeback pages after the initial writeout.
Comments explain the exact problem.

Signed-off-by: Nick Piggin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fs: remove WB_SYNC_HOLD

2009-01-25T00:41:51+00:00

commit 4f5a99d64c17470a784a6c68064207d82e3e74a5 upstream.

Remove WB_SYNC_HOLD.  The primary motiviation is the design of my
anti-starvation code for fsync.  It requires taking an inode lock over the
sync operation, so we could run into lock ordering problems with multiple
inodes.  It is possible to take a single global lock to solve the ordering
problem, but then that would prevent a future nice implementation of "sync
multiple inodes" based on lock order via inode address.

Seems like a backward step to remove this, but actually it is busted
anyway: we can't use the inode lists for data integrity wait: an inode can
be taken off the dirty lists but still be under writeback.  In order to
satisfy data integrity semantics, we should wait for it to finish
writeback, but if we only search the dirty lists, we'll miss it.

It would be possible to have a "writeback" list, for sys_sync, I suppose.
But why complicate things by prematurely optimise?  For unmounting, we
could avoid the "livelock avoidance" code, which would be easier, but
again premature IMO.

Fixing the existing data integrity problem will come next.

Signed-off-by: Nick Piggin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman